Introduction

Huntington disease (HD) is a hereditary neurodegenerative disorder caused by a CAG triplet repeat expansion in the gene huntingtin; HD affects approximately 1 in 10,000 individuals in populations of European descent [14]. Since HD is a dominantly inherited disease, a person whose parent carries the HD mutation gene has a 50 % chance inheriting it at the time of conception. About 150,000 individuals in the USA are “at risk” for HD. The age of onset of HD is inversely related to the length of the CAG repeat; for the most common expansion lengths, mutation signs and clinical diagnosis of HD (based on characteristic motor symptoms) typically occur between ages 30 and 50. Motor, cognitive, and psychiatric abnormalities may emerge gradually, more than a decade before diagnosis (prodromal HD), and worsen progressively [5]. Although the rate of clinical progression differs for each person, HD is generally fatal within 15–20 years of clinical diagnosis [6]. The fact that this progressive, fatal disease typically strikes individuals during the prime of their lives underscores the need for interventions that slow the disease progression and maximize health-related quality of life (HRQOL).

HRQOL is a multidimensional construct defined as the impact that a disease or disability has on different aspects of well-being [7]. This follows the World Health Organization (WHO) framework for HRQOL which includes physical, social, and emotional well-being [8]. HRQOL differs from general quality of life (QOL), which is a poorly defined concept that lacks a consensus definition, that may or may not be synonymous with HRQOL [9, 10]. Current measures of HRQOL are insufficient to capture the broad extent of functional and symptom distress in HD and are also insensitive to potential intervention effects in HD. Most HRQOL measures used in HD were developed for other clinical populations and are inadequate because of important differences in symptoms across the neurodegenerative diseases. For example, although Parkinson’s disease (PD) and HD are both basal ganglia disorders characterized by motor abnormalities, these motor manifestations present differently; PD is typically characterized by tremor and bradykinesia, whereas HD is typically choreic (involuntary “dance-like” movements) and hyperkinetic [11]. Therefore, a measure of motor functioning developed for PD may not be meaningful for HD. Similarly, although cognitive dysfunction in HD overlaps with cognitive dysfunction in Alzheimer disease (AD), individuals with HD typically have “subcortical” deficits (in attention, processing speed, and executive dysfunction), whereas individuals with AD also have prominent “cortical” deficits (in memory, language, and executive dysfunction) [12]. Furthermore, generic measures of HRQOL cannot detect subtle differences in function for prodromal HD symptoms [13], and single-item ratings [14, 15] have inadequate sensitivity and reliability to detect change over time [15]. In addition, the only existing HD-specific measure of HRQOL, the HDQoL [16], has evidence to support reliability and construct validity, but did not meet minimally established sample size criteria for its developmental approach and takes ~22 min to complete [17]. Thus, there is a critical need for a well-developed, validated, brief HD-specific patient-reported outcome (PRO) measure of HRQOL. This is especially important given that the focus of clinical interventions for HD is not just to prolong life, but also to prolong quality living.

Recently, there has been an investment in the development of new, state-of the-art systems to better assess HRQOL for a variety of chronic health conditions. Specifically, the Quality of Life for Neurological Disorders (Neuro-QoL) measurement system [18, 19] and Patient-Reported Outcomes Measurement Information System (PROMIS) [20] were designed to create and disseminate reliable and valid standardized PROs that measure key symptoms and health concepts. PROMIS was developed for use in individuals with chronic conditions, and Neuro-QoL extended this work to neurological disorders (stroke, PD, multiple sclerosis, child and adult epilepsy, amyotrophic lateral sclerosis, and muscular dystrophy). Neuro-QoL and PROMIS offer several advantages over more traditional measures. These systems allow for cross-disease comparison. In addition, PROMIS and Neuro-QoL contain many identical items that allow linkage between measures such that a score on one PROMIS measure can be used to estimate a score on a Neuro-QoL measure. Third, these PRO systems utilize computerized adaptive test (CAT) technology, a method whereby each individually administered item is selected based on the previous item response. CATs allow for the sensitive measurement of a broad range of symptomatology with the administration of a small subset of items (between 5 and 12 items) without losing the precision of a longer measure. This reduces response burden, which is particularly important in HD where motor, psychiatric, and cognitive symptoms may impair the ability to respond to long questionnaires. The exact subset of items administered in a CAT depends on upon item response theory (IRT) calibrations [21]. A calibrated item bank is a set of carefully crafted questions that develop, define, and quantify a common theme [22, 23]. The items can be arranged along a scale, e.g., from no symptoms to extreme symptoms. The dynamic nature of CAT allows for greater sensitivity across the disease spectrum than most traditional, static measures while still retaining the integrity of the full measure. This is especially relevant in HD, where many measures exhibit floor effects during the prodromal phase of the disease and ceiling effects for the later stages of the disease [24]. CAT also provides better precision and lower standard error than static measures, even when the number of items administered for each is identical [25]. This is true even when short forms target a specific end of a symptom trait (such as low-end or high-end fatigue) [25].

The purpose of the current study was to develop and validate a PRO measurement system that captures both the generic and more unique aspects of HRQOL in HD (“HDQLIFE”). Given the complexity of the multi-phase study to develop the HDQLIFE, this paper provides a broad introduction to the processes and aims of each phase of the study; further details on the methods and results of each phase are found in the companion articles [2628]. Broadly, this study focused on validating existing measurement systems to capture generic, relevant aspects of HRQOL for individuals with HD (i.e., Neuro-QoL and PROMIS, described below), and developing additional content that would allow for disease-specific sensitivity utilizing a computer adaptive test framework (using PROMIS measurement development standards [29]). These results are complemented by the companion articles which include detailed presentations of exploratory and confirmatory factor analysis results, graded response model results, differential item functioning analysis results, as well as item-level calibration data and preliminary validation data generated using post hoc computer adaptive test simulations [2628].

Methods

Literature reviews and a qualitative focus group study [30] were conducted to characterize HD-relevant HRQOL domains (including the identification of relevant Neuro-QoL/PROMIS measures) and develop items for domains that were not captured within these existing systems. Next, a quantitative study served to: validate the existing, relevant Neuro-QoL/PROMIS measures in individuals with HD, and create and validate new, HD-specific item banks (i.e., computer adaptive tests).

  1. 1.

    Item Development.

A qualitative focus group study and literature review were conducted to determine the domains, subdomains, and items that should be used to assess HRQOL in HD [30]. Focus groups were conducted with key HD stakeholders and included six groups with individuals at various stages of diagnosed, symptomatic HD, five groups with individuals either at risk for HD (i.e., have not been tested and were not diagnosed with HD yet but have a parent with HD) or with prodromal HD (i.e., have a positive gene test, but not diagnosed with manifest HD), three groups with non-clinical HD caregivers (e.g., family members), and two groups with HD clinicians (e.g., physicians, nurses). Participants discussed what the term “quality of life” meant to them, what they believed to be the most important aspects of HRQOL, and how HD affected their HRQOL. Focus group discussions were transcribed verbatim and analyzed according to a well-established frequency analysis approach [31]. Detailed qualitative findings have been described elsewhere [30]. Briefly, results showed that several PROMIS/Neuro-QoL measures were relevant in HD and that a number of HD-specific HRQOL issues were not captured by these PROs (see Fig. 1).

Fig. 1
figure 1

Components of the HDQLIFE measurement system

The next step of the development of the HDQLIFE measurement system was to create preliminary item pools examining chorea, speech and swallowing difficulties, and end of life issues. Each item pool went through several different iterations based on expert review, cognitive debriefing interviews with individuals with HD, literacy review, and translatability review (to enable future translation into different languages). Expert review included insight from measurement development experts and professionals with clinical expertise in HD; experts provided feedback with regard to item overlap, appropriateness of the content to each subdomain, wording suggestions/changes, and content coverage (i.e., that all aspects of the specified domain were represented). Additional items were developed in cases where content coverage was deemed inadequate. All new items were also reviewed by at least 5 individuals with prodromal or symptomatic HD (i.e., cognitive debriefing) to ascertain comprehension, processes used to arrive at a particular response (retrieving relevant information from memory, response selection including motivation and social desirability), and overall relevance of an item to the content being measured [32]. All new items also underwent a literacy review using the Lexile framework [33] to ensure that the items were written no higher than a fifth-grade reading level. Thus, we maximized the accessibility of this measure to participants, regardless of their level of education or cognitive impairment. Finally, a translatability review was conducted to maximize the potential for this measure to be translated into other languages in the future. We focused on Spanish translation for this review. Forward and backwards Spanish translations were conducted by a Spanish-speaking translation scientist to identify potential concerns, such as items that contained wording or concepts that would be difficult to translate.

  • HDQLIFE Chorea Item Pool Literature review and focus group data were used to create an initial item pool of 141 chorea items; 75 items were deleted and 3 items were revised based on expert review, 0 items were deleted and 9 items were revised based on translation review, and 2 items were deleted and 5 items were revised based on cognitive interview feedback. The final chorea item pool was comprised of 64 items.

  • HDQLIFE Speech and Swallowing Item Pool Literature review and focus group data were used to create an initial item pool of 102 speech and swallowing items; 49 items were deleted and 12 items were revised based on expert review, 1 item was deleted and 3 items were revised based on translation review, and 5 items were deleted and 25 items were revised based on cognitive interview feedback. The final speech and swallowing item pool was comprised of 47 items.

  • HDQLIFE End of Life Concerns Item Pool Literature review and focus group data were used to create an initial item pool of 69 items related to end of life concerns; 21 items were deleted and 0 items were revised based on expert review, 0 items were deleted and 39 items were revised based on translation review, and 3 items were deleted and 13 items were revised based on cognitive interview feedback. The final end of life concerns item pool was comprised of 45 items.

  1. 2.

    Quantitative Study.

Once the item pools were developed, all items were field tested in 536 individuals including those with prodromal HD and manifest HD to meet the standards established by PROMIS to develop new CATs [29].

Participants

Participants were 18 years old or older, able to read and understand English, had either a positive test for the HD gene mutation (CAG ≥ 36, but did not yet have an HD clinical diagnosis, n = 205) and/or a clinical diagnosis of HD (n = 331), and had the ability to provide informed consent. The Total Functional Capacity (TFC) [34], as determined by clinician-rated administration, was used to classify participants with an HD diagnosis as either early-stage (sum scores of 7–13) or later-stage HD (sum scores of 0–6; described in more detail, below). Participants were recruited at several locations in the USA to ensure a geographically diverse sample. This included eight established HD clinics (Los Angeles, CA; Iowa City, IA; Indianapolis, IN; Baltimore, MD; Ann Arbor, MI; Golden Valley, MN; St. Louis, MO; Piscataway, NJ), the National Research Roster for Huntington’s disease, online medical record data capture systems [35], and articles/advertisements in HD-specific newsletters and Websites. Participants were also recruited in conjunction with other ongoing research studies, such as Predict-HD (San Francisco, CA; Iowa City, IA; Indianapolis, IN; Baltimore, MD; St. Louis, MO; Cleveland, OH) [3638], as well as in cooperation with HD support groups and HD specialized nursing home units (Phoenix, AZ; Tuscon, AZ; Denver, CO; Jacksonville, FL; Des Moines, IA; Louisville, KY; Lansing, MI; Robbinsdale, MN; Lakewood, NJ; Plainfield, NJ; New York City, NY; Dallas, TX; Seattle, WA). Participants received monetary compensation ($40) for participating in this study.

Measures

Participants were evaluated using the Unified Huntington’s Disease Rating Scales (UHDRS) [39], a standardized clinical rating scale that assesses four components of HD: motor function, cognition, behavior, and functional abilities. Although the UHDRS has several documented shortcomings [24, 4044], it is the most frequently used assessment measure in HD clinical trials [45] and is included in the common data element recommendations provided by the National Institute of Neurological Disorders and Stroke [46]. The reliability and internal consistency of the four components of the UHDRS have been well studied [39]. We examined Total Functional Capacity (TFC), Total Motor Score (TMS), Independence Scale, and two measures of Cognition (total score for Symbol Digit Modalities Test [SMDT] [47] and Stroop Interference [48, 49]). The TFC is a 5-item measure that provides an index of day-to-day functioning across the domains of occupation, finances, domestic chores, activities of daily living, and care level. Total score ranges from 0 to 13 with higher scores indicating better functioning. The TMS provides a composite measure of oculomotor function, dysarthria, chorea, dystonia, gait, and postural stability; higher scores indicate more motor dysfunction. The Independence Scale is rated from 0 to 100, with higher scores indicating better functioning/greater levels of independence. Executive function measures included the SDMT (processing speed) and Stroop tests (interference); higher scores indicate better performance. Participants also were administered the Problem Behaviors Assessment Scale (PBA-s) [50] which is a clinician-administered assessment of behavior. For the purposes of this study, we examined clinician-rated Apathy, Irritability, Aggression, Anxiety, and Depression.

Participants completed the three newly developed HDQLIFE item pools (n = 64 chorea items, n = 47 speech/swallowing difficulties items, and n = 45 end of life concerns items). Participants also completed CATs for 12 PRO item banks from the Neuro-QoL and PROMIS identified as relevant to HD (Anxiety, Anger, Stigma, Emotional and Behavioral Dyscontrol, Positive Affect and Well-Being, Depression, Ability to Participate in Social Roles and Activities, Satisfaction with Social Roles and Activities, Lower Extremity Function/Mobility, Upper Extremity Function/ADLs, Applied Cognition Executive Functioning, and Applied Cognition General Concerns). Finally, participants completed two generic measures of HRQOL, the 12-Item World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) [51] and the Euro-Qol-5D (EQ5D) [52]. The WHODAS 2.0 is a 12-item standardized self-report measure of functioning and disability; higher scores indicate worse HRQOL. The EQ5D is a 5-item standardized measure of health status; higher scores indicate worse overall HRQOL.

Missing Data

Missing data rates were generally very low. The majority of our sample (99 %) had complete data for clinician-rated motor, functioning, and behavioral assessments (i.e., PBA-s and UHDRS Motor, Independence, and TFC measures); 99 % completed the EQ5D; 93–96 % completed the clinician-administered cognition measures (i.e., UHDRS SDMT and Stroop Interference); 93–95 % completed the HDQLIFE measures; 91–92 % completed the PROMIS/Neuro-QoL measures; and 89 % completed the WHODAS. Not surprisingly, rates of data loss were higher for the late-stage HD participants relative to both of the other HD groups for most of the measures (for prodromal vs. late-stage HD all Χ 2 p < .05 except for HDQLIFE Speech Difficulties; for early vs. late-stage HD all Χ 2 p < .05 except for the HDQLIFE measures and Neuro-QoL Positive Affect and Well-Being). IRT models (the primary method used in this paper) are designed to handle missing data; the less missing data, the more stable parameter estimation. In general, less than 50 % of missing data for IRT models is considered acceptable [53, 54].

Data Capture

Study participants generally completed all measures (clinician-administered and PROs) within a 2-week time frame. PROs were completed through Assessment Center (https://www.assessmentcenter.net), either at a designated computer during the research visit (for individuals with restricted access to a computer or the internet), or on a personal or publically available computer with an Internet connection. Participants could opt to complete PROs independently, or with the assistance of local site staff or a family member; participants and caregivers were instructed that response selections should always be those of the participant. They were instructed that assistance should be limited to logging in to the online study, reading questions, and/or clicking response options, when appropriate; participants were provided the following written and verbal instructions, “IMPORTANT: It is okay if you ask a caregiver/friend/family member to help you complete this survey (use the mouse and keyboard or touchscreen), but we want to make sure that the answers reflect what you feel and believe. It is not okay for the caregiver/friend/family member to answer questions for you; each response should be based on what you believe and feel.” Upon survey completion, participants were also asked to indicate whether they received help completing the survey: 65 % indicated completing the assessments independently; 15 % indicated receiving assistance from a caregiver/family member/friend; 10 % indicated receiving assistance from study staff; these data were missing for 9 % of participants. Participants indicating that they received assistance were also asked to indicate the type of assistance they received (participants could indicate more than one response): 89 participants indicated needing help using the computer/ipad (i.e., using mouse and/or keyboard or touchscreen); 67 participants indicated that his/her caregiver (family member, or friend) helped explain questions; 34 participants indicated that their caregiver (family member, or friend) answered questions by reminding them of important information; and 17 participants indicated that a caregiver (family member, or friend) helped by answering questions.

Psychometric Analysis Steps

Development of the new HRQOL item banks and CATs involved identifying unidimensional sets of items and conducting item response theory (IRT) [55] analyses to develop the calibration data needed to program the CAT. Each item pool (i.e., chorea, speech and swallowing difficulties, and end of life issues) was analyzed separately using factor analyses implemented in MPLUS (version 6.12) [56]; the sample was randomly divided into two separate datasets for these analyses. In the first dataset, exploratory factor analysis (EFA) was used to establish the number of unidimensional factors within each item pool as determined by: eigenvalues >1; scree plot review (i.e., number of factors before the break in the scree plot); and number of factors that explained >5 % of the variance. A promax rotation then was used to examine the association among factors by calculating their loadings (criterion > 0.4) and inter-factor correlations. Each unidimensional set of items (determined by EFA) was then subjected to confirmatory factor analysis (CFA) to assess model fit using the second randomly generated dataset. An iterative process including clinical input was taken into account to finalize item exclusion/inclusion [5759].

Once unidimensional item sets were identified, an IRT graded response model (GRM) [60] was implemented in IRTPRO (version 2.1) [61]. To be retained, items had to demonstrate good psychometric properties. Items were also examined for differential item functioning (DIF) based on age, gender, and education; the LORDIF package within R (version 0.3–2) [62] was used to conduct these analyses [63]. DIF is an indication of unexpected behavior by an item on a test, such that an item performs differently for a subgroup of participants when it should not (e.g., men perform better than women). Items exhibiting DIF for age, gender, and/or education were excluded from the final item set.

Administration time for these new measures was recorded, and a univariate analysis was conducted to determine whether there were significant differences for the HD groups (prodromal vs. early-, vs. late-HD). An exploratory analysis examining Pearson correlations between CAG repeat number and the new HDQLIFE measures was also conducted.

Validation of PROMIS/Neuro-QoL

Pearson correlations between the PROMIS/Neuro-QoL measures and comparator measures were calculated to examine construct validity. Comparator measures included two generic self-report measures of HRQOL (WHODAS 2.0 [64] and EQ-5D [52]), as well as selected measures from two clinical rated measures: the UHDRS (TMS, Independence Scale, Symbol Digit Modalities Test [47], and Stroop Interference) [39] and the PBA-s (Apathy, Irritability, Aggression, Anxiety, and Depression) [50]. To demonstrate adequate construct validity, correlations between the new measures and generic measures should be moderate to large (r = 0.5–0.8) and correlations with clinician measures should be small to moderate (r = 0.2–0.4) [65].

Sample Size Considerations

Sample size consideration was determined based on the need for IRT analysis, the primary method in the current effort. While sample sizes of 200–1000 have been proposed when using graded response model (GRM), in which a larger sample size can produce more stable parameter estimation [60, 66], rules of thumb dictate that a minimum of 5–10 individuals are needed for every item within an item pool [6769]. With an average of 50 items per item pool, 500 individuals were needed for reliable item response theory (IRT) calibration data. Additionally, differential item functioning (DIF) analyses (an indication of item bias) can be performed provided that there are at least 200 participants within each condition; sampling stratification considered age (≤40 vs. >40 and ≤50 vs. >50), gender (male vs. female), and education (≤high school vs. >high school]) [70].

Results

Participant demographics

Five hundred thirty-six individuals with HD (prodromal and manifest) participated in this study (Table 1); 205 individuals had prodromal HD, 202 had early-stage HD, and 125 had late-stage HD (4 participants did not have enough information to designate a classification). There were no significant group differences for sex, Χ 2(2, 532) = 4.29, p = .12, but there were small differences across groups for education; F(2, 506) = 16.18, p < 0.0001, with early-HD and late-HD groups having 1 to 1.5 fewer years of education than the prodromal HD group. As expected, since HD symptoms progress with age, analysis of age of the groups showed significant differences among the three groups, F(2, 529) = 45.01, p < .0001. The prodromal group (M = 45.65, SD = 11.99) was significantly younger than both manifest groups, and the early-HD group (M = 51.42, SD = 12.80) was significantly younger than the late-HD group (M = 42.56, SD = 12.08).

Table 1 Demographic data for the HDQLIFE participants

New HDQLIFE CAT Development

Across the 3 item pools, 156 items were field tested. For the chorea item pool, EFA and CFA supported 34 unidimensional items; the final Chorea item bank is comprised of 34 items (detailed analyses can be found in [27]). For the speech and swallowing difficulties item pool, EFA and CFA supported two separate unidimensional sets of items: difficulties with speech (27 unidimensional items) and difficulties with swallowing (15 unidimensional items). The final Speech Difficulties item bank is comprised of 27 items (no items were deleted based on IRT; detailed analyses can be found in [28]), and the final Swallowing Difficulties item bank is comprised of 16 items (1 item was deleted based on IRT; detailed analyses can be found in [28]). Finally, for the end of life item pool, EFA and CFA supported two unidimensional item sets: Concern with Death and Dying (12 unidimensional items) and Meaning and Purpose (7 items). The final Concern with Death and Dying item bank is comprised of 12 items (no items deleted based on IRT; detailed analyses can be found in [26]). There were not enough items retained to develop a CAT for Meaning and Purpose; thus, 4 items comprised the final Meaning and Purpose short form (3 items were deleted based on IRT; detailed analyses can be found in [26]). Four new CATs were developed: Chorea, Speech Difficulties, Swallowing Difficulties, and Concern with Death and Dying. Six-item short forms were selected by expert review for each of these measures; a 4-item short form was developed to assess Meaning and Purpose. The analysis results for the new HDQLIFE measures are shown in Table 2. Average administration time for each new HDQLIFE measure was less than 1 min; univariate analyses indicated significant differences among all three groups for all measures (in all cases, prodromal participants had the fastest completion times, early-HD completion times were in the middle, and late-HD participants had the slowest completion times; Table 3).

Table 2 New HDQLIFE measures statistics
Table 3 Administration times (seconds) for prodromal, early-, and late-stage HD participants

All newly developed HDQLIFE measures are scored on a t metric with a mean of 50 and standard deviation of 10, which is the same metric utilized for Neuro-QoL/PROMIS [71]. Thus, scores below 40 (1.0 SD below the mean) can be considered low and scores above 60 can be considered high. Note that the referent group (i.e., the group used to develop the algorithm for the CATs) for the new measures (i.e., Chorea, Speech Difficulties, Swallowing Difficulties, Concern with Death and Dying, and Meaning and Purpose) are individuals with HD, while the referent group for the Neuro-QoL/PROMIS measures is the general population. There were significant group differences among the three HD groups on all of the HDQLIFE measures except Concern with Death and Dying and Meaning and Purpose (Table 4); differences were in the expected direction. There were also statistically significant though very modest associations between CAG repeat number and all of the new HDQLIFE measures except Meaning and Purpose (r = .21, p < .01 for HDQLIFE Chorea; r = .20, p < .01 for HDQLIFE Speech Difficulties; r = .23, p < .01 for HDQLIFE Swallowing Difficulties; r = .11, p < .05 for HDQLIFE Concern with Death and Dying; and r = −.07, p > .05 for HDQLIFE Meaning and Purpose) providing preliminary support for construct validity of these new measures.

Table 4 Average scores for clinician-rated and self-report assessments for prodromal, early-, and late-stage HD participants

Preliminary support for Neuro-QoL/PROMIS Validity

Descriptive information regarding the Neuro-QoL/PROMIS measures, other generic self-reported measures of HRQOL, and clinician-rated assessments is provided in Table 4; for most measures, prodromal HD performed better than early-HD and late-HD, and early-HD performed better than late-HD. Neuro-QoL/PROMIS had moderate to strong relationships with generic self-report measures of HRQOL (r’s ranged from .34 to .74; Table 5). Neuro-QoL/PROMIS measures generally had moderate relationships with clinician-rated measures (r’s ranged from .35 to .70 with the majority between .42 and .49). Correlations tended to be highest between Neuro-QoL/PROMIS physical, social, and cognitive measures and corresponding self-report measures of these same constructs. Correlations were lowest among PROMIS emotion measures and corresponding measures, and highest among Neuro-QoL physical functioning measures and corresponding measures.

Table 5 Pearson correlations for Neuro-QoL and PROMIS health-related quality of life (HRQOL) measures

Discussion

Clinical trials aimed at slowing the progression of HD are underway. Unfortunately, these clinical studies employ few or no PROs [45]. When PROs are included, the specific measures utilized do not allow for cross-study or cross-disease comparison and may not be sufficiently sensitive to detect small but clinically meaningful changes in function [72]. Furthermore, existing PROs are often lengthy and time intensive. This is especially problematic given the regulatory and public interests to include PROs as meaningful endpoints in clinical trials [73]. The HDQLIFE measurement system uses state-of-the-art measurement techniques to help remedy these problems. HDQLIFE includes 12 validated Neuro-QoL/PROMIS measures in HD, as well as five new HD-specific measures: Chorea, Speech Difficulties, Swallowing Difficulties, Concern with Death and Dying, and Meaning and Purpose. HDQLIFE is unique in that it includes new HD-specific items based upon direct input from participants living with HD or the threat of HD, and consultation with experts who work with individuals with HD. HDQLIFE also includes “generic” items from PROMIS and Neuro-QoL to enable comparisons across different medical populations. Thus, HDQLIFE allows for both HD specificity and cross-disease comparisons, providing a significant advantage to more traditional measurement systems that require a trade-off between these two functions.

The HDQLIFE is also the first PRO assessment in HD to utilize item banking and CAT methodology. In CAT, each individual item is selected based on the response to the previous item. This “smart test” allows clinicians and researchers to ascertain a person’s level of functioning using only a minimal number of items without losing the precision of a longer measure. CAT offers several advantages to traditional test administration, including specification of the minimum/maximum number of items, and/or maximum acceptable standard error. Further, most measures can be administered as fixed-length short forms (4–8 items), effectively reducing test length without sacrificing test sensitivity (i.e., administration time for each new HDQLIFE measure was less than 1 min). This is particularly important given time constraints inherent in clinical trials assessment and the need to limit participant burden during test administration, which is especially important during later-stage HD when cognition is compromised and processing speed is slowed [12]. CAT also has the advantage that new items can be evaluated for consistency with the original bank and then added at a later date, allowing for future expansion and adjustment.

There are several limitations to this study. First, this study sample was comprised of participants who were recruited through other research studies and through large, established HD clinics; this convenience sample may not represent the HD population at large. Specifically, while there is no evidence to suggest that there are gender differences in HD in the general population [74], our sample included slightly more females (59 %) than males. While consistent with other research studies in HD (females comprise 55–64 % of other large HD cohorts [75, 76]), and in research studies more generally [7779], it is possible that our findings are not fully representative of males with HD. With regard to education, our prodromal participants were more highly educated than the manifest participants. This is not surprising given that individuals with greater education also have greater medical genetic knowledge [80] and are more likely to get medical testing [81] and that individuals with higher education are more likely to participate in HD research studies [36, 37, 82]. Rates for race/ethnicity were consistent with established prevalence rates [8386] and other large HD research cohorts [75, 76, 87]. Second, participants completed the survey in multiple ways (online during research visits, online at home, by phone interview, or by in-person interview) and assistance was provided when appropriate (e.g., help logging into the online survey, help clicking the responses). Since a portion of the surveys were completed at home, it is possible that some participant answers were influenced by a person providing assistance with the survey. In addition, a small percentage of participants indicated that their caregiver (family member, or friend) answered questions for them or answered questions by reminding them of important information. A recent high-quality meta-analysis indicates that mode of PRO administration, including completing on paper versus electronically or independently versus with help, does not cause bias [88]; however, future work is warranted to better understand how this may have influenced responding. Third, survey completion allowed multiple sittings, as long as it was generally completed within two weeks of the clinic visit when the UHDRS and PBA-s were administered. Since the survey was not always completed at the same time as the in-clinic assessments, it is possible that the correlations between these measures were less robust than if they had been completed concurrently. Lastly, due to the effect of the disease on cognition, some of the HD participants, particularly those in the late stage of the disease, may not provide reliable self-reports of symptoms and concerns. Furthermore, we know that a small portion of our sample (largely our later-stage participants) were more likely to have incomplete survey data; some of this data loss was due to participant fatigue, while other data loss was due to practical limitations related to exceeding study visit lengths for reserved testing space (and an inability to complete the assessment outside of the clinic visit). Thus, additional work is needed to determine when self-report becomes unreliable [89].

The ultimate utility of the HDQLIFE will depend on its demonstration as a clinically meaningful outcome measure in controlled clinical trials of promising treatments for HD. Data from this study support the utility of the HDQLIFE as a standardized outcome instrument for efficiently capturing HRQOL in HD clinical and research settings. HDQLIFE will be available, free of charge, through www.assessmentcenter.net. Since HD is a relatively rare condition, the CAT platform of HDQLIFE should maximize the effectiveness of clinical trials by minimizing the number of participants needed to detect clinically meaningful changes in levels of function. The ability to conduct cross-disease comparisons may support advances in other neurodegenerative diseases. This should allow researchers to more effectively target interventions that are successful in diseases exhibiting symptom overlap with HD. The HDQLIFE offers a brief and more relevant alternative to current lengthier assessments of HRQOL. The HDQLIFE can also be used in the clinical setting, allowing patients to more effectively communicate symptoms of concern to treatment providers. This can also be accomplished from their home computers, tablets, and smart phones, facilitating better communication with HD specialists who may be geographically far from patients [90]. HDQLIFE provides the next generation of HRQOL measurement specific to HD, a disease that brings unique challenges and thus requires a validated assessment of the aspects of HRQOL that matter to HD patients and their caregivers.