Introduction

Patient-reported outcomes (PROs) refer to patient-perceived health-related quality of life (HRQOL), functional status, and symptom burden [1]. PROs provide a direct way to capture patient symptoms, emotional distress, and participation in activities of living, as many of these symptoms are often not well captured or underestimated though traditional measures. PROs have been shown to improve patient-physician communication and patient satisfaction [2], as well as correlate with cancer survival [3], adding important prognostic value to traditional measures. PROs are thus increasingly used in oncology to facilitate care by quantifying a patient’s level of distress or impairment caused by not only the disease but treatment as well.

The importance of PROs is further highlighted by its increasing use in clinical trial design in cancer care. The United States Food and Drug Administration (FDA) now recognizes PROs as a valid measure of clinical benefit for new drug approval. Clinical benefit is simply defined as “living longer or living better” [1]. Given these standards, trials may be tasked to show benefit beyond standard clinical measures, including using PRO assessments as primary and secondary endpoints [4].

Allogeneic hematopoietic cell transplantation (HCT) is a potentially curative therapy for patients with high-risk hematologic malignancies, bone marrow failure syndromes, and other nonmalignant conditions. The treatment itself, however, is complicated by significant morbidity, including complications such as graft-versus-host disease (GVHD). Although the importance of assessing PROs in HCT recipients as part of both routine care and research has been established [5, 6], their use and interpretation is currently not standard. GVHD, in particular, is an HCT complication associated with significant symptom burden and impaired HRQOL, in which PROs are an underused tool to capture the patient experience. Although GVHD has been the subject of numerous clinical trials investigating the use of novel agents and strategies for prevention and treatment, there remains significant variability in the inclusion and measurement of PROs. Several challenges have been recognized, and there has been little progress over the last decade in implementation of PROs specific for acute GVHD. Despite the fact that acute GVHD may arguably be among the most common and burdensome complications after transplantation, there are currently no validated measures to capture the symptoms and HRQOL from the patient perspective.

The use of HRQOL assessments after allogeneic HCT have been reviewed previously [6, 7]. The goal of this current paper is to focus on the need for development and implementation of PROs that capture HRQOL and symptom burden in acute GVHD.

PROs in acute GVHD: the need

Acute GVHD is a common and well-known complication of allogeneic HCT, which has a significant impact on quality of life, physical functioning, and clinical outcome [8, 9]. Classic acute GVHD predominantly affects the skin, liver, and GI tract, and traditional grading is based upon physician report of patient symptoms, primarily consisting of percent body area of rash, presence of nausea and anorexia, and volume of diarrhea. These clinical scoring systems are limited by wide inter-observer variability, and outcomes for GVHD grades are often heterogeneous and inconsistent, especially for lower grades [10,11,12,13]. The identification and risk-stratification of patients based on clinical staging [14] and blood biomarkers [15, 16] has been proposed as a new treatment paradigm to identify those who are at the greatest risk and require more aggressive upfront therapy, while sparing those who are likely to respond from excess toxicity. Despite these advances and refinements, the critical patient perspective as a sensitive tool to assess symptoms and potentially treatment response and outcome is still lacking.

The most direct way to capture symptoms of disease and treatment is patient-self report, as previous studies have demonstrated that physicians frequently overestimate HRQOL compared with patient report, and parents often underestimate HRQOL of their children [17, 18]. In acute GVHD, PROs may thus optimally convey true severity of nausea, anorexia, diarrhea, functional status, and overall HRQOL. Treatment, typically involving high dose corticosteroids, with other immunosuppressive agents added as needed may lead to further complications and symptoms, such as infections, weakness, and pain. Thus both the acute GVHD process and effect of treatments used to prevent or treat GVHD may impact patient symptom burden and HRQOL, and PROs may provide a better assessment of the global patient-specific impact of acute GVHD.

PROs in chronic graft-versus-host disease as a model

Quality of life and symptom burden are now established as important measures in the study of chronic GVHD [19,20,21,22,23,24]. Lee et al. first reported the development of a validated symptom scale for chronic GVHD in 2002, demonstrating that patient-reported symptoms were more responsive and sensitive to changes in patient-perceived chronic GVHD severity and activity compared to physician assessments and generic HRQOL measures [25]. The Lee Chronic GVHD Symptom Scale (LSS) is a 30-item, 7-domain symptom scale, including symptoms of the skin, eyes, mouth, breathing, eating, and digestion, muscles and joints, energy and mental and emotional aspects.

Several studies conducted through the Chronic GVHD Consortium have demonstrated significant correlation between PRO scores (including the LSS and HRQOL measures such as the Short Form 36 (SF-36) and Functional Assessment of Cancer Therapy-Bone Marrow Transplant (FACT-BMT)) and baseline NIH disease severity, changes in chronic GVHD activity, and physician assessment. Furthermore, patient-reported symptoms and QOL were also found to be significant predictors of failure-free survival, nonrelapse mortality, and overall survival in chronic GVHD [26]. Through multiple observational and interventional trials, studies have strongly established feasibility with high completeness indices of 75–87% of surveys completed at baseline and follow-up. In 2015, the NIH Consensus Development Project for Clinical Trials in Chronic GVHD made a strong recommendation for the inclusion of PROs in therapeutic response trials, recognizing the LSS as a chronic GVHD core measure.

Patient reported outcomes in acute GVHD

In contrast, little progress has been made in the development and implementation of a PRO measure for acute GVHD, and several challenges are cited. The role and complexities of PROs in the context of clinical trial design and as an endpoint in acute GVHD has been previously highlighted [27]. Lee et al. summarized a discussion by the FDA, NIH, Center for International Blood and Marrow Transplant Research (CIBMTR), and American Society for Blood and Marrow Transplant, underscoring several important logistical and analytical challenges in using PROs in acute GVHD, including (1) lack of a valid, reliable and sensitive tool specific for acute GVHD, which may be held to FDA standards in demonstrating “clinical benefit”; (2) data collection logistics—including active patient participation/burden, frequency of assessments, timing, costs, and interpretation; and (3) correlation with objective response criteria and sensitivity in the ability to detect clinical meaningful differences and change in acute GVHD, with many other concurrent confounding toxicities, particularly in clinical trials evaluating novel prophylaxis strategies, which often only result in small changes in GVHD rates.

PRO assessment tools

A number of HRQOL instruments have been studied in HCT, and have previously been reviewed [7, 28]. Table 1 summarizes the most common measures used across transplant literature, (FACT-BMT, SF-36, and the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ)-C30) [29, 30] and underscore the heterogeneity in measures, from the target population for which the instrument was developed to the symptoms captured, which can be overlapping and potentially conflicting; making interpretation of PROs on HCT outcomes across studies difficult.

Table 1 Most frequently used patient reported outcome measures in hematopoietic cell transplant studies.

These challenges have been fully recognized by the community [7], and the Blood and Marrow Transplant-Clinical Trials Network (BMT-CTN) further addressed this issue in a white paper in 2012, emphasizing the importance of collecting PROs and calling for the harmonization of instruments and assessment time points across studies, as well as providing guidance to ensure quality of data collection.

Trials investigating PROs specifically in acute GVHD, however, are very limited and further highlight these issues [31,32,33]. Studies have primarily focused on HRQOL and have not addressed other domains of symptom burden in this patient population. A multi-center phase II–III trial of T-cell depletion versus cyclosporine and methotrexate in unrelated donor transplantation [34] used the FACT-BMT and SF-36, as well as the Center for Epidemiological Studies of Depression (CES-D) at day 100, 6 months, 1 year and 3 years and found no differences in HRQOL between the two arms. BMT-CTN 0802, a phase III, randomized placebo-controlled trial evaluating the addition of mycophenolate mofetil to steroids versus placebo and steroids as therapy for acute GVHD found no significant difference in GVHD-free survival or HRQOL at baseline and day 56 ± 7 days as measured by the MD Anderson Symptom Inventory (MDASI) measure, a 19-item instrument capturing 13 symptoms and 6 items measuring interference with life. Of note, there were generally high completion rates of assessments across both trials (75% and 90%, respectively).

More recent recommendations from the BMT-CTN and CIBMTR have thus called for the use of a single core set of measurement tools, ideally that is freely available, easy to access, and has low patient burden [7]. The NIH Patient-Reported Outcomes Measurement Information System (PROMIS) assessments are a set of valid, reliable, and flexible tools to capture patient-reported health status. The PROMIS measures consists of item banks covering a number of physical, emotional, and social domains with a variable number of questions (i.e., item banks), which can be combined to form multi-item measures. PROMIS may be administered in multiple ways including as computerized adaptive testing using item response theory and computer technology to select items and calculate a score to maximize precision while minimizing patient burden. Fixed length multi-dimensional measures are also available. Item banks are amenable to adaption over time, and scores may be used to facilitate comparisons across measures. Assessments are available for both adults and children, as well as parent-proxy reporting. Additional advantages include its availability in multiple languages, free use, and potential for incorporation into the electronic health record. The PROMIS assessments, however, were developed for the general population, and not necessarily HCT or GVHD specific. Nevertheless, recent studies have demonstrated a high correlation between PROMIS and the SF-36 scores in both HCT survivors and GVHD patients [35, 36]. Although the SF-36 was also originally developed for the general population, it has been frequently used and validated in the HCT population.

There remains a gap, however, in identifying and investigating a tool which can more specifically capture the symptom burden for acute GVHD. For cancer clinical trials, adverse events are reported by Common Terminology Criteria for Adverse Events (CTCAE). The CTCAE consists of 790 discrete adverse events, of which many are symptoms reported by the investigator. As increasing evidence suggests that collecting this information directly from the patient increases detection and reliability of a patient’s symptoms, the National Cancer Institute developed the Patient-reported outcome CTCAE (PRO-CTCAE), a 124 item tool, representing 78 symptomatic toxicities, assessing frequency, severity, and interference. The measure is also flexible in choice of items, thus not requiring the patient to complete all 124 items, as well as freely available and may be ideal for using to assess symptom burden in acute GVHD. Although this tool has only thus far been used in a few small studies in HCT and not specifically in GVHD, it has been demonstrated to be feasible, even when used frequently early post-HCT [37, 38].

Data collection, logistics, and reporting

Although the practicality of frequent and active patient participation in a potentially ill posttransplant population is a concern, several studies have now demonstrated that frequent (weekly to twice weekly) PRO surveillance is feasible with high completion rates [25, 37, 39, 40]. Nevertheless, burden to the patient and to the center collecting data are an important consideration to the sustainability of PRO collection. Acute GVHD may require frequent timing of PRO assessments. Recall period must be long enough to capture the experience of GVHD, but short enough to identify dynamic changes over the course of the disease. Items must be written in plain language that is easy to understand for a heterogeneous patient population with varying literacy levels [41]. Guidelines for prioritizing assessments across individual center practice, clinical trials, and future CIBMTR implementation is also needed. In addition, a well-planned data collection structure is essential to ensure reliable and consistent PRO data capture [1]. Web-based and electronic tools are likely to aid in this effort by allowing patient and clinicians to readily prompt, access, and track assessments while ensuring secure and accountable data collection which may also facilitate subsequent analysis [42]. In the modern era, internet access and comfort with web applications have increased significantly. In 2019, it is estimated that 90% of adults in the United States use the internet [43]. Many electronic health systems now have the ability to link PRO assessments directly into the electronic health record, further facilitating access and tracking of results to both patients and health care providers.

While studies are increasingly using and reporting PRO outcomes with the growing demonstration of feasibility; accurate, valid, and accessible reporting of PROs remain a challenge. A review of 795 phase III randomized controlled trials that included a HRQOL outcome demonstrates high variability in the quality of reporting of HRQOL. Only 14% of the 795 studies included the four key quality indicators for appropriate clinical application: evidence for instrument validity, inclusion of a PRO hypothesis, information about missing PRO data, and interpretation of HRQOL findings [44]. While this seems to have improved over time, there is a need for a standard approach to reporting PRO data. The CONSORT (Consolidated Standards of Reporting Trials) Statement [45, 46] provides evidence-based recommendations to improve the completeness of reporting of randomized controlled trials. A number of extension statements have been developed for reporting other trial designs as well. Given the increasing use of PRO data to inform patient-care, clinical decision-making and health policy/reimbursement decisions, a CONSORT-PRO extension was published in 2013 [47]. The CONSORT PRO guidance provides a checklist of five items to be standardly reported in all randomized controlled trials in which PROs are a primary or secondary outcome, and include (1) PROs be identified as a primary or secondary outcome in the abstract; (2) description of the PRO hypothesis be provided; (3) evidence of instrument validity and reliability be provided; (4) statistical approaches for dealing with missing data be reported; and (5) PRO-specific limitations of study findings and generalizability to clinical practice be discussed [47].

Although details of statistical approaches to handle missing PRO data are outside the scope of this review paper, this is an important aspect of using and analyzing PROs, particularly in a potentially ill acute GVHD population in which a high frequency of missing data is expected. While there are a number of statistical methods frequently used to handle missing data longitudinally, analytic strategies depend on why the data are missing [48, 49]. Complete case analyses and standard imputation techniques can be used for data missing randomly. When data are not missing randomly, such as when acute patients with poor HRQOL do not complete follow-up PROs, bias may be introduced, and most statistical methods for handling missing data are not appropriate. In these cases, sensitivity analyses can be conducted to assess the impact of assumptions regarding missing data. For instance, missing values can be set to a worst-case value, such as a PROMIS score of 0, to measure the maximum impact of the missing data.

Developing a new acute GVHD PRO tool

The development of a new PRO measure typically involves a systematic process using mixed qualitative and quantitative methods with input from stakeholders including domain experts as well as patients. The process normally begins by identifying a need, analyzing existing measures, defining the concept, generating and improving items, and testing and validating in specific patient populations [50]. Given the large number of existing PRO measures that have been used in cancer clinical trials and within the HCT field, however, the development of an entirely new measure arguably should be unnecessary for acute GVHD. Appropriate existing measures do require validation within a specific and new patient population. There are several forms of validity including construct validity, convergent validity (demonstration of high correlations with existing measures that address the same concept; and divergent validity (low correlations with measures that assess other concepts)); known groups validity (the ability to show differences between groups at any given time point); reliability (stability of the measure and high internal consistency of domain items); and responsiveness to meaningful change (ability to show change in a given group across time points). Validation thus becomes an ongoing process to demonstrate that an instrument can function effectively in a particular population for a specific purpose. High quality PRO measures are reviewed and revised over their life spans to continue to address changes in the patient experience.

There thus remains a need for a validated acute GVHD PRO tool. With more routine incorporation of PROs into transplant and GVHD clinical trials and increasing comfort with assessments such as the PROMIS measures, the time is now to validate a measurement for the standard use for acute GVHD. We propose the use of PROMIS measures and the PRO-CTCAE to best capture the symptom burden and experience/quality of life of patients with acute GVHD. A pilot trial evaluating the PRO-CTCAE and PROMIS in a cohort of acute GVHD patients is currently ongoing at our institution. A table detailing the specific PROMIS and PRO-CTCAE measures used for this study is provided in Supplementary table.

Conclusion

This review highlights the current need for PROs in acute GVHD. Given our current landscape of PRO assessment tools, feasibility of PRO collection in the present digital era, and refinements and improvements in GVHD grading and treatment, it is timely to consider development and validation of a PRO measure specific for acute GVHD. The experience of acute GVHD from the patient’s perspective is an integral endpoint in the GVHD treatment paradigm that is currently lacking; but the standard incorporation of an acute GVHD PRO assessment will ultimately enhance the care of these patients.