Introduction

Accurate monitoring of symptomatic adverse events (AEs) is essential in clinical trials to assess and ensure patient safety, as well as inform decisions related to treatment and/or continued trial participation [1, 2]. Standard documentation of this information in US-based cancer clinical trials has relied solely on clinician reporting using an AE rating system known as the Common Terminology Criteria for Adverse Events (CTCAE) [3]. This library of descriptive terms was developed and is maintained by the U. S. National Cancer Institute (NCI) and consists of 790 items that capture information on discrete events; each is graded by a clinician on a 5-point scale. Of these, 78 items have been identified as objective, observable AEs [4].

Recently, there has been a charge toward the use of patient-reported outcomes (PROs), defined as the unfiltered direct report of a given symptom toxicity by a patient, and considered to be the “gold standard” for the capture of symptomatic AEs [5, 6]. This charge has been led by the release of the 2009 U. S. Food and Drug Administration (FDA) Guidance for Industry on the Use of PRO Measures in Medical Development to Support Labeling Claims [7], which subsequently led to the NCI initiative to develop a PRO version of the CTCAE (PRO-CTCAE) that will be used in future US-based clinical trials in oncology [4, 8, 9].

As the integration of PRO symptomatic AE ratings into cancer clinical trials becomes commonplace, it is important to understand the degree to which this patient-driven information is complementary to clinician-based CTCAE ratings of observable AEs. The purpose of this review is to summarize the current state of the literature with respect to characterizing the association between clinician-based CTCAE and patient-based PRO ratings.

Methods

Search strategy

A systematic search for articles published in peer-reviewed medical journals was conducted with assistance from health sciences librarians using PubMed, EMBASE, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) databases. In PubMed, Medical Subject Headings (MeSH) were used. In Embase, Emtree terms were exploded. The search terms were: (“Neoplasms”[Mesh] OR cancer OR cancers OR cancerous OR tumor OR tumors OR tumor OR tumors OR neoplasm* OR neoplastic OR malignan* OR metastatic OR metastasis OR metastases) AND (physician* OR doctor OR doctors OR clinician* OR “general practitioner” OR “general practitioners” OR provider* OR specialist* OR nurse* OR staff OR oncologist*) AND (agreement OR disagreement OR reliability OR concordance OR discrepancy OR consistency OR consensus OR “intraclass correlation coefficient” OR “intraclass correlation coefficient” OR “kappa correlation” OR “kappa coefficient” OR “self assessment” OR “physician assessment” OR “provider assessment” OR “self assessed” OR “physician assessed” OR “provider assessed” OR “patient ratings” OR “physician ratings” OR “provider ratings” OR “patients rated” OR “physicians rated” OR “providers rated” OR “Physician’s Role”[Mesh]) AND (“adverse symptom event” OR “adverse symptom events” OR “adverse event” OR “adverse events” OR “adverse effect” OR “adverse effects” OR “patient reported outcome” OR “quality of life” OR “Quality of Life”[Mesh] OR ECOG OR “Eastern Cooperative Oncology Group” OR “functional problems” OR distress OR “performance status” OR depression). There were no date or language restrictions; each database was search in its entirety, through July 2015.

Selection strategy

We deemed studies were eligible for inclusion if they (1) made an original report of a quantitative comparison between analogous CTCAE and PRO ratings and (2) included participants aged 18 or older.

Screening process

For the title screening, abstract review, and full-text review stages, two co-authors were randomly assigned to independently review each article for eligibility, with discrepancies arbitrated by a third co-author who was naïve to the article. At the full-text review stage, each assigned co-author completed standardized coding forms to extract the pre-determined data from the potentially eligible articles. References from the included full-text articles were searched to determine whether they should be also considered for inclusion. Study quality was assessed using a modified version of the Downs and Black Study Quality Checklist [10] (i.e., 16 of the original 27 quality indicators most relevant to the types of studies reviewed).Footnote 1 Based on quality, an article was considered to be fit for inclusion if it met at least 33 % of the modified Downs and Black Study quality indicators.

For the purposes of this review, cutoffs for all agreement metrics reported, including percentage agreement, Kendall-tau rank correlation (τ), Goodman and Kruskal’s gamma statistic (γ), Pearson’s correlation (r) , Cohen’s kappa (κ), weighted Cohen’s kappa (κw) statistics were defined as follows: poor (0.00–0.29), moderate (0.30–0.69), and strong (0.70–1.00). For instances where only specificity/sensitivity was reported, the following cutoffs were used: low (0.00–0.29), moderate (0.30–0.69), and high (0.70–1.00).

Results

A total of 7474 titles were identified through electronic database searching, with 75 additional records found through hand searching. After duplicates were removed, a total of 5658 articles were retrieved. Following title screening, 908 articles were identified for abstract review with 93.6 % agreement between the independent pairs of raters. A total of 251 full-text articles were reviewed. Reasons for article exclusion during the full-text review included the following: CTCAE not used (n = 161), PROs were not assessed (n = 18), the article was a review or did not include original research findings (n = 18), CTCAE and PRO ratings were not explicitly compared (n = 12), and the article described a non-cancer population (n = 5). Twenty-eight articles met the eligibility criteria and were included in this review (Fig. 1). Each of these articles possessed at least 47 % of the relevant quality indicators from a modified version of the Downs and Black Study Quality Checklist [10], with 21 of the included articles having 75 % or more of these quality indicators.

Fig. 1
figure 1

PRISMA Flow Chart

Study characteristics

Table 1 provides a summary of clinical characteristics and findings from each of the 28 included studies. Patients were of mixed cancer types, including anal, breast, cervical, chronic myeloid leukemia, endometrial, hematological, lung, ovarian, pelvic, pharyngeal, prostate, and rectal. Given this, the AEs captured were variable, with many common across studies (e.g., dyspnea, fatigue, nausea, neuropathy, pain, vomiting), as well as several that were disease-specific (e.g., erectile dysfunction, hemoptysis, xerostomia).

Table 1 Clinical characteristics and summarized findings of included studies by primary patient-reported outcomes measure (N = 28)

Well-validated PRO measures were used to capture patient-based AE ratings in 20 of the included studies. These included disease-specific modules of the European Organization for Research and Treatment of Cancer (EORTC) [1220] and Functional Assessment of Cancer Therapy (FACT) [2123] instruments, as well as the dermatology-specific Skindex-16, [24, 25] the EuroQol EQ-5D, [26] the Short-Form 36 Health Survey (SF-36) [27], and two recently developed bowel symptom inventories [28, 29]. Patient-adapted versions of the CTCAE were used in four studies [27, 3032], with visual analog scales used to capture patient-rated AEs in three studies [11, 33, 34].

Association between CTCAE and PRO ratings

A poor to moderate association between CTCAE and PRO ratings was reported in the majority of included studies (n = 21). Seven such studies reported this as a general finding, with agreement statistics ranging from κ = 0.00 to r = 0.74 [14, 24, 27, 32, 34, 35]. Regardless of how pain was self-reported by patients (i.e., EuroQol EQ-5D or EORTC QLQ-C30), four independent studies reported moderate agreement between PRO and CTCAE ratings (r range = 0.33–0.37; κ range = 0.16–0.29) [16, 17, 20, 26]. Similarly, independent assessments of nausea and vomiting were reported as being poor, as captured by Symptom Tracking and Reporting (STAR) (τ = 0.11 and −0.02) [26] to moderate, as captured by EORTC QLQ-C30 (r range = 0.47–0.48, κ range = 0.41–0.48) [1517].

Independent studies that made use of the Bowel Problem Scale in cohorts of patients with rectal [28] or anal [29] cancer found poor agreement with clinician ratings for proctitis (κ = 0.22 and 0.11, respectively) and moderate agreement for diarrhea (κ = 0.64 and 0.68, respectively). Additionally, independent studies of neuropathy reported a poor to moderate relationship between PRO and CTCAE reports, as measured by the FACT Gynecologic Oncology Group–Neurotoxicity Module (FACT/GOG-Ntx) (r’s range 0.23–0.69) [23, 36]. A study that made use of the Patient Neurotoxicity Questionnaire found that patients more frequently reported severe neuropathy (30 %) as compared to that identified by CTCAE (10 %) [37].

Several studies (n = 4) reported a mix of poor, moderate, and strong associations between CTCAE and PRO ratings. In a study of 400 patients with lung or genitourinary cancer, Basch and colleagues [30] reported a strong percentage of agreement between CTCAE and a modified CTCAE for nausea and vomiting (96 and 90 %, respectively), but relatively moderate percent agreement for fatigue (55 %) and strong percent agreement for pain (70 %). A study of 82 patients with lung cancer led by Brabo [13] observed a moderate correlation between CTCAE and the sore mouth (r = 0.41, p < 0.01), and alopecia items (r = 0.52, p < 0.01) from EORTC QLQ-C30. However, strong correlations were observed when comparing CTCAE and the EORTC QLQ-C30 dysphagia (r = 0.73, p < 0.01) and neuropathy (r = 0.72, p < 0.01) items. As part of a study of 100 patients with mixed cancer types, Cirillo and colleagues [31] found moderate agreement between CTCAE and a modified patient CTCAE for asthenia (κ = 0.32, 95 % confidence interval (CI) 0.17 to 0.44), but strong agreement between clinicians and patients for nausea (κ = 0.74, 95 % CI 0.64–0.85) and diarrhea (κ = 0.71, 95 % CI 0.43–0.90). A recently completed study of 66 patients with lung cancer by Tang and colleagues [38] found poor agreement between CTCAE and the Thoracic Symptom Self-Assessment Tool (TSSAT) for nausea (κ w  = 0.07, 95 % CI −0.16–0.30) and insomnia (κ w  = 0.03, 95 % CI −0.34-0.39), but strong agreement for hemoptysis (κ w  = 0.71, 95 % CI 0.35–1.00).

Three studies reported a strong agreement between CTCAE and PRO ratings. These included a study of 281 patients with chemotherapy-induced peripheral neuropathy, as captured by the EORTC QLQ-CIPN20 [12]; a study of edema, neuropathy, and pain in 1031 patients with lung cancer, as captured by the FACT-Taxane module [22]; as well as a study of nausea in 30 patients with mixed cancer types, as measured by a visual analog scale (r = 0.88, p < 0.001) [11].

Discussion

As the oncological clinical encounter increasingly incorporates patient-reported data, it is necessary to inform clinicians how this data relates to the standard AE ratings they have been making as part of clinical trials and their routine practice. This review demonstrated that, regardless of which self-report measure was used, the majority of studies that have directly compared CTCAE and PRO ratings report a poor to moderate association between clinician and patient-based AEs, either globally, or by individual symptom (e.g., nausea, neuropathy, pain, vomiting).

This discordance between clinicians and patients provides further evidence that PRO measures provide unique, valuable information that can be complementary to CTCAE ratings. For instance, in a systematic review completed by Gotay and colleagues, it was demonstrated that PRO measures are correlated with survival in cancer clinical trials and provide information above and beyond conventional clinical assessments [39].

Currently, all AEs in clinical trials, including adverse symptoms, are documented by providers at clinic visits to meet federal requirements and facilitate evaluation of new therapies. Patient self-reporting has a relatively minimal role. However, this current practice is problematic, as our previous work has demonstrated that AEs frequently go undetected by clinicians or are reported as less severe than via patient reporting [26, 40]. Additionally, the CTCAE recall period encompasses the entire period since the last clinic visit for a given patient. In many cases, a patient follow-up visit might not occur for 2–4 weeks, which can lead to a loss of information related to AEs that are experienced early in that timeframe.

Recognizing the importance of accurately capturing the patient perspective as complementary information to clinician-based reporting, the NCI has developed a patient language version of the CTCAE (PRO-CTCAE) [4]. PRO-CTCAE is an electronic-based system for patient self-reporting of AEs from the CTCAE that were found to be amenable to patient reporting. This library of 78 symptoms was identified and developed via a process of direct cognitive interviews with patients and extensive quantitative validation in a large multicenter study [8, 9]. PRO-CTCAE utilizes a 7-day recall period and can be completed routinely by patients, and thus may be implemented to systematically capture AEs that may not be apparent at the time of clinician-based CTCAE grading.

The present review is limited by a number of factors. The timing of the respective CTCAE and PRO ratings was not explicitly stated in the majority of included articles, though it was implied that these AEs were proximally assessed. Future studies should clearly indicate the amount of time that occurs between clinician- and patient-based AE reporting to eliminate the possibility that the level of agreement between these rating sources is being influenced by passage of time (e.g., separate visits, between visits versus next or previous clinic visit). Additionally, a meta-analysis would have been optimal to best characterize the relationship between CTCAE and PRO ratings. Unfortunately, given the variety of PRO measures used and numerous instances where study authors did not provide detailed statistical coefficient for each AE captured, a meta-analysis was infeasible.

To better assist clinicians with understanding the relationship between their CTCAE ratings and PRO assessments, future work should seek to systematically equate CTCAE and PRO ratings. Future approaches that aim to integrate PROs with clinician reporting of AEs, especially those of the CTCAE and PRO-CTCAE, would improve our understanding of patient and clinician ratings and assist clinicians and policy makers with the interpretation of clinical trial results.