Introduction

In clinical oncology trials the National Cancer Institute Common Toxicity Criteria is the standard grading system for monitoring adverse events [1]. The CTC assessment contains subjective and objective elements including analytic tests, objective examination, and the patients’ symptom experience [2]. Toxicity information is usually obtained during a patient–physician interaction involving clinician interpretation of patient reporting and determination of the severity grade [3]. The CTC assessment focuses on the physical effect of a symptom on an individual patient, whereas the QoL assessment captures the level of distress associated with the symptom. Symptom experience as reported by patients in self-assessment questionnaires are often not documented in medical records, and case report forms do not capture the same information as QoL questionnaires do [46]. Numerous studies have documented systematic underreporting of symptoms by clinicians compared to patients in cancer samples as well as in non-cancer samples [710]. Geels et al [6] found that the rate of symptom detection was higher with patient self-reports than with physician assessments. In a European-Canadian Intergroup ovarian cancer trial patients more often rated symptoms as severe or moderate, but these were less often documented as higher grade toxicity symptoms by clinicians [11]. The agreement between symptoms reported by clinicians and symptoms reported by patients was low. Fromme et al. [12] examined the reliability of physician identification of adverse events compared with patient QLQ-C30 reported symptoms in a chemotherapy trial. The results showed that physicians did not report approximately one half of the symptoms identified by the QLQ-C30 assessment, and the QLQ-C30 did not detect approximately one half of the physician’s reporting adverse events of the same symptoms. Huschka et al. [13] found that the median time to the first occurrence of a severe adverse event reported by CTC grading was 304 days and the median time to the first 10-point decline in patient-reported QoL was 142 days. Self-reports are more sensitive to underlying changes in functional status and are reported sooner than those rated by physicians [14].

Assessments of QoL capture information from the patient perspective. Consequently, patient self-reporting is the standard approach to assess QoL. The draft guidance of the United States Food and Drug Administration emphasized that patient-reported outcomes should “come directly from the patient without interpretation of the patient’s responses by a physician or anyone else” [15]. The usefulness of patient-reported outcomes in clinical trials has been a subject of recent research [8, 9, 11]. Basch et al. [8] adapted the CTC assessment for use by patients and described the commonalities and discrepancies between adverse event reporting by patients and by clinicians. For most symptoms, agreement between patients and physicians was high when using the same format. For subjective symptoms, the agreement was lower and clinicians tended to underestimate the severity compared to patient-reported symptoms. Butler et al. [11] found that the agreement between clinician ratings and patient ratings was higher when patients experienced none or mild toxicities.

Many QoL instruments measure specific symptoms such as fatigue, nausea/vomiting, pain, dyspnea, constipation, or diarrhea [16]. These symptoms are adverse events commonly monitored by the CTC scale. There is a certain degree of overlap between toxicity grading and QoL reports. Some researchers question if QoL instruments provide the same information as toxicity data [17]. Toxicity and QoL seem to be associated, but the relationship remains unclear. This study aims to determine the relationship between physician-reported symptoms based on the CTC assessment and patient-reported QoL assessments at cycles 3 and 6. We assessed the agreement of specific chemotherapy-related symptoms reported by patients and by clinicians.

Materials and methods

Participants

Data were considered for inclusion form three closed randomized controlled ovarian cancer trials. Trial 1 (AGO-OVAR3, n = 798) compared the combination of carboplatin/paclitaxel with paclitaxel/cisplatin [17]. Trial 2 (AGO-OVAR5, n = 1,308) compared carboplatin/paclitaxel and epirubicine with carboplatin/paclitaxel [18]. Trial 3 (AGO-OVAR7, n = 1,282) compared carboplatin/paclitaxel followed by topotecan with carboplatin/paclitaxel [19]. These controlled trials were conducted within the German AGO Ovarian Cancer Study Group; two of them are intergroup trials with the French GINECO. The primary objectives of these trials were to evaluate the effects of chemotherapy regimens which had shown comparable antitumor efficacy for all utilized regimens. Details of the trial designs and the results have been published previously [1820]. Secondary endpoints of all trials included toxicity, QoL, and response to treatment.

Measures

The National Cancer Institute CTC (version 2.0) was used to assess toxicity symptoms related to chemotherapy [1]. Adverse events and toxicities were graded by study investigators. Each symptom was rated on a 4- or 5-point grading scale with 0 indicating an absence of toxicity and grade 3 or 4 indicating the most severe toxicity. All observed toxic effects were recorded continuously; blood chemistry parameters were measured before each chemotherapy cycle and hematologic parameters were measured weekly.

The EORTC QLQ-C30 was used to assess patients QoL. When the trials were designed, no specific QoL module for ovarian cancer was available. The QLQ-C30 includes five functional scales, three symptom scales, a global QoL/health status scale, and six single items. Most of the questions use a 4-point Likert scale ranging from 1 “not at all,” to 4 “very much” except the items on global QoL/health status with scoring from 1 to 7. The QLQ-C30 meets the standards for reliability [16]. All scores were linearly transformed to 0–100 and analyzed according to the scoring manual [21]. Higher scores on the functioning scales and the global QoL/health status scale indicate a higher level of functioning and a better QoL. Higher scores on the symptom scales represent a higher level of symptoms. EORTC QLQ-C30 was completed within 3 weeks after cycles 1, 3, and 6 and at every 6 months follow-up. Informed consent was obtained from all patients before randomization. The trials were designed in accordance with good clinical practice guidelines and the German drug laws and the Declaration of Helsinki.

Data and statistical analysis

The statistical analysis of the clinical trial data was restricted to patients who received six cycles of chemotherapy and had valid CTC and QoL assessments at cycles 3 and 6. Descriptive statistics were used to describe the demographic and clinical characteristics of the participants. Spearman’s rank correlation coefficients were performed to assess the association between the CTC raw scores and the QLQ-C30 raw scores. Correlation coefficient of <0.30 indicate a weak relationship, 0.30–0.50 a moderate relationship, and >0.50 a high relationship [22]. Five toxicity symptoms (nausea, emesis/vomiting, pain, dyspnea, and constipation) allowed a direct comparison since clinicians and patients rated the same symptoms on the CTC scale and on the QLQ-C30, respectively. These symptoms were matched, and the proportion for each symptom for which clinicians and patients provided an identical grade (exact agreement) was evaluated. The proportion of disagreement for each symptom (one-, two-, or three-point deviation) was also assessed.

Results

Participant characteristics

The baseline characteristics of the study participants are summarized in Table 1. The mean age was 57.5 years. The majority of patients (74.1%) were diagnosed with stage III disease. Almost half of the study participants (49.1%) received carboplatin/paclitaxel as the standard treatment. Overall, 2,110 of the 3,048 randomized patients (69%) had valid CTC and QoL assessments to be included in this analysis (Fig. 1).

Table 1 Characteristics of the participants with complete toxicity and quality of life data at cycles 3 and 6 (N = 2,110)
Fig. 1
figure 1

Consort diagram for patients with ovarian cancer randomized in three clinical chemotherapy trials

A previous analysis showed that patients who completed QoL questionnaires at all assessment times did not differ from dropouts in terms of QoL at baseline, stage of disease, and survival [23]. Table 2 shows grade 2 or greater symptoms captured by the CTC reporting system. Leukopenia and neutropenia grade ≥3 were the most frequently reported hematologic toxicities. Alopecia, nausea, emesis/vomiting, constipation, peripheral neuropathy, myalgia, pain, and dyspnea were the most frequently reported non-hematologic symptoms. Clinical results of the three randomized trials have been reported previously [1820].

Table 2 Toxicities captured by the CTC grading system

Association between patient-reported quality of life and clinician-rated toxicity

The correlations between patient-reported QoL and clinician-reported toxicity are displayed in Table 3. A severe adverse event was considered present if it was documented grade ≥2 for hematologic toxicities or grade ≥3 for non-hematologic toxicities. Only adverse events of at least 5% incidence were included. The correlation between The QLQ-C30 functioning scales and the CTC grading were weak (<0.30). Moderate correlations were found between the QLQ-C30 symptom scales and the CTC ratings. The coefficient ranged from 0.32–0.49 after cycle 3 for nausea, vomiting, constipation, pain, and dyspnea. The correlations were similar after cycle 6 ranging from 0.31 to 0.46 except for constipation which showed a high correlation (0.55). The correlations between toxicity and the various QoL functioning domains were low. However, on a symptom level most correlations were moderate.

Table 3 Spearman rank correlation coefficients between quality of life and toxicity

We further explored the five toxicity symptoms (nausea, emesis/vomiting, pain, dyspnea, and constipation) reported on the CTC scale that had corresponding items on the QLQ-C30. Table 4 shows the percent agreement of physician ratings (CTC) and the patient ratings (QLQ-C30) after cycles 3 and 6. Exact agreement between clinician and patient reporting ranged from 54.2% to 77.5% after cycle 3 and from 54.7% to 80.8% after cycle 6. The highest proportion of agreement was found for emesis/vomiting after cycles 3 and 6; the lowest for pain after cycle 3 and dyspnea after cycle 6. When symptom grading differed, patients reported greater severity for pain, constipation, and dyspnea. However, clinicians graded emesis/vomiting and nausea more severe than patients. Disagreements of two or three points were fewer than 5% for all symptoms when physician provided more severe ratings and fewer than 15% when patient provided more severe ratings (not shown).

Table 4 Percent agreement between toxicity symptoms (CTC) and patient-reported symptoms (QLQ-C30)

Discussion

The CTC grading system has a long tradition of reporting adverse events in clinical cancer trials. More recently, patient-reported outcomes have increased attention in clinical trial settings. In the German AGO Ovarian Cancer Study Group, QoL assessments have been incorporated in most phase III first line trials in addition to the clinician-reported toxicity. In this study, we analyzed the data of three randomized chemotherapy trials and determined the relationship between clinicians assessed toxicity and patient-reported QoL.

In our study as in several others [6, 7, 9, 11], the correlations between clinician-rated toxicity with the patient-reported QoL functioning status were weak; on a symptom level, the correlations were moderate. Basch et al. [8] reported that patients and clinicians generally agree on the severity of symptoms when using the same questions. Others found that even if the same instrument, e.g., the QLQ-C30 was used, physicians tend to systematically underestimate the patients’ overall QoL, social functioning, and role functioning [10]. Systematic underreporting of symptoms in terms of frequency, severity, and onset by clinicians compared to patients has been documented [9]. In concordance with others [8], we found that agreement for more observable symptoms such as emesis or vomiting was higher than for more subjective symptoms such as pain and dyspnea.

The impact of symptoms on QoL depends on different personal values and may vary from patient to patient which is not captured by the CTC ratings. When patients make an assessment, they probably use an intraindividual strategy and compare their current health condition with their condition before treatment, when they were healthy. In contrast, clinicians may use an interindividual comparison strategy and make their judgment based on their clinical experience. They see a broad range of patients and evaluate whether the patients clinical health status has improved or declined. It seems that physician-reported toxicity symptoms using the CTC system may not be the same as a patient perceived symptom detected with a QoL questionnaire. The CTC system records incidences of severe toxicity that indicate a need for immediate medical intervention, whereas QoL measures are intended to detect problems from the patients’ perspective.

Our study, as well as many others, showed a disparity between clinician-graded symptoms and patient-reported symptoms which may have clinical implications [5, 7, 11]. The CTC guides treatment decisions and patients may not receive the required supportive care interventions during chemotherapy. There is evidence that patient-reported outcome assessments provide significantly more toxicity symptoms than the CTC reporting system [9]. QoL measurements can detect severe adverse events earlier than the CTC system [13]. Since the CTC has the potential for underreporting especially subjective symptoms, the patient’s perspective should be considered as the gold standard. Validated QoL measures or patient-reported symptom assessment scales should be used more frequently when considering treatment alternatives [3, 15].

It should be noticed that the documentation systems for assessing toxicity and QoL are different and caution is warranted when interpreting the results. Patients and clinicians may have considered different time frames which is a limitation of the study. Patient reports using the QLQ-C30 refer to the past week, whereas the physician’s assessment of toxicity may reflect the time period between two chemotherapy cycles. Nevertheless, the data were collected in accordance with common practice from clinical trials. The strengths of the study are that the protocols of the three clinical chemotherapy trials were comparable using the same inclusion/exclusion criteria. Patients were homogeneous in terms of their clinical characteristics. All patients underwent similar treatment regimens according to a phase III protocol. The timing of QoL assessment and toxicity documentation were consistent using standardized measurements.

Our study results suggest that the inclusion of patient-reported outcome measures in clinical trials may be a useful approach to assess treatment-related symptoms in addition to the common toxicity documentation. Clinicians should not entirely rely on the CTC grading to capture adverse events and information related to patient well-being, but consider patient-reported symptoms or changes in QoL as well.