Low associations between clinician assessed toxicity and morbidity after antineoplastic treatment and patient-reported symptoms in health-related quality of life (HR-QoL) are consistently described in the literature [4, 6, 14, 16]. Modern oncology research has, therefore, increasingly recognized that the assessment of morbidity in clinical studies should combine both traditional morbidity scoring by clinicians with patient-reported scores to give a more complete picture of cancer and treatment-related side effects [4]. Consequently, HR-QoL studies have steadily increased over recent years [11].

In view of the major methodological differences between morbidity scoring and self-reported symptoms in HR-QoL, a certain amount of variation is quite acceptable and comprehensible. While morbidity grading systems are supposed to follow more or less objective criteria, patient-reported symptoms inherently are based on a subjective self-evaluation of the impact on quality of life. The aim of this study was, therefore, not to evaluate the general concordance between clinicians assessed and patient-reported symptoms, but to focus on mismatches with high clinical relevance, indicating obvious discrepancy which suggests underreporting of morbidity.

In the frame of the EMBRACE study, the curative treatment of locally advanced uterine cervical cancer patients with definitive radio(chemo)therapy, including magnetic resonance imaging (MRI)-guided brachytherapy is evaluated with regard to clinical outcome: survival, local control, morbidity, and HR-QoL. This ongoing multicenter prospective observational study provides a large patient collective and allows a comparison between morbidity assessed by clinicians and patient-reported symptoms assessed by HR-QoL questionnaire.

Material and methods

Patients

In July 2011, follow-up data of 306 patients of the ongoing EMBRACE study were available at the time point 3 months after treatment. Of these, a total of 223 patients (73%) had both morbidity and quality of life data completed and were analyzed.

EMBRACE inclusion criteria are a newly biopsy proven squamous, adeno- or adenosquamous carcinoma of the uterine cervix, FIGO stage IB–IVB (para-aortic metastatic nodes below L1–L2 only) with definitive radio(chemo)therapy with curative intent. Informed consent was obtained from all patients and the study is approved in all participating centers by the respective National Ethics Committee.

Symptom assessment

Early morbidity is assessed prospectively 3 months after treatment with Common Terminology Criteria for Adverse Events (CTCAE v.3), which is widely accepted throughout the oncology community as the standard classification and severity grading scale for adverse events in cancer therapy clinical trials [12]. The grading system follows general rules of grade 1 (G1) mild, G2 moderate, G3 severe symptoms, G4 life-threatening consequences, and G5 death related to side effects [12]. In parallel, patients’ quality of life is collected 3 months after treatment with the EORTC-QLQ-C30 [1] and CX24 questionnaire [7]. Patients are asked for the occurrence of symptoms and are offered four answer categories “not at all”, “a little”, “quite a bit” and “very much”. Between CTCAE and EORTC-QLQ there is an overlap between 12 symptoms, which are summarized in Tab. 1. At the time of CTCAE assessment, the clinicians were unaware of the quality of life ratings.

Tab. 1 Overlap between CTCAE v.3 morbidity and patient reported symptoms by EORTC-QLQ-C30/CX24

Analyses

SPSS statistical software system (SPSS Inc., Chicago, IL, USA) was used for calculations. All EORTC-QLQ symptoms were assessed on single items level, except fatigue syndrome which combines three items and was recoded. Single missing data in both grading systems were eliminated from analyses involving that variable. No imputations were performed.

The concordance between CTCAE and EORTC-QLQ was analyzed regarding discrepancies. Therefore mismatches for each symptom were classified, if the CTCAE grading was rated 0 (no symptom) and the patient-reported substantial symptoms (“quite a bit” or “very much”) in EORTC-QLQ (Tab. 2).

Tab. 2 Categorization for mismatches, sensitivity and specificity analysis between CTCAE v.3 and EORTC QLQ C30/CX24

The absolute number of mismatches was calculated per symptom, together with the relative percentage in the overall cohort. In addition, the relative percentage of mismatch only in patients reporting substantial problems in the EORTC-QLQ is given (Tab. 3). Analysis of the number of mismatches per patient and per center was performed.

Tab. 3 Summary of results: comparison between CTCAE v.3 and EORTC QLQ C30/CX24

For sensitivity and specificity analysis both grading systems were categorized into positive or negative scoring analogue to the mismatches (Tab. 2). Sensitivity and specificity analysis of the CTCAE grading for each symptom is given referring to the EORTC-QLQ patient reported symptoms as gold standard. Sensitivity was calculated as Se =  [true positive/(true positive + false negative)], meaning the probability that the clinician scores morbidity to some degree (G > 0) in CTCAE, given that the patient reports “quite a bit” or “very much” symptoms in the EORTC-QLQ. Specificity was calculated as Sp = [true negative/(false positive + true negative)] meaning the probability of G0 in CTCAE, given that the patient reports no or just “a little” symptoms in the EORTC-QLQ.

Results

Patients’ characteristics

A total of 223 patients from 14 EMBRACE centers were included in the analysis with median age of 49 years (range 27–83 years). Tumor stage according to FIGO classification is stage IB: 36, IIA: 18, IIB: 111, IIIA: 3, IIIB: 46, IVA: 5, and IVB: 2. WHO performance scale before treatment shows PS0 in 169 patients, PS1 in 46, PS2 in 5, and PS3 in 1 patient. The treatment includes pelvic external beam radiotherapy, concomitant chemotherapy, and MRI-guided brachytherapy. At the time point of analysis 3 months after end of treatment, 16 of the 223 patients had a persistent disease.

Discrepancies between CTCAE v.3 and EORTC-QLQ C30/CX24

The total number of all substantial (EORTC-QLQ positive) symptoms reported by patients is 360. From these, 159 mismatches between CTCAE and EORTC-QLQ (44%) were found in 88 patients.

Symptoms with the highest occurrence of mismatches (over 10%) are urinary frequency, fatigue, and insomnia. In addition, in the subgroup of 73 sexually active patients, 15.1% mismatches in vaginal dryness and the feeling of shortening and tightening of the vagina were found. Results are described in detail in Tab. 3. A graphical example of the association and the mismatches regarding urinary frequency is provided in Fig. 1.

Fig. 1
figure 1

Scatter dot plot of the association between CTCAE v.3 and EORTC-QLQ regarding urinary frequency, absolute numbers per category given, mismatches circled

In 39% of all patients (88/223), at least one mismatch was found. In 47 patients there is only one mismatch present, in 25 patients two mismatches, in 7 patients three mismatches, in 4 patients four mismatches and in 5 patients five mismatches.

The five major participating centers, which at least contributed 20–40 patients, were compared regarding the number of patients with at least one mismatch. Large institutional differences were found: center 1 had only mismatches in 4% of patients, center 2 in 13%, center 3 in 24%, center 4 in 54%, and center 5 in 71% of patients (relative to the number of patients included by the center).

Overall sensitivity is moderate (around 50% with some exceptions) and specificity is high (around 80–90%) in nearly all symptoms (see details in Tab. 3). The sensitivity of 0% in hemorrhage bleeding is due to just 1 patient reporting “quite a bit” of blood in stools, which has a G0 in CTCAE. From 17 patients reporting relevant problems in bowel control, only 2 were graded in CTCAE anal incontinence, resulting in a low sensitivity of 11.8%. Low specificity was found in fatigue syndrome (68.1%) and vaginal dryness (74.5%). Even when reporting no or just a little symptoms, a substantial amount of patients received a CTCAE grading to some degree.

Discussion

The aim of this study was to focus on mismatches with high clinical relevance, which suggests underreporting of early morbidity 3 months after end of treatment. If the patient reports symptoms as being “quite a bit” or “very much” and this is not recognized by the CTCAE scoring (G0), there is an obvious discrepancy.

This approach takes into account that it is not possible to translate the symptoms occurring “a little”, “quite a bit” or “very much” into CTCAE G1, 2, 3 grading. Neither grading system follows a linear association. A certain amount of variation between CTCAE scoring and patient-reported EORTC-QLQ symptoms is quite comprehensible. In the majority of the 12 items investigated, the CTCAE grading relies partly on patients’ statements and symptoms are not directly observable or measurable. In example, diarrhea can only be graded by the number of stools per day the patient reports, which is then combined by the clinician with the medical intervention needed. The only exception is limb edema, which has a true objective definition of inter-limb discrepancy in volume or circumference. Several items (fatigue, insomnia, and hot flashes) rely only on the patients’ subjective evaluation and report. The patients’ report of symptoms strongly depends on individual factors, like psychological coping strategies, the patient–clinician relationship, communication factors like interpersonal sympathy and trust, and the setting during medical encounter.

In total, with 223 patients and 12 overlapping items, there are 2,676 possibilities for concordance. Thus, 159 mismatches represents 6% overall discrepancy. But taking into account that the best agreement between patient and clinician on the rating is the absence of symptoms [8], we selected the patients with substantial subjective problems. Overall, 360 EORTC-QLQ positive symptoms were reported and 159 of them were not recognized by CTCAE grading. This leads to an overall underestimation of morbidity in 44%, given that the patient experiences symptoms “quite a bit” or “very much”.

Several explanations for the mismatches can be hypothesized, either related to patients or related to clinicians. We expected some patients to underreport morbidity consistently during the medical encounter (due to interpersonal factors), leading to a high frequency of mismatches in a few patients. Our results, showing a wide range in the number of mismatches per patient, do not clearly support this explanation (although this can not be verified statistically).

On the other hand, we found large institutional differences (two centers with 4 vs. 71% of patients with at least one mismatch), which leads to the assumption that some EMBRACE participating centers tend to underestimate early morbidity. The five centers analyzed are from five different countries in Europe and sociocultural differences may have contributed to the diverging results. This is also supported by Laugsand et al. [10], who found in a large multicenter study of patient–clinician agreement “substantial variations between countries”. One possible explanation of clinician-related mismatches, described by Dische in 2003 [5], is the tendency in clinical studies to put more emphasis on identifying severe G3/4 than milder morbidity. Klee et al. [9] even concludes that quality of life includes information about milder morbidity that is not usually in the clinicians’ morbidity scoring systems. In addition, it may be questioned how general or specific some clinicians ask about the symptoms and how much information of the communication process is altered or lost [4, 13].

In our study, some symptoms seem to be sensitive for mismatches, especially urinary frequency/urgency, fatigue-syndrome, and insomnia. Vistad et al. [16] compared clinicians assessed RTOG/EORTC morbidity and LENT SOMA questionnaires of 147 cervical cancer patients more then 5 years after radiotherapy. They found a slightly higher proportion of mismatches (17%) regarding bladder symptoms (combining five urinary symptoms), where patients reported severe and clinicians reported no problems.

Fatigue syndrome and also insomnia are no organ-related symptoms and rely entirely on subjective interpretation, as they can not be described consistently in concrete observations. Therefore, they remain often under recognized [2, 3]. Nevertheless they are known to be one of the most common and distressing symptoms in cancer [3] with profound effects on quality of life [15]. Similar results to our study were found by Laugsand et al. [10], who reports underestimations of clinicians regarding fatigue in 13.4%, insomnia in 10.2% and diarrhea in 7.3%.

For the vaginal symptoms, a conclusion has to be drawn very carefully, after a subgroup of patients reported vaginal problems only in association with sexuality while CTCAE grading relies on objective findings of the gynecological examination. It is not clear how vaginal dryness is recorded during examination, either in asking the patient (who will probably just report problems in case of sexual activity) or relying on the clinicians’ perception (which is probably not correlated with the lubrication due to sexual arousal). The same problem can be hypothesized regarding vaginal stenosis. It can be observed objectively, but the feeling of vaginal shortening and tightening strongly depends on sexual intercourse and the male partner.

Overall sensitivity in our study is moderate as expected, taking into account the true positive and the false negative (which we defined as mismatches); our results are comparable to Fromme et al. [6] (fatigue 62%, insomnia 36%, diarrhea 60%). Overall specificity is around 80–90% in all symptoms, again reflecting the fact that the best agreement between patient and clinician on the rating is the absence of symptoms [8].

Of course we also found mismatches in the other direction (patient reports no or just a little symptoms, CTCAE shows a G > 0), which we do not report here in detail. They can be explained by the strict definition we used to categorize both grading systems and mainly rely on patients reporting the symptom “a little”, which are graded by CTCAE G1.

Conclusion

Analysis of mismatches indicates a high risk of underestimation of morbidity: Nearly half of the patient reported substantial symptoms from EORTC-QLQ are not recognized by the clinician in CTCAE scoring to any degree (G0). Those discrepancies seem to be mainly due to institutional differences. For the prospective assessment of symptoms in clinical studies, it is therefore essential to integrate patient reported symptoms to the traditional scoring systems, in order to receive a complete and comprehensive picture of the symptom burden.