Introduction

Cushing’s disease (CD) is a rare devastating endocrine disease that is caused by an adrenocorticotropic hormone (ACTH) secreting pituitary adenoma [1]. 80% of these tumours can be classified as microadenomas and 20% as macroadenomas. The elevated ACTH levels secreted by these tumours stimulate the adrenal glands to produce excess cortisol, leading to the subsequent development of clinical signs and symptoms of hypercortisolism. Patients suffer from this disease for many years before coming to medical attention and appropriate diagnosis. The median time between onset of symptoms and final diagnosis is about 2 years [2, 3], ranging from < 1 month to 16 years. CD most commonly affects adults aged 20–50, primarily females. It is associated with severe morbidity and premature mortality [4,5,6,7,8,9,10] mostly due to cardiometabolic comorbidities [6, 11]. Already at the time of diagnosis, 58–85% of patients have hypertension, 32–41% are obese, 20–47% have diabetes mellitus, and 38–71% have dyslipidaemia [12]. Prior prolonged exposure to glucocorticoids also may affect the central nervous system, leading to persistent cognitive and mood alterations such as major depression in 50–81% of the patients at the time of diagnosis [9]. Osteoporosis in 31–50% of the patients [12] and fractures, especially vertebral fractures, can further add to morbidity. Other conditions such as hypopituitarism, including persistent adrenocortical insufficiency after surgery may also contribute to the increased mortality risk [13]. Finally, chronic exposure to hypercortisolism in long-term follow-up has a significant impact on patient’s health-related quality of life (HRQoL) [14,15,16].

To date no clinician-reported outcome tool for managing the disease in clinical practice has been developed for patients with CD. However, there is a need for such a tool aiming at the assessment of the magnitude of changes in clinical signs and symptoms. This is particularly of importance, bearing in mind that patients present with extremely variable disease courses ranging from mild symptoms to aggressive disease. In addition, the response of the patients to treatment is very different in each individual patient. According to a recent review overall initial remission rates in clinical studies ranged from 25 to 100% with a mean remission rate of 77.8%, the recurrence rates following initial remission ranged from 0 to 65.6%, with a mean recurrence rate of 13.2% [17]. The role of medical treatment is increasing in recent years, since novel compounds have been developed. Treatment with Mifepristone, where 24-h urinary free cortisol (UFC) measurement for therapy monitoring is useless, has been shown to correlate with clinical features [18]. However, occurrence and magnitude of symptoms do not always correlate with biochemical activity [19].

The aim of this study was to develop a multidimensional and integrated clinical scoring instrument that encompasses, summarizes and weights appropriately the desired clinical benefits of a treatment for CD.

Methods

Patients

No animals were used in this research project. Our research involved human participants which were included only in case of providing their informed consent. Patients with confirmed CD aged 18 years or older irrespective of medical history, current or previous treatment and disease activity were included in the study. Therefore, the cohort consisted of a mixed population of patients with CD with or without previous surgical treatment, radiotherapy and/or medical treatment. Patients were treated from 2012 to 2016 in two academic centres in Munich of the German Cushing’s Registry CUSTODES (Cushing-Syndrom Register-Therapie und Outcome in Deutschland). Both centres are specialized in the treatment of patients with pituitary diseases and all patients had given written informed consent prior to inclusion to participate in a joint specialized Cushing’s consultation hour which forms the basis of this study. A total of 99 patients with at least two visits have been included in the study, out of which 28 patients were treated at the Max Planck Institute and 71 patients at the University Hospital of the Ludwig-Maximilians-University. The study did not have any relevance for the decision-making process regarding patients’ treatment.

Data were registered in a web-based database, which serves as a regional data platform in a consortium of different clinical and epidemiological centres in Munich (part of the German Cushing’s Registry).

Study design

This was a retrospective study to identify relevant variables composing a clinical CD score. For this purpose at first a panel of 42 variables potentially relevant to the clinical course of CD were predefined by a group of clinical endocrinology experts under consideration of relevant literature (Table 1). These variables covered a broad range of domains typically impacted by the disease such as the cardiovascular and metabolic system, neuropsychiatric status, musculoskeletal system, quality of life, clinical signs and symptoms, pituitary function as well as biochemical variables.

Table 1 Description of the variables assessed

Variables as well as biochemical disease activity were assessed at baseline and at least at a second time point after 12 months. Due to the real-world clinical setting of this study, follow-up visits did not always take place at exactly 12 months after the initial visit. The primary endpoint was the identification of variables whose changes from baseline to the follow-up visit(s) may explain well the variance in biochemical activity-alteration between the two visits. By considering the biochemical activity-alteration from baseline to follow-up visits as an influential factor with certain categories representing disease status (see more details in the next session) the primary endpoint could be also defined as the detection of those clinical variable changes which the disease activity exerts a significant effect on.

For validation of the clinical score, if such a score had to be established by the actual study, a prospective multicentre, European, follow-up study was planned in a cohort of patients with active CD before (treatment-naïve) and after treatment initiation.

Assessment of biochemical activity

Biochemical activity was assessed by measuring UFC levels as a surrogate biochemical parameter of clinical benefits [20] in CD. UFC has always collected twice and was assayed by the ADVIA Centaur XP chemiluminescence immunoassay, Siemens (Erlangen, Germany). Biochemical results were classified according to biochemical control, i.e. normal or elevated UFC concentrations according to the respective reference ranges, achieved in two consecutive visits of at least 12 months distance (Table 2). In this sense the biochemical activity that represents disease activity was classified in four categories: category 0 = 1st and 2nd visit with uncontrolled biochemical activity, category 1 = 1st visit with controlled and 2nd visit with uncontrolled biochemical activity, category 2 = first visit with uncontrolled and 2nd visit with controlled biochemical activity, category 3 = first and 2nd visit with controlled biochemical activity. The focus of the study was then directed on identifying clinical variables with significant differences between the four categories of disease activity. Measurement of UFC was not performed if not useful such as in patients with previous bilateral adrenalectomy or if the patients refused a 24-h urine collection. Since changes in UFC levels were the key reference parameter for biochemical disease activity in our predefined study protocol, these patients were excluded from analysis. At the end biochemical data were available for 138 visit-pairs assigning to 48 patients only.

Table 2 Definition of categories of biochemical activity

Statistical analyses

The primary endpoint was evaluated by the use of multivariate variance analyses (MANOVAs) applied on separate variable sets each containing the clinical variables of the different areas pictured in Table 1. Thereby the biochemical activity was considered as an influential factor with four categories as they have been defined previously. Even if the most clinical variables of Table 1 due to their transformation in ordinal data are apparently not normal-distributed, the application of variance analyses on them is nonetheless entirely justified, because variance analyses are very robust against violations in the normality and homogeneity conditions. When by applying variance analyses a small set of clinical variables with significant differences between the biochemical activity-categories would be found (candidates for the scoring system), then discriminant analysis with these candidates had to be subsequently performed, in order to assess their discriminative power between the different categories. Candidate variables that reveal in the discriminant analysis a high discriminative power should be declared as final components of a multidimensional score system in the treatment of CD.

In all variance analyses (as well in the possibly following discriminant analyses) pairs of consecutive visits were used as sample units. Although in that case the required independence between sample units may be violated, we desisted from the possible inconsistence in favour of a better test power, since—as already mentioned—biochemical activity by UFC was measured only by 48 from the 99 patients participated in the study.

Note: If a considerable sample size of patients would be available then an association analysis like linear or categorical regression analysis would also be right here. However, the sample of the 48 patients with registered UFC biochemical activity was too small for performing adequate regression analyses with one dependent and 42 independent variable(s); on the other hand the sample of the 138 visit-pairs seems to be also inadequate for regression analyses, since in opposite to MANOVA these analyses are very sensitive against violations in their prerequisites. The expected partly strong dependencies between consecutive visits for some of the considered clinical variables could be a major obstacle in the application of regression analyses here. Similarly, investigation of correlations between changes in the clinical variables and changes in the UFC biochemical activity would also have helped to identify clinical variables associated well with UFC activity. But with the use of correlations between clinical signs and UFC activity possible inter-correlations between the 42 clinical variables could not have been evaluated and considered properly and had therefore to be ignored.

Results

Patients

A total of 99 patients were enrolled, each with at least two consecutive visits. All together 266 visit-pairs with a distance of more than 12 months between consecutive visits have been registered for the patients. Baseline characteristics of the patients are shown in Table 3.

Table 3 Baseline characteristics of the patients (n = 99)

Biochemical activity of CD and response to treatment

Data on biochemical activity measured by UFC were available from 138 (51.9%) of 266 visit-pairs in 48 patients. In 48 (34.8% of 138) visit-pairs in 21 patients UFC was not controlled at the first visit (defined by UFC above the upper limit of normal), whereas it was controlled in 65.2% of the visits (defined by normal UFC according to laboratory specific reference range). In the consecutive visit after 1 year patients were controlled in 70.3% of visits and not controlled in 29.7%. Since changes of biochemical activity were expected to translate into changes of clinical disease activity, we specifically identified those patients who newly achieved or lost disease control during follow-up. In 41 (29.7%) consecutive visits of 17 patients a change in biochemical categories was observed between two visits, with 7 patients in 17 (12.3%) visits changing from previously controlled to not controlled (loss of biochemical control as assessed by UFC) and in 24 (17.4%) visits from uncontrolled to controlled (achievement of biochemical control) (Fig. 1). Patients, who had a change in biochemical control categories in 41 consecutive visits were mainly expected to show a change of clinical disease characteristics that could potentially form a clinical score.

Fig. 1
figure 1

Categories of biochemical response between consecutive visits in patients with CD (n = 138 visits)

No biochemical control in either visit was observed in 24 (17.4%) consecutive visits of patients, whereas stable biochemical in either visit control was observed in 73 (52.9%) consecutive visits (Fig. 1).

Sensitivity of clinical variables to changes of biochemical activity

In order to evaluate the sensitivity of clinical variables to changes of biochemical activity or in other words to proof which of the clinical variable could be significantly influenced by changes in the biochemical response, variance analyses were applied on different sets of clinical variables as already mentioned in the “Statistical analyses” section. By doing that particular interest was focused on the comparisons (a) between category 0 and 2 (persistently uncontrolled patients versus patients who changed from uncontrolled to controlled) (b) between category 3 and 2 (patients with stable control versus patients who changed from uncontrolled to controlled and c) between category 3 and 1 (patients with stable control versus patients who changed from controlled to uncontrolled).

Unfortunately, the performed variance analyses produced no global effect on the considered sets of clinical variables corresponding to quality of life dimensions, laboratory, anthropometric, musculoskeletal variables and others irrespective of the sample units (Wilks multivariate tests of significance; p-values not significant for any variable set). Interestingly, also the subsequent investigation of the simple effects (which indeed was not necessary under the failure of global effects) did not produce significant simple effects on none of the single variables (univariate F-tests within the MANOVAs, p-values n.s.). Consequently, no significant differences in clinical disease variables were observed by the aforementioned comparisons (a), (b) and (c). As example, Fig. 2 shows the receiver operating characteristic (ROC)—curves of the eight-dimension scores of SF-36 HRQoL and the BDI global score by comparing uncured (persistently uncontrolled) patients with cured patients i.e. patients who changed from uncontrolled to controlled [comparison (a)]. Changes in biochemical activity were not reflected by changes of variables in these patients.

Fig. 2
figure 2

Sensitivity of the eight SF-36 HRQoL dimension scores and total BDI score to changes of biochemical activity. True positive and false positive rates correspond to the well-known “sensitivity” and “1-specifity” terms respectively when investigating factor effects on variables with ROC-curves. By the considered scores the area under the curve of the corresponding ROC-curves did not point to any significant effect. BDI beck depression inventory, HRQoL health-related quality of life, SF short form

As a consequence of the negative results of the variance analyses was the waiving of the planed subsequent discriminant analysis, since except some interpretation disparities both analyses deliver almost the same output.

Discussion

The results in our real-world cohort of patients with CD show biochemical and clinical activity at baseline and during one or more 12-month follow-up interval(s). In about a third of the visits, a change of biochemical activity from controlled to uncontrolled or vice versa was observed between two consecutive visits. However, these changes of biochemical disease activity did not translate into significant changes of clinical disease activity and statistical analyses of the data could not provide key parameters that are discriminative between biochemical cured and non-cured patients, defined by 24-h UFC concentrations within one year follow-up.

Although there is a need for a clinical score reflecting the clinical benefits of a treatment to date no such score has been developed in patients with CD. Previously, only Nugent et al. generated a Bayesian equation for differential diagnosis of Cushing’s syndrome using the incidence of signs and symptoms in 211 patients [21]. The clinical features included osteoporosis, central/generalized obesity, weakness, bruising/acne, plethora, coloured striae, edema, hirsutism, oligomenorrhea, headache, abnormal glucose tolerance, age < 35 years, diastolic blood pressure > 105 mmHg, red blood cell volume > 49 fL and serum potassium < 3.6 mEq/L [21]. This tool returned a “confident” diagnosis in 9/38 patients with Cushing’s syndrome and excluded it in 45/93 [21]. However, the equation was never used in clinical practice due to its low positive and negative predictive values.

There are several reasons why we have failed to identify significant clinical variables reflecting biochemical disease activity in our patients. First of all, the variables selected a priori may have not been adequate. However, this appears unlikely since they consisted of the key symptoms and features of the disease according to literature and clinical experience [22, 23].

Second, frequently there is a time range of up to several years between onset of symptoms and diagnosis as well as treatment that already may have resulted in persisting alterations despite biochemical remission following surgery. Recent evidence suggests that symptoms often persist despite biochemical remission following surgery and normalisation of cortisol secretion may not result in complete resolution of comorbidities associated with CD [9, 24]. Especially, increased cardiovascular risk may persist despite long-term remission of hypercortisolaemia [9]. Prior prolonged exposure to glucocorticoids also may have irreversible effects on the central nervous system, leading to persistent cognitive and mood alterations [24]. Even mortality remains increased in patients with CD after initial biochemical remission, suggesting that cure does not directly reverse the metabolic consequences of long-term overexposure to cortisol [9, 12, 13]. Hence, clinical changes may have been too small or did not exist to discriminate variables due to the lack of significant differences between the two visits. Or, put in another way, 12 months may have been a too short follow-up period bearing in mind that changes of variables such as bone mineral density may take much more time to become significant.

Third, more than two-thirds of the patients in our study were in biochemical remission both at baseline and follow-up. Therefore, the number of patients showing a change of biochemical status from active to controlled disease may have been too small for supplying significant effects in the variance analyses (MANOVAs).

Finally, the results of our study emphasize the limitations of a multicentre real-world-based study. In particular, missing data may have caused our web-based study to become underpowered to detect the true effect sizes. Our results therefore highlight the importance of a more rigid quality control of data acquisition also in a real-world observational setting.

With regard to pituitary diseases a clinician-reported outcome tool for managing the disease in clinical practice has been developed only in acromegaly. This so called SAGIT® tool (Signs and symptoms—Associated comorbidities—GH concentration level—IGF-1—Tumor) assesses the status and the evolution of this particular disease and is intended to guide treatment-decision making. Results of a pilot study with 11 endocrinologists applying the score has been published [25]. Similar to our proposed CD score the SAGIT® tool relies on symptoms, associated comorbidities, but unlike the proposed CD score, it also includes measures of biochemical disease activity (growth hormone and insulin-like growth factor 1) and tumour characteristics. However, validation in larger cohorts of patients with acromegaly is still lacking. Furthermore, an Acromegaly Disease Activity Tool (ACRODAT®) is in development [26], which also includes patient-oriented outcome measures like quality-of-life- similar to our study which we presented here. These tools might assist clinicians towards a more meaningful patient management in pituitary disease.

In conclusion, it was not possible to construct a clinical scoring system for patients with CD that encompasses, summarizes and weights appropriately the desired clinical benefits of a treatment for CD in our population of patients. Our result highlights the need for a thorough patient selection and quality control for future attempts for generating a clinical scoring system. Accordingly, we have initiated a subsequent study investigating the construction of a clinical scoring system specifically in newly-diagnosed, treatment-naïve patients with CD.