Introduction

Endogenous Cushing’s Syndrome (CS) is related to increased morbidity and mortality [1, 2]; therefore, its prompt recognition in high-risk population is mandatory. CS diagnosis could be a challenge in clinical practice, due to the overlap of cortisol-related clinical signs and symptoms with metabolic syndrome or psychiatric disorders, conditions that may also be associated with abnormalities in the ACTH–cortisol axis, defined as functional hypercortisolism [3, 4].

The Endocrine Society’s guidelines recommend screening for CS using one of the following tests, because they evaluate common features of CS: the corticotroph feedback to cortisol with 1-mg dexamethasone suppression test (1-mg DST); the cortisol rhythm with late night salivary cortisol (LNSC) assay; and the daily cortisol excretion with 24-h urinary-free cortisol (UFC) measurement [5]. CS is a rare disease; otherwise, its signs and symptoms could be very common: screening test should prefer sensitivity (SE) rather than specificity (SP).

Salivary cortisol is increasingly used to assess adrenal-related disorders [6]. Since the impairment of circadian cortisol rhythm is a peculiar marker of CS [3, 5], LNSC seems the best choice to screen for CS because of its non-invasive, stress-free, and easy collection in outpatients [6,7,8,9,10,11]. Chemiluminescent immunoassays (CLIA) are worldwide used in clinical chemistry to measure cortisol; however, they suffer of cross reactivity with other steroids (especially cortisone), therefore, increasing the number of false-positive results. Endocrine Society guidelines suggest to use mass spectrometry [5]; however, its use in clinical practice is limited [12], and some authors advocate to study the real diagnostic accuracy of chemiluminescence before to discard it, since the diagnostic accuracy of LNSC measured with liquid chromatography–tandem mass spectrometry (LC–MS/MS) is not so superior [13].

The aims of our study were: a) to study the diagnostic accuracy of LNSC in a large series of consecutive patients in a prospective study; b) to consider the SP of LNSC in different control groups; and c) to compare the results obtained in CLIA with LC–MS/MS.

Materials and methods

Patients and protocol

We prospectively evaluated 281 consecutive patients, from November 2010 to December 2014, routinely attending the Division of Endocrinology in Ancona:

  1. (a)

    Control group 117 subjects with uni- or multinodular goitre. There were 51 males and 66 females, mean age 45 years (range 23–72), and their mean body mass index (BMI) was 22.8 ± 1.3 kg/m2. Exclusion criteria were BMI < 20 or > 25 kg/m2, mental disorders, pregnancy or puerperium, current steroid, or estro-progestinic treatment.

  2. (b)

    Suspected CS 164 patients, on the basis of clinical signs or symptoms. The presence of at least three common features of hypercortisolism (summarized in Fig. 1), BMI > 30 kg/m2, uncontrolled diabetes mellitus, or adrenal incidentaloma was sufficient to suspect CS. We considered as suspected CS 52 males and 112 females, mean age was 43 years (range 18–79), and mean BMI was 30.8 ± 8.5 kg/m2. Imaging features of adrenal malignancies, biochemical evidence of primary aldosteronism, and pheochromocytoma were exclusion criteria in subjects with adrenal incidentaloma.

    Fig. 1
    figure 1

    Prevalence of signs and symptoms in the suspected-CS group

All subjects were outpatients and collected one saliva sample among 23.00 and 24.00, and a written form was provided to ensure a proper saliva collection.

In case of increased LNSC, high clinical suspicion of CS or adrenal incidentaloma, patients were hospitalized to measure UFC (three collections in consecutive days), to assess midnight serum cortisol (MSC) and serum cortisol suppression after 1-mg DST. Serum and urinary cortisol levels were measured with an electro-chemiluminescence immunoassay (ECLIA Roche, Modular E170, Indianapolis, IN, USA).

To exclude possible interferences, patients have to avoid the use of glucocorticoid or estrogen drugs for at least 8 weeks; saliva collection was performed at least 60 min after smoking, eating, or drinking something, brushing teeth; hepatic and renal diseases were excluded in all subjects (healthy controls and suspected CS). If a patient assumed a drug known to affect tests that could not be discontinued, we discarded the results. To avoid false-negative results, all non-CS patients were re-assessed after 12 months, to exclude cyclical CS.

Salivary cortisol was collected into a commercial polyester-based sampling device with citric acid (Salivette® Sarstedt, Numbrecht, Germany); all samples were stored at 4 °C in the patient’s fridge immediately after collection, until delivering to the laboratory. LNSC was measured with chemiluminescence assay (CLIA automated method Access, Beckman Coulter, Brea, CA, USA), previously described [14]. Briefly, this is a linear method (1–1655 nmol/L), intra-assay coefficient of variation is < 7%, and cross reaction with other steroids is 2.1% with corticosterone, 5.3% with 17-hydroxyprogesterone, 8% with cortisone, and 0.04% with dexamethasone. A volume of at least 25 µL of saliva was needed to measure cortisol with CLIA, and it was available in all subjects (at least 200 µL of saliva collected in all patients).

In selected cases (increased LNSC with CLIA or high clinical suspicion of CS in a false-negative result), a liquid chromatography–tandem mass spectrometry (LC–MS/MS) method with a HPLC Agilent Technologies 1100 series and triple quadrupole mass spectrometer (Santa Clara, California, USA), previously described [15], was used to confirm the result.

An LNSC value > 16 or > 3.3 nmol/L was considered as threshold for CS diagnosis, respectively, measured in CLIA or LC–MS/MS. These thresholds have been previously calculated in an independent cohort of patients (122 healthy volunteers and 28 CS, not considered in the present manuscript to rule out any interpretation bias); SE and SP were 97.2%/98.1% in CLIA (AUC 0.973) and 95.3%/97.4% in LC–MS/MS (AUC 0.947, unpublished data).

According to STARD (Standards for Reporting Diagnostic accuracy studies) criteria, we considered as reference standard those patients with confirmed endogenous hypercortisolism. Briefly, we considered Cushing’s Disease (n = 36) in those patients with positive ACTH immune-staining of the pituitary adenoma, ACTH pituitary/peripheral gradient > 3 after CRH stimulation in petrosal sinus sampling or at least two of the following criteria: 80% decrease of serum cortisol after 8 mg DST; ≥ 50% rise in ACTH or ≥ 20% rise in cortisol levels after CRH stimulation test; MRI confirmation of a pituitary adenoma ≥ 6 mm; and long-term remission after pituitary surgery. The other ACTH-dependent CS patients were ectopic CS (n = 2). We considered adrenal CS (n = 9) in those patients with ACTH levels < 10 ng/L and positive finding of an adrenal lesion.

All patients gave their informed consent; the local Ethic Committee approved the protocol and the study was conducted in accordance with the principles of the Declaration of Helsinki.

Statistical analyses

Continuous data and salivary cortisol concentrations are shown as mean and standard deviation. We first assessed the normality of distribution using the Kolmogorov–Smirnov Z test, and then compared the groups using the Student’s t test for unpaired data with the correct P value adjusted after the Levene’s test for equality of variances; Spearman r test was used to assess logistic regression among variables. SE and SP were calculated at different cut-off levels to perform Receiver Operating Curve (ROC) analyses, and positive and negative predictive values (respectively, PPV and NPV) were calculated. We calculated also the positive and negative likelihood ratios (LRpos and LRneg), since they are independent from disease prevalence. LRpos value > 10 is large and often conclusive, values of 5–10 are moderate, and those in the range of 2–5 are small and increases in the likelihood of disease. LRneg levels of 0.2–0.5 indicate small, 0.1–0.2 moderate, and < 0.1 conclusive decreases in the likelihood of disease [16]. The 95% CI for LRpos and LRneg was calculated with the Simel et al.’s method [17].

Linear regression was used to examine the relationship between UFF or UFF/UFE and age. Statistical analysis was performed using the SPSS 16 software package (SPSS, Inc, Chicago, IL, USA). The significance level was set at a p value < 0.05 for all the tests.

Results

Patients with suspected CS, CS, and non-CS, presented higher LNSC levels than control group (Table 1, Fig. 2). Only one subjects with a final diagnosis of CS, as depicted in Fig. 3, presented normal LNSC. On the other hand, we found a large number of false-positive results: 35 out of 81 subjects with increased LNSC were non-CS (15 diabetic and 20 obese patients). LNSC was similar among control group and patients with adrenal incidentaloma. The diagnostic accuracy of LNSC (and other tests when performed according to our protocol) is reassumed in Table 2. All patients with CS presented at diagnosis with unsuppressed serum cortisol after 1-mg DST; only one revealed normal MSC (UFC and LNSC both increased) and one normal UFC (and impaired cortisol rhythm).

Table 1 Late Night Salivary Cortisol (LNSC) levels measured with chemiluminescence (CLIA) assay in the whole cohort
Fig. 2
figure 2

Scatter plot of LNSC (measured with CLIA, dotted line represents the cut-off 16 nmol/L) levels in the different populations: 117 control group and 164 suspected CS (obesity n = 61, diabetes n = 27, adrenal incidentaloma n = 29, CS n = 47)

Fig. 3
figure 3

Study design

Table 2 Diagnostic accuracy of Late Night Salivary Cortisol (LNSC, measured with chemiluminescence assay, CLIA) and other first-line tests for Cushing’s Syndrome (CS) in our patients, according to selection criteria

Considering non-CS (n = 75, 40 with normal and 35 with increased LNSC), the number of false-positive results was high: less than half patients presented with a normal result of all the first-line screening test (LNSC, MSC, UFC, and 1-mg DST); 44% revealed increased UFC, 27% impaired MSC, and 15% unsuppressed serum cortisol after 1-mg DST. The combination of all tests did not result in a diagnostic improvement.

Overt hypercortisolism was excluded in all patients with an adrenal incidentaloma. Only one subjects presented with slightly increased LNSC (18.3 nmol/L); nevertheless, cortisol suppression after 1-mg DST, UFC, and MSC was normal. Two patients with adrenal incidentaloma presented an impaired serum cortisol rhythm (increased MSC), and only one revealed sub-optimal cortisol suppression after DST (72 nmol/L, therefore, characterizing a subclinical hypercortisolism). Overall, another four patients revealed serum cortisol 50–138 nmol/L after 1-mg DST, but all other considered first-line screening tests were normal, and clinical signs or symptoms of overt hypercortisolism were absent.

Considering 16 nmol/L as threshold for CS diagnosis, overall LNSC revealed SE 97% (95% CI 0.817–0.993) and SP 84% (95% CI 0.772–0.871) in the whole group of subjects considered, achieving LRpos of 5.56 (95% CI 4.14–7.46) and LRneg of 0.045 (95% CI 0.007–0.307).

Notwithstanding that, if we considered the group of non-CS (those patients with increased likelihood to have a CS), the number of false-positive results increased, and therefore, the SP decreased to 70% (95% CI 59.8–76.3), with LRpos of 3.071 (95% CI 2.328–4.051) and LRneg of 0.054 (95% CI 0.008–0.371). SP dropped to 60% (95% CI 49–68.3) if we discharged patients with adrenal incidentaloma, with LRpos of 2.346 (95% CI 1.823–3.02) and LRneg of 0.063 (95% CI 0.009–0.433). Therefore, we re-computed the threshold of LNSC considering the group of CS compared to non-CS: if we increase the cut-off for CS up to 21.9 nmol/L, we would gain in SP (77%), with a reduction in SE (92%).

We measured cortisol with LC–MS/MS in those patients with increased LNSC results in CLIA or high clinical suspicion of CS. The two subjects in control group with increased LNSC were false-positive results, since their first cortisol measurement in chemiluminescence (17.2 and 18.6 nmol/L) was not confirmed in LC–MS/MS (respectively, 0.2 and 0.4 nmol/L). Spectrometry confirmed also the false-negative LNSC result of the patient with confirmed CS and normal cortisol rhythm in CLIA (respectively, 1 and 0.6 nmol/L). Considering the 35 non-CS subjects with false-positive increased LNSC in CLIA, a sufficient frozen saliva volume for LC–MS/MS was available for 21 samples, and in 12 (6 obese and 6 diabetic patients), the LC–MS/MS analyses revealed a normal LNSC, contradicting CLIA. In the other nine patients, CLIA and LC–MS/MS results were concordant, confirming increased LNSC.

Patients with false-positive or negative results to screening tests were similar considering age, weight, BMI, and glycated haemoglobin levels, considering all non-CS or only diabetic or obese subjects.

As reported in Table 3, we arbitrarily divided those obese and diabetic patients non-CS with false-positive results considering mild, moderate, and severe grades of hypercortisolism. Most patients revealed mild or moderate hypercortisolism, and UFC was the screening test with the higher number of false-positive results.

Table 3 False-positive results in patients with obesity or diabetes, stratified by severity of hypercortisolism upon upper limit of normality (ULN: mild: < 1.5 ULN, moderate 1.5–2.5 ULN, severe > 2.5 ULN)

Discussion

Despite the apparently benign character of the disease, endogenous hypercortisolism is related to severe comorbidities, with increased mortality: a delayed, or missed, CS diagnosis in high-risk populations is detrimental [1, 2]. Therefore, screening tests need to enhance sensitivity, to discover all the patients with the disease, avoiding low specificity, which is related to increased number of unnecessary confirmatory tests. Saliva collection is convenient, especially for outpatients, and the diagnostic accuracy of LNSC reported in the literature is so high in clinical study [7, 9,10,11, 18, 19], and confirmed in meta-analyses [20] that it has been proposed as the single first-line screening test in a diagnostic flow chart for suspected CS [6]. Nevertheless, published studies reported retrospective series, and it is well known that case–control retrospective design studies for establishing the accuracy of diagnostic tests may overestimate their accuracy to detect all cases.

Therefore, our aim was to study the diagnostic accuracy of LNSC in a prospective study, considering consecutive non-selected patients with CS suspicion (to describe a “real-life” approach). Moreover, we considered LNCS alone and combined with other first-line screening test, especially in high-risk categories for CS.

We confirm in a prospective study the high diagnostic accuracy of LNSC, in particular, high SE (97%), high NPV (99%), and low LRneg (0.045): the likelihood of having CS in patients with normal LNSC is extremely low. Nevertheless, the only one CS with normal LNSC presented with an overt picture of CS, thus limiting the clinical impact of this false-negative result. It is well known that some confirmed CS may present with normal cortisol rhythm; otherwise, lower levels of LNSC may be due to inadequate soaking of the saliva collection device, or to some fluctuation as in the case of cyclical CS [21]. Our result confirms previously reported series that considered considering electro-chemiluminescence (ECLIA) [9, 18, 22], and only few authors reported a lower SE [8], probably due to different analytic or pre-analytic protocols (an international standardization about salivary cortisol management and measurement is lacking). Recently, Repetto and coworkers reported lower cortisol concentrations determined by CLIA than ECLIA, the latter validated by the manufacturer for the measurement of salivary cortisol [23]. We confirm a high SE for LNSC, as well as that for UFC, MSC, and 1-mg DST: all the proposed tests are generally appropriate for CS screening purposes, given their low false-negative rates. A large prospective study would be needed to establish once and for all which is the first test to use in different clinical situations (obesity, metabolic syndrome, hypertension, osteoporosis, and so on). In our opinion, all three screening tests have to be performed in a referral center, and the screening strategy has to be tailored to the patient.

Despite high SE, we observed an unsatisfactory low SP, adequate only in the whole group of patients (84%). SP was low in the non-CS group (70%) as well as in diabetic or obese subject (60%, considered as the high-risk population of having CS). Consequently, the LRpos was low: a patient with increased LNSC levels is not always a CS, and other tests are needed to confirm or exclude the clinical suspicion. It has always been reported that LNSC levels increase with age, and contrasting data have been reported regarding a relationship among LNSC and BMI [24, 25]. Despite these considerations, the high number of false-positive results in our cohort could be related to a real subclinical or mild form of CS, characterized by impaired cortisol rhythm, but adequate pituitary feedback and daily glucocorticoid secretion (in terms of normal cortisol suppression after 1-mg DST and UFC, respectively). On the other hand, several authors have previously reported that LNSC is weak to distinguish subclinical hypercortisolism [10, 26, 27]. Another source of false positives is the threshold itself: we adopted an ROC-based cutoff previously calculated in a cohort of patients different from that reported in this manuscript, to avoid selection bias. However, if we have re-calculated the LNSC threshold in this group of non-CS (patients with obesity, diabetes, and adrenal incidentaloma) increasing the cut-off value up to 21.9 nmol/L, we would increase SP (77%) preserving an acceptable high SE (which is of utmost importance to diagnose a rare disease as CS). The optimal threshold for LNSC (or other steroids) is a critical issue in the real-life clinical practice of the endocrine laboratory, because in the literature, there is reported a wide range of different cut-off values with different assays [7,8,9,10,11,12,13, 18,19,20,21,22,23,24,25,26]: an effort should be considered to harmonize the measurement (assays) and upper limit of normality for salivary cortisol. Finally, obese or diabetic subjects might could hide a functional hypercortisolism, with chronic activation of hypothalamic–pituitary–adrenal axis without endogenous CS [4], and in the previous literature, some authors did not considered high-risk population but healthy volunteers (with reduced likelihood of having CS, thus increasing SP avoiding subjects with functional hypercortisolism) [18, 19, 23]. We could speculate that there is a continuum concerning cortisol rhythm impairment between healthy controls, functional hypercortisolism (obesity and diabetes), and CS. Some authors observed an SP higher than ours in ECLIA (97 and 94%) [8, 22] and CLIA (97%) [23], or using radio-immunometric assay [18], but they considered different populations of suspected CS, with high or low likelihood ratio of hypercortisolism, therefore, affecting SP (low SP in case of suspected CS or functional hypercortisolism, and high SP in case of healthy controls). To conclude, as remarked by the low PPV and LRpos of LNSC, an accurate selection of patients before prescribing the screening test (not only LNSC) is mandatory, to reduce the number of false positives (because clinical picture of suspect CS and CS is overlapping in most cases).

Comparing the diagnostic accuracy of the other first-line screening tests, also 1-mg DST and UFC revealed high SE, as previously reported [16, 20]. One of the pitfalls of all screening test for CS is SP: in our cohort of non-CS, only 53% of patients revealed normal result to all tests. UFC and MSC presented the worst SP (respectively, 56 and 73%), and 1-mg DST presented the best SP. On the other hand, LNSC is easier to perform and does not require nurses for blood sampling nor assuming any drug (dexamethasone, which could be cumbersome).

First-line screening test is not always accurate to distinguish functional hypercortisolism, the so-called pseudo-cushing [4, 28,29,30], also in our cohort of patients, revealing a diagnostic accuracy similar to other tests. As previously described, the impairment of cortisol rhythm yielded less satisfactory results to diagnose CS (albeit measured with MSC); on the other hand, other authors suggested to use MSC as a second-line test for functional hypercortisolism, because cortisol rhythm is impaired only in endogenous CS [31, 32]. We found that SE of LNSC was similar to that of MSC, and increasing the threshold, we could partially gain in SP (as previously reported [28]). Nevertheless, saliva sample is easy-to-manage, convenient for outpatients (hospitalization is not required), and stress-free: further investigations are needed to establish the impairment of cortisol rhythm in functional hypercortisolism.

The proper definition of autonomous cortisol secretion in patients with adrenal incidentaloma is a challenge, especially when subclinical. It is well known that LNSC is useful only to exclude overt CS and its role is somewhat limited to detect subclinical hypercortisolism [26, 27, 33, 34]. In our cohort of patients, the SP of LNSC was affected only when obesity or diabetes mellitus was considered, and the diagnostic performance of LNSC to exclude overt CS was high, as well as that of the other screening tests (all patients suppressed serum cortisol after 1-mg DST and presented with normal UFC). Therefore, to detect subclinical hypercortisolism related to cardiovascular events and mortality [35, 36], we suggest to use 1-mg DST as the first-line screening test in adrenal incidentaloma, and in case of inadequate cortisol suppression (50–138 nmol/L), LNSC or UFC should be considered to exclude overt CS [37].

One of the critical issue for cortisol measurement, not only in saliva, is that the widespread use of immunoassays may give rise to falsely high values due to cross reactivity with cortisol metabolites, especially cortisone (8% cross reaction with cortisol in our study, and cortisone is present in saliva at high concentrations [12]). Although LC–MS/MS is a referral method, it has been described in clinical practice only in few studies and with contradictory results [7, 11, 12, 38]. We consider mass spectrometry to confirm the unexpected results obtained with chemiluminescence: the two controls with increased LNSC in CLIA were not confirmed with LC–MS/MS. Half patients with increased LNSC in CLIA were confirmed with LC–MS/MS, thus increasing SP only partially; further studies are needed to establish if the gain in diagnostic accuracy of cortisol-related disease is tangible with LC–MS/MS. However, we acknowledge that only a part of CLIA samples was re-analyzed with spectrometry.

Rather than strength (i.e., the prospective design or the “double” control group, one with normal subjects and one with patients selected with some feature of suspected hypercortisolism), our work present some limitations. First, the design considered a hospitalization of patients with increased LNSC levels or high clinical suspicion of CS. Moreover, we collected only one saliva sample, since our aim was to propose an easy-to-manage screening test in clinical practice. It has been reported that performing two LNSC collections does not seem to improve the diagnostic accuracy compared to only one [11, 39]. On the other hand, Raff in 2012 suggested to measure two samples of LNSC with immunoassay due to high cortisol variability and pre-analytical errors [6]. Considering that also Endocrine Society’s guidelines for CS proposed two samples for UFC or LNSC [5], the debate about performing one or more saliva collections is still ongoing.

To conclude, LNSC present a high diagnostic accuracy to exclude hypercortisolism in patients with normal cortisol levels, also measured in chemiluminescence; therefore, its use is suitable in clinical practice. LC–MS/MS could be used to reduce the number of false-positive results; nevertheless, some subjects without functional hypercortisolism could have a mild impairment of cortisol rhythm, not leading to an overt CS.