Introduction

Acromegaly is an insidious slowly developing syndrome due to long-term exposure to elevated levels of GH and IGF-I [1]. In the majority of cases GH hypersecretion is due to a pituitary adenoma, whereas it is rarely associated with a hypothalamic or an ectopic GH releasing hormone (GHRH)-producing tumor.

The annual incidence is approximately three cases/million, whereas the prevalence, estimated around 60 cases/million, is probably higher, since the diagnosis is frequently missed and often significantly delayed. Indeed, epidemiological studies have demonstrated that the diagnosis is generally preceded by approximately 4–10 years of unrecognized active disease [2, 3] and this time accounts for the establishment of the systemic complications (cardiovascular, respiratory, gastrointestinal, metabolic, neoplastic and skeletal), which are responsible for the high mortality rate in these subjects, if untreated [4].

Therefore, the early detection and the rapid control of acromegaly are crucial, in order to prevent irreversible complications. Therefore, the therapeutic goals are to restore an adequate pituitary function, to reduce morbidity, and to bring the mortality to that expected for age and sex, which may be achieved by normalizing GH and IGF values [57]. For this reason, the diagnostic and follow-up protocol must be as simple and inexpensive as possible, in order to make it feasible to implement at the slightest suspicion of disease.

The criteria for the diagnosis and evaluation of disease activity were first defined in a consensus statement developed in a workshop held in Cortina, in 1999 [8]. Briefly, a random GH value less than 0.4 μg/l (with normal IGF-I) excludes the diagnosis of active acromegaly. If either GH or IGF-I is abnormal, an oral glucose tolerance test (OGTT) should be performed and the nadir value should be less than 1 μg/l. Ten years later no substantial modifications have been proposed, except for a revision of the cut-off values which should be lowered in consideration of the higher purity of the most recent GH International Standard Preparations used for the assay calibration [9, 10].

In a prospective study in 50 acromegalic subjects we have previously reported that the 120th minute sample after OGTT is as reliable as the GH nadir, and that together with IGF-I and/or ALS, can provide sufficient information for a routine assessment of disease activity [11]. This protocol is useful for the clinical management because reduces the cost of hormonal assessment, the discomfort for the patient, and because the 120th minute sample is currently used for the diagnosis of diabetes mellitus as well [12].

Recently Carmichael et al. [13] reported similar percentage of discordant values between IGF-I and both basal and post glucose GH nadir in subjects in treatment with somatostatin receptor ligands (SRL), making GH post-OGTT not superior than the basal value. On this basis, evaluation of GH nadir after OGTT seems useless for disease assessment in patients treated with somatostatin analogues and, therefore, only basal GH and IGF-I could be sufficient.

In the present study we retrospectively analyzed the data in a cohort of acromegalic patients followed between 1988 and 2005 in order to estimate the performance of the various GH parameters in both the diagnosis and follow-up of acromegaly.

Materials and methods

Tests

Post-glucose GH values were retrieved from 279 OGTT performed in our unit from January 1988 to December 2005 in 93 acromegalic patients (30 males and 63 females, age range 24–83 years). The samples were obtained at 0, 30, 60, 90, and 120 min after oral glucose load (75 g). Following the criteria used by Carmichael et al. [13], in order to account for changes in sensitivity of GH assays used, and to reflect contemporary criteria for cure, values after glucose load were considered pathologic if less than 2 or than 1 μg/l for patients studied before or after the 1st June 1998, respectively (when the assay method was changed from Nichols Allegro to Immulite). Cut-off for basal values was always 2.5 μg/l.

In 61 subjects (22 males and 39 females) 77 OGTT, as well as the GH day curve (DC, evaluated by at least 5 samples obtained in the same day and performed no more than 7 days before or after OGTT) were available for a comparative study. The threshold for pathologic values was 1 and 2.5 μg/l, respectively for minimum and mean DC values recorded with the new ICMA automated assay, 2 and 2.5 μg/l, respectively for the values obtained with the older RIA and IRMA assays (see below).

Assays

GH was measured by means of three different methods over the years. The first method (1988) was a commercial RIA from Sorin Biomedica (Saluggia, Italy), which used a standard curve calibrated against WHO 66/217 (1 mg = 2 IU) and had an analytical sensitivity of 0.23 μg/l. The second (from 1989 to May 1998) was an IRMA from Nichols Institute (Nichols Allegro, San Juan Capistrano, CA, USA), which used a standard curve calibrated against NIAMDD-hGH-RP-1 (equivalent to WHO 66/217) and with analytical sensitivity of 0.06 μg/l. The third (from June 1998 to December 2005) was an ICMA (Immulite) from DPC (Los Angeles, CA, USA), calibrated against WHO 80/505 (1 mg = 2.59 IU), with analytical sensitivity of 0.01 μg/l.

IGF-I was measured on acid-extracted samples, using reagents provided by Nichols Institute (San Juan Capistrano, USA) from 1988 to 1993 and by Biosource (Nivelles, Belgium) from 1993 to 2005. In order to test the interchangeability of the results observed with these two methods 114 samples have been assayed with both methods. The results showed a coefficient of correlation of 0.98 with a slope 1.09 and an intercept of −2.17 ng/ml, not clinically relevant. Analysis of the results by the method of Bland and Altman [14] evidenced a mean discordance of 17 ng/ml, corresponding to about 5% of the concentration observed. This percentage is roughly similar to the inter-assay coefficient of variation.

For the IGF-I concentration a standard deviation score (SDS) was calculated for each value on the basis of data obtained from 1,234 normal subjects of both sexes, aged 20–80 years, grouped into decades of age. An IGF-I level >2 SDS for age conventionally identifies active acromegalic disease.

The different therapeutic groups are described in Table 3.

Statistical analysis was performed by means of the SPSS and Medcalc programs for Windows.

Results

In all 279 OGTTs examined (Table 1) discordance between IGF-I and T0 was no significantly higher than both between IGF-I versus nadir, and between IGF-I versus T120. Regarding the different GH values, the lowest discordance was found between T120 and nadir, significantly (P < 0.0001) lower than the discordance T0—nadir and T0—T120, respectively (Table 1; Fig. 1). ROC curve analysis of nadir and T120 (calculated assuming IGF-I >2SDS as the marker for disease activity) showed almost identical curves (Difference between areas = 0.001, P = 0.949).

Table 1 Percent discordance between OGTT basal (T0), NADIR and 120th minute (T120) GH values and IGF-I
Fig. 1
figure 1

Concordance between OGTT nadir T0 and T120 values. Axes intersection is at 1 μg/l, therefore T120 discordant values are plotted in the upper left area

In the 77 subjects who had also the DC data (Table 2), the GH minimum and mean DC (Log transformed) were the best correlated with IGF-I (r = 0.707 and 0.699, respectively) and the nadir had the lowest discordance rate with IGF I. Correlation between GH values evidenced r values around 0.99 for all parameters considered. Discordance rate between nadir and minimum DC was lower than that between nadir and mean DC, but no difference resulted statistically significant (Table 2).

Table 2 Correlation and discordance between GH post glucose (OGTT) and spontaneous daytime values (DC) and IGF-I

Analysis performed using only GH data obtained by ICMA showed similar results.

Among the different therapeutic groups, no significant differences were evidenced in the percent discordance of T0, nadir and T120 with IGF-I (Table 3; Fig 2).

Table 3 Concordance and discordance between OGTT GH values and IGF-I according to different treatments
Fig. 2
figure 2

Percent discordance of T0 and nadir GH values according to the treatment used at the time of testing. Concordant values are according to disease control based on both GH and IGF-I results while discordant values are distinguished according to high GH or IGF. (SRL somatostatin receptor ligand therapy, DA dopamine agonist therapy, no treatment patients are post-surgery without medical treatment)

Discussion

In diagnosing and defining disease activity in acromegaly, a single random GH value is often influenced by the sampling conditions (stress of venipuncture, physical exercise, etc.). GH secretion should, therefore, be assessed by evaluating either spontaneous secretion (integrated GH concentration, evaluated by continuous or very frequent sampling, or day curve, evaluated by multiple sampling throughout the day) or response to glucose load.

In our experience comparison of OGTT values showed that about 12% of samples after glucose were discordant with T0 (Table 1), therefore using a single random sample about 12% of our patients could be erroneously classified as patients with active disease. Comparison between nadir and 120th minute values showed a much lower discordance (5%) with all discordant values except one (evaluated only 1 month postoperatively) near the borders of the cut-off lines (Fig. 1).

Comparison between DC and OGTT data (Table 2) evidenced a correlation index around 0.99 among all values, the discordance between nadir and minimum DC was much lower than that with mean DC, and, surprisingly, the discordance rate between nadir and mean DC was higher than that with T0. This could be due in part to the different cut-off values, which were somewhat arbitrary and mainly selected on the basis of contemporary criteria of cure, in part also to the limitation of the cut-off principle for borderline values. Indeed, sometimes a little difference of value could classify either “controlled” or “non controlled” the result.

As expected correlation analysis between IGF-I and GH (Table 2) was highest with DC data and lowest with T0, however, the discordance rate did not parallel the correlation, as the highest discordance observed was for the DC data, even higher than that with T0. Again this reflects, in our opinion, the limitation of the principle of the cut-off and is probably proportional to the number of values near the cut-off levels.

Our data confirm the experience of others [15] about the reliability of multiple GH sampling, irrespective of the time-span in which the sampling is performed (the whole day or only in the morning hours), and probably proportional to the number of samples considered but we could not evidence any gain in clinical performance of the day curve with respect to OGTT. Very recently Ribeiro-Oliveira et al. [16] in 40 newly diagnosed untreated acromegalics evidenced that no patient with mean 24 h GH >4.3 μg/l had an OGTT nadir below 1 μg/l while over 50% of patients with lower GH secretion had a suppression below 1 μg/l and around 30% below 0.4 μg/l. This finding is very important in pointing out that a normal glucose suppression does not necessarily exclude the diagnosis in naïve patients and probably the presence of active disease in the follow up. Our patients were not comparable with those of Ribeiro-Oliveira but in our experience all 26 patients with mean DC >4.5 μg/l showed no suppression of GH below 1 μg/l. Moreover, considering all subjects with a pathological IGF-I (54/77 patients) we observed 18 subjects with GH nadir below 1 μg/l (10 below 0.4 μg/l) and 23 subjects with mean DC below 2.5 μg/l (12 below 1 μg/l). This is evidence that neither multiple daily sampling nor lower OGTT cut-off values help to solve the GH/IGF-I discrepancy.

ROC curve analysis of OGTT nadir and T120 demonstrated the virtual equivalence of the two values, therefore, the 120th minute sample, provides virtually the same information of the nadir. Considering that cost-containment is strongly pursued by health authorities and hospital managers, the strategy of the 120th minute-only sample helps in optimizing the follow-up of patients with acromegaly by limiting at the maximum the number of sampling without significant loss in efficiency, also considering that in our experience all but one subject with discordancy between the 120th minute and the nadir were at the borderline levels.

IGF-I, being function of GH integrated concentration, could theoretically replace GH measurement, however, discordant results between GH and IGF-I are frequently observed, due also to unspecific interferences such as thyroid function and nutritional conditions, and for this reason all recent consensus recommendations suggest to take in consideration both GH and IGF-I [8, 17, 18]. Our experience confirms this concept and points out a discordance rate around 30% (Table 1). Carmichael et al. [13] reported a higher discordance rate in SRL treated patients as compared with patients without medication or on dopamine agonists. While the overall discrepancy rate is similar, we could not evidence any difference in the three treatment groups (Table 3; Fig. 2). A possible reason for this disagreement is in the relatively low number of subjects on DA treatment in our experience (only 19 subjects). Regarding the much lower discordance rate observed in our SRL treated patients in comparison with those of Carmichael (around 27 and 48%, respectively) this is particularly due to the evidence of a preponderance of patients with high GH and normal IGF-I in Carmichael experience while in ours the numbers of subjects with high GH and low IGF-I were comparable to the number of subjects with low GH and high IGF-I (Table 3). In this particular group the correlation between nadir log GH and IGF-I was 0.62 in our subjects and 0.44 in Carmichaels’ suggesting a possible methodological difference in the two groups. It is worth noticing that comparison of different last generation GH assays performed by Arafat et al. [19] evidenced not only marked differences between the different assays tested but also significant sex, age and BMI influences.

In all patients discordance of GH with IGF-I was slightly but not significantly lower after glucose load (Table 1), and this would suggest that glucose load is useless, but the discordance rate between T0 and nadir or T120 was much higher than that between nadir and T120, showing that glucose load eliminates the unspecific GH increases. Therefore, in diagnosis and follow up of acromegaly, also in consideration of the findings of Ribeiro-Oliveira [16], we support the strategy of a basal and a 120th minute post-glucose test, which makes testing less expensive and more feasible. In case of discordancy between GH and IGF-I the day curve evaluation may give additional information, however, the increase of costs makes it largely unpractical, also considering that almost invariably the discordant values are borderline the cut-off values.

Overall, considering all concordant GH results (mean DC, minimum DC and OGTT nadir, Fig. 3) in comparison with IGF-I, we had 40/77 subjects with complete concordance (29 non controlled and 11 controlled) and 20/77 subjects in which all GH values were concordant and discordant with IGF-I (3 GH uncontrolled and IGF-I controlled, 17 GH controlled and IGF-I uncontrolled). A reduced IGF-I with high GH could be ascribed to unspecific interferences on IGF-I production or to a peripheral effect of SRL on IGF production [13, 20], an IGF-I disproportionately high compared to GH could be due to the synthesis by the tumor cells of a GH variant molecule not “seen” by the highly specific modern assay systems.

Fig. 3
figure 3

Comparison of OGTT, DC and IGF-I results in the same subject. Only the subjects presenting with all GH results concordant are depicted. The figures on top of columns are the number of subjects in each group

We confirm once again the log relationship between GH and IGF-I (Fig. 4) already reported [21, 22]. On clinical grounds, this means that small variations in the low GH concentration range produce greater modifications of IGF-I than do similar variations in the high concentrations. Indeed, if subjects with OGTT nadir values below or above 5 μg/l are considered separately, linear correlation analysis between GH and IGF-I shows slopes of 62.1 and 8.2, respectively (Fig. 4), which means that for every μg of GH decrease in the concentration range below 5, IGF-I decreases by 62.1 μg, while above 5 μg/l the reduction is only 8.2 μg. By consequence, when treating an acromegalic patient, we must consider that a therapy which produces a great GH reduction from very high values does not necessarily imply a proportional reduction in IGF-I; conversely IGF-I is much more sensitive to GH variations in the low concentration range.

Fig. 4
figure 4

Correlation between GH nadir after OGTT and IGF-I. The subjects are grouped according to a GH below or above 5 μg/l in order to show that the slope calculated by the correlation analysis is much steeper in the low GH range

Moreover, in an integrated view of pathogenesis and control of acromegaly, the discordance between different parameters and cut-off values, the relative pitfall of the markers available push us towards the search for new markers and methods for evaluating disease activity and monitor the effectiveness of care in the aim of normalizing life expectancy [23].

In conclusion, an extended revision of our experience on diagnosis and follow up of acromegaly has shown that:

  1. 1.

    The minimum GH DC is the GH parameter best correlated with IGF-I but in clinical practice evaluation of GH after glucose load in association with IGF-I is the most practical approach.

  2. 2.

    Being the 120th minute sample almost superimposable to the nadir, the use of the basal and 120th minute GH values can reliably replace multiple sampling allowing the test to be simplified and the cost of follow-up significantly reduced.

  3. 3.

    Different treatment modalities do not influence the discordance rate between GH and IGF-I.