Introduction

In 2006, the European Neuroendocrine Tumor Society (ENETS) proposed a three-tiered grading system of neuroendocrine tumors (NET), based on mitotic count and Ki-67 labeling index [1]. In 2010, the World Health Organization licensed a new classification of NET, incorporating the grading system previously proposed by ENETS [2]. The clinical value of the ENETS grading system has been validated, and the importance of the grading for prognosis within each TNM stage has been confirmed [3, 4]. More recently, some reports suggested that prognostic value of grading may be improved in pancreatic NET (P-NET), when the 2 % traditional cutoff value for NET G2 is replaced by a 5 % Ki-67 value [57]. This proposal awaits to be confirmed by further data.

It is well known that different areas in single or familial multiple NETs (or different sites of metastasis) may display different proliferative activities [8]. Notably, guidelines suggest that mitotic index should be computed on at least 50 fields in tumor areas with the highest cell density [1, 9, 10]. Thus, many authorities discourage using small biopsies or cytological samples for evaluation of NET grading, and the WHO classification stated that NETs grading should be determined on surgical specimens [2]. Furthermore, lack of standardization in Ki-67 labeling quantification has been highlighted [9, 11]. Despite these concerns, there are few doubts that, if available, a reliable preoperative determination of NET grading would be useful in clinical practice. First of all, the benefit of resecting low-grade, nonfunctioning, small (<2 cm) P-NETs has been recently disputed; accordingly, for these patients, a wait and see approach has been proposed [12]. Furthermore, grading is pivotal in choosing the most appropriate therapeutic option in unresectable P-NETs [13].

Endoscopic ultrasonography-guided fine needle aspiration (EUS-FNA) is a well-established diagnostic tool in gastrointestinal malignancies. Although many studies have dealt with diagnosis and grading of P-NETs by EUS-FNA, comparison between cytological specimens and their histological counterparts is available only in a subset of patients [1420]. Unfortunately, methods used to assess Ki-67 labeling index were not always detailed in these studies, which showed conflicting results.

The aim of this multicenter, retrospective study is to compare the results of grading by Ki-67 labeling index, as measured respectively on cytological samples, obtained by EUS-FNA and on surgical specimens of patients with P-NETs. We also investigated whether the adoption of a 5 % (instead of 2 %) cutoff value for defining G2 tumors may lead to a better concordance between cytological and histological grading.

Materials and methods

We reviewed the databases of the pathology departments of five institutions (IRCSS-ASMN, Reggio Emilia; Azienda Ospedaliero-Universitaria Policlinico, Modena; IRCCS-Humanitas Clinical Institute, Rozzano; Bellaria and Maggiore Hospitals, Bologna; Santa Maria della Misericordia Hospital, Udine) from January 2007 to October 2013. We retrieved the files of all the patients who had been diagnosed as affected by NET on the basis of EUS-FNA specimens and had been subsequently treated by complete surgical resection. According to the ENETS/WHO scheme, Ki-67 cytological index was evaluated by manual counting as a percentage of 500–2,000 cells, and we decided to exclude cytological samples that included less than 500 tumor cells. Thus, we identified 62 patients, as potentially eligible for the study, but the cytological samples were not adequate in 14 of them, and only 48 consecutive patients with 53 lesions were included in our final analysis. In particular, 49 well-differentiated P-NETs, showing a monotonous population of cells predominantly single with round or ovoid nuclei and salt and pepper chromatin pattern, and four cases of poorly differentiated pancreatic neuroendocrine carcinomas (P-NECs), characterized by nests of small to intermediate neoplastic cells with fusiform nuclei, minimal cytoplasm, granular chromatin, and inconspicuous nucleoli, were collected. Details of these patients are shown in Table 1. Three patients, all affected by MEN-I harbored multiple lesions (two patients, two lesions; one patient, four lesions). The cytological diagnosis of P-NET was confirmed by surgery in all patients, and the mean time between EUS-FNA and surgery was 58.5 days (range 0–154). The five centers contributed 16, 11, 8, 7, and 6 patients, respectively.

Table 1 Characteristics of the patients and of their lesions

All patients consented to EUS-FNA and surgery, and the study was conducted in accordance with the principles of the Helsinki Declaration.

Ki-67 labeling index was measured on cytological samples and their corresponding surgical specimens; histology obtained after complete resection was adopted as reference standard. All cytological samples were re-evaluated independently by two expert cytologists (G.C.; P.B.) and classified according to the 2010 WHO criteria. The same pool of samples was also classified using the alternative 5 % cutoff value for G2 P-NET. Any difference of interpretation was resolved by joint review by the two cytologists. Histological specimens were reviewed by a second team of pathologists (A.F.; L.L.; L.D.T.). According to WHO recommendation, areas of highest density of Ki-67 immunoreactivity (hot spots) were selected, and Ki-67 labeling index was still measured by manual counting on microscope by each of the three pathologists. Thus, each group of pathologists was blinded to the results of the other group.

EUS-FNAs were performed by experienced gastroenterologists (with experience of more than 1,500 EUS) using different types of scopes and needles (see Table 1), according to the local common practice and at the operators’ discretion. Also cytological samples were prepared according to the local practice of each center. In three out of five institutions, on-site cytological assistance was routinely available. Direct, alcohol-fixed smears (19 lesions, from Modena and Rozzano) or cell-block paraffin sections (34 lesions, from Reggio Emilia, Bologna, and Udine) were used for immunocytochemical Ki-67 staining. Immunocytochemical evaluation on archival Papanicolau-stained fixed fine needle aspirate smears has been already validated [2123]. Selected Pap-stained slides were placed in xylene until the coverslip could be easily removed. After the removal of coverslip, the slides were rehydrated in decreasing ethanol grades, then immunocytochemistry staining was performed using Dako mouse monoclonal Ki-67 Mib 1 antibody (1:100 dilution for 1 h) by standard automated Bench-Mark XT autostainer (Roche). Similarly, Dako mouse monoclonal Ki-67 Mib 1 antibody was used, when immunochemistry was performed on 4-μm-thick formalin-fixed deparaffined tissue sections. Heat-induced epitope retrieval was done with CC1 solution (Ventana-Roche) for 30 min, and the staining was developed using the UltraView DAB Universal Detection Kit (Ventana-Roche) on a Bench-Mark XT autostainer (Roche). Distinct nuclear staining in tumor cells was regarded as positive staining.

The distribution of Ki-67 proliferative index values in cytological and histological samples was compared using the Wilcoxon rank sum test. Concordance between cytological and histological grade was expressed as percent of concordant pairs and measured by weighted kappa. The related weights were calculated according to the Cicchetti-Allison method. We chose the weighted coefficients instead of the simple (unweighted) ones because they were more suitable for ordered variables like the grade scores; we also provided the simple kappa for completeness.

Furthermore, sensitivity and specificity of cytology in predicting the histological grade of the neoplasm are provided. Due to the small size of some subgroups involved in sensitivity and specificity calculation, the confidence intervals for sensitivity and specificity were calculated using the Clopper-Pearson method. So, even if we set the computation to reach 95 % of coverage, because of the discrete nature of the problem, the confidence coefficient is not exactly 95 %, but at least 95 %. In other words, this approach can result in wider (conservative) confidence intervals. All statistical calculations were performed on lesion basis. SAS system, release 9.2, was used for statistical calculations.

Results

Representative examples of assessment of Ki-67 proliferative index on cytological and respective histological samples are shown in Fig. 1. The mean number of hot spots counted for each patient was eight (standard deviation ±1, 8; range 4–10); these corresponded to a mean number of 1,603 cells counted (standard deviation ±360; range 800–2,000).

Fig. 1
figure 1

Examples of cytological (left) and respective histological (right) Ki-67 index assessment: a, b, P-NET grade 1 orig. mag. ×100; c, d, P-NET grade 2 orig. mag. ×100; e, f, grade 3 orig. mag. ×100

The distribution of Ki-67 proliferative index in cytological and histological samples is shown in Fig. 2. In lesions classified as G1 on the basis of their histological specimens, the median value of Ki-67 proliferative index was 1 (range 1–6) in cytological and 1 (1–2) in histological samples (p = 0.407); the corresponding figures were 3 (2–10) and 5 (4–15) in G2 lesions (p = 0.059) and 60 (30–90) and 62.5 (30–90) in G3 cases (p = 1).

Fig. 2
figure 2

The distribution of Ki-67 index in cytological and histological samples. Patients are grouped (G1, G2, G3) according to the histological grade, and the distributions are compared using Wilcoxon rank sum test. The length of the box represents the interquartile range (IQR), which contains the central 50 % of the data (from the 25th to the 75th percentile).The line through the inside of the box represents the median value. The adjacent values are represented by the T-shaped lines that extend from each end of the box, from the largest observation which is less than or equal to the 75th percentile plus 1.5 times the IQR, to the smallest observation which is greater than or equal to the 25th percentile minus 1.5 times the IQR. Values outside this range, “outliers,” are represented as individual points

As shown in Table 2, the results of cytological and histological grading were concordant in 46 out of 53 (86.8 %) lesions and in 41 out of 48 patients (85.4 %). When a >5 % cutoff for G2 was chosen, the concordance rate was 49/53 lesions (92.4 %) and 44/48 patients (91.6 %). Agreement analysis was conducted both with weighted and simple kappa, for 2 and 5 % cutoff values, observing in each case a significant agreement between cytology and histology (p < 0.0001). Both weighted and simple analyses suggested a quite modest and not significant improvement in agreement going from 2 to 5 % cutoffs: weighted kappa was 0.77 (95 % CI 0.60–0.94) and 0.82 (95 % CI 0.64–1.00), respectively, for 2 % and for 5 % cutoff values; simple kappa was 0.73 (95 % CI 0.53–0.92) and 0.75 (95 % CI 0.51–0.98), respectively, for 2 % and for 5 % cutoff values. All the four cases with diagnosis of G3 neuroendocrine carcinoma (NEC, small cell type) were confirmed by histology, and no cases of G1-G2 NETs were upgraded to G3 NECs at histological grading. The cytological grading assessed on 19 smears and 34 cell block sections was consistent with histology in respectively 17 and 29 cases. In no cases was the grading evaluated both on smears and on cell block sections. No differences in the concordance rates between cytological and histological grading were observed, depending on the type of needle used in performing EUS-FNA (not shown).

Table 2 Concordance between cytological and histological grading

The assessment of the accuracy of cytological grading in predicting the true histological grade was done in two steps. Firstly, we assessed the accuracy of cytology in distinguishing G3 NECs and G1-G2 NETs: a 100 % sensitivity (4/4, 95 % CI 39.8–100.0) and a 100 % specificity (49/49, 95 % CI 92.8–100.0) were found. Secondly, we considered the diagnoses of G2 versus G1 NET for the cutoff values (2 and 5 %) under study, and we observed comparable figures for each cutoff: namely, sensitivity was 66.7 % (10/15, 95 % CI 38.4–88.2) and 75.0 % (3/4, 95 % CI 19.4–99.4), respectively, for 2 and 5 % cutoffs; the corresponding figures for specificity were 94.1 % (32/34, 95 % CI 80.3–99.3) and 93.3 % (42/45, 95 % CI 81.7–98.6).

Discussion

To our knowledge, this is the largest published series of P-NETs, with grading assessed both on cytological material and on surgical specimens. We found a high concordance between cytological and histological grading evaluated on 49 cases of well-differentiated P-NETs and 4 cases of small cell P-NECs; furthermore, EUS-FNA specimen was confirmed to be an accurate tool to distinguish G3 NECs from G1-G2 NETs, but it was less satisfactory in separating G1 from G2 NETs. In particular, cytology was found to be highly specific in the diagnosis of both G2 and G3 tumors, but the sensitivity was poor for G2 NETs and high for the prediction of G3 NECs. Nevertheless, the confidence intervals of the latter sensitivity were wide, reflecting the small number of NECs in our study. Future, larger series may provide a more accurate estimate of the sensitivity of cytological grading for diagnosis of NECs.

Our findings were obtained with different types of EUS-FNA needles, and we did not observe any difference when proliferative activity was measured on direct alcohol-fixed smears or cell block sections. In most patients, multiple passes were performed during EUS-FNA in order to sample different areas of the lesion and to improve the abundance of cells in the sample. 2010 WHO classification mandates the use of both mitotic count and Ki-67/MIB1 index in assigning the grade on histological material. The mitotic count was not performed on our cytological specimens; at least 50 HPFs are required for a reliable measurement [1, 2], and this requirement could not be fulfilled in most of our samples; moreover, in our experience, nuclear crushing hampered a reproducible identification of mitotic figures in some cytological smears. Recently, McCall in a series of 264 well-differentiated P-NETs found that over one third of their cases had discordant grades, when assessed based on the mitotic rate (G1) and on the Ki-67 index (G2). These patients had shorter overall survival than patients classified as G1 on both mitotic rate and Ki-67 index; this may suggest that Ki-67 labeling index should be assessed on all P-NETs [24]. According to WHO recommendation, we evaluated by manual counting the Ki-67 labeling index only when a minimum of 500 to 2,000 tumor cells were present, in order to ensure that the sample was representative. This simple criteria, although reasonable, has not been validated yet on cytological samples. However, objective quantification of the Ki-67 proliferative index by digital image analysis and manual methods has been recently stressed [25]. In the 27 cases reported by Hasegawa et al., the concordance between cytological and histological grades was improved, when aspirates with less than 2,000 cells were excluded [20]. The clinicians should be fully aware of the limits of cytological grading, and this parameter needs always to be integrated with other clinical elements to define the optimal therapeutic plan.

Theoretically, tumor heterogeneity may hamper the clinical value of Ki-67 labeling index, as assessed on core biopsies or cytological aspirates. In a recent study [26], “virtual biopsies” were constructed from 45 surgically resected liver metastases of well-differentiated neuroendocrine carcinoma; the single biopsies identified correctly all G1, but only 8 of 23 G2 cases (34.8 %). Interestingly, both the grade based on the single biopsy and the grade based on the whole metastasis correlated with overall survival. Also, cytological grading obtained by EUS-FNA aspirates of primary pancreatic NETs was reported to be predictive of survival [15, 20]. We included in the present multicenter, retrospective study only patients who underwent surgery after EUS-FNA; we waived performing a survival analysis, because in consequence of the study design, we had not the opportunity of standardizing follow-up schedules, treatment regimens, and other confounding variables among the centers. Furthermore, 10 patients had been referred by other centers, and no follow-up data were available; 5 out of the 38 remaining patients died during a mean follow-up of 883 days (standard deviation ±534), but none of the deaths were related to the disease. Disease recurred in 1/21 G1 patients and respectively in 3/13 and in 2/4 G2 and G3 cases.

When we consider clinical studies based on EUS-FNA, the different case mix and the reference to different classifications (WHO 2004 and WHO 2010) and to different Ki-67 cutoff values make comparisons cumbersome. Kaklamatos et al. [16] published a letter reporting a poor concordance (54 %) between cytological and histological Ki-67 measurement; in other published studies, the concordance rate ranged from 75 to 83 %. In most of these series, cytology was able to correctly classify the few high-grade cases included. De Angelis et al. reported a 68.4 % concordance, using a 3 % cutoff value, but concordance improved to 94.7 %, using a 20 % cutoff value [17]. Apparently, when well-differentiated neoplasms are classified using the 2 % Ki-67 cutoff value, the performance of cytology is not completely satisfactory. In a study based on the 2004 WHO classification [15], a 75 % concordance was reported: five out of six uncertain behavior NETs turned out to be well-differentiated carcinomas. A 100 % concordance between cytology and histology has been reported when a 5 % cutoff value was chosen to define G2 NETs [18]; in our series, the adoption of the 5 % cutoff value did not significantly improve the diagnostic performances of cytology in the diagnosis of G2 NETs. Nevertheless, the choice of the best cutoff should be based not on the concordance rate between cytology and histology, but on the prognostic value of the chosen value. Large, purposely designed studies are needed.

Our study has some limitations. First of all, it is retrospective, and we included only patients with a EUS-FNA diagnosis of NET: in these patients, the quality of cytological material was obviously rather good, and this could have biased our results. Furthermore, our series is rather small and included only 4 G3 cases. Nevertheless, the percentage of our P-NEC cases was consistent with the literature considering that they are often unresectable [2, 27]. Finally, due to the design of our study, our data could not be used to confirm the prognostic value of Ki-67 assessment nor to choose the best cutoff values.

In conclusion, our study showed that cytological samples including at least 500–2,000 tumor cells are suitable for a reliable assessment of Ki-67 labeling index. Cytological Ki-67 labeling index may be useful for preoperative classification of NETs and may give a valuable contribution for clinical and therapeutic decision making, when surgical treatment is not advisable.