Introduction

There have been many proposals for the classification of pre-neoplastic lesions of the laryngeal mucosa, but there is yet no final mutual agreement. Among these, the classifications, World Health Organization (WHOC), Ljubljana (LC) and squamous intraepithelial neoplasia (SINC) are the most widely favored [16]. These three are included in the latest classification of the World Health Organization [7]. The categories of these classifications are intended for determining the prognosis. The inter-observer agreement (IA) or variability is very important for any classification. There are a few studies about the IA of WHOC for head and neck lesions [818], but there are no data for LC and SINC [14]. In the present study, IA of laryngeal biopsies has been evaluated by WHOC, LC and SINC by multiple observers, in order to determine which of the classifications is better than the others.

Materials and Methods

Hematoxylin- and eosin (H&E)-stained sections from 42 laryngeal biopsies were gathered from 14 pathologists interested in head and neck pathology, participating from 8 cities and 11 centers. The slide set was posted among centers and each participant was requested to review microscopic slides and evaluate the sections preferentially according to WHOC, LC and SINC, based on the criteria described in the recent article by Gale et al. [6] (Table 1). In addition to these diagnostic categories, epithelium without any hyperplastic and neoplastic lesions and invasive squamous cell carcinoma were also accepted. It was announced to the participants that if they give a diagnosis with two categories like “moderate-severe dysplasia”, the higher category would be accepted for statistical analysis. The pathologists knew only the localization of the lesion during the evaluation of the slides.

Table 1 World Health Organization 2005 (WHO), Ljubljana (L) and squamous intraepithelial neoplasia (SIN) classifications [7]

All the participants were requested to send their diagnosis electronically to one center and all the data was gathered for statistical analysis.

The participants were asked to complete a questionnaire including the years spent as a pathology specialist, years of special interest or specific working in head and neck pathology, the number of laryngeal biopsies evaluated in September 2009, the classification they apply during daily practice for laryngeal pre-neoplastic lesions, and comments about the working set and organization of the study.

Statistical Analysis

The Friedman and Wilcoxon Signed Rank Test and the Spearman’s Correlation Analysis (SPSS, 11.00), as well as the weighted linear kappa analysis were performed (MedCalc).

Results

Distribution of Cases

The diagnosis which was favored by most of the participants was listed for each case. According to these, the distribution of laryngeal biopsies by WHOC was: squamous cell hyperplasia 5 (12%), mild dysplasia 11(26.2%), moderate dysplasia 12 (28.6%), severe dysplasia 7 (16.7%), carcinoma in situ 5 [12], and invasive squamous cell carcinoma in 2 (4.8%) cases.

The Profile of the Participants:

The participants had been pathology specialists for 5–24 years and they had been dealing with head and neck pathology for 5–18 years. The number of laryngeal biopsies diagnosed per month ranged from 11 to 31. WHOC was the predominantly applied classification, SINC was applied by one, and WHOC and LC were applied together by one, while two participants reported each case according to the three classifications.

Evaluation Results of Laryngeal Biopsies

All the participants (n = 14) applied both the WHOC and LC to laryngeal biopsies, but only 6 applied SINC.

There was a significant difference between the participants when all the cases were considered by WHOC (Friedman’s Test 0,000). Two of the participants diagnosed the cases with lower (mean: 2.24 ± 1.32 and 2.60 ± 1.29) and one diagnosed the cases with higher grades (mean: 3.62 ± 1.15) than the others (Wilcoxon Signed Ranks Test with Bonferroni Correction P = 0.000).

There was also a significant difference between the participants when all the cases were considered by LC (Friedman’s Test 0,000). Again, two of the participants diagnosed the cases with lower (mean: 2.45 ± 0.74 and 2.50 ± 0.63) and one diagnosed the cases with higher grades than the others (mean: 3.05 ± 0.73) (Wilcoxon Signed Ranks Test  ≤ 0.05 with Bonferroni Correction P = 0.000), but the participant who gave significantly higher grades was somebody else this time.

There was a significant difference between the participants when all the cases were considered by SINC (Friedman’s Test 0.000). One of the participants, who was one of the above-mentioned participants, diagnosed the cases as lower grade than the others (2.14 ± 1.20) (Wilcoxon Signed Ranks Test P < 0.003).

Ninety-one pairs of Spearmen Correlation Analyses were performed for WHOC results, each comparing a pair of participants. There was no correlation for 2 (2%) comparisons. Correlations were mild, moderate and strong for 33 (36.3%), 45 (49.5%) and 11 (12.1%) pairs, respectively.

For LC, the results were no correlation, mild, moderate and strong correlations for 13 (14.3%), 34 (37.4%), 40 (43.9%) and 4 (4.4%) comparisons, respectively.

The SINS results were for 15 comparisons, as only 6 participants applied this method. There were mild, moderate and strong correlations for 4 (26.7%), 10 (66.6%) and 1 (6.6%) pairs (Table 2).

Table 2 The distribution of correlation coefficients (r) comparing pairs of observers’ results for the laryngeal biopsies by the World Health Organization classification (WHOC), Ljubljana classification (LC) and squamous intraepithelial neoplasia classification (SINC) (Spearman’s Correlation Analysis)

The mean correlation coefficients (MCC) of WHOC was significantly higher than that for LC (mean 0.55 ± 0.15 and 0.48 ± 0.14, P = 0.000, Paired Samples t test).

The MCC of 6 participants who evaluated the cases by three methods, WHOC, LC and SINC, were compared, but there was no significant difference (P = 0.24, Friedmans Test).

The Linear kappa analysis was performed for WHOC, LC and SINC for 65, 54 and 7 pairs as there were missing values for other comparisons, preventing the kappa analysis. The mean kappa values for WHOC, LC and SINC were 0.40 ± 0.11 (range: 0.19–0.65), 0.42 ± 0.12 (range: 0.5–0.71) and 0.42 ± 0.11 (range: 0.29–0.62), respectively (Table 3). The mean values of kappa statistics for matching pairs with WHOC, LC and SINC were 0.42 ± 0.10 (0.19–0.56) (33 pairs), 0.41 ± 0.12 (0.15–0.63) (33 pairs) and 0.37 ± 0.07 (0.29–0.47) (5 pairs). The results of the matching pairs with available kappa values were compared and there was no significant difference between WHOC and LC (P = 0.44 Paired samples t test, 33 comparisons), WHOC and SINC, as well as LC and SINC (P = 0.078 and P = 0.144, respectively, Wilcoxon Signed Ranks Test).

Table 3 The distribution of wKappa values for pairs of observers, with the World Health Organization classification (WHOC), Ljubljana classification (LC) and squamous intraepithelial neoplasia classification (MedCalc)

Discussion

The classification of head and neck squamous mucosal pre-neoplastic lesions has been a controversial issue. The criteria of the three most favored classifications, WHOC, LC and SINC, can be compared, but none of the categories match perfectly. Mild dysplasia of WHOC and SINI share features with basal/parabasal hyperplasia of LC. Moderate and severe dysplasia or SINII and III have many features of atypical hyperplasia of LC, but SINIII also includes the lesions that are described in the context of carcinoma in situ by WHOC and LC [16, 18].

The criteria for uterine cervical dysplasia have been adapted to laryngeal and oral lesions by WHOC, but the prognostic value and IA of this method in the head and neck are not as good as they are for the cervix [19]. The oral cavity is the other head and neck region where these classifications are applied. The mean IA with WHOC for oral cavity is also lower than that of the cervix; the kappa values were 0.15–0.41 and 0.23–0.45 for two series [8, 9] and the mean kappa value was 0.58 in another [12]. For laryngeal cases, the kappa value was 0.32 in the series by McLaren et al. [11] with WHOC. There are other series with categories censored, but we do not believe that these series reflect the IA of WHOC [14, 15]. The lack of data about the IA of the head and neck lesions by LC and SINC is the basis of this study, in addition to providing more data about the IA of WHOC.

Studies for determining IA have some shortcomings. During the examination, the participants are asked to perform things differently from their daily practices. They have to evaluate the same group of patients following each other. They cannot demand new sections or apply additional methods or make a comment about the findings, and they have to select a category. They cannot ask for a repeat biopsy. The sections being evaluated may be from different centers with variable qualities of tissue processing and staining. They are asked to apply different classifications from what they used to do. For example, in this series, only two participants stated that he/she applied the three classifications in daily practice, and one stated that he/she used two (WHOC and LC). The others applied only one, with a high rate of WHOC.

During the reporting of the biopsies, the responsibility is high, and probably more time is spent compared to a research project like the present one. There may be bias towards seeing a pre-neoplastic lesion, as the working set is intended for this. Last but not least, an important factor may be related to the errors saving the results, during the evaluation, writing onto the computer, and homogenization of the data for statistical analyses. We tried to overcome the above-mentioned factors with as much care as possible.

The number of cases studied in this series are lower than the previous ones which evaluated the IA of WHOC for laryngeal biopsies, but in this series, there were more participants and the participants were requested to perform three classifications allowing the comparison of classifications in one series [8, 9, 11, 12]. The participants in this series were pathologists interested in head and neck pathology; hence, the results may not reflect all the pathologists who do not have a special interest or experience in this field. On the other hand, although all the participants were interested in head and neck pathology, some gave statistically significant lower or higher results than the others.

In this series, the weighted kappa statistics were preferred as unweighted kappa considers all disagreements to be equally important, but the weighted kappa provides better results if there is a low degree of disagreement and worse results if the degree of disagreement is high. For best results with this method, the categories should preferably be nearly equally different from each other [20]. As we intended to prefer the weighted kappa statistics, we had to censor the simple and basal-parabasal hyperplasia categories for LC, as in the recent article by Gale et al. [6]; these two were grouped together as benign lesions with low malignant potential.

In this series, the correlation coefficients of the WHOC were higher than LC for laryngeal pre-neoplastic lesions. The wkappa values of LC, WHOC and SINC were not different. These kappa values were in concordance with the previous findings, yielding fair, nearly moderate agreement (wkappa = 0.40) by WHOC. Both LC and SINC results showed moderate agreement (wkappa = 0.42 and 0.42, respectively), but there was no difference between these three classifications.

Previous studies have suggested that the molecular basis for LC classification for laryngeal neoplastic progression is stronger than for the others [2124]. This series presents data about IA of WHOC, and novel information about LC and SINC of laryngeal pre-neoplastic lesions. We could not achieve good IA wkappa values by any of the three most favored classifications for preneoplastic laryngeal lesions, in agreement with the previous findings. These results probably point to requirements for the improvement of classifications or pathologists, or even better, both. Although the MCC of WHOC was higher as the wkappa were not significantly different, the findings in this series were not in favor of any of the classifications for better IA for laryngeal lesions.