Introduction

Colorectal cancer (CRC), clinically defined as the infiltration of the submucosa by neoplastic cells, is the most common malignancy of the gastrointestinal tract [1, 2]. In the polypoid/sessile lesion, when the infiltration is limited to the submucosa with preservation of the muscularis propria layer (pT1 in the TNM classification), the lesion is labelled as malignant polyp and as such amenable to curative endoscopic resection [2]. The clinical potential to prevent advanced neoplastic types (> pT1 in the TNM classification) relies on the endoscopist’s ability to identify and characterize superficial neoplastic lesions (SNL) at colonoscopy. These lesions are usually intended as those with a macroscopic morphology likely indicative of neoplastic infiltration limited to the submucosa [3, 4]. International guidelines [5,6,7] recommend an accurate macroscopic characterization of them to suggest in vivo the histologic results and, consequently, guide the most appropriate therapeutic approach. Although easily available, the predictive ability of the parameters evaluated in white light endoscopy, such as size, site and morphology (in accordance with the Paris classification [3]), is far from optimal as they are not always in keeping with the histology of resected neoplasms. With the intent to ameliorate concordance between the optical characterization and corresponding histology, new imaging techniques, such as zoom-magnification, and traditional or virtual chromoendoscopy have provided additional details. Surveying both the glandular pattern and the vascularity of the neoplasm, a distinction between neoplastic and non-neoplastic polyps and, within neoplastic ones, among those infiltrating or not the submucosa will be greatly improved [7,8,9].

Currently, there are several optical endoscopic classifications of superficial colonic lesions [10, 11]. The Kudo classification [12] has been the first one to be introduced in clinical practice in 1994. By using traditional chromoendoscopy combined with zoom magnification (up to 100x) of endoscopic images, the authors called attention to the microarchitecture of the glandular orifices (the so-called pit-pattern), which would facilitate the distinction among non-neoplastic, adenomatous, or cancerous lesions. The advent of virtual chromoendoscopy, mainly Narrow Band Imaging (NBI), speeded the acquisition of this information so that several optical classification systems have been proposed but partly validated. The main limitation of the NBI classifications is the requirement of zoom-magnification, a tool which is widespread among Eastern endoscopists, but of limited use in Western practice [7, 10, 11]. To overcome this problem, a new system, the NBI International Colorectal Endoscopic (NICE) Classification, which employs the NBI technique without magnification, has been developed [12, 13]. Nowadays, it would represent the only validated system to be adopted in clinical practice with unmagnified endoscopy. On the other hand, due to the limited availability of endoscopes with a magnification ability of observed pictures, also the original Kudo system has been adopted in several centers by means of NBI without magnification; however, this practice has not received experimentally validation yet [7, 15,16,17,18,19]. Indeed, although through the unmagnified NBI is not feasible to exactly emphasize the pit-pattern, but a pit-like pattern, several authors suggest its use anyway [7, 15].

In the present investigation, Western endoscopists attempted to categorize 64 superficial colonic lesions by both the NICE and the Kudo classifications, without using magnification of endoscopic pictures. The main purpose was to evaluate the relative performances of the two systems in predicting in vivo what histology of resected specimens would ultimately report. As a secondary aim we intended to evaluate the inter-observer agreement of participants in defining the neoplasms according to each of two classification systems.

Materials and methods

This study was carried out at the Gastroenterology and Digestive Endoscopy Division of the “Casa Sollievo della Sofferenza” Foundation, IRCCS (San Giovanni Rotondo, Italy). We conducted a comparative observational study in which 11 endoscopists were involved.

Before proceeding with the study, all operators had to attend a 1-h internal meeting when pertinent literature was reviewed and details of both the NICE and the Kudo classifications explored. Subsequently, two different sets of 25 endoscopic pictures of colonic superficial lesions, retrieved from the figures accompanying available literature, were emailed to the investigators in a Powerpoint file, preceded by a summary of each system. The neoplasms category, reported in the legends of these illustrations, was used as gold standard. Respondents were blinded to these descriptions and had to assess the lesion subtype using both the NICE and the Kudo classifications. For the latter, indigo carmine-spray pictures were employed. After sending the individual response, each investigator was made aware of the “correct” classification. In a final meeting with all participants, residual doubts concerning the two systems were solved.

Study design

Archived videos of colonoscopies recorded at our Endoscopy Unit with Olympus high definition colonoscopes (CF-Q 180, CF-H 185, CF-H 190 and CF-HQ 190 Olympus Medical Systems, Tokyo, Japan), in white light and NBI mode but without magnification, were retrieved. Exclusion criteria were poor bowel preparation, familial polyposis and non-polyposic syndromes, chronic inflammatory bowel diseases, and pedunculated polyps. Lesions of all sizes (diminutive, ≤ 5 mm; small, 6–9 mm; large, ≥ 10 mm), with sessile or flat (simple or mixed) morphology, classified according to the Paris system [3], were included. After selecting 64 high-quality records depicting superficial colonic lesions, short videoclips varying in length from 7 s to 4 min were created. We sent them to the participants by a Google Drive link, twice 4 months apart: in the initial invoice, observers had to classify the lesions according to the NICE classification (1, 2, 3 categories); 4 months later, they inspected the same lesions and characterized them according to the Kudo classification (I, II, IIIS, IIIL, IV, Vi not in demarcated area, Vi in demarcated area, Vn), as suggested by Matsuda and colleagues [20].

For the correct classification, corresponding histology of resected specimens was taken as the gold standard. Lesions categorized according to the Kudo system were grouped using the clinical classification of Matsuda et al [10, 20], as follows: (1) non neoplastic (Kudo’s I, II), (2) noninvasive neoplastic (Kudo’s IIIS, IIIL, IV, Vi not in demarcated area), and (3) invasive neoplastic (Kudo’s Vi in demarcated area, Vn) (Fig. 1). By the NICE classification, subtypes 1, 2, and 3 were also categorized as non-neoplastic, noninvasive neoplastic, and invasive neoplastic, respectively (Fig. 1).

Fig. 1.
figure 1

Relationship between Kudo’s and NICE classification and clinical classification suggested by Matsuda [20].

As to the corresponded histology and the management of these three categories of lesions, in our series, we considered as non-neoplastic lesions the hyperplastic ones, therefore manageable without treatment or at most with endoscopic resection; as noninvasive neoplastic neoplasms the adenomatous SNL and the intra-mucosal carcinomas, both susceptible of endoscopic treatment. Finally, in agreement with Puig et al [21], we considered as invasive neoplastic lesions the T1 submucosal deep (> 1000 μ) or T2 carcinomas, consequently to send to resective surgery [20].

Outcome

Main outcome of this study was to evaluate the overall accuracy of each of the two optical classifications in anticipating the final histologic diagnosis of superficial colonic lesions and to compare one system to the other one. Distinct sensitivity sub-analyses were pre-planned to investigate the ability of each classification in distinguishing (1) neoplastic vs. non-neoplastic, (2) non-invasive vs. invasive neoplastic, and (3) endoscopically vs. surgically amenable lesions. As a secondary aim, the assessment of the inter-observer agreement among the 11 participants, overall and for individual categories, was also computed.

Statistical analysis

Performances were assessed in terms of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy. The latter was presented as 95% confidence interval. Statistical differences of diagnostic accuracies were analyzed by the Chi- Square test. Differences of diagnostic sensitivity, specificity, PPV, and NPV were analyzed by the exact binomial test for paired data. A ρ value of less than 0.05 was considered significantly different. Interobserver agreement was estimated using the Cohen’s kappa coefficient (κ). To overcome a potential kappa paradox [22, 23], the agreement was also assessed by means of the Gwet’s AC1 coefficient. 95% confidence intervals (95%CI) were considered. All statistical analyses were performed using SAS Software Release 9.4 (SAS Institute, Cary, NC).

Results

Size, gross morphology, and histology of the superficial colonic lesions depicted in the 64 videos were the following: 12 lesions were diminutive in size, 17 small, and 35 large; as to their Paris macroscopic morphologies, 43 lesions were of single type (24 defined as 0-Is, 17 as 0-IIa, and the remaining two as 0-IIb), and 21 were of mixed type (9 as 0-IIa+Is, 8 as 0-IIa+IIc, and 4 as 0-Is+IIc); at histology, hyperplastic lesions (no.= 7), low (no.= 39) or high grade (no.= 5) dysplastic lesions, and 13 carcinomas [3 intra-mucosal (High grade dysplasia according to WHO classification [24]), 5 deeply infiltrating the submucosa and 5 infiltrating the muscularis propria layer] were ascertained.

The in vivo ability of both the NICE and the Kudo classifications to predict the histologic subtypes of the 64 superficial neoplasms is indicated in Table 1: assuming histology as the gold standard, the overall diagnostic accuracy amounted to 82% (95%CI: 79-85) with the former system, and to 81% (95%CI: 78–84) with the latter one; the difference was not significant (ρ = 0.78). Next, we calculated the agreement of the in vivo vs. the ex vivo definition for each category of lesions (Fig. 1), and the average percentages of correct classifications provided by the 11 study participants are shown in Table 1. For non-neoplastic lesions, the value amounted to 73% (95%CI: 63–83) with the Kudo system and to 87% (95%CI: 79–94) with the NICE and, when compared, statistically significant difference was found (ρ = 0.03); for all other categories, accuracy rates were ≥ 80% with each of the two systems and did not differ significantly.

Table 1 Diagnostic performances of the Kudo and NICE classifications for the in vivo prediction of histology of 64 superficial colonic lesions

Sub-analyses (Table 2)

A. Non-neoplastic (hyperplastic) vs. neoplastic (noninvasive and invasive) lesions

The two classification systems proved to have a comparable, high diagnostic accuracy in distinguishing between hyperplastic from proliferative polyps: 90% (95%CI: 88–92) with the Kudo, and 91% (95% CI: 89–93) with the NICE system. After considering separately the Se, Sp, PPV, and NPV rates, the two classifications had optimal (> 90%) values for both Se and PPV, comparable but low (≤ 55%) values for NPV, and acceptable but significantly different values for Sp: 73% (95%CI: 63–83) for the Kudo system and 87% (95%CI: 79–94) for the NICE classification (ρ = 0.02). At the exact binomial test, the PPV values also differed (ρ = 0.01).

Table 2 Statistical sub-analysis among the NICE and the Kudo classifications

B. Non-invasive vs. invasive neoplastic lesions

For the second sub-analysis, we excluded the seven non neoplastic neoplasms and considered only the neoplastic ones (57 videoclips). The attempt to ascertain in vivo whether a macroscopic superficial proliferative lesion may or may not involve the submucosa is of paramount clinical relevance. To this purpose one classification system did not perform better than the other one (ρ > 0.05). The individual rates for Se, Sp, PPV, and NPV came out with equal values between the two classifications, but it is worth noting that numerically the NPV values were higher than the PPV values for each of the two classification systems: NPV values of 96% (95% CI: 94–98) for the Kudo and of 97% (95% CI: 96–99) for the NICE classification and PPV values of 69% (95%CI: 61–77) and 65% (95%CI: 58–73), respectively.

C. Endoscopic vs. surgical approach

Endoscopic resection was considered the standard of care for 54 superficial neoplasms (the seven non-neoplastic and the 47 noninvasive neoplastic lesions); the remaining 10 ones were considered to send to surgery for resection. Relying on histology of resected specimens as the final parameter to verify the correctness of the endoscopists’ therapeutic choice, the indication was 91% correct with both the Kudo and the NICE classification systems and did not differ between the two. Rates of Se, Sp, NPV for each classification were satisfactorily in the high range, while still valuable but lower values for PPV were registered: 68% (95%CI: 60-76) for Kudo, and 65% (95%CI: 58–73) for NICE. A comparative performance of the two systems did not yield a significant value.

The inter-observer agreement (Table 3)

The concordance in detailing the endoscopic features of the 64 superficial colonic lesions among the 11 participating operators was assessed by measuring both the κ- and the AC1 values. Overall, according to the κ-statistics the inter-observer agreement scored 0.49 (95%CI: 39-59) with the Kudo system and 0.56 (95%CI: 46–66) with the NICE one; the AC1 rates were 0.66 (95%CI: 58–73) and 0.67 (95%CI: 59–75), respectively. By applying the Landis and Koch [25] scale, the agreement was rated moderate with the κ-statistics and substantial with the Gwet’s one.

Table 3 Inter-observer agreement (κ- and AC1-values with 95% confidence intervals) for the NICE and the Kudo classifications

Next, we further detailed the agreement for the in vivo identification of hyperplastic lesions, proliferative noninvasive ones, and invasive cancers. As reported in Table 3, the Cohen’s κ-values for each group ranged from 0.1 to 0.25 with both classifications, all pointing towards a slight-fair agreement [25]; corresponding values with the Gwet’s statistics scored in the range of 0.56 to 0.71 with the Kudo classification and from 0.69 to 0.78 with the NICE’s one, indicating a moderate-substantial agreement. With the latter analysis, the NICE classification numerically outperformed the Kudo system for two of the three considered histologic subtypes. However, the only difference in terms of agreement category was highlighted in the group of non-neoplastic lesions (0.56 vs. 0.74), corresponding to a moderate and a substantial agreement, respectively.

Discussion

For pedunculated polyps, the accuracy of their optical characterization is of secondary relevance as for all of them endoscopic removal is recommended. On the contrary, for sessile and flat lesions an accurate evaluation and prediction of an invasive behavior bears relevant therapeutic implications [8], as the risk of a lymph node spreading occurs only when the infiltration extends beyond the mucosa. In addition, recent studies have further specified that the risk is almost zero when infiltration is limited to the first 1000 μ of the submucosa (sm1) [5, 20]. Consequently, in vivo, the appreciation of a possible submucosal invasion of SNL is of crucial indication of the most appropriate treatment approach: spare surgery for lesions otherwise endoscopically resectable; avoid endoscopic resection for advanced neoplasms; proceed with an en-bloc resection of high grade SNL or those with suspected superficial invasion of the submucosa (e.g., Kudo Vi); and abstain from biopsy lesions with no apparent signs of deep submucosal invasion, to avert a fibrotic reaction that would affect the endoscopic resection. Moreover, relying on the characterization process, an endoscopist could also avoid the too expensive histological examination for all the diminutive lesions found or leave in situ the non-neoplastic ones in the recto-sigmoid region [2, 5, 6, 8, 26,27,28].

In order to accomplish with the previous intent, several classification systems have been produced, but many of them lack systematic external validation and have not been compared each other to ascertain the best performing one [5,6,7]. To the best of our knowledge, this study is the first one to compare overall the performances of the NICE and the Kudo classifications. In two previous works, both using a magnified image of the neoplasms, authors confronted the overall abilities of a classification system: the Kudo’s pit pattern with the help of chromoendoscopy or NBI was compared with the Hiroshima NBI classification, obtaining comparable or at most higher rates of the former analysis [29, 30]. Other studies, with or without magnification of pictures, reported the comparative performances of the many, not widely validated systems (e.g., Sano, Hiroshima, Showa classifications), and only partially compared their abilities (e.g., evaluating exclusively the ability in distinguishing between neoplastic vs. non-neoplastic) [31,32,33,34,35,36,37,38]. The NICE classification has been compared with another characterization system only in one work [39]. Finally, considering a classification as a whole, very few studies have assessed the inter-observer agreement by means of unmagnified NBI endoscopy [17].

In an attempt to partly overcome previous deficiencies, we carried out a comparative study where 11 Western endoscopists had to classify 64 superficial lesions of the colon with both the Kudo and NICE system. A distinctive advantage of this investigation is that neoplasms had to be categorized without the use of their amplification, a feature that would value our findings even for centers where the amplification technology is unavailable. For the single purposes, we have studied with both systems, values comparable and sometimes higher than those reported in the current literature using the same endoscopic tools were reported [1, 13,14,15,16,17,18, 21, 32].

As indicated in Table 1, overall, both systems were highly accurate (> 80%) and comparable in predicting in vivo what the histology would indicate after lesions were resected. This value resulted slightly higher than those reported in previous studies of unmagnified NBI endoscopy (76.7% using the NICE and 71.7% using the Kudo) [1, 17]. The accordance between the in vivo and the ex vivo characterization of the superficial neoplasms resulted excellent for the distinction of non-neoplastic from neoplastic lesions, of non-invasive from invasive neoplastic ones, and for the indication of the most appropriate therapeutic choice, whether an endoscopic or surgical resection had to be advised (Table 2).

By further scrutinizing the results of this investigation, interesting features were highlighted. In predicting a specific lesion category (Table 1), accuracy rates of the two classifications scored ≥ 80% for each group, with the remarkable exception of non-neoplastic lesions assessed with the Kudo system where the value amounted to 73% and shown significantly lower than the NICE one. Moreover, a better performance in terms of Sp and PPV of the NICE classification was displayed when the analysis was restricted to distinguish non-neoplastic vs. neoplastic lesions (Table 2). For this latter purpose, our rates were in good keeping with those in the pertinent literature using the same technologies, except for the NPV rate [13, 16, 18, 32]. We attributed this value to the low sample size of the non-neoplastic category, as only seven of them were confronted with 57 neoplastic ones. The diagnostic performances of the two classifications in distinguishing between invasive vs. noninvasive SNL and in directing the therapeutic option were high and comparable, being superimposable or higher than previously reported values [14, 15, 21]. However, as in previous studies, suboptimal, low (65% to 69%) PPV values emerged, and these results were again attributed to the unbalanced distribution of the observations: only in 10 lesions the “illness” was considered, therefore our number of the “true positives” was not very consistent.

Observing the calculated κ-values (Table 3), the overall agreement among raters resulted comparable with each of the two classification systems: 0.49 for the Kudo and 0.56 for the NICE. A higher agreement, but anyway lower than the > 0.80 value reported in the pertinent literature, was found with the AC1 statistics [17]. When we evaluated the agreement for each subtype, the κ-values were poor (0.10–0.25) with each of the two classifications, but with the Gwet’ statistics the AC1 values ranged from moderate to substantial agreement. This issue, known as the “κ paradox,” configures the situation where the κ-value is low despite a high level of agreement. Mathematically, this effect is explained by the fact that κ is influenced by the prevalence problem, due to a skewed distribution of categories, and by the degree to which coders disagree [22]. According to Landis and Koch [25], our AC1 values were different among the NICE and the Kudo classifications only for non-neoplastic lesions, in which a higher agreement was reported with the former (Table 3).

Our study has limitations. First, we acknowledge our results might reflect the experience of a single endoscopic center and not be indicative of a multicenter practice. A further study should assess the multicenter performances among observers working in different units to assure the accountability of our rates. The second limitation is the lack of sm1 lesions in our series, because of their rarity [14]. Unfortunately, even employing the magnification tool, the latter are the most difficult to characterize. To overcome this problem, some authors have even proposed a multi-step classification system which includes the sequential use of 3 endoscopic classifications (NICE, JNET, Kudo), associated with magnification, NBI, and traditional chromoendoscopy [8]. Another limitation could be envisaged in the lack of the use of the magnification. However, it should be acknowledged that worldwide in the majority Endoscopy Units this tool is missing, and the unmagnified vision represents the mostly adopted tool in endoscopic daily clinical practice. Finally, the slightly lower performance of the Kudo classification could depend on our training module which used only dye-based images, due to the lack in literature of NBI images assessed with this system.

In conclusion, our results demonstrated that through the NICE classification there might be greater confidence in recognizing a non-neoplastic polyp, thus saving cost of the histologic investigation. Moreover, although the NICE classification outperformed the Kudo’s in some specific sub-analyzes, the two systems can be considered comparable. Finally, the overall accuracy of 80% is probably not high enough to allow a confident therapeutic choice (i.e., leaving it in place, endoscopic or surgical removal); however, higher (> 90%) accuracy rates were registered when differentiating non-neoplastic vs neoplastic lesions, non-invasive vs invasive polyps, and to suggest an endoscopic vs surgical option for removal. These would justify their safer clinical use and could allow the use of the NBI Kudo classification in daily endoscopic practice even in those centers where magnification is not available.