1 Introduction

Gastrointestinal (GI) diseases are among the most frequently occurring diseases, posing a real threat to population health. These GI diseases include GI bleeding, Crohn’s disease, tumors, and ulcers. According to the Global Cancer Observatory (GCO) statistics, issued in 2020, it is estimated that there will be 18.99 million new cases of cancers and 10.1 million deaths [1]. Among these cancer cases, three of the nine most occurring types occur in the GI tract. These three are the esophageal, stomach (gastric), and colorectal cancer types. In particular, polyps represent the main cause of colorectal cancer [2]. Polyps have different sizes ranging from very small to large. Most studies show that diminutive polyps are mostly over-looked with a miss rate of 14 to 30% [3]. Careful investigation and early detection are crucial to decreasing the risk of getting colorectal cancer.

Various clinical methods have been proposed for detecting GI diseases. Among these methods, endoscopy is considered the gold standard [4, 5]. Specifically, gastroscopy, colonoscopy, and wireless capsule endoscopy (WCE) are the most common endoscopic modalities for examining and evaluating the GI tract. Gastroscopy is used to examine the upper parts of the GI tract, such as the esophagus, stomach, and the first part of the small bowel (duodenum). Colonoscopy is used for evaluating lower GI parts such as the colon, and rectum. However, gastroscopy and colonoscopy offer limited visualization in the long GI tract. Only proximal duodenum and ileum can be accessed by these procedures. In addition, these endoscopic procedures are highly invasive and lead to substantial patient discomfort and anxiety. To alleviate these problems, wireless capsule endoscopy (WCE) was introduced in 2000 [6], and it enabled the examination of the entire GI tract non-invasively. In a WCE procedure, a small capsule that is swallowed through the mouth is ejected through the anus. Since WCE is a non-invasive procedure, its usage has gotten more attention in diagnosing many GI diseases.

Generally, in WCE, the capsule is swallowed and allowed to travel naturally through the GI tract for up to 8 h, where the capsule typically captures a large number of images (approximately 55,000–60,000 images). These images are then stored on a computer and manually investigated by endoscopists to detect possible abnormalities. Obviously, it is a tedious and time-consuming process for endoscopists to go through all images. The review process is also prone to human errors. More importantly, false-positive and false-negative results can arise mainly due to missing important details. On top of this, false-positive results may cause unnecessary anxiety to patients whereas false-negative results may delay the detection of critical disease conditions, which may evolve into a stage where the disease becomes incurable or even fatal. Moreover, the ability to detect and classify abnormalities also differs from one physician to the other. Therefore, an accurate automatic method for the detection and classification of GI abnormalities is highly important to arrive at timely treatment decisions while saving substantial costs and labor effort.

Several methods related to the automatic detection of gastrointestinal anomalies have been reported in the literature. The Kvasir dataset [7] is an annotated dataset of endoscopic images of the GI tract. Most researchers who worked on this dataset focused on the detection and multi-class classification of anatomical landmarks and diseased tissues in the GI tract. Pogorelov et al. [7] proposed a system to classify multi-class GI endoscopic images. They used different types of techniques, such as convolutional neural networks (CNN) (including the pre-trained Inception V3 model), random forests, and the logistic model tree (LMT). Among these methods, the LMT approach outperformed the others with an accuracy of 93.3%. On the same dataset, Agrawal et al. proposed a technique that employs novel features for training a support vector classifier for GI images [8]. In this method, three types of features were extracted: conventional hand-crafted features and CNN features obtained from the VGGnet and Inception-V3 pre-trained models. By fusing these types of features, accuracy, and an F1-score of 0.961 and 0.847 were obtained, respectively.

Liu et al. [9] used the bidirectional marginal Fisher analysis (BMFA) and support vector machines (SVM) to classify various landmarks and anomalies of the GI tract. The authors used six types of hand-crafted features provided along with the Kvasir dataset [7]. Experimental results with accuracy, recall, and specificity values of 0.9257, 0.7028, and 0.9575 were obtained, respectively. Nadeem et al. [10] fused different textures and deep learning features. Firstly, CNN features were obtained using the VGG-19 pre-trained model. These CNN features were then combined with conventional texture features, namely the Haralick features and the local binary pattern (LBP) features. Finally, by utilizing the logistic regression classifier, the system achieved accuracy and an F1 score of 83% and 82%, respectively. Furthermore, Gamage et al. [11] proposed a system utilizing CNN features for GI image classification. These features were extracted from three types of pre-trained networks: DenseNet-201, ResNet-18, and VGG-16. Feature vectors were then fused and fed into an artificial neural network (ANN) classifier, reaching an accuracy of over 97%.

Various techniques have been devised for polyp detection [12,13,14,15,16]. In [15], the authors proposed a polyp detection technique that utilizes texture features of the Red, Green, and Blue (RGB) and Hue, Saturation, and Intensity (HSI) color spaces for differentiating polyp tissues from the normal ones. Texture features based on the discrete wavelet transform as well as uniform LBP features were fused and used to train SVM classifiers for polyp detection. Experimental results showed that the classifier based on RGB color features achieved the best accuracy of 91.6%.

Most of the above-mentioned studies used pixel-based representations. However, it is crucial to note that such a representation is not inherently natural but rather an artifact of digital imaging. Consequently, a more intuitive and perceptually meaningful approach involves working with image representations that consider both image geometry and appearance. Addressing this need, superpixel image representations have been proposed [17, 18]. In such models, similar image pixels are clustered based on the similarity of appearance or color attributes. These superpixel models could speed up the image processing time and improve the results on different image processing tasks [19].

The term superpixel is used to describe a cluster of similar pixels with a similar color or appearance property [20]. Superpixels have been used for various types of applications in medical imaging [21, 22]. Several methods have been proposed in the literature for generating superpixels. Those can be mainly classified as: graph-based algorithms [23, 24], gradient-ascent methods [25, 26], or clustering methods [19].

Earlier methods investigated the applicability of superpixels in GI image analysis. Iakovidis et al. investigated using salient superpixels for blood detection in the images of wireless capsule endoscopy (WCE) [21]. Xing et al. dealt with the GI bleeding detection problem using color histograms of WCE image superpixels, and a subspace KNN classifier [27]. While these two superpixel-based methods achieved good performance, they were tailored only to identify GI bleeding spots.

This work presents a superpixel-based segmentation and classification approach for endoscopic images of the GI tract. This study is an extension of our previous work [28], which performed GI image superpixel segmentation using the simple linear iterative clustering (SLIC) method [19]. This paper exploited another superpixel segmentation method, namely linear spectral clustering (LSC) [29] and compares its outcomes against those of the SLIC-based segmentation method as well as pixel-based methods. Various texture and color features were extracted from generated superpixels to train and test binary support vector machines for each GI part. In addition, segmentation outcomes of both SLIC and LSC methods were evaluated and compared based on evaluation metrics, such as the Dice coefficient and the intersection-over-union. The main contribution of this work is to show how SLIC and LSC superpixel methods can be used to segment and classify various GI diseases and landmarks of endoscopic images. The proposed superpixel-based methods lead to superior classification performance for the different GI regions. This performance appears to be clearly better than that of conventional pixel-based classification methods.

2 Materials and Methods

2.1 Dataset

Two datasets of GI images were used for realizing the proposed framework: the Kvasir-V2 and Kvasir-SEG datasets. The Kvasir-V2 dataset was used for GI image classification, whereas the Kvasir-SEG dataset was utilized for evaluating the segmentation outcomes of different superpixel methods. The details of both datasets are explained as follows.

Kvasir V2 Dataset:

This dataset consists of annotated and verified collection of GI images taken with an endoscope [7]. The main objective of this dataset is to facilitate the evaluation and comparison of different methods for GI image classification, detection of GI landmarks, object localization in GI images, and the diagnosis of endoscopic diseases of the GI tract. The dataset has 8000 images representing eight image classes, with 1000 images for each class. All the images were annotated by highly trained experts. The image classes can be broadly classified into three categories, namely, anatomical landmarks (z-line, pylorus, and cecum), GI diseases (esophagitis, polyps, and ulcerative colitis), and polyps removal procedures (dyed and lifted polyps, and dyed resection margins). The image resolution in this dataset ranges from 720 × 576 up to 1920 × 1072 pixels. Sample images from this dataset are shown in Fig. 1.

Fig. 1
figure 1

Sample images from the Kvasir dataset (a) esophagitis, (b) normal z-line, (c) normal pylorus, (d) polyp

Kvasir-SEG Dataset:

This dataset consists of 1000 annotated polyp images with their respective truth masks [30]. The image resolution in this dataset ranges from 332 × 487 to 1920 × 1072 pixels. This dataset is mainly used to develop new and improved techniques for segmenting, detecting, localizing, and classifying polyps. In our work, we used this dataset to quantitatively evaluate and compare segmentation results of both SLIC- and LSC-based segmentation methods. A few samples of polyps and the corresponding masks are depicted in Fig. 2.

Fig. 2
figure 2

Sample polyp images and their corresponding masks from the Kvasir-SEG dataset

2.2 Proposed Method

Figure 3 presents the general block diagram of the proposed system. The system includes modules for superpixel segmentation, feature extraction, superpixel classification, decision-level fusion, and finally GI image classification. The system was implemented in MATLAB 2016b on a Lenovo IdeaPad 330 computer with an Intel Core i7 processor and an 8-GB RAM.

Fig. 3
figure 3

A block diagram of the proposed computer-aided diagnosis system for detection and classification of diseases and landmarks of the GI tract

All the images were resized to the size of the smallest image in the Kvasir dataset (720 × 576) before applying other image processing modules. This was done to reduce the computational cost and thus speed-up detection and classification in GI images.

2.2.1 Superpixel Segmentation

Superpixel Segmentation with Linear Spectral Clustering

The linear spectral clustering (LSC) algorithm [29] was proposed based on the investigation of the relationship between the objective functions of normalized cuts [20] and weighted K-means [31]. The LSC algorithm preserves the perpetually essential global image properties. In addition, this algorithm has linear complexity and high memory efficiency. In particular, the LSC algorithm uses simple weighted K-means clustering for image segmentation. LSC approach avoids the high complexity of the spectral method for minimizing the normalized cuts. In the LSC method, image pixels are mapped into a 10-dimensional feature space for improving linear separability. Study shows that the LSC-based segmentation method provides better segmentation results than existing superpixel algorithms [29]. Figure 4 illustrates the LSC-based segmentation results for some sample images from the Kvasir v2 dataset.

Fig. 4
figure 4

LSC-based superpixel segmentation of sample images from the Kvasir dataset with different numbers of superpixels: (a) 25, (b) 50, and (c) 100 superpixels

The LSC segmentation output is controlled through tuning the ratio r = Cs/Cc, where Cs and Cc are parameters used for measuring color uniformity and spatial proximity. A careful selection of the r parameter can lead to a better segmentation output that adheres to natural image boundaries. When the r-value is large, superpixels with high shape regularity will be formed while fewer boundary pixels are correctly recovered. On the contrary, if the r-value is small, the distance in color dominates, forcing pixels with similar colors to be clustered together. Consequently, irregular superpixels with better boundary adherence will be generated. Such a trend can be visually observed in Fig. 5. Therefore, the selection of the r-value can be considered as seeking a balance between shape regularity and boundary adherence.

Fig. 5
figure 5

LSC-based superpixel segmentation for a sample esophagitis image with 25 superpixels at r values of (a) 0.05, (b) 0.1, (c) 0.3, and (d) 0.5

Superpixel Segmentation with Simple Linear Iterative Clustering

Simple linear iterative (SLIC) [19] is another commonly used method for superpixel segmentation. The SLIC method clusters image pixels to efficiently generate compact and nearly uniform superpixels. The SLIC technique requires two parameters, the number of superpixels (K) and a compactness value (c), that tweaks the smoothness of the superpixel contours. A large c-value means high dominance of the spatial proximity criterion, resulting in regular and compact segmentation. The c-value typically ranges from 1 to 20. In this paper, we conducted a grid search to find the best K and c-value for each classification problem. Figure 6 shows the SLIC-based segmentation results for some sample images from the Kvasir v2 dataset.

Fig. 6
figure 6

SLIC-based superpixel segmentation of sample images from the Kvasir dataset with different numbers of superpixels: (a) 25, (b) 50, and (c) 100 superpixels

2.2.2 Feature Extraction

In this work, we investigated different types of texture and color superpixel features for GI image segmentation and classification. In particular, we employed three types of texture features, namely, local binary patterns, second-order statistical features derived from gray-level co-occurrence matrices, and first-order statistical features. We explored these features for grayscale images as well as color images in the RGB and HSV color spaces.

Texture features

  1. (a)

    Local binary patterns

    A local binary pattern (LBP) is an effective visual descriptor mostly utilized for classification in computer vision [32]. These patterns give simple and efficient representations of local image characteristics. Numerous LBP applications have been reported including face detection [33, 34], demographic classification [35, 36], and other related applications [37, 38]. In addition, LBP has been used for the detection and classification of GI diseases such as GI bleeding, tumor and other disease regions of various endoscopic images [39,40,41].

    Figure 7(a) demonstrates how LBP is calculated for a 3 × 3 image block with a radius of 1 and 8 neighbors. In Fig. 7(b), the relationship between different values of the radius (R) and the number of neighbors (M) is shown. For an image with a center pixel coordinate (\({x}_{c}\), \({y}_{c}\)), M neighboring pixels, and a neighborhood radius R, the LBP code can be calculated as [42]:

    $${LBP}_{P,R({x}_{c}, {y}_{c})}^{ }={\sum }_{m=0}^{M-1}s({U}_{m}-{U}_{c})\times {2}^{m}$$
    (1)

    where \({U}_{m},{U}_{c}\) represent the gray-scale intensities at the center and neighboring pixels. The function \(s\left(x\right)\) is defined as:

    Fig. 7
    figure 7

    Example calculation of local binary patterns: (a) LBP value calculation for 3 × 3 blocks with a radius R = 1 and M = 8 neighbors, (b) the relationship between R and M for circular LBP with R = {1, 2, 3} and M = {8, 12, 16}

    $$s\left(x\right)=\left\{\begin{array}{ll}1& if\;x\ge 0\\ 0& if\;x<0\end{array}\right..$$
    (2)

    Based on the above formulas, we can evaluate LBP codes and produce endoscopic image features for GI image classification.

  2. (b)

    Gray-level co-occurrence matrices

    Haralick et al. introduced statistical texture features based on gray-level co-occurrence matrices (GLCM) [43]. This technique has been widely used in image analysis tasks [44]–[45], especially in biomedical image analysis [46, 47]. For this approach, feature extraction is carried out in two steps: GLCM computation followed by the calculation of statistical GLCM-based texture features. A GLCM shows how often each gray level occurs at a pixel located at a fixed geometric position relative to another pixel with a different gray level. The horizontal direction, 00, with a default offset of one (nearest neighbor) was used in this paper.

    We computed 18 GLCM features in this paper. These features are the autocorrelation, contrast, energy, entropy, correlation, cluster prominence, cluster shade, dissimilarity, sum variance, homogeneity, maximum probability, the sum of variance squares, sum average, sum entropy, difference variance, difference entropy, information measure of correlation, and inverse difference momentum.

  3. (c)

    Fist-order statistical (FOS) features

    We computed first-order statistical (FOS) descriptors from first-order histograms of gray-level images. The first-order histogram \(H(i)\) for a gray-level intensity value \(i\) is calculated as [48]:

    $$H\left(i\right)=\frac{GP(i)}{T}$$
    (3)

    where \(GP(i)\) is the number of image pixels with the gray-level \(i\) and \(T\) is total number of pixels in the image. Based on the above definition of the first-order histogram, the mean image intensity (\(\mu\)) and its central moments \({\mu }_{k}\) are given by:

    $$\begin{array}{ccc}\mu=\sum_{i=0}^{L-1}iH(i)&\mathrm{and}&\mu_k=\sum\nolimits_{i=0}^{L-1}\left(i-m\right)^kH(i)\end{array}\\$$
    (4)

    where L is the total number of gray-level values, and \(k=\) 2, 3, 4.

    The variance (μ_2), skewness (μ_3), and kurtosis ( μ_4) are the most widely used central moments in medical image analysis. In our study, we adopt these moments along with the mean intensity.

Multi-Channel GLCM and FOS Features

Texture features were extracted in gray-level images, as well as images in two common color spaces, namely, the RGB and HSV color spaces. Specifically, the GLCM and FOS features were extracted from 5 image channels (red, green, blue, hue, and gray-scale channels), as well as the LBP map associated with the gray-scale image. Also, we computed 36-bin LBP histograms for the gray-scale images. So, the overall number of features is (18 GLCM features + 4 FOS features) × 6 channels + 36 LBP histogram features = 168 features. From now on, we use the term multichannel GLCM (mGLCM) to indicate the features extracted from the aforementioned six image channels. We define the multichannel FOS features (mFOS) similarly.

2.2.3 Evaluation Metrics

To compare the classification performance of different pixel-based and superpixel-based methods, we used common evaluation metrics: accuracy, precision, recall, and specificity. In addition, we generated the receiver operating characteristic (ROC) curves and the associated areas under the ROC curves (AUC).

We have also evaluated the segmentation quality of both the SLIC and LSC superpixel-based methods based on certain metrics. These are: the Dice similarity coefficient (DSC) and the intersection over union (IoU). We computed these metrics using the ground-truth polyp segmentation data provided by the Kvasir-SEG dataset.

Dice Similarity Coefficient (DSC)

It is a standard measure for pixel-wise comparison of the predicted and ground-truth segmentation results. This measure is defined as:

$$DSC\left(P,G\right)=\frac{2*|F\cap G|}{\left|F\right|+|G|}=\frac{2*TP}{2*TP+FP+FN}$$
(5)

where \(F\) and \(G\) stand for the predicted and ground-truth object segmentation, respectively. Here, TP, FP, and FN represent the true-positive, false-positive, and false-negative counts, respectively.

Intersection over Union (IoU)

This metric measures the similarity between the predicted and ground-truth segmentation outcomes. The IoU metric can be defined mathematically as:

$$IoU\left(P,G\right)=\frac{F\cap G}{F\cup G}=\frac{TP(t)}{TP(t)+FP(t)+FN(t)}$$
(6)

where \(t\) is the threshold value, which was set to \(t=0.5\) in this work.

2.2.4 Classification of GI Images

In this study, support vector machine (SVM) [49] classifiers were used to classify various GI diseases and landmarks of endoscopic images. The SVM classifier has been used previously in the detection and classification of wireless-capsule endoscopy (WCE) images [12, 50,51,52]. The SVM classifier seeks to find a hyper-plane that maximizes the margin between samples of different classes. In our study, we considered an SVM with a polynomial-type kernel. For SVM training, we randomly chose 700 images from each class of the Kvasir v2 dataset. The remaining 300 images of each class were used for testing the trained SVM classifiers. We compared the classification results of both pixel- and superpixel-based methods based on values of accuracy, recall, specificity, precision, and AUC.

3 Results and Discussions

3.1 Qualitative Comparison and Parameter Tuning for GI Classification Methods

The SLIC- and LSC-based segmentation methods have specific parameters to control the segmentation processes. Varying these parameters will affect segmentation results both quantitatively and qualitatively. For this purpose, both segmentation methods were qualitatively compared at three different superpixel numbers (K): 25, 50, and 100. The effects of this variation of the K value on the SLIC and LSC segmentation outcomes are illustrated in Figs. 4 and 6, respectively. On the one hand, LSC-based segmentation achieves good adherence to natural image boundaries with visually intuitive, perceptually satisfactory, and uniform segmentation. On the other hand, SLIC-based segmentation provides regular and compact superpixels but lacks uniformity and doesn’t agree well with natural image boundaries. In Table 1, SLIC- and LSC-based segmentation results are compared in terms of the computational time at various superpixel numbers. Obviously, there is a slight increase in computational time as the number of superpixels increases. The table also shows that the SLIC method is moderately faster than the LSC one.

Table 1 Computational times (in sec.) of both the SLIC and LSC segmentation methods, where K is the number of superpixels

Figure 5 depicts how variations in the r parameter could affect LSC-based segmentation outcomes. We evaluated these outcomes at different r parameter values to recognize this effect. Clearly, small r values lead to good adherence and perceptually uniform segmentation outputs. On the contrary, as the r-value increases, the segmentation results become more compact and regular, but adherence to image boundaries is gradually lost.

Furthermore, classification based on the LSC method was evaluated at different r values (see Fig. 8). First, classification results for each GI part were found at different superpixel numbers (K = 10, 15, 20, 25, and 30) for a single r value (0.1). Then, we selected the K value at which the best results were obtained. The K values associated with the best performance for the upper, middle, and lower GI parts were found to be 10, 15, and 20, respectively. After fixing the K value for each GI part, we evaluated the classification performance at different r values of 0.1, 0.2, 0.3, 0.4, and 0.5. Finally, we obtained superior results at r values of 0.1, 0.05, and 0.4 for the classification problems of the upper, middle, and lower GI parts, respectively.

Fig. 8
figure 8

Accuracy at different values of the r parameter for the LSC superpixel segmentation

3.2 Classification of GI Images with Different Feature Combinations

We applied different combinations of the extracted feature types namely, LBP, mGLCM, and mFOS. So, we first tried each feature type individually. Then, we used pairs of feature types such as LBP-mFOS, LBP-mGLCM, as mFOS-mGLCM. Finally, we tried combining the three feature types as LBP-mFOS-mGLCM. So, we had a total of seven combinations: LBP, mFOS, mGLCM, LBP-mFOS, LBP-mGLCM, mFOS-mGLCM, and LBP-mFOS-mGLCM. Then, we compared classification results of all combinations based on accuracy, recall, specificity, precision, and AUC values. ROC curves for the best-performing feature combinations were also created.

Table 2 illustrates the classification results of the upper GI part (esophagitis versus normal z-line). Based on the parameter fine-tuning experiments in Section 3.1, we fixed the compactness parameter of the SLIC method at c = 1, while the r parameter of the LSC method was set to 0.1 (see Fig. 8). As shown in Table 2, the classification accuracy is improved when the SLIC-based segmentation is employed, particularly, when using the mGLCM features only. According to the medical context of our GI classification problems, recall (or sensitivity) is crucially important as it reflects the classifier's ability to detect anomalies. With this relative importance of the recall metric, the LSC-based method appears to be superior as it outperforms the other methods with a recall (or sensitivity) of 79%. In addition, the LSC-based method achieved the best average performance with an AUC value of 83.1% when the mFOS-mGLCM combination was used. However, the SLIC-based approach gives superior results with accuracy, specificity, and precision values of 77.54%, 77.89%, and 76.33%, respectively.

Table 2 Pixel- and superpixel-based classification results for the upper GI part (esophagitis versus normal z-line) at K = 10

Figure 9 shows the ROC curves of the three classification methods with the mFOS-mGLCM feature combination, which gives the highest AUC scores for classifying the images of the upper GI part into esophagitis and normal z-line images. As it can be observed, the LSC-based superpixel method generally outperforms the two other methods across the whole range of the false-positive rates. Indeed, the LSC method achieved the highest AUC value of 83.37% which is superior to those of the pixel- and SLIC-based methods.

Fig. 9
figure 9

ROC curves of the classification methods of the upper GI part (esophagitis versus normal z-line) using the mFOS-mGLCM feature combination

In Table 3, the classification results of the middle GI part (polyps versus normal pylorus) are shown. As the table shows, both superpixel-based classification methods were slightly better than the pixel-based one. This relative improvement can be particularly observed when the mGLCM and the LBP-mGLCM features were respectively used for SLIC- and LSC-based classification. Both the SLIC- and LSC-based methods achieved a classification accuracy of 98.5%. In addition, the high recall values for both methods show that our superpixel-based approach is nearly perfect for the detection of anomalies in the middle GI part. These recall values are 99.66% and 100% for the SLIC- and LSC-based classification, respectively. Furthermore, using the LBP-mGLCM feature combination, the LSC-based method provided a slightly higher AUC value of 99.78% compared to the other two methods.

Table 3 Pixel- and superpixel-based classification results for the middle GI part (polyps versus normal pylorus) at K = 15

Table 4 shows the classification results of the lower GI part using both superpixel- and pixel-based methods. The results show that the superpixel-based methods are moderately better than the pixel-based ones. For the superpixel-based methods and the LBP-mGLCM feature combination, the LSC method outperformed the SLIC one with an accuracy of 93.67%, a specificity of 95.8%, and a precision of 96%. However, the SLIC-based method achieved a higher value of recall (or sensitivity) of 94.16%. Moreover, the SLIC method achieved the best AUC score of 97.4%.

Table 4 Pixel- and superpixel-based classification results for the lower GI part (ulcerative colitis versus normal-cecum) at K = 20

Figure 10 illustrates the ROC curves of the three considered methods for the task of classifying the middle GI part (polyps versus normal pylorus) with the best LBP-mGLCM feature combination. Obviously, the LSC method achieved a slightly better ROC curve than the two other methods. Besides, the LSC method narrowly outperforms the other two methods with an AUC score of 99.78%.

Fig. 10
figure 10

ROC curves of the classification methods of the middle GI part (polyps versus normal pylorus) with the LBP-mGLCM feature combination

Figure 11 shows the ROC curves for classifying the lower GI section (cecum and ulcerative colitis) using the three classification methods with the LBP-mGLCM feature combination. The results show that the SLIC-based method outperforms the other methods for most of the range of the false-positive rates. In addition, while the AUC values of the three methods are quite close, the LSC method achieved the best AUC value of 97.4%.

Fig. 11
figure 11

ROC curves of the classification methods of the lower GI part (cecum versus ulcerative colitis) with the LBP-mGLCM feature combination

3.3 Comparative Evaluation Against Ground-Truth Polyp Segmentation

In this part of the study, we used 80 images of both small and large polyps for qualitative comparisons of the SLIC- and LSC-based segmentation methods. Samples of these images are shown in Fig. 12. We followed a series of steps to get specific locations of the polyps. First, segmentation maps were computed at various superpixel numbers. Then, we used the Dice coefficient value to obtain the exact locations of the polyp tissues. This was done by computing the Dice similarity coefficient for each generated superpixel and its respective ground truth. Finally, the superpixel with the highest Dice coefficient score was chosen to be the polyp object. As a result, superpixel numbers of 15 and 50 were used for the segmentation of small and large polyps, respectively. From visual inspection, the SLIC-based method provides better performance in segmenting small polyps. However, for segmenting large polyps, both methods approximately produce good segmentation results with slightly the LSC-based method giving better visual segmentation output than the SLIC-based one.

Fig. 12
figure 12

Visual comparison of superpixel-based polyp segmentation methods: (a) original images, (b) ground-truth maps, (c) SLIC-based segmentation outputs, (d) LSC-based segmentation outputs

In Fig. 13, both the LSC- and SLIC-based segmentation outputs were computed at various superpixel numbers (K) based on the Dice coefficient values. For small polyps, we tried numbers of superpixels in the range from K = 10 to K = 100. A peak Dice coefficient value was obtained at K = 50 with the LSC-based method (see Fig. 13(a)). Likewise, for large polyps, we tried numbers of superpixels starting from K = 10 up to K = 50, and obtained a peak Dice coefficient value at K = 15 (Fig. 13(b)) with the SLIC-based method.

Fig. 13
figure 13

Computation of the Dice coefficient values at various numbers of superpixels (K) for the segmentation of (a) small polyps, and (b) large polyps

Figure 14(a) shows box plots of the Dice coefficient and IoU values resulting from the segmentation of 40 images of large polyps using SLIC-based segmentation. The mean values of these metrics are 84.75% and 73.88%, respectively (See Table 5). Figure 14(b) shows similar box-plot results with LSC-based segmentation. The corresponding mean values of the Dice coefficient and the IoU are 85.68% and 75.02%, respectively (See Table 5). For small polyps, Fig. 15 shows similar box plots for the SLIC- and LSC-based segmentation methods. The mean Dice and IoU values are generally much lower than those of large polyps: 71.65% and 56.79% for the LSC-based method, and 76.31% and 62% for the SLIC-based method (See Table 5).

Fig. 14
figure 14

Box plots comparing Dice coefficient and IoU for superpixel-based segmentation across 40 large polyp images, showcasing (a) SLIC-based and (b) LSC-based segmentation results

Table 5 Mean DSC and IoU values of SLIC- and LSC-based segmentation method
Fig. 15
figure 15

Box plots comparing Dice coefficient and IoU for superpixel-based segmentation across 40 small polyp images, showcasing (a) SLIC-based and (b) LSC-based segmentation results

Furthermore, we carried out a student’s t-test to check the significance of the statistical difference between the SLIC- and LSC-based segmentation results for large and small polyps. In particular, a paired t-test comparison was performed in MS Excel. A p-value was computed for each of the Dice similarity coefficient (DSC) and the IoU values computed for 40 images of large and small polyps. The obtained p-values are summarized in Table 6. A significance level of p = 0.05 was used to assess the statistical significance of the difference between the SLIC- and LSC-based segmentation. Considering the obtained p-values for the Dice coefficient and IoU (see Table 6), the results show that there is no significant difference between the SLIC- and LSC-based methods in the segmentation of large polyps. However, for the segmentation of small polyps, the p-values show that there is a significant difference between the SLIC- and LSC-based segmentation methods. As a result, the SLIC-based method is moderately better than that of the LSC-based ones in segmenting small polyps.

Table 6 The p-values of the DSC and IoU obtained from pairwise Student’s t-tests for the difference between SLIC and LSC segmentation of 40 images of large and small polyps

4 Conclusion

In this study, we developed a superpixel-based computer-aided diagnosis system to enhance endoscopic image analysis, enabling more precise differentiation between diseased and healthy patterns in patients. Our comprehensive analysis revealed that superpixel-based classification, particularly using Linear Spectral Clustering (LSC) and Simple Linear Iterative Clustering (SLIC), is superior to traditional pixel-based methods. LSC demonstrated excellent boundary adherence and preserved image structure, while SLIC showed greater efficacy in classifying upper GI tract images, and LSC was more effective for the middle and lower GI sections, albeit with higher computational demands. The study emphasized the importance of selecting the appropriate superpixel method based on specific requirements and optimizing parameters for each classification scenario. Furthermore, we assessed segmentation quality using Dice coefficient and Intersection over Union (IoU) metrics, finding that SLIC was more effective in small polyp segmentation, while LSC matched its performance in segmenting larger polyps, illustrating the nuanced strengths and weaknesses of each method in different applications.

In the future, combined superpixel and deep learning methods for the classification of gastrointestinal (GI) diseases and landmarks can be devised. This could enable the model to capture a broader and more nuanced range of characteristics associated with GI conditions, potentially improving the diagnostic capabilities of the computer-aided diagnosis (CAD) system. The synergy between the precise segmentation provided by superpixel methods and the pattern recognition capabilities of deep learning could thus lead to a more robust and effective system for GI disease classification. Moreover, the CAD system in this study could be improved by performing a more detailed analysis of the number of superpixels, fine-tuning superpixel parameters that could affect segmentation output, and combining various types of features.