Abstract
Gastrointestinal (GI) diseases are among the most frequently occurring diseases that pose a significant threat to people’s health. The gold standard for diagnosing these diseases is endoscopic examination, yet this approach is resource-intensive, requiring costly equipment and specialized training. This study explores an alternative approach for GI image segmentation and classification, employing Simple Linear Iterative Clustering (SLIC) and Linear Spectral Clustering (LSC) superpixel methods. Analyzing images from the comprehensive Kvasir dataset, which represents different GI tract sections, the research applies three distinct features—local binary pattern, gray-level co-occurrence matrices, and first-order statistical features—for Support Vector Machine (SVM) classification. The results demonstrate that superpixel-based classification methods exceed traditional pixel-wise techniques in terms of accuracy and efficiency. Specifically, SLIC excels in upper GI tract analysis, yielding 77.33% accuracy, 77.89% sensitivity, and 76.8% specificity. Conversely, LSC shows superior performance for middle and lower GI sections, with accuracy, sensitivity, and specificity of 98.5%, 100%, and 97.1% for the middle GI, and 93.67%, 91.72%, and 95.8% for the lower GI tract, respectively. Moreover, SLIC operates faster than LSC. These findings highlight superpixel methods' potential to improve GI disease diagnosis, promising more efficient, accurate medical imaging.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Gastrointestinal (GI) diseases are among the most frequently occurring diseases, posing a real threat to population health. These GI diseases include GI bleeding, Crohn’s disease, tumors, and ulcers. According to the Global Cancer Observatory (GCO) statistics, issued in 2020, it is estimated that there will be 18.99 million new cases of cancers and 10.1 million deaths [1]. Among these cancer cases, three of the nine most occurring types occur in the GI tract. These three are the esophageal, stomach (gastric), and colorectal cancer types. In particular, polyps represent the main cause of colorectal cancer [2]. Polyps have different sizes ranging from very small to large. Most studies show that diminutive polyps are mostly over-looked with a miss rate of 14 to 30% [3]. Careful investigation and early detection are crucial to decreasing the risk of getting colorectal cancer.
Various clinical methods have been proposed for detecting GI diseases. Among these methods, endoscopy is considered the gold standard [4, 5]. Specifically, gastroscopy, colonoscopy, and wireless capsule endoscopy (WCE) are the most common endoscopic modalities for examining and evaluating the GI tract. Gastroscopy is used to examine the upper parts of the GI tract, such as the esophagus, stomach, and the first part of the small bowel (duodenum). Colonoscopy is used for evaluating lower GI parts such as the colon, and rectum. However, gastroscopy and colonoscopy offer limited visualization in the long GI tract. Only proximal duodenum and ileum can be accessed by these procedures. In addition, these endoscopic procedures are highly invasive and lead to substantial patient discomfort and anxiety. To alleviate these problems, wireless capsule endoscopy (WCE) was introduced in 2000 [6], and it enabled the examination of the entire GI tract non-invasively. In a WCE procedure, a small capsule that is swallowed through the mouth is ejected through the anus. Since WCE is a non-invasive procedure, its usage has gotten more attention in diagnosing many GI diseases.
Generally, in WCE, the capsule is swallowed and allowed to travel naturally through the GI tract for up to 8 h, where the capsule typically captures a large number of images (approximately 55,000–60,000 images). These images are then stored on a computer and manually investigated by endoscopists to detect possible abnormalities. Obviously, it is a tedious and time-consuming process for endoscopists to go through all images. The review process is also prone to human errors. More importantly, false-positive and false-negative results can arise mainly due to missing important details. On top of this, false-positive results may cause unnecessary anxiety to patients whereas false-negative results may delay the detection of critical disease conditions, which may evolve into a stage where the disease becomes incurable or even fatal. Moreover, the ability to detect and classify abnormalities also differs from one physician to the other. Therefore, an accurate automatic method for the detection and classification of GI abnormalities is highly important to arrive at timely treatment decisions while saving substantial costs and labor effort.
Several methods related to the automatic detection of gastrointestinal anomalies have been reported in the literature. The Kvasir dataset [7] is an annotated dataset of endoscopic images of the GI tract. Most researchers who worked on this dataset focused on the detection and multi-class classification of anatomical landmarks and diseased tissues in the GI tract. Pogorelov et al. [7] proposed a system to classify multi-class GI endoscopic images. They used different types of techniques, such as convolutional neural networks (CNN) (including the pre-trained Inception V3 model), random forests, and the logistic model tree (LMT). Among these methods, the LMT approach outperformed the others with an accuracy of 93.3%. On the same dataset, Agrawal et al. proposed a technique that employs novel features for training a support vector classifier for GI images [8]. In this method, three types of features were extracted: conventional hand-crafted features and CNN features obtained from the VGGnet and Inception-V3 pre-trained models. By fusing these types of features, accuracy, and an F1-score of 0.961 and 0.847 were obtained, respectively.
Liu et al. [9] used the bidirectional marginal Fisher analysis (BMFA) and support vector machines (SVM) to classify various landmarks and anomalies of the GI tract. The authors used six types of hand-crafted features provided along with the Kvasir dataset [7]. Experimental results with accuracy, recall, and specificity values of 0.9257, 0.7028, and 0.9575 were obtained, respectively. Nadeem et al. [10] fused different textures and deep learning features. Firstly, CNN features were obtained using the VGG-19 pre-trained model. These CNN features were then combined with conventional texture features, namely the Haralick features and the local binary pattern (LBP) features. Finally, by utilizing the logistic regression classifier, the system achieved accuracy and an F1 score of 83% and 82%, respectively. Furthermore, Gamage et al. [11] proposed a system utilizing CNN features for GI image classification. These features were extracted from three types of pre-trained networks: DenseNet-201, ResNet-18, and VGG-16. Feature vectors were then fused and fed into an artificial neural network (ANN) classifier, reaching an accuracy of over 97%.
Various techniques have been devised for polyp detection [12,13,14,15,16]. In [15], the authors proposed a polyp detection technique that utilizes texture features of the Red, Green, and Blue (RGB) and Hue, Saturation, and Intensity (HSI) color spaces for differentiating polyp tissues from the normal ones. Texture features based on the discrete wavelet transform as well as uniform LBP features were fused and used to train SVM classifiers for polyp detection. Experimental results showed that the classifier based on RGB color features achieved the best accuracy of 91.6%.
Most of the above-mentioned studies used pixel-based representations. However, it is crucial to note that such a representation is not inherently natural but rather an artifact of digital imaging. Consequently, a more intuitive and perceptually meaningful approach involves working with image representations that consider both image geometry and appearance. Addressing this need, superpixel image representations have been proposed [17, 18]. In such models, similar image pixels are clustered based on the similarity of appearance or color attributes. These superpixel models could speed up the image processing time and improve the results on different image processing tasks [19].
The term superpixel is used to describe a cluster of similar pixels with a similar color or appearance property [20]. Superpixels have been used for various types of applications in medical imaging [21, 22]. Several methods have been proposed in the literature for generating superpixels. Those can be mainly classified as: graph-based algorithms [23, 24], gradient-ascent methods [25, 26], or clustering methods [19].
Earlier methods investigated the applicability of superpixels in GI image analysis. Iakovidis et al. investigated using salient superpixels for blood detection in the images of wireless capsule endoscopy (WCE) [21]. Xing et al. dealt with the GI bleeding detection problem using color histograms of WCE image superpixels, and a subspace KNN classifier [27]. While these two superpixel-based methods achieved good performance, they were tailored only to identify GI bleeding spots.
This work presents a superpixel-based segmentation and classification approach for endoscopic images of the GI tract. This study is an extension of our previous work [28], which performed GI image superpixel segmentation using the simple linear iterative clustering (SLIC) method [19]. This paper exploited another superpixel segmentation method, namely linear spectral clustering (LSC) [29] and compares its outcomes against those of the SLIC-based segmentation method as well as pixel-based methods. Various texture and color features were extracted from generated superpixels to train and test binary support vector machines for each GI part. In addition, segmentation outcomes of both SLIC and LSC methods were evaluated and compared based on evaluation metrics, such as the Dice coefficient and the intersection-over-union. The main contribution of this work is to show how SLIC and LSC superpixel methods can be used to segment and classify various GI diseases and landmarks of endoscopic images. The proposed superpixel-based methods lead to superior classification performance for the different GI regions. This performance appears to be clearly better than that of conventional pixel-based classification methods.
2 Materials and Methods
2.1 Dataset
Two datasets of GI images were used for realizing the proposed framework: the Kvasir-V2 and Kvasir-SEG datasets. The Kvasir-V2 dataset was used for GI image classification, whereas the Kvasir-SEG dataset was utilized for evaluating the segmentation outcomes of different superpixel methods. The details of both datasets are explained as follows.
Kvasir V2 Dataset:
This dataset consists of annotated and verified collection of GI images taken with an endoscope [7]. The main objective of this dataset is to facilitate the evaluation and comparison of different methods for GI image classification, detection of GI landmarks, object localization in GI images, and the diagnosis of endoscopic diseases of the GI tract. The dataset has 8000 images representing eight image classes, with 1000 images for each class. All the images were annotated by highly trained experts. The image classes can be broadly classified into three categories, namely, anatomical landmarks (z-line, pylorus, and cecum), GI diseases (esophagitis, polyps, and ulcerative colitis), and polyps removal procedures (dyed and lifted polyps, and dyed resection margins). The image resolution in this dataset ranges from 720 × 576 up to 1920 × 1072 pixels. Sample images from this dataset are shown in Fig. 1.
Kvasir-SEG Dataset:
This dataset consists of 1000 annotated polyp images with their respective truth masks [30]. The image resolution in this dataset ranges from 332 × 487 to 1920 × 1072 pixels. This dataset is mainly used to develop new and improved techniques for segmenting, detecting, localizing, and classifying polyps. In our work, we used this dataset to quantitatively evaluate and compare segmentation results of both SLIC- and LSC-based segmentation methods. A few samples of polyps and the corresponding masks are depicted in Fig. 2.
2.2 Proposed Method
Figure 3 presents the general block diagram of the proposed system. The system includes modules for superpixel segmentation, feature extraction, superpixel classification, decision-level fusion, and finally GI image classification. The system was implemented in MATLAB 2016b on a Lenovo IdeaPad 330 computer with an Intel Core i7 processor and an 8-GB RAM.
All the images were resized to the size of the smallest image in the Kvasir dataset (720 × 576) before applying other image processing modules. This was done to reduce the computational cost and thus speed-up detection and classification in GI images.
2.2.1 Superpixel Segmentation
Superpixel Segmentation with Linear Spectral Clustering
The linear spectral clustering (LSC) algorithm [29] was proposed based on the investigation of the relationship between the objective functions of normalized cuts [20] and weighted K-means [31]. The LSC algorithm preserves the perpetually essential global image properties. In addition, this algorithm has linear complexity and high memory efficiency. In particular, the LSC algorithm uses simple weighted K-means clustering for image segmentation. LSC approach avoids the high complexity of the spectral method for minimizing the normalized cuts. In the LSC method, image pixels are mapped into a 10-dimensional feature space for improving linear separability. Study shows that the LSC-based segmentation method provides better segmentation results than existing superpixel algorithms [29]. Figure 4 illustrates the LSC-based segmentation results for some sample images from the Kvasir v2 dataset.
The LSC segmentation output is controlled through tuning the ratio r = Cs/Cc, where Cs and Cc are parameters used for measuring color uniformity and spatial proximity. A careful selection of the r parameter can lead to a better segmentation output that adheres to natural image boundaries. When the r-value is large, superpixels with high shape regularity will be formed while fewer boundary pixels are correctly recovered. On the contrary, if the r-value is small, the distance in color dominates, forcing pixels with similar colors to be clustered together. Consequently, irregular superpixels with better boundary adherence will be generated. Such a trend can be visually observed in Fig. 5. Therefore, the selection of the r-value can be considered as seeking a balance between shape regularity and boundary adherence.
Superpixel Segmentation with Simple Linear Iterative Clustering
Simple linear iterative (SLIC) [19] is another commonly used method for superpixel segmentation. The SLIC method clusters image pixels to efficiently generate compact and nearly uniform superpixels. The SLIC technique requires two parameters, the number of superpixels (K) and a compactness value (c), that tweaks the smoothness of the superpixel contours. A large c-value means high dominance of the spatial proximity criterion, resulting in regular and compact segmentation. The c-value typically ranges from 1 to 20. In this paper, we conducted a grid search to find the best K and c-value for each classification problem. Figure 6 shows the SLIC-based segmentation results for some sample images from the Kvasir v2 dataset.
2.2.2 Feature Extraction
In this work, we investigated different types of texture and color superpixel features for GI image segmentation and classification. In particular, we employed three types of texture features, namely, local binary patterns, second-order statistical features derived from gray-level co-occurrence matrices, and first-order statistical features. We explored these features for grayscale images as well as color images in the RGB and HSV color spaces.
Texture features
-
(a)
Local binary patterns
A local binary pattern (LBP) is an effective visual descriptor mostly utilized for classification in computer vision [32]. These patterns give simple and efficient representations of local image characteristics. Numerous LBP applications have been reported including face detection [33, 34], demographic classification [35, 36], and other related applications [37, 38]. In addition, LBP has been used for the detection and classification of GI diseases such as GI bleeding, tumor and other disease regions of various endoscopic images [39,40,41].
Figure 7(a) demonstrates how LBP is calculated for a 3 × 3 image block with a radius of 1 and 8 neighbors. In Fig. 7(b), the relationship between different values of the radius (R) and the number of neighbors (M) is shown. For an image with a center pixel coordinate (\({x}_{c}\), \({y}_{c}\)), M neighboring pixels, and a neighborhood radius R, the LBP code can be calculated as [42]:
$${LBP}_{P,R({x}_{c}, {y}_{c})}^{ }={\sum }_{m=0}^{M-1}s({U}_{m}-{U}_{c})\times {2}^{m}$$(1)where \({U}_{m},{U}_{c}\) represent the gray-scale intensities at the center and neighboring pixels. The function \(s\left(x\right)\) is defined as:
$$s\left(x\right)=\left\{\begin{array}{ll}1& if\;x\ge 0\\ 0& if\;x<0\end{array}\right..$$(2)Based on the above formulas, we can evaluate LBP codes and produce endoscopic image features for GI image classification.
-
(b)
Gray-level co-occurrence matrices
Haralick et al. introduced statistical texture features based on gray-level co-occurrence matrices (GLCM) [43]. This technique has been widely used in image analysis tasks [44]–[45], especially in biomedical image analysis [46, 47]. For this approach, feature extraction is carried out in two steps: GLCM computation followed by the calculation of statistical GLCM-based texture features. A GLCM shows how often each gray level occurs at a pixel located at a fixed geometric position relative to another pixel with a different gray level. The horizontal direction, 00, with a default offset of one (nearest neighbor) was used in this paper.
We computed 18 GLCM features in this paper. These features are the autocorrelation, contrast, energy, entropy, correlation, cluster prominence, cluster shade, dissimilarity, sum variance, homogeneity, maximum probability, the sum of variance squares, sum average, sum entropy, difference variance, difference entropy, information measure of correlation, and inverse difference momentum.
-
(c)
Fist-order statistical (FOS) features
We computed first-order statistical (FOS) descriptors from first-order histograms of gray-level images. The first-order histogram \(H(i)\) for a gray-level intensity value \(i\) is calculated as [48]:
$$H\left(i\right)=\frac{GP(i)}{T}$$(3)where \(GP(i)\) is the number of image pixels with the gray-level \(i\) and \(T\) is total number of pixels in the image. Based on the above definition of the first-order histogram, the mean image intensity (\(\mu\)) and its central moments \({\mu }_{k}\) are given by:
$$\begin{array}{ccc}\mu=\sum_{i=0}^{L-1}iH(i)&\mathrm{and}&\mu_k=\sum\nolimits_{i=0}^{L-1}\left(i-m\right)^kH(i)\end{array}\\$$(4)where L is the total number of gray-level values, and \(k=\) 2, 3, 4.
The variance (μ_2), skewness (μ_3), and kurtosis ( μ_4) are the most widely used central moments in medical image analysis. In our study, we adopt these moments along with the mean intensity.
Multi-Channel GLCM and FOS Features
Texture features were extracted in gray-level images, as well as images in two common color spaces, namely, the RGB and HSV color spaces. Specifically, the GLCM and FOS features were extracted from 5 image channels (red, green, blue, hue, and gray-scale channels), as well as the LBP map associated with the gray-scale image. Also, we computed 36-bin LBP histograms for the gray-scale images. So, the overall number of features is (18 GLCM features + 4 FOS features) × 6 channels + 36 LBP histogram features = 168 features. From now on, we use the term multichannel GLCM (mGLCM) to indicate the features extracted from the aforementioned six image channels. We define the multichannel FOS features (mFOS) similarly.
2.2.3 Evaluation Metrics
To compare the classification performance of different pixel-based and superpixel-based methods, we used common evaluation metrics: accuracy, precision, recall, and specificity. In addition, we generated the receiver operating characteristic (ROC) curves and the associated areas under the ROC curves (AUC).
We have also evaluated the segmentation quality of both the SLIC and LSC superpixel-based methods based on certain metrics. These are: the Dice similarity coefficient (DSC) and the intersection over union (IoU). We computed these metrics using the ground-truth polyp segmentation data provided by the Kvasir-SEG dataset.
Dice Similarity Coefficient (DSC)
It is a standard measure for pixel-wise comparison of the predicted and ground-truth segmentation results. This measure is defined as:
where \(F\) and \(G\) stand for the predicted and ground-truth object segmentation, respectively. Here, TP, FP, and FN represent the true-positive, false-positive, and false-negative counts, respectively.
Intersection over Union (IoU)
This metric measures the similarity between the predicted and ground-truth segmentation outcomes. The IoU metric can be defined mathematically as:
where \(t\) is the threshold value, which was set to \(t=0.5\) in this work.
2.2.4 Classification of GI Images
In this study, support vector machine (SVM) [49] classifiers were used to classify various GI diseases and landmarks of endoscopic images. The SVM classifier has been used previously in the detection and classification of wireless-capsule endoscopy (WCE) images [12, 50,51,52]. The SVM classifier seeks to find a hyper-plane that maximizes the margin between samples of different classes. In our study, we considered an SVM with a polynomial-type kernel. For SVM training, we randomly chose 700 images from each class of the Kvasir v2 dataset. The remaining 300 images of each class were used for testing the trained SVM classifiers. We compared the classification results of both pixel- and superpixel-based methods based on values of accuracy, recall, specificity, precision, and AUC.
3 Results and Discussions
3.1 Qualitative Comparison and Parameter Tuning for GI Classification Methods
The SLIC- and LSC-based segmentation methods have specific parameters to control the segmentation processes. Varying these parameters will affect segmentation results both quantitatively and qualitatively. For this purpose, both segmentation methods were qualitatively compared at three different superpixel numbers (K): 25, 50, and 100. The effects of this variation of the K value on the SLIC and LSC segmentation outcomes are illustrated in Figs. 4 and 6, respectively. On the one hand, LSC-based segmentation achieves good adherence to natural image boundaries with visually intuitive, perceptually satisfactory, and uniform segmentation. On the other hand, SLIC-based segmentation provides regular and compact superpixels but lacks uniformity and doesn’t agree well with natural image boundaries. In Table 1, SLIC- and LSC-based segmentation results are compared in terms of the computational time at various superpixel numbers. Obviously, there is a slight increase in computational time as the number of superpixels increases. The table also shows that the SLIC method is moderately faster than the LSC one.
Figure 5 depicts how variations in the r parameter could affect LSC-based segmentation outcomes. We evaluated these outcomes at different r parameter values to recognize this effect. Clearly, small r values lead to good adherence and perceptually uniform segmentation outputs. On the contrary, as the r-value increases, the segmentation results become more compact and regular, but adherence to image boundaries is gradually lost.
Furthermore, classification based on the LSC method was evaluated at different r values (see Fig. 8). First, classification results for each GI part were found at different superpixel numbers (K = 10, 15, 20, 25, and 30) for a single r value (0.1). Then, we selected the K value at which the best results were obtained. The K values associated with the best performance for the upper, middle, and lower GI parts were found to be 10, 15, and 20, respectively. After fixing the K value for each GI part, we evaluated the classification performance at different r values of 0.1, 0.2, 0.3, 0.4, and 0.5. Finally, we obtained superior results at r values of 0.1, 0.05, and 0.4 for the classification problems of the upper, middle, and lower GI parts, respectively.
3.2 Classification of GI Images with Different Feature Combinations
We applied different combinations of the extracted feature types namely, LBP, mGLCM, and mFOS. So, we first tried each feature type individually. Then, we used pairs of feature types such as LBP-mFOS, LBP-mGLCM, as mFOS-mGLCM. Finally, we tried combining the three feature types as LBP-mFOS-mGLCM. So, we had a total of seven combinations: LBP, mFOS, mGLCM, LBP-mFOS, LBP-mGLCM, mFOS-mGLCM, and LBP-mFOS-mGLCM. Then, we compared classification results of all combinations based on accuracy, recall, specificity, precision, and AUC values. ROC curves for the best-performing feature combinations were also created.
Table 2 illustrates the classification results of the upper GI part (esophagitis versus normal z-line). Based on the parameter fine-tuning experiments in Section 3.1, we fixed the compactness parameter of the SLIC method at c = 1, while the r parameter of the LSC method was set to 0.1 (see Fig. 8). As shown in Table 2, the classification accuracy is improved when the SLIC-based segmentation is employed, particularly, when using the mGLCM features only. According to the medical context of our GI classification problems, recall (or sensitivity) is crucially important as it reflects the classifier's ability to detect anomalies. With this relative importance of the recall metric, the LSC-based method appears to be superior as it outperforms the other methods with a recall (or sensitivity) of 79%. In addition, the LSC-based method achieved the best average performance with an AUC value of 83.1% when the mFOS-mGLCM combination was used. However, the SLIC-based approach gives superior results with accuracy, specificity, and precision values of 77.54%, 77.89%, and 76.33%, respectively.
Figure 9 shows the ROC curves of the three classification methods with the mFOS-mGLCM feature combination, which gives the highest AUC scores for classifying the images of the upper GI part into esophagitis and normal z-line images. As it can be observed, the LSC-based superpixel method generally outperforms the two other methods across the whole range of the false-positive rates. Indeed, the LSC method achieved the highest AUC value of 83.37% which is superior to those of the pixel- and SLIC-based methods.
In Table 3, the classification results of the middle GI part (polyps versus normal pylorus) are shown. As the table shows, both superpixel-based classification methods were slightly better than the pixel-based one. This relative improvement can be particularly observed when the mGLCM and the LBP-mGLCM features were respectively used for SLIC- and LSC-based classification. Both the SLIC- and LSC-based methods achieved a classification accuracy of 98.5%. In addition, the high recall values for both methods show that our superpixel-based approach is nearly perfect for the detection of anomalies in the middle GI part. These recall values are 99.66% and 100% for the SLIC- and LSC-based classification, respectively. Furthermore, using the LBP-mGLCM feature combination, the LSC-based method provided a slightly higher AUC value of 99.78% compared to the other two methods.
Table 4 shows the classification results of the lower GI part using both superpixel- and pixel-based methods. The results show that the superpixel-based methods are moderately better than the pixel-based ones. For the superpixel-based methods and the LBP-mGLCM feature combination, the LSC method outperformed the SLIC one with an accuracy of 93.67%, a specificity of 95.8%, and a precision of 96%. However, the SLIC-based method achieved a higher value of recall (or sensitivity) of 94.16%. Moreover, the SLIC method achieved the best AUC score of 97.4%.
Figure 10 illustrates the ROC curves of the three considered methods for the task of classifying the middle GI part (polyps versus normal pylorus) with the best LBP-mGLCM feature combination. Obviously, the LSC method achieved a slightly better ROC curve than the two other methods. Besides, the LSC method narrowly outperforms the other two methods with an AUC score of 99.78%.
Figure 11 shows the ROC curves for classifying the lower GI section (cecum and ulcerative colitis) using the three classification methods with the LBP-mGLCM feature combination. The results show that the SLIC-based method outperforms the other methods for most of the range of the false-positive rates. In addition, while the AUC values of the three methods are quite close, the LSC method achieved the best AUC value of 97.4%.
3.3 Comparative Evaluation Against Ground-Truth Polyp Segmentation
In this part of the study, we used 80 images of both small and large polyps for qualitative comparisons of the SLIC- and LSC-based segmentation methods. Samples of these images are shown in Fig. 12. We followed a series of steps to get specific locations of the polyps. First, segmentation maps were computed at various superpixel numbers. Then, we used the Dice coefficient value to obtain the exact locations of the polyp tissues. This was done by computing the Dice similarity coefficient for each generated superpixel and its respective ground truth. Finally, the superpixel with the highest Dice coefficient score was chosen to be the polyp object. As a result, superpixel numbers of 15 and 50 were used for the segmentation of small and large polyps, respectively. From visual inspection, the SLIC-based method provides better performance in segmenting small polyps. However, for segmenting large polyps, both methods approximately produce good segmentation results with slightly the LSC-based method giving better visual segmentation output than the SLIC-based one.
In Fig. 13, both the LSC- and SLIC-based segmentation outputs were computed at various superpixel numbers (K) based on the Dice coefficient values. For small polyps, we tried numbers of superpixels in the range from K = 10 to K = 100. A peak Dice coefficient value was obtained at K = 50 with the LSC-based method (see Fig. 13(a)). Likewise, for large polyps, we tried numbers of superpixels starting from K = 10 up to K = 50, and obtained a peak Dice coefficient value at K = 15 (Fig. 13(b)) with the SLIC-based method.
Figure 14(a) shows box plots of the Dice coefficient and IoU values resulting from the segmentation of 40 images of large polyps using SLIC-based segmentation. The mean values of these metrics are 84.75% and 73.88%, respectively (See Table 5). Figure 14(b) shows similar box-plot results with LSC-based segmentation. The corresponding mean values of the Dice coefficient and the IoU are 85.68% and 75.02%, respectively (See Table 5). For small polyps, Fig. 15 shows similar box plots for the SLIC- and LSC-based segmentation methods. The mean Dice and IoU values are generally much lower than those of large polyps: 71.65% and 56.79% for the LSC-based method, and 76.31% and 62% for the SLIC-based method (See Table 5).
Furthermore, we carried out a student’s t-test to check the significance of the statistical difference between the SLIC- and LSC-based segmentation results for large and small polyps. In particular, a paired t-test comparison was performed in MS Excel. A p-value was computed for each of the Dice similarity coefficient (DSC) and the IoU values computed for 40 images of large and small polyps. The obtained p-values are summarized in Table 6. A significance level of p = 0.05 was used to assess the statistical significance of the difference between the SLIC- and LSC-based segmentation. Considering the obtained p-values for the Dice coefficient and IoU (see Table 6), the results show that there is no significant difference between the SLIC- and LSC-based methods in the segmentation of large polyps. However, for the segmentation of small polyps, the p-values show that there is a significant difference between the SLIC- and LSC-based segmentation methods. As a result, the SLIC-based method is moderately better than that of the LSC-based ones in segmenting small polyps.
4 Conclusion
In this study, we developed a superpixel-based computer-aided diagnosis system to enhance endoscopic image analysis, enabling more precise differentiation between diseased and healthy patterns in patients. Our comprehensive analysis revealed that superpixel-based classification, particularly using Linear Spectral Clustering (LSC) and Simple Linear Iterative Clustering (SLIC), is superior to traditional pixel-based methods. LSC demonstrated excellent boundary adherence and preserved image structure, while SLIC showed greater efficacy in classifying upper GI tract images, and LSC was more effective for the middle and lower GI sections, albeit with higher computational demands. The study emphasized the importance of selecting the appropriate superpixel method based on specific requirements and optimizing parameters for each classification scenario. Furthermore, we assessed segmentation quality using Dice coefficient and Intersection over Union (IoU) metrics, finding that SLIC was more effective in small polyp segmentation, while LSC matched its performance in segmenting larger polyps, illustrating the nuanced strengths and weaknesses of each method in different applications.
In the future, combined superpixel and deep learning methods for the classification of gastrointestinal (GI) diseases and landmarks can be devised. This could enable the model to capture a broader and more nuanced range of characteristics associated with GI conditions, potentially improving the diagnostic capabilities of the computer-aided diagnosis (CAD) system. The synergy between the precise segmentation provided by superpixel methods and the pattern recognition capabilities of deep learning could thus lead to a more robust and effective system for GI disease classification. Moreover, the CAD system in this study could be improved by performing a more detailed analysis of the number of superpixels, fine-tuning superpixel parameters that could affect segmentation output, and combining various types of features.
References
Global Cancer Observatory. Available: https://gco.iarc.fr/. Accessed 4 Nov 2020
Amersi F, Agustin M, Ko CY (2005) Colorectal cancer: Epidemiology, risk factors, and health services. Clin Colon Rectal Surg 18(3):133–140. Thieme Medical Publishers. https://doi.org/10.1055/s-2005-916274
van Rijn JC, Reitsma JB, Stoker J, Bossuyt PM, van Deventer SJ, Dekker E (2006) Polyp miss rate determined by tandem colonoscopy: a systematic review. Am J Gastroenterol 101(2):343–350. https://doi.org/10.1111/j.1572-0241.2006.00390.x
Take I, Shi Q, Zhong Y (2015) Progress with each passing day: role of endoscopy in early gastric cancer. Transl Gastrointest Cancer 4(6):423–428. https://doi.org/10.3978/j.issn.2224-4778.2015.09.04
J. Mannath and K. Ragunath (2016) Role of endoscopy in early oesophageal cancer. Nat Rev Gastroenterol Hepatol 13(12): 720–730. Nature Publishing Group. https://doi.org/10.1038/nrgastro.2016.148
Iddan G, Meron G, Glukhovsky A, Swain P (2000) Wireless capsule endoscopy. Nature 405(6785):417–418. https://doi.org/10.1038/35013140
Pogorelov K et al (2017) Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM Multimedia Systems Conference, MMSys 2017. Association for Computing Machinery, Inc, pp. 164–169. https://doi.org/10.1145/3083187.3083212
Agrawal T, Gupta R, Narayanan S (2019) On Evaluating CNN Representations for Low Resource Medical Image Classification. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings. Institute of Electrical and Electronics Engineers Inc., pp. 1363–1367. https://doi.org/10.1109/ICASSP.2019.8682397
Liu Y, Gu Z, Cheung WK-W (2017) HKBU at MediaEval 2017 - Medico: Medical Multimedia Task. undefined.
Nadeem S, Tahir MA, Naqvi SSA, Zaid M (2018) Ensemble of texture and deep learning features for finding abnormalities in the gastro-intestinal tract. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer Verlag, pp. 469–478. https://doi.org/10.1007/978-3-319-98446-9_44
Gamage C, Wijesinghe I, Chitraranjan C, Perera I (2019) GI-Net: Anomalies Classification in Gastrointestinal Tract through Endoscopic Imagery with Deep Learning. In: MERCon 2019 - Proceedings, 5th International Multidisciplinary Moratuwa Engineering Research Conference. Institute of Electrical and Electronics Engineers Inc., pp. 66–71. https://doi.org/10.1109/MERCon.2019.8818929
Billah M, Waheed S, Rahman MM (2017) An automatic gastrointestinal polyp detection system in video endoscopy using fusion of color wavelet and convolutional neural network features. Int J Biomed Imaging 2017. https://doi.org/10.1155/2017/9545920
Chen PJ, Lin MC, Lai MJ, Lin JC, Lu HHS, Tseng VS (2018) Accurate classification of diminutive colorectal polyps using computer-aided analysis. Gastroenterology 154(3):568–575. https://doi.org/10.1053/j.gastro.2017.10.010
Kang J, Gwak J (2019) Ensemble of instance segmentation models for polyp segmentation in colonoscopy images. IEEE Access 7:26440–26447. https://doi.org/10.1109/ACCESS.2019.2900672
Li B, Meng MQH (2012) Automatic polyp detection for wireless capsule endoscopy images. Expert Syst Appl 39(12):10952–10958. https://doi.org/10.1016/j.eswa.2012.03.029
Tajbakhsh N, Gurudu SR, Liang J (2016) Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans Med Imaging 35(2):630–644. https://doi.org/10.1109/TMI.2015.2487997
Maghsoudi OH (2017) uperpixels based segmentation and SVM based classification method to distinguish five diseases from normal regions in wireless capsule endoscopy. https://arxiv.org/abs/1711.06616v1. Accessed 16 Nov 2023
Boschetto D, Mirzaei H, Leong RWL, Grisan E (2016) Superpixel-based automatic segmentation of villi in confocal endomicroscopy. In: 3rd IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2016. Institute of Electrical and Electronics Engineers Inc., pp. 168–171. https://doi.org/10.1109/BHI.2016.7455861
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2281. https://doi.org/10.1109/TPAMI.2012.120
Ren X, Malik J (2003) Learning a classification model for segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 10–17. https://doi.org/10.1109/iccv.2003.1238308
Iakovidis DK, Chatzis D, Chrysanthopoulos P, Koulaouzidis A (2015) Blood detection in wireless capsule endoscope images based on salient superpixels. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp. 731–734. https://doi.org/10.1109/EMBC.2015.7318466
Nguyen BP, Heemskerk H, So PTC, Tucker-Kellogg L (2016) Superpixel-based segmentation of muscle fibers in multi-channel microscopy. BMC Syst Biol 10(S5):124. https://doi.org/10.1186/s12918-016-0372-2
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905. https://doi.org/10.1109/34.868688
Veksler O, Boykov Y, Mehrani P (2010) Superpixels and supervoxels in an energy optimization framework. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer Verlag, pp. 211–224. https://doi.org/10.1007/978-3-642-15555-0_16
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236
Levinshtein A, Stere A, Kutulakos KN, Fleet DJ, Dickinson SJ, Siddiqi K (2009) TurboPixels: Fast superpixels using geometric flows. In: IEEE transactions on pattern analysis and machine intelligence. pp. 2290–2297. https://doi.org/10.1109/TPAMI.2009.96
Xing X, Jia X, Meng MHQ (2018) Bleeding detection in wireless capsule endoscopy image video using superpixel-color histogram and a subspace knn classifier. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp. 3594–3597. https://doi.org/10.1109/EMBC.2018.8513012
Dalju HB, Rushdi MA, Morsy A (2021) Superpixel-based segmentation and classification of gastrointestinal landmarks and diseases. BioCAS 2021 - IEEE biomedical circuits and systems conference, proceedings. https://doi.org/10.1109/BIOCAS49922.2021.9645002
Li Z, Chen J (2015) Superpixel segmentation using Linear Spectral Clustering. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society, pp. 1356–1363. https://doi.org/10.1109/CVPR.2015.7298741
Jha D et al (2020) Kvasir-SEG: a segmented polyp dataset (2020). In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer, pp. 451–462. https://doi.org/10.1007/978-3-030-37734-2_37
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957. https://doi.org/10.1109/TPAMI.2007.1115
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Jin H, Liu Q, Lu H, Tong X (2004) Face detection using improved LBP under bayesian framework. In: Proceedings - third international conference on image and graphics, pp. 306–309. https://doi.org/10.1109/icig.2004.62
Taouche C, Batouche MC, Chemachema M, Taleb-Ahmed A, Berkane M (2014) New face recognition method based on local binary pattern histogram. In: STA 2014 - 15th international conference on sciences and techniques of automatic control and computer engineering. Institute of Electrical and Electronics Engineers Inc., pp. 508–513. https://doi.org/10.1109/STA.2014.7086724
Sun N, Zheng W, Sun C, Zou C, Zhao L (2006) Gender classification based on boosting local binary pattern. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), Springer Verlag, pp. 194–201. https://doi.org/10.1007/11760023_29
Yang Z, Ai H (2007) Demographic classification with local binary patterns. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer Verlag, pp. 464–473. https://doi.org/10.1007/978-3-540-74549-5_49
Gao X, Li SZ, Liu R, Zhang P (2007) Standardization of face image sample quality. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer Verlag, pp. 242–251. https://doi.org/10.1007/978-3-540-74549-5_26
Ma B, Zhang W, Shan S, Chen X, Gao W (2006) Robust head pose estimation using LGBP. In: Proceedings - international conference on pattern recognition. pp. 512–515. https://doi.org/10.1109/ICPR.2006.1006
Maghsoudi OH, Alizadeh M, Mirmomen M (2017) A computer aided method to detect bleeding, tumor, and disease regions in Wireless Capsule Endoscopy. In: 2016 IEEE Signal Processing in Medicine and Biology Symposium, SPMB 2016 – Proceedings. Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SPMB.2016.7846852
Li B, Meng MQH (2009) Small bowel tumor detection for wireless capsule endoscopy images using textural features and support vector machine. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009. pp. 498–503. https://doi.org/10.1109/IROS.2009.5354726
Li B, Meng MQH (2009) Computer-aided detection of bleeding regions for capsule endoscopy images. IEEE Trans Biomed Eng 56(4):1032–1039. https://doi.org/10.1109/TBME.2008.2010526
Ojala T, Pietikäinen M, Mäenpää T (2000) Gray scale and rotation invariant texture classification with local binary patterns. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), Springer Verlag, 2000, pp. 404–420. https://doi.org/10.1007/3-540-45054-8_27
Haralick RM, Dinstein I, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3(6):610–621. https://doi.org/10.1109/TSMC.1973.4309314
Soh LK, Tsatsoulis C (1999) Texture analysis of sar sea ice imagery using gray level co-occurrence matrices. IEEE Trans Geosci Remote Sens 37(2 I):780–795. https://doi.org/10.1109/36.752194
Eleyan A, Demirel H (2009) Co-occurrence based statistical approach for face recognition. In: 2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009. pp. 611–615.https://doi.org/10.1109/ISCIS.2009.5291895
Htay TT, Maung SS (2018) Early stage breast cancer detection system using GLCM feature extraction and K-Nearest Neighbor (k-NN) on Mammography image. In: ISCIT 2018 - 18th International Symposium on Communication and Information Technology. Institute of Electrical and Electronics Engineers Inc., pp. 345–348. https://doi.org/10.1109/ISCIT.2018.8587920
Souaidi M, el Ansari M (2019) Multi-scale analysis of ulcer disease detection from WCE images. IET Image Process 13(12):2233–2244. https://doi.org/10.1049/iet-ipr.2019.0415
Aggarwal N, Agrawal RK (2012) First and second order statistics features for classification of magnetic resonance brain images. J Signal Inf Process 03(02):146–153. https://doi.org/10.4236/jsip.2012.32019
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/bf00994018
Veronese E, Grisan E, Diamantis G, Battaglia G, Crosta C, Trovato C (2013) Hybrid patch-based and image-wide classification of confocal laser endomicroscopy images in Barrett’s esophagus surveillance. In: Proceedings - international symposium on biomedical imaging. pp. 362–365. https://doi.org/10.1109/ISBI.2013.6556487
Ghatwary N, Ahmed A, Grisan E, Jalab H, Bidaut L, Ye X (2019) In-vivo Barrett’s esophagus digital pathology stage classification through feature enhancement of confocal laser endomicroscopy. J Med Imaging (Bellingham) 6(1):014502. https://doi.org/10.1117/1.JMI.6.1.014502
Souza L et al (2019) Barrett’s esophagus identification using color co-occurrence matrices. In: Proceedings - 31st Conference on Graphics, Patterns and Images, SIBGRAPI 2018. Institute of Electrical and Electronics Engineers Inc., pp. 166–173. https://doi.org/10.1109/SIBGRAPI.2018.00028
Acknowledgements
The first author would like to acknowledge the financial support provided by the African Biomedical Engineering Mobility (ABEM) project, which is funded by the Intra-Africa Academic Mobility Scheme of the European Union.
Funding
This work is supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (RS-2023–00240521) and project for Industry-University-Research Institute platform cooperation R&D funded by Korea Ministry of SMEs and Startups in 2022 (S3310765).
Author information
Authors and Affiliations
Contributions
Hika Barki: Conceptualization, Methodology, Software, Writing – original draft, writing – review and editing, Validation, Data curation, Visualization, Investigation. Gelan Ayana: Methodology, Writing – review and editing, Validation, Data curation, and visualization. Muhammad Rushdi: Conceptualization, Methodology, Software, writing – review and editing. Validation, Data curation, Visualization, Investigation. Ahmed Morsy: Conceptualization, Methodology, writing – review and editing. Validation, Data curation, Visualization, Investigation. Se-woon Choe: Methodology, Writing – original draft, writing – review and editing, Data curation, Validation, Investigation, Funding acquisition, Supervision.
Corresponding authors
Ethics declarations
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Barki, H., Ayana, G., Rushdi, M. et al. Superpixel-based Landmark Identification and Disease Diagnosis from Gastrointestinal Images. J. Electr. Eng. Technol. 19, 3373–3389 (2024). https://doi.org/10.1007/s42835-024-01903-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42835-024-01903-x