1 Introduction

Colorectal cancer, also known as bowel cancer, is a general type of cancer which affects both men and women [29]. The number of mortalities due to colorectal cancer is 694,000 in less developed countries of the world [30]. An American cancer society estimated about 132,000 new cases of colorectal cancer, appeared in 2015 [36]. A few gastrointestinal infections are directly linked with the colorectal cancer, such as hemorrhoids, short bowel, to name but a few. These infections can be diagnosed through colonoscopy method, but this complete procedure is time consuming and also having a constraint of limited number of specialists [25]. Recently, push gastroscopy methods are utilized in the clinics for diagnosis of gastrointestinal diseases like an ulcer, polyp, and bleeding. Although, these methods are not suitable to identify small bowels due to complex structure [17]. In 2000, this sort of problem was solved by a new technology named, wireless capsule endoscopy (WCE), which has an ability to identify gastrointestinal diseases right from the small bowels [8]. WCE technology is now rapidly used in hospitals for the diagnosis of gastrointestinal diseases like ulcer and bleeding [21]. Recently reported that about 1Million patients are successfully treated with WCE [17].

One of the constraints in this complete procedure is that it takes a lot of time, which makes this procedure somewhat arduous, because ulcer shows itself only for a short duration in the entire video. It might be possible; physician misses an ulcer region during this complete process. Moreover, few irregularities are hidden to the naked eyes due to set of challenges like a change of shape, texture, and color similarities.

For an accurate diagnosis, several researchers proposed computer-aided diagnosis (CAD) methods, which consist of few fundamental steps including lesion segmentation, feature extraction & reduction, and classification. The segmentation of lesion region is important for the detection of diseased part from WCE images. For this reason, several methods are introduced in the literature. In [16], authors introduced a new color features-based approach for detection of gastrointestinal diseases like ulcer and bleeding from WCE images. In color features, they calculated chromaticity moment to distinguish the normal and abnormal regions. Finally, the performance of the color features is analyzed using neural networks (NN). In [20], authors introduced a new technique for the detection of ulcer and bleeding from WCE images, in which texture and HSV color features are extracted, which are later utilized by Fisher scoring (FS) method. Through this FS approach, features with maximum information are selected, which are later classified using multilayered NN.

For features extraction, several methods are utilized by the scholars including color features [33], discriminative joint features [38], scale invariant feature transform (SIFT) features [39], texture features [19], Log filter bank [2], to name but a few [24, 27]. Feature reduction step plays it vital role and several methods exist such as principal component analysis (PCA), linear discriminant analysis (LDA), etc. Set of classifiers are working robustly to accurately identify infectious regions such as artificial neural network [10],support vector machine (SVM) [22], K-nearest neighbor (KNN), naïve Bayes [31], to name but a few.

However, in computer-based methods, several challenges still exist which degrade the system’s accuracy. In the segmentation process, several challenges include lesion irregularity, texture, color similarity, complex background, and border – making segmentation process more complex. Moreover, in the classification phase, robust features produce accurate results, but the selection of most appropriate features out of pool of features is still a greater challenge.

To address these challenges, we propose a new technique for gastrointestinal diseases detection and classification from WCE images namely CCSFM (color-based contour saliency fusion method). The proposed approach is an integration of four primary phases. In the first phase, HSI transformation is performed on the original RGB image to select Hue, saturated and intensity channels. Then a threshold based weighted function is performed, which selects the channel with maximum information, which is later segmented using active contour model. In the second phase, YIQ transformation is performed on RGB image prior computing max and min pixel values against each channel, which produces a mask function. Finally, thresholding is applied to a mask image and later fuse their pixels with an active contour image. In the third phase, color, local binary patterns (LBP), and GLCM features are extracted from mapped RGB segmented images. The extracted features are later fused by a simple concatenation method, prior to reduction steps. Only those feature values having high probability are selected for further processing. Finally, a multi perceptron neural network is employed for the classification of reduced vector. Our major contributions are enumerated below:

  1. I-

    A new saliency-based segmentation technique is proposed based on YIQ color transformation, which is carried out on RGB image. A mask function is structured which utilizes maximum and minimum pixel values against each channel to support a segmentation process.

  2. II-

    A new maximum a posteriori probability (MAP) estimation method is implemented for the fusion of proposed saliency image and active contour segmented image.

  3. III-

    A feature selection methodology is proposed, which defines a threshold value based on probability of maximum feature occurring to select most discriminant features.

  4. IV-

    A new database is constructed which comprises 9000 RGB images of gastrointestinal diseases such as ulcer and bleeding. Moreover, for a fair comparison, 3000 healthy RGB images are also provided. Selected image samples can be seen in Fig. 1, which are; (a) ulcer, (b) bleeding, and (c) healthy.

Fig. 1
figure 1

Sample WCE images: first row) ulcer samples; second row) bleeding samples; third row) healthy samples

2 Related work

A substantial amount of work has been done in the field of medical imaging to develop computerized methods – having capability to assist physicians [11,12,13]. Several computer-based methods have been proposed for the identification and classification of GI diseases from WCE images, which make the diagnostics easier for the doctors. Kundu et al. [15] utilized Y plane in YIQ color transformation for automatic detection of ulcer from WCE images. Y-plane total pixels are taken as features, which are classified using SVM by implementing Gaussian RBF kernel function. Shipra et al. [34] introduced a statistical color-based feature for bleeding detection from the WCE images. These statistical color features are extracted from RGB images and characterized into bleeding and non-bleeding images using SVM classifier. Suman et al. [33] extracted color features from several color spaces including RGB, CMYK, LAB, HSV, XYZ, and YUV to make ulcer and non-ulcer regions distinct. Then these features are combined using cross-correlation approach in order to provide a fair comparison between two patterns. Finally, SVM is used for classification to achieve a classification accuracy of 97.89%.Said et al. [5] presented texture features based approach for anomalies identification from WCE images. Initially, the texture features such as LBP variance and discrete wavelet transform (DWT) are extracted from GI disease images to tackle illuminations changes. The extracted features are finally classified using SVM and MLP to achieve promising results. Charfi et al. [4] followed a hybrid feature extraction methodology for ulcer recognition from WCE images. These features include complete local binary patterns (CLBP) and global local oriented edge magnitude patterns (GLOEMP). The CLBP features are extracted for the texture information of ulcer images, whereas GLOEMP features are calculated for the color information. Thereafter, both CLBP and GLOEMP features are integrated in the form of vector and classified using SVM and MLP. Yuan et al. [37] introduced an automated approach for ulcer identification from WCE images. The introduced method consists of two phases. In the first phase, a multi-level superpixels based saliency approach is proposed, which draw the outline of the ulcer region. Color and texture features are extracted from each level and then all levels are integrated to construct a final saliency map. In the second phase, saliency max-pooling (SMP) method is proposed and combined with the locality-constrained linear coding (LLC) method to achieve a classification accuracy of 92.65%. Fu et al. [7] presented a computer-based method for bleeding identification from WCE images. The image pixels are grouped using superpixels segmentation approach. The features are extracted from superpixel and later fed to SVM for classification. Addition to that, several algorithms are proposed in this domain to generate pool of solutions [34].

The above-mentioned techniques are mostly relying upon color and texture features. Inspire from the aforementioned methods, we propose a new approach for the detection and classification of GI diseases like ulcer and bleeding from WCE images based on improved saliency method and MLPNN.

3 Materials and methods

In this section, a novel approach is presented for GI diseases detection and classification from WCE images. Fundamental steps of proposed approach are: a) active contour-based segmentation using HSI color transformation; b) proposed saliency method based on YIQ color space; c) fusion of segmented images; d) features extraction and reduction, and e) classification using artificial neural networks. A detailed description of each step is provided and schema of proposed framework can be seen in Fig. 2.

Fig. 2
figure 2

Proposed framework for the detection and classification of gastrointestinal diseases from WCE image samples

3.1 Active contour-based segmentation

In few WCE images, there exists a smooth color variation between the diseased and healthy regions, which is one of the reasons of wrong segmentation. To tackle this problem, we implemented an active contour model without edge segmentation [3] using HSI color transformation. The entire process comprised of three sub-phases. In the first phase, HSI color transformation is performed followed by an implementation of weighted function to extract best suitable channel in the second phase. Finally, the selected channel is fed into an active contour segmentation method, with no edge function being used to stop the evolving curve at the desired boundary locations.

Let ξ(x, y) ∈ (R × C × 3) denotes an input RGB image. Their HIS conversion is defined using Eq. (1-3) as:

$$ {\xi}_H={Cos}^{-1}\left(\frac{n}{d+\epsilon}\right) $$
(1)

where, \( n=\frac{1}{2}\left(\left({\xi}_R-{\xi}_G\right)+\left({\xi}_R-{\xi}_B\right)\right) \) and \( d=\sqrt{\left({\left({\xi}_R-{\xi}_G\right)}^2+\left({\xi}_R-{\xi}_B\right)\left({\xi}_G-{\xi}_B\right)\right)} \). ξH represents hue channel, n denotes the sum of difference between RGB channels, and d is a distance of RGB pixels. Three channels red, green, and blue are denoted by ξRξG, and ξB - calculated as \( {\xi}_R=\frac{r}{\sum_{k=1}^m{\xi}_k} \), \( {\xi}_G=\frac{g}{\sum_{k=1}^m{\xi}_k} \),&\( {\xi}_B=\frac{b}{\sum_{k=1}^m{\xi}_k} \). The given parameters include m = 3 and k ∈ {1, 2, 3} - representing indices of each RGB channel.

$$ {\xi}_S=1-\frac{3\times {n}_1}{d_1} $$
(2)

where, n1 = φ(φ(ξR, ξG, ξB)), and φ is a minimum operator which selects minimum value from each index. At the first time, when φ operator is used, it returns three values, one from each extracted channel (red, green, and blue). The second φ selects the minimum value from all three computed values. Moreover, d1 is the sum of pixels of all ξR, ξG, and ξB channels as defined as d1 = ξR + ξG + ξB.

$$ {\xi}_I=\frac{\left({\xi}_R+{\xi}_G+{\xi}_B\right)}{3} $$
(3)

Where, ξI is an intensity channel. To select a channel with maximum information, we utilized our published work [1, 6], previously tested on natural images – now using for medical application. In this technique, a weighting criterion is implemented to identify a gray channel incorporating maximum information regarding foreground object. These weights are calculated based on object’s distance from center (wdc), number of connected components (wcl), boundary connection (wbc), and a generated distance matrix (wdm). Therefore, the cumulative weight relies on four conditions mentioned below:

$$ {\xi}_{f={w}_{dc}+{w}_{bc}+{w}_{cl}+{w}_{dm}} $$
(4)

Later, an active contour method is implemented on the selected channel, to identify the healthy and diseased region. The energy function of the active contour method is defined as:

$$ F\left({s}_1,{s}_2,C\right)=\mu .L(C)+v.A\left( in\left(C\ \right)\right)+{\lambda}_1{\int}_{in(C)}{\left|{\xi}_f-{s}_1\right|}^2 dx dy+{\lambda}_2{\int}_{out(C)}\ {\left|{\xi}_f-{s}_2\right|}^2 dx\ dy $$
(5)

Where,λ1 = 2 & λ2 = 4 are the constants to control brightness affects, μ is a mean value, v≥0, C is an evolving curve, L is the curve length, A is the region area inside C, in(C) is the inside boundary of the curve, out(C) denotes the outside boundary of the curve and s1,  s2 are left and right snake contours, depends on the curve C and defined as:

$$ {s}_1\left(\varphi \right)=\frac{\int_{\varOmega }{\xi}_f\ H\ \left(\varphi \left(x,y\right)\right) dxdy}{\int_{\varOmega }H\left(\varphi \left(x,y\right)\right) dxdy} $$
(6)
$$ if\;{\int}_{\varOmega }H\ \left(\varphi \left(x,y\right)\right) dxdy>0 $$
(7)
$$ {s}_2\left(\varphi \right)=\frac{\int_{\varOmega }{\xi}_f\left(1-H\left(\varphi\ \left(x,y\right)\right)\right) dxdy}{\int_{\varOmega}\left(1-H\ \left(\varphi \left(x,y\right)\right)\right) dxdy} $$
(8)
$$ if{\int}_{\varOmega}\left(1-H\ \left(\varphi\ \left(x,y\right)\right)\right) dxdy>0 $$
(9)

The process started with the initialization of a mask (φ0). We introduce an automatic mask initialization technique described in Algorithm 1:

figure f

It is observed that the average diameter of the diseased region is about 2 mm. Therefore, we initialized the mask close to the lesion boundary with the size of 3 mm approximately. In this method, we considered maximum pixel value, as the lesion area is darker than the healthy skin. Next, compute the s1(φn) and s2(φn). The time dependent partial differential equations (PDE) are utilized to get the φn + 1. Finally, apply the exit condition to stop the process, which states, when there is no change in solution for three consecutive iterations. The output of resultant active contour image ξact(s, C) is shown in Fig. 3.

Fig. 3
figure 3

Active contour segmentation in HSI color space

3.2 Improved saliency-based segmentation

In this section, we present a new improved saliency-based method for ulcer detection from WCE images. The proposed saliency method consists of following series of steps: a) RGB to YIQ conversion; b) extraction of YIQ channels and find max and min pixel values, and c) generation of mask function calculated from max and min values. A detailed flow of proposed saliency method is shown in Fig. 4.

Fig. 4
figure 4

Framework of proposed saliency-based segmentation

Initially, YIQ transformation is performed, in which Y-channel represents luminance factor and I & Q show the chrominance factor. One of the reasons to select this color space is it property of being near to human visual system and also shows more sensitivity in I-axis compared to Q-axis. The conversion of RGB to YIQ is defined as:

$$ \left(\begin{array}{c}{\xi}_Y\\ {}{\xi}_l\\ {}{\xi}_Q\end{array}\right)=\left(\begin{array}{c}{\alpha}_1{\alpha}_2{\alpha}_3\\ {}{\alpha}_4{\alpha}_5{\alpha}_6\\ {}{\alpha}_7{\alpha}_8{\alpha}_9\end{array}\right)\left(\begin{array}{c}{\xi}_R\\ {}{\xi}_G\\ {}{\xi}_B\end{array}\right) $$
(10)

where, ξY, ξl, and ξQrepresent luminance, hue, and saturation channel respectively.(α1,  … , α9) denote transformed values, in the range of −1 to 1. The exact values used are, ξY ∈ {0.299, 0.587, 0.114}, ξI ∈ {0.596, −0.274, −0.321}, and ξQ ∈ {0.211, −0.523, 0.311} [9]. Next step is to find max (ϱmax) and min(ϱmax)pixel values from extracted channels:

$$ {\boldsymbol{\varrho}}_{\boldsymbol{max}}\left(\boldsymbol{i}\right)=\mathit{\operatorname{MAX}}\left({\xi}_{YIQ}(i)\right) $$
(11)
$$ {\boldsymbol{\varrho}}_{\boldsymbol{min}}\left(\boldsymbol{i}\right)=\mathit{\operatorname{MIN}}\left({\xi}_{YIQ}(i)\right) $$
(12)

where, index i ∈ {1, 2, 3}, shows three extracted channels Y, I, and Q, respectively. The mass function is defined as follows:

$$ {\boldsymbol{\xi}}_{\boldsymbol{mask}}\left(\boldsymbol{BW}\right)=\boldsymbol{F}\left({\xi}_{YIQ}\left(x,y\right)\right) $$
(13)
$$ \boldsymbol{F}\left(\boldsymbol{x},\boldsymbol{y}\right)=\left\{\begin{array}{c}1\kern3em if\ {\boldsymbol{\varrho}}_{\boldsymbol{min}}\left(\mathbf{1}\right)\le {\xi}_Y(i)\le {\boldsymbol{\varrho}}_{\boldsymbol{max}}\left(\mathbf{1}\right)\\ {}1\kern3em if\ {\boldsymbol{\varrho}}_{\boldsymbol{min}}\left(\mathbf{2}\right)\le {\xi}_I(i)\le {\boldsymbol{\varrho}}_{\boldsymbol{max}}\left(\mathbf{2}\right)\\ {}1\kern3em if\ {\boldsymbol{\varrho}}_{\boldsymbol{min}}\left(\mathbf{3}\right)\le {\xi}_Q(i)\le {\boldsymbol{\varrho}}_{\boldsymbol{max}}\left(\mathbf{3}\right)\\ {}0\kern3em Otherwise\end{array}\right. $$
(14)

where, ξmask(BW)is a binary image, F(x, y) is a mask function, which is generated by the max and min pixel values. The effects of saliency-based ulcer detection results are shown in the Fig. 5. Moreover, as a refinement step, we utilized some morphological operations such as opening, closing to extricate extraneous pixels.

Fig. 5
figure 5

Selected samples from improved saliency-based segmentation

3.3 Maximizing a posterior probability (MAP) based pixels fusion

The obtained image from different algorithms may have distinct patterns, therefore, to improve its quality; we combine the characteristics of both images. Several methods have been proposed for image fusion including pixel-based, region-based, pyramid transform based fusion, and few other methods. In order to take advantage, we adopted maximizing a posterior probability based fusion method. To inspire with these methods, in this article, we implement a Maximizing a Posterior (MAP) based fusion method, which combines valuable pixels of segmented images into a single matrix.

Let β1 represents pixels from an active contour segmented image, ξact(s, C),\( \overline{\beta} \) representing pixels from proposed saliency method, and ξpro(x, y) is the fused image, having dimension R(256 × 256). According to the Bayesian estimation, the joint priori distribution (JPD) of β1 and \( \overline{\beta} \) is defined as [35].

$$ P\left({\beta}_1,\overline{\beta}|d\right)=\frac{P\left(d|\overline{\beta}\right)P\left({\beta}_{1,}\overline{\beta}\right)}{P(d)} $$
(15)

where, d representing total points of both segmented images, defined as follows:

$$ P\left(d|\overline{\beta}\right)=P\left(d|{\beta}_1,\overline{\beta}\right) $$
(16)

Further, modify the above equation to get the final fusion image as follows:

$$ P\left({\beta}_1,\overline{\beta}\right)=P\left({\beta}_1|\overline{\beta}\right)P\left(\overline{\beta}\right) $$
(17)

Here, \( P\left({\beta}_1,\overline{\beta}\right) \) follows joint probability distribution of β1and \( \overline{\beta} \). For the given total pixels of segmented images d, the MAP estimates \( {\overline{\beta}}_{map} \) and βmap are calculated as:

$$ \left\{{\beta}_{map},{\overline{\beta}}_{map}\right\}=\mathit{\arg}{\max}_{\beta_1,\overline{\beta}}\left\{P\left({\beta}_1,\overline{\beta}|d\right)\right\} $$
(18)
$$ {\boldsymbol{\xi}}_{\boldsymbol{pro}}\left(\boldsymbol{x},\boldsymbol{y}\right)=\mathit{\arg}{\max}_{\beta_1,\overline{\beta}}\left\{\frac{P\left(d|\overline{\beta}\right)P\left({\beta}_1|\overline{\beta}\right)P\left(\overline{\beta}\right)}{P(d)}\right\} $$
(19)

where, ξpro(x, y)is fused segmented binary image, which is mapped to original RGB image U(x, yas:

$$ {{\boldsymbol{\xi}}_{\boldsymbol{pro}}}^{\boldsymbol{rgb}}\left(\boldsymbol{x},\boldsymbol{y}\right)={\boldsymbol{\xi}}_{\boldsymbol{pro}}\left(\boldsymbol{x},\boldsymbol{y}\right)\times \boldsymbol{U}\left(\boldsymbol{x},\boldsymbol{y}\right) $$
(20)

Figure 6 is showing series of steps from active contour segmentation to final RGB image.

Fig. 6
figure 6

Image fusion using proposed MAP approach

3.4 Features extraction

Features give pattern information of the images, which can be later utilized in the classification phase [10, 26,27,28]. In the domain of medical, classification GI diseases showed much attention from last few years. For this purpose, several feature extraction and reduction techniques are proposed by researchers but still it contain set of challenges: a) color similarity between diseased and healthy regions; b) shape of diseased regions; and c) selection of most discriminant features. These problems are somewhat addressed in this article by combining three different types of features including color, LBP, and gray level co-occurrences matrices (GLCM) features.

In the first stage, we extract singular value decomposition (SVD) based color features, which are extracted from RGB mapped image. The purpose of SVD is to reduce degree of freedom in a complex system [32, 40]. As ξprorgb(x, y)is an RGB mapped image, and their extracted channels are ξRξG, and ξB.Let the rank of an RGB mapped image matrix isΔA of size (N × N), whereΔA∈ ξprorgb(x, y)andN2represents total number of pixels for each extracted channel.

$$ {\Delta}^A=H\boldsymbol{\delta} {V}^T $$
(21)
$$ {\Delta}^A=\left[\begin{array}{c}{h}_{1,1}{h}_{1,2}\dots \kern4.00em {h}_{1,N}\\ {}{h}_{1,1}{h}_{1,2}\dots \kern4.00em {h}_{1,N}\\ {}\vdots \vdots \vdots \kern1.75em \vdots \\ {}\vdots \vdots \vdots \kern1.75em \vdots \\ {}{h}_{N,1}{h}_{N,2}\dots \kern4.00em {h}_{N,N}\end{array}\right]\times \left[\begin{array}{c}{\delta}_1\kern3.25em 0\kern2em 0\kern2.75em 0\\ {}0\kern3.25em {\delta}_20\kern2.75em 0\\ {}0\kern3em 0\kern2.25em {\delta}_30\\ {}\vdots \vdots \vdots \kern3.5em \vdots \\ {}0\kern3.25em 0\kern2em \cdots \kern2.5em {\delta}_N\end{array}\right]\times \left[\begin{array}{c}{v}_{1,1}{v}_{1,2}\dots \kern4.00em {v}_{1,N}\\ {}{v}_{1,1}{v}_{1,2}\dots \kern4.00em {v}_{1,N}\\ {}\vdots \vdots \vdots \kern1.75em \vdots \\ {}\vdots \vdots \vdots \kern1.75em \vdots \\ {}{v}_{N,1}{v}_{N,2}\dots \kern4.00em {v}_{N,N}\end{array}\right] $$
(22)
$$ {\boldsymbol{\Delta}}^A={\sum}_{i=1}^{N^2}{\delta}_i{h}_i{v_i}^T $$
(23)

whereH and V are (N × N) oorthogonal matrix and T denotes the transposition of matrix V. Moreover, hi and vi denotes the H’s and V’s column vector, respectively. The diagonal elements of δ denotes by δi are called the singular values of ΔA and satisfied δ1 ≥ δ2 ≥ ⋯ ≥ δj ≥ δj + 1 = δN = 0. The size of extracted SVD color feature for each channel is (1 × 125) as shown in the Fig. 7. Thereafter, add three mean features for each channel in the SVD matrix, which is later used for classification.

Fig. 7
figure 7

Flow diagram of proposed features extraction &reduction process

Secondly, we extract LBP and GLCM features for texture analysis. Mostly, LBP features are utilized for face recognition [23] but from last few years, LBP features are used in the domain of medical imaging for disease classification [39]. The LBP operators label the pixels of given images by thresholding function. The thresholding function is performed on the neighborhood of each pixel of the given image and gives the output in binary form. The LBP features are calculated as follows:

$$ {\xi}_{LBP}\left(P,R\right)={\sum}_{p=0}^{P-1}s\left({O}_p-{O}_c\right){2}^p $$
(24)
$$ s(x)=\left\{\begin{array}{c}1\kern3.75em if\ x\ge 0\\ {}0\kern2.75em Otherwise\end{array}\right. $$
(25)

Where, Op denotes the neighborhood pixels, Oc denotes the central pixels, s(x) is a sign function, P denotes the symmetric neighborhood pixels, and R denotes the radius of a circle. The LBP gives the output vector of dimension N × 59, where N denotes the number of images which are utilized for feature extraction.

Finally, we extract 22 GLCM features [19] including contrast, entropy, difference variance, difference entropy, information measure of correlation 1, information measure of correlation 2, and few more. The extracted features are finally fused by simple concatenation method, which gives the output fused vector of size (N × 206). The fused vector is defined by Ψfused.

Thereafter, we implement a new but simple method of feature reduction-based probability distribution. The proposed feature reduction method is based on two steps. In the first step, we calculate the probability of a fused feature vector. Then select the higher probability features, which are later utilized in reduction function. The probability of a fused vector is defined as follows:

Let Ψfused(i) denote the features index of the fused vector, Pr denotes the probability value of each extracted feature i, which is defined as:

$$ \Pr (i)={\sum}_{i=1}^K\frac{x(i)}{K} $$
(26)

Wherex(i) denote the number of favorable features, K denotes the total number of features, and Pr(i) is the probability of each feature index. Then select a higher probability feature and put into reduction function to remove the irrelevant features.

$$ g(i)= argmax\ \left(\mathit{\Pr}(i)\right) $$
(27)
$$ {\boldsymbol{\upxi}}_{\boldsymbol{sel}}\left(\boldsymbol{FV}\right)=\left\{\begin{array}{c}{\xi}_{i\kern5em if\kern1.5em {\Psi}_{fused}(i)\ge g(i)}\\ {}0\kern5.5em Otherwise\end{array}\right. $$
(28)

Where ξsel(FV) is the final selected feature vector, which is later plugin to MLPNN [10] for classification. The cost function of MLPNN is:

$$ {F}_{cost}\left(x,y\right)=\Phi {\sum}_{q=1}^Q\left({\left( Target- Actual\right)}^T\left( Target- Actual\right)\right) $$
(29)
$$ {F}_{cost}\left(x,y\right)=\Phi {\sum}_{q=1}^Q{e_q}^2 $$
(30)

4 Experimental results and discussion

In this section, experimental results of the proposed method are presented in terms of both numerical and graphical plots. To show authenticity of proposed method, we collected 9000 WCE images from 6 patients, which are provided by POF Hospital Wah Cantt, Pakistan. The collected WCE images are divided into three different categories: a) ulcer, b) bleeding, and c) healthy, selected samples are shown in Fig. 1. These 9000 WCE images (3000 Ulcer, 3000 Bleeding, and 3000 Healthy), having resolution (381 × 321), are separated from 18 videos of 6 subjects – belong to any of the aforementioned category. The ground truth images are provided by a specialist doctor who assigned images a label, few sample ground truth images are shown in Fig. 8. Classification results are generated on MLPNN, but to provide a fair comparison, set of classifiers are selected including fine tree (FTree), quadratic discriminant analysis (QDA), linear SVM (LSVM), quadratic SVM (QSVM), cubic SVM (CSVM), Fine Gaussian SVM (FGSVM), Medium Gaussian SVM (MGSVM), fine KNN, medium KNN (MKNN), cosine KNN, cubic KNN, weighted KNN, Boosted Tree, and Bagged Tree. The performance of these classification methods is analyzed based on six statistical measures including sensitivity, AUC, FPR, FNR, accuracy, and computation time. The results are calculated in two different steps: a) ulcer segmentation results and b) classification results. All simulations are being done on MATLAB 2017b using personal desktop core I7 with 8 GB of RAM.

Fig. 8
figure 8

Segmentation resultsalong with their ground truth images

Description of classifiers in terms of selected parameters

Classifier

Description

FTree

Preset: Fine tree, Maximum number of splits: 100

Split criterion: Gini’s diversity index, Surrogate decision splits: off

QDA

Preset: Quadratic discriminant, Covariance structure: Full

LSVM

Preset: Linear SVM, Kernel function: Linear, Kernel scale: Automatic

Box constraint level: 1, Multi-class method: One-vs-One

Standardized data: true

QSVM

Preset: Quadratic SVM, Kernel function: Quadratic, Kernel scale: Automatic, Box constraint level: 1, Multi-class method: One-vs-One

Standardized data: true

CSVM

Preset: Cubic SVM, Kernel function: Cubic, Kernel scale: Automatic

Box constraint level: 1, Multi-class method: One-vs-One

Standardized data: true

FGSVM

Preset: Fine Gaussian SVM, Kernel function: Gaussian, Kernel scale: 11

Box constraint level: 1, Multi-class method: One-vs-One

Standardized data: true

MGSVM

Preset: Medium Gaussian SVM, Kernel function: Gaussian,

Kernel scale: 44, Box constraint level: 1, Multi-class method: One-vs-One

Standardized data: true

Fine KNN

Preset: Fine KNN, Number of Neighbors: 1, Distance Metric: Euclidean

Distance Weight: Equal, Standardize data: true

MKNN

Preset: Medium KNN, Number of Neighbors: 10, Distance Metric: Euclidean, Distance Weight: Equal, Standardize data: true

Cosine KNN

Preset: Cosine KNN, Number of Neighbors: 10, Distance Metric: Cosine

Distance Weight: Equal, Standardize data: true

Cubic KNN

Preset: Cubic KNN, Number of Neighbors: 10, Distance Metric: Minkowski, Distance Weight: Equal, Standardize data: true

WKNN

Preset: Weighted KNN, Number of Neighbors: 10, Distance Metric: Euclidean, Distance Weight: Square Inverse, Standardize data: true

Boosted Tree

Preset: Boosted tree, Ensemble method: AdaBoost, Learner type: Decision tree, Maximum number of splits: 20, Number of learners: 30

Learning rate: 0.1

MLPNN

Type: Feed Forward

Learning rate: 0.1

4.1 Segmentation accuracy

In this step, we present the segmentation accuracy of proposed using relation:

$$ Accuracy=\frac{TP+ TN}{TP+ TN+ FP+ FN} $$
(33)

WhereTPrepresents true positive values, TN denotes true negative values, FP denotes false positive values, and FN denotes the false negative values. These values are calculated by comparing each segmented image and their corresponding given ground truth image. The proposed CBCSF method is tested on all selected images of type ulcer and their few accuracy results are given in the Table 1. The maximum segmentation accuracy achieved is 97.46% and average accuracy of 87.9635%. A comparison is also being carried out with [14], which achieved maximum segmentation accuracy of 87.09%. The sample ulcer segmentation results with ground truth images are shown in Fig. 8, and bleeding segmentation results are shown in Fig. 9.

Table 1 Segmentation accuracy of proposed model
Fig. 9
figure 9

Bleeding segmentation from WCE images. a original image, b proposed segmentation method, c mapped RGB, and d border detection

4.2 Classification accuracy

In this section, we present proposed classification results in terms of accuracy, sensitivity, FNR, FPR, and AUC. The classification results are computed into five distinct scenarios - given in Table 2. In the first scenario, all three classes such as ulcer, bleeding, and healthy are selected to extract their LBP features. For this purpose, 50:50 training and testing method is opted, The 50% samples from each class are selected for testing, and results are validated using 10-fold cross-validation. The maximum testing accuracy of scenario 1 achieved is 99.60%, sensitivity rate of 0.996, FPR 0.000, FNR 0.4, and AUC is 1.00 on MLPNN, given in Table 3. Moreover, the best accuracy on some other supervised learning methods such as Fine KNN, CSVM, QSVM, and WKNN using same features are 99.50%, 99.40%, 99.00%, and 99.00%, respectively. In Fig. 10, a confusion matrix is presented, which shows the authenticity of MLPNN. Moreover, the computation time of each classifier including MLPNN is also provided, having best time of 11.60 s (MLPNN), however, the worst testing computation time is 41.26 s (Fine tree), Table 3. The computational time of all classification methods are given in Fig. 11

Table 2 Different scenarios of classification for gastrointestinal diseases
Table 3 Classification results on LBP features
Fig. 10
figure 10

Confusion matrix for LBP features using MLPNN

Fig. 11
figure 11

Computational time comparison of different classifier on LBP features

In the second scenario, GLCM features are extracted to perform classification. For classification 10 fold cross validation is performed and achieved maximum classification accuracy in terms of accuracy 97.3%, FNR 2.7%, FPR 0.013, sensitivity 0.971, and AUC is 0.9866 as given in Table 4. The classification accuracy of MLPNN is proved by the confusion matrix presented in Fig. 12. A second highest accuracy, sensitivity, and AUC is 97.2%, 0.970, and 0.9866, which is achieved on QSVM. The computation time for classification method is also calculated and best-achieved computation time is 8.44 s for MLPNN, however, the worst execution time is 35.08 and classification accuracy is 82.2%, which is achieved on MGSVM. Moreover, the computation time of all classification methods is also plotted in Fig. 13.

Table 4 Classification results for GLCM features
Fig. 12
figure 12

Confusion matrix for MLPNN on GLCM features

Fig. 13
figure 13

Computational time comparison of different classifiers using GLCM features

In the third scenario, color features are extracted from tested images. The color features are extracted from RGB mapped images having dimension 1 × 128. The extracted color descriptors consist of 125 SVD and 3 mean features. Thereafter, perform 10 fold cross-validation and obtained maximum results upto 99.7% on MLPNN, which is presented in Table 5. In Fig. 14, a confusion matrix is shown, which confirms the performance of MLPNN. Tables 3, 4, and 5 show that color features performs better for classification of GI diseases. However, the best computation time for color features is 9.02 s as shown in Fig. 15, which was achieved on MLPNN. It is a little bit higher than GLCM features due to an increase in the number of features. Because the dimension of GLCM features is 1 × 42, however, color features are 1 × 128 in dimension. But, from Tables 3 and 4, it is clearly shown that color features perform better as compared to LBP and GLCM features.

Table 5 Classification results using SCD features
Fig. 14
figure 14

Confusion matrix for MLPNN using SVD features

Fig. 15
figure 15

Computational time comparison of classifiers using SVD features

After computation of classification results on individual feature vectors, fused features using the serial-based method in the fourth scenario. The major aim of feature fusion is to improve the classification accuracy and also reduce the computation time. The fused features are store into one matrix and performed 10-foldcross-validation. The fusion process produces maximum classification accuracy is 99.5% on MLPNN as presented in Table 6. The performance of MLPNN is confirmed by a confusion matrix shown in Fig. 16. However, we noticed that the classification accuracy of MLPNN on fused features is less than color features (99.70%) and LBP features (99.60%) but it improves the performance in terms of computation time, which is 8.116 s. The computation time of LBP features is 11.61, GLCM features are 8.44, and 9.02 s for color features, whereas the computation time of MLPNN on a fused vector is 8.116 as shown in Fig. 17, which is good as compared to previous scenarios.

Table 6 Fusion of SVD, LBP, and GLCM features
Fig. 16
figure 16

Confusion matrix for fusion of SVD, GLCM, and LBP features on MLPNN

Fig. 17
figure 17

Computational time comparison of classifiers on fused features

Finally, the proposed feature selection method is applied to the fused feature vector and selects the best features to make the proposed method more reliable and efficient in terms classification accuracy and computational time. The LBP, GLCM, and color features are extracted from 50% testing images and fused in one matrix. Thereafter, select the best features and performed 10-fold cross-validation. The maximum achieved classification accuracy is 100% on MLPNN and FGSVM but theperformance of MLPNN is better as compared to FGSVM based on their computation time as given in Table 7. The performance of MLPNN is confirmed by confusion matrix given in Fig. 18. Moreover, the proposed selection results are proved through ROC plots as shown in Fig. 19. The ROC plots are shown for each class which reveals the maximum AUC and minimum FP rate.

Table 7 Classification results after the proposed feature selection algorithm
Fig. 18
figure 18

Confusion matrix after MLPNN classification using proposed feature selection method

Fig. 19
figure 19

Verification of proposed features selection results through ROC plot. The ROC of each class like ulcer, healthy, and bleeding are plotted separately

The classification results of Table 7 presented that the other supervised learning classification methods perform well on best-selected features and achieved average classification accuracy is 99.03%. The best computation time on selected features is 3.624 s which is achieved on MLPNN, however, the second-best time is 7.007 s for FTree as shown in Fig. 20. The all above results and discussion, it is clearly shown that the proposed method performs well for probability based best-selected features and achieved improved performance in terms of accuracy, sensitivity, FPR, and computation time. In addition, we also performed three validation methods such as Hold-Out, Leave-out-one, and K-fold for computation of the classification results using proposed features selection method- results are presented in Tables 8 and 9. From the stats, it is quite cleared that the proposed method still performs exceptionally by achieving the classification accuracies of 99.7% and 99.6% (Table 10).

Fig. 20
figure 20

Computational time comparison of classifiers on selected most discriminant selected features

Table 8 Classification accuracies comparison using three different cross-validation methods
Table 9 More statistical parameters based analysis of proposed results on MLPNN
Table 10 Comparison with existing methods

Moreover, a general comparison with existing methods is also presented in Table 8 in terms of accuracy and sensitivity rate. The time parameter is not added under the comparison table because in literature, most of the authors are not considering it and are only focused on the classification accuracy. From the analysis, it is concluded that the proposed method outperforms existing methods in terms of greater accuracy and sensitivity. Few labeled recognition results of proposed method are shown in Fig. 21.

Fig. 21
figure 21

Labeled results using proposed method from WCE images

5 Conclusion

A new method is proposed for detecting and classifying GI diseases from WCE images. Fusion of HSI active contour and a newly improved saliency method using MAP is the crux of this framework. We clearly verify from the statistical measures that all three selected classes are accurately classified with proposed by achieving a maximum accuracy of 100% and best computation time of 3.624 s. Moreover, we make use of multiple features including color, LBP, and GLCM to generate a robust feature set – comprising all good range of features.

In the above discussion, we conclude that color transformation plays a key role in ulcer segmentation from WCE images. It not only highlights primary infected regions but also the regions where RGB color space fails to reveal a difference. Moreover, fusion of set of features worked well. In the coming articles, we will be focusing on fusing more number of features as well as interested in adding additional steps of feature selection and dimensionality reduction. It will not only select most discriminant features but also computationally not very expensive in terms of time. Additionally, in the future, a deep CNN method (DenseNet, Inception V3) will be implemented by applying transfer learning using more than 20,000 images from a greater number of patients.