Combining deep features and hand-crafted features for abnormality detection in WCE images

Amiri, Zahra; Hassanpour, Hamid; Beghdadi, Azeddine

doi:10.1007/s11042-023-15198-z

Combining deep features and hand-crafted features for abnormality detection in WCE images

Published: 25 May 2023

Volume 83, pages 5837–5870, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Combining deep features and hand-crafted features for abnormality detection in WCE images

Download PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In this paper, a computer-aided method is proposed for abnormality detection in Wireless Capsule Endoscopy (WCE) video frames. Common abnormalities in WCE images include ulcers, bleeding, Angiodysplasia, Lymphoid Hyperplasia, and polyp. In this paper, deep features and Hand-crafted features are combined to detect these abnormalities in WCE images. There are not sufficient images available to train deep structures, therefore the ResNet50 pre-trained model is used to extract deep features. Hand-crafted features are associated with color, shape, and texture. We used a novel idea to reveal unexpected color changes in the background due to existing lesions as a color feature set. Histogram of gradient (HOG) and local binary pattern (LBP) were used respectively for shape and texture features. They are extracted from the region of interest (ROI), i.e. suspicious region. The expectation Maximization (EM) algorithm is used to extract more distinct areas in the background as ROI. The expectation Maximization (EM) algorithm is configured in a way that can extract areas with a distinct texture and color as ROI. The EM algorithm is also initialized with a new fast method which leads to an increase in the accuracy of the method. A large number of features are created by the method, so the minimum redundancy maximum relevance approach is used to select a subset of more effective features. These selected features are then fed to a Support Vector Machine for classification. The results show that the proposed approach can detect mentioned abnormalities in WCE frames with the accuracy of 97.82%

Abnormalities detection in wireless capsule endoscopy images using EM algorithm

Article 28 May 2022

A survey of feature extraction and fusion of deep learning for detection of abnormalities in video endoscopy of gastrointestinal-tract

Article 03 August 2019

Weakly supervised multilabel classification for semantic interpretation of endoscopy video frames

Article 25 May 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Wireless capsule endoscopy (WCE) is a capsule shape and non-invasive tool that is equipped with a camera to record video from the entire digestive tract [34]. This device has many advantages over other conventional methods. It can access organs like small bowel that are inaccessible with other common methods [27, 45]. It also can take realistic images from body organs compared to other non-invasive devices such as CT-scan.

The recorded video of this technology usually contains more than 50,000 frames [26]. Some frames may contain abnormality, but the abnormalities are present in a few video frames, and also, the size of these abnormalities is usually small compared to the background. Inspecting recorded video to find the symptoms of a disease (abnormality) is a tedious task for physicians because it usually takes more than 1.5 h [18, 39]. These can lead to a notable miss detection rate by the specialist. Therefore, a computer-aided method is essential to detect suspicious frames containing abnormality automatically and suggest them to a physician for further investigation.

Different abnormalities may be present in the WCE images, but the most important abnormalities are bleeding, ulcers, angiodysplasia (AD), polyps, and lymphoid hyperplasia (LH). These lesions sometimes occur in the small intestine. This organ is investigated by WCE very easily, but it is so difficult with other common methods [45]. In this study, we focused on providing a computer-aided method to identify these lesions.

The most common abnormality is bleeding, which is seen as red and brown spots in the image [23]. AD lesions are small vascular malformations of the gut with a cherry-red appearance, and they are one important reason for bleeding in the gastrointestinal tract [25]. Ten percent of the world's population suffers from ulcers. Ulcers can be a symptom of very serious diseases [49]. It usually appears in light-gray or white color and with a spot pattern [37]. LH occurs in the rapidly increasing lymphocytic cells. It has a spherical shape with color of light yellow or light gray [21]. Polyps may be precancerous lesions, which can be seen as protruding from the mucosal wall [6]. In Fig. 1, different WCE images with abnormalities considered in this study are shown.

There are two approaches to designing a computer-aided diagnosis system. Traditional approaches are based on Hand-crafted feature extraction, and recent approaches are based on deep learning methods. There are many traditional approaches for WCE abnormality detection. In our recent works, we also focused on the traditional methods for abnormality detection [4, 6, 7]. Recently, deep learning methods based on the Convolutional Neural Networks (CNNs) have surpassed traditional methods on the medical image processing and diagnosis system. However, lack of large and public dataset with annotation for WCE images is a great challenge in training CNNs for abnormality detection. Indeed, training a deep structure needs a large and well-balanced dataset to optimize a tremendous number of parameters.

Due to this challenge, using pre-trained models and transfer learning is an effective technique to take advantage of deep learning methods. In transfer learning, a CNNs model, that is pre-trained with a dataset with sufficient data from other domains, is used to extract features from real data. In this paper, ResNet50 pre-trained model is used to extract deep features, and also hand-crafted features are extracted. We will show that we can surpass the current WCE classification systems, which focus on one traditional or deep approach.

In the proposed method, to extract deep features, we used ResNet50 pre-trained structure as the feature extractors, and the features are extracted from the AveragePooling2D layer. For extracting hand-crafted features, firstly, the region of interest (ROI), i.e., the suspicious region is extracted by using Expectation–Maximization (EM) algorithm. The proposed ROI extraction method is able to extract distinct areas in terms of color and texture from the background. Then, suitable features associated with the color, texture, and shape are extracted from the ROI.

To extract the color features, first the more worthy color channels in which the lesions are seen more distinctly are identified, and eight statistical features such as the mean of the pixels from these channels are extracted. Also, the histogram of these channels is used to extract one other color feature type. Uniform Local Binary Pattern (uniform-LBP) [46] has been used to extract texture features. Finally, Histogram of Gradient (HOG) [11] has been used to describe shape of ROI.

To reduce the number of features produced from feature extraction step, a subset of features is selected using the maximum relevance and minimum redundancy (MRMR) algorithm. Finally, to classify images into six classes (normal, image with bleeding, AD, polyp, LH, or ulcer lesion) the selected features are fed to a Support Vector Machine (SVM). The main contributions and novelty of this study are listed below:

Introducing a novel method to extract more distinct area in image background as ROI considering the color and texture characteristics.
Proposing a new fast segmentation method for WCE image that is used for EM initialization.
Combining deep and Hand-crafted features to simultaneously classify several types of WCE images, including bleeding, AD, polyp, LH, ulcer, and normal for the first time.

This paper is organized as follows. The hypothesis and limitations are mentioned in Sect. 2, In Sect. 3, the related works are reviewed. Then, Sect. 4 is devoted to explaining the proposed method. Several experiments are done, and the method is evaluated in Sect. 5. Finally, in Sect. 6, the conclusions of the research and future works are mentioned.

2 Hypothesis and Limitations

WCE abnormality detection system faces several limitations and challenges. The produced images by this technology have less quality with respect to traditional endoscopy or colonoscopy. The high similarity between some WCE frames with different abnormalities complicates the identification process (see Fig. 2). WCE images may also suffer from low brightness, noise, blurriness, and low resolution [29].

The assumption considered to solve the problem is that there is only one type of abnormality in each frame. The abnormalities considered include bleeding, AD, polyp, LH and ulcer.

3 Related Works

Many researchers introduced computer-aided systems for abnormality detection in WCE frames [5, 22, 41, 43]. Various types of abnormalities, including ulcers, LH, polyps, tumors, bleeding, AD, and Crohn's disease, can exist in WCE images. The most of existing researches considered bleeding or ulcer lesions in their study [5, 29, 41]. Few studies existed in the AD or polyp detection in WCE images [45, 46, 49]. According to our research, LH was considered just in our recent work [6]. Whereas, due to the similarity of this lesion to malignant lymphoma, it is very important for doctors to diagnose it.

Bleeding detection was investigated in various studies. In Yuan’s method [50], a bleeding detection method based on color histogram analysis was proposed. In Yuan’s method, colors such as blue, which were rarely seen in WCE images, were not present in the color histogram. For this purpose, the colors in the images were divided into k clusters using the k-means algorithm, and the centers of these clusters were considered as the words in the histogram. Then, each pixel was mapped to the nearest word and the words with the number of pixels assigned to them, formed a histogram. The produced histogram was considered as the feature vector, and SVM algorithm classified WCE images into bleeding or normal ones. This method could only be used to identify lesions with a distinct color from the background.

Caroppo et al. [8] introduced a deep transfer learning method for bleeding detection. In the introduced method the features were extracted from three CNN models, including ResNet50, InceptionV3, and VGG19. Then, a supervised machine learning method classified features into normal and bleeding classes. Another bleeding detection method was introduced by Hajabdollahi et al. in Ref. [20]. In this method, at first, the informative components in different color spaces were recognized. Then, a simpified structure of CNN was used for the detection process. The simplification was done by using simultaneous quantization and pruning methods. In the result section, we compare our method with these three bleeding detection methods. In [33] also a deep transfer learning method was used to detect bleeding frames in WCE images. In this method in Xception pre-trained model the fully connected layer was removed and it was replaced with layers that is compatible with the number of existed classes. Then network is learned with a faster learning rate in the new layers and very slow learning rate in the remaining layers.

Different computer-aided methods investigated AD lesions. Deeba et al. [15] introduced a saliency map-based method to detect AD lesions. The saliency map combine a color distinctness map and a pattern distinctness map. The color map was the logarithmic ratio of the red component with respect to the green component in RGB color mode. The pattern map also was acquired by computing the distance of all overlapping patches with the average patches in the image. The accuracy of this method was considerable on a dataset with 3602 images, but this method just could be used for lesions like AD that has red color. While our method is capable to detect lesions with different colors.

In Fonseca et al. [17] method also transfer learning method was used to classify WCE into normal or abnormal classes containing three different abnormalities (Angiectasia, Blood-Fresh, and Polyp). In this method, the features vector was extracted from the deepest layer of the pre-trained CNN model and then the focal loss function was used for binary classification. In our recent work [7], we introduced a method for bleeding and AD lesion detection. In the method, firstly, the ROI was extracted by EM algorithm, then, color features based on histogram and statistical properties were extracted from ROI, finally, multilayer perceptron was used to classify WCE images. In this work, we were limited to detecting red lesions. The present study has been developed with respect to our recent work [7] in several aspects in order to detect lesions with different colors and textures. EM algorithm is developed to extract lesions with distinct colors and textures and is not limited to red lesions. Also, we proposed a fast method for EM initialization. Furthermore, we extend the color histogram-based features that reveal unexpected red color changes in background to revels the color change of the lesions that is used in present study.

4 Proposed Method

In this research, a novel method is proposed to detect the most common abnormalities in WCE images, including ulcer, bleeding, AD, LH, and polyp. The main steps of the proposed method are shown in Fig. 3. Each of these steps will be described below.

4.1 Preprocessing

Some information is reported in the boundary area of WCE images. In the preprocessing step, a circular mask is applied to the image for eliminating these texts that can affect the detection process. In Fig. 4, the process of eliminating these texts in a frame margin is shown. In the considered mask, each pixel inside the circle has a value of one, and outside the circle has a value of zero. This mask is multiplied in the original image. As a result, the values inside the circle will be the values of the original image, and the values outside the circle will be zero.

4.2 Extract Hand-crafted features

Hand-crafted feature extraction consists of three steps including image enhancement, ROI extraction, and feature extraction.

4.2.1 Image Enhancement

WCE frames usually suffer from low lightening. Therefore, to have a more accurate classification, we use fast efficient algorithm introduced in [47] for the enhancement of low-brightness images. This method, at first, inverts the image as Eq. (1) where I is the low-lighting image and $P$ is the inverted image. ${I}^{c}(x)$ is also the intensity of the image pixels in one color channel. Then the de-haze algorithm is applied to the inverted image using Eqs. (2) and (3). In Eq. (2) the hazy image is modeled. In this equation,$O\left(x\right)$ and $t\left(x\right)$ are respectively the intensity of original object and the amount of light reaches the camera from object. A is also the global atmospheric light.

In the de-hazing algorithm, $O\left(x\right)$ can be recovered from $I\left(x\right)$ by estimating $A$ and $t(x)$. To estimate A, firstly 100 pixels whose minimum intensities among all three channels of RGB color space are highest in the image are selected. Then among the selected pixels, the pixels that the sum of RGB values is highest is chosen. $t(x)$ is also estimated by Eq. (3) where $\theta$ is set to 0.8 and $\beta \left(x\right)$ is a square block around x with the size of 9. This setting is borrowed from the literature [47]. Finally, the image is inverted again. In Fig. 5, the results of applying this method on two frames are shown.

$${P}^{c}\left(x\right)=255-{I}^{c}(x)$$

(1)

$$P\left(x\right)=O\left(x\right)t\left(x\right)+A(1-t\left(x\right))$$

(2)

$$t\left(x\right)=1-\theta \underset{c\in \{r,g,b\}}{\mathit{min}}(\underset{y\in \beta (x)}{\mathit{min}}\frac{{P}^{c}(y)}{{A}^{C}})$$

(3)

4.2.2 ROI Extraction

ROI selection is an important step in Hand-crafted feature extraction. The lesion areas are very small relative to the background in most WCE frames. So, extracting features from the full image can lead to low accuracy in detecting lesions. Therefore, one solution is dividing images into patches and extracting features from each patch, which leads to the extraction of a large number of features. Another solution is selecting the ROI and then extracting features from this area. In the proposed method, ROIs are extracted based on the combination of the distinctive characteristics of the texture and color of lesions. In the following, before introducing the ROI extraction method, the texture map used in the ROI extraction method is explained.

4.2.3 Texture map

The texture map is extracted based on the Margolin et al. method introduced in [28]. In the method, the WCE frame is divided into overlapping patches with the size of 3 × 3 pixels, and the average of all patches (average patch) is calculated. Then, the L1 norm (Manhattan distance) of each patch with the average patch in the PCA coordinate is calculated, and the length of this path is considered as the distance of the patch. A patch is more distinctive if its distance is longer than other patches. In the first column of Fig. 6, four WCE images with different lesions are shown, and the related texture maps extracted by this method can be seen in the second column. Finally, a Gaussian filter with a standard deviation of 3 is applied to the extracted texture map to make the lesion area smoother, which helps the segmentation in the next step. In the experimental result section, we will show that the best value for the standard deviation is 3. This value was obtained by trial and error. In the third column of Fig. 6, the blurred texture maps after applying this filter are shown.

It is notable that the heterogeneous patches in an image are just used to calculate PCA more quickly. For selecting heterogeneous patches, at first, the simple linear iterative clustering (SLIC) algorithm is used to divide the image into 200 patches, and then 50% of patches with higher variance are kept.

4.2.4 EM algorithm for image segmentation

Investigation of the distribution of the pixels in the lesion and non-lesion areas shows that they have different distributions in terms of color and texture. This finding is illustrated in Fig. 7. It can be deducted from the figure, the distributions of pixels in all channels of RGB color space and texture map in both areas are near normal distribution but with different parameters (mean and variance). Therefore, to create discrimination between areas with and without lesion, WCE image pixels in the different channels of RGB color space and texture map can be considered in a combination of several multivariable normal distributions.

Therefore, WCE images can be segmented using this finding that the pixel’s distribution of lesion areas is different from other areas. Indeed, the parameters of the different normal distributions in the image must be calculated, and then each pixel can be assigned to the distribution that pixels have the most probability to occur in that distribution. The EM algorithm can solve this problem. Indeed, when we have a training set D=$\left\{{x}^{1},{x}^{2},\dots ,{x}^{n}\right\}$ without any labels for data, but just we know they are generated by k distinct distributions, the EM algorithm can estimate the parameters of the distributions.

In this algorithm, pixels are data points, and the label of pixel ${x}^{i}$ is defined as ${l}^{i}$. The probability of belonging ${x}^{i}$ to class j $\left(p\left({l}^{i} =j\right)\right)$ is considered as ${\theta }_{j}$, which $\theta$ has a multinomial distribution, where $\theta_j\geq0\;\mathrm{and}\;\sum\nolimits_{j=1}^k\theta_j=1$. We also know that a multivariable normal distribution, with ${\mu }_{j}$ and ${\Sigma }_{j}$ as mean and covariance, is the distribution of data in a class j $\left(x^i\left|l^i=j\right.\right)$. EM algorithm models data by using Eq. (4) which is the log-likelihood of data.

$$\mathcal{l}\left(\theta ,\mu ,\Sigma \right)=\sum\limits_{i=1}^{m}log\sum\limits_{{l}^{i}=1}^{k}p\left({x}^{i}|{l}^{i};\mu ,\Sigma \right)p({l}^{i};\theta )$$

(4)

This equation is estimated with two iterative steps, including E-step and M-step. In the E-step, Eq. (5) is evaluated using the current estimate for the parameters. Equation (5) is a function for the expectation of the log-likelihood, and shows the probability of ${x}^{i}$ belonging to class j. In M-Step in Eqs. (6)-(8), the parameters are updated based on guesses in E-step. After convergence of the algorithm, each pixel belongs to the most probable distribution. Therefore, the segmented image is extracted.

$${w}_{j}^{i}=\frac{\frac{1}{(2\pi {)}^\frac{n}{2}|{\Sigma }_{j}{|}^\frac{1}{2}}\mathit{exp}\left(-\frac{1}{2}{({x}^{i}-{\mu }_{j}{)}^{T}\Sigma }_{j}^{-1} ({x}^{i}-{\mu }_{j})\right). {\theta }_{j}}{{\sum }_{h=1}^{k}\frac{1}{(2\pi {)}^\frac{n}{2}|{\Sigma }_{l}{|}^\frac{1}{2}}\mathit{exp}\left(-\frac{1}{2}{({x}^{i}-{\mu }_{h}{)}^{T}\Sigma }_{h}^{-1} ({x}^{i}-{\mu }_{h})\right). {\theta }_{h}}$$

(5)

$$\theta_{\mathrm j}=\frac1{\mathrm m}\sum\limits_{i=1}^me_j^i$$

(6)

$$\mu_{\mathrm j}=\frac{\sum_{i=1}^me_j^ix^i}{\sum_{i=1}^me_j^i}$$

(7)

$$\Sigma_{\mathrm j}=\frac{\sum_{i=1}^me_j^i{(x}^i-\mu_j)(x^i-\mu_j)^T}{\sum_{i=1}^me_j^i}$$

(8)

4.2.5 EM algorithm initialization

An important step in the EM algorithm for more accurate and faster convergence is choosing an appropriate initial starting point for the EM algorithm. For this purpose, the image pixels are divided into k segments by using a very fast method and this segmentation is considered as the starting point of the EM algorithm.

For EM initialization, we propose a fast method for ROI extraction. In this method, at first, a joint normal distribution is fitted on the image pixels, which is noted by $X\sim \mathcal{N}\left(\mu ,\Sigma \right).$ Where X is a three-dimensional vector of image pixels in RGB color space ($X={\left[R,G,B\right]}^{T}$). Indeed, the size of X is a $3\times (m.n)$ matrix, where the image size is $(m\times n)$. The parameters in the joint normal distribution are mean vector ($\mu$) and $3\times 3$ covariance matrix ($\Sigma$).

$$\mu =E\left[X\right]={\left[E\left[R\right], E\left[G\right], E\left[B\right]\right]}^{T}$$

(9)

$${\Sigma }_{i,j}=Cov\left[{X}_{i},{X}_{j}\right]$$

(10)

where $E[.]$ is expected value and covariance is $Cov[.,.]$. ${\Sigma }_{i,j}$ is the covariance between values of the pixels in component i and j in color space. Then, the probability density function (PDF) of each image’s pixel is calculated by the Eq. (11). In this equation, $|.|$ is the matrix determinant.

$$f\left(x\right)=\frac{1}{\sqrt{{\left(2\pi \right)}^{3}\left|\Sigma \right|}}\mathrm{exp}\left(-\frac{1}{2}{\left(x-\mu \right)}^{T}{\Sigma }^{-1}\left(x-\mu \right)\right)$$

(11)

To fast segmentation, at first, the complement of $f\left(x\right)$ is calculated by $\overline{f(x)}=1-f(x)$ for all pixels and then image is then divided into k levels using the Otsu multi-level thresholding method [32]. For an image with L gray levels $\{0, 1, . . . , L-1\}$ the image histogram can be defined as $\left\{{f}_{0}, {f}_{1}, \dots , {f}_{L-1}\right\}$ where ${f}_{i}$ is the occurrence frequency of level$i$. In Otsu method image is segmented into k clusters by selecting optimal threshold values from the set$T = \left\{\left({t}_{1}, {t}_{2}, . . . , {t}_{K}\right)\right|0 < {t}_{1} < {t}_{2} < . . . < {t}_{K}< L\}$. The optimal thresholds are those that maximize the variance between clusters. The best value for k is 5, which is specified in the second experiment in the result section. So, this segmentation is considered as an initial state for EM algorithms. In Fig. 8, initial segmentation for one WCE image with AD lesion is shown.

4.2.6 ROI selection

A segmented image is the output of the EM algorithm. Now, one segment must be selected as the ROI. As mentioned before, in the texture map, the lesion area is more distinctive versus other areas, and this area has a higher value than the background. Therefore, to select the ROI segment, for each segment, the mean value of pixels in the texture map is calculated, and the ROI is the segment with the highest mean value. Finally, in the selected ROI may some small dots or lines exist; hence, blobs with eccentricity > 0.9 (ellipse with eccentricity zero is a circle, while ellipse with eccentricity one is a line segment) and those smaller than 100 pixels are removed to extract final ROI segment. The result of applying the proposed ROI extraction method on WCE images with different lesions is illustrated in Fig. 9.

4.2.7 Feature Extraction

In this step, we extract three types of descriptors, including color, texture, and shape to detect different lesions.

The extract color descriptors, at first the more worthy color channels in different color spaces for representing each abnormality is identified by the method that we introduced in [7]. In this method, for each abnormality and for each channel of RGB, LAB, HSV and YCbCr color spaces, we calculate one worthiness measure. To calculate this measure, for one abnormality in a specific channel, 30 percent of existing images with this abnormality in the dataset are selected randomly. For each image the Normalized Cumulative Histogram (NCH) is calculated, and then the lesion area is removed from this image, and the NCH is calculated for this image again. In the next step, the mean of absolute difference (MAD) of the two NCHs related to this image and the corresponding abnormal region deleted image is calculated. Finally, the average of MAD for all selected images is considered as the measure for that channel.

In Table 1, the first two worthy color channels for each abnormality are reported. From the table, the more worthy channels for identifying abnormalities are ‘a’ and ‘b’ in CIELab، ‘R’ in RGB,‘Cr’ and ‘Cb’ in YCbCr and ‘V’ in HSV. In continue, two sets of color features are extracted from these channels.

Table 1 The first three worthy channels for different abnormality

Combining deep features and hand-crafted features for abnormality detection in WCE images

Abstract

Similar content being viewed by others

Abnormalities detection in wireless capsule endoscopy images using EM algorithm

A survey of feature extraction and fusion of deep learning for detection of abnormalities in video endoscopy of gastrointestinal-tract

Weakly supervised multilabel classification for semantic interpretation of endoscopy video frames

Explore related subjects

1 Introduction

2 Hypothesis and Limitations

3 Related Works

4 Proposed Method

4.1 Preprocessing

4.2 Extract Hand-crafted features

4.2.1 Image Enhancement

4.2.2 ROI Extraction

4.2.3 Texture map

4.2.4 EM algorithm for image segmentation

4.2.5 EM algorithm initialization

4.2.6 ROI selection

4.2.7 Feature Extraction

4.3 Extract Feature from Pre-trained Model

4.4 Feature selection

4.5 Abnormality Detection

5 Experimental Results and Discussion

5.1 Datasets

5.2 Evaluation Criteria

5.3 Results

6 Conclusions

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation