Abstract
Visualization of inner gastrointestinal (GI) tract is an important aspect in diagnosis of diseases such as the bleeding and colon cancer. Wireless capsule endoscopy (WCE) provides painless imaging of the GI tract without much discomfort to patients via near-lights imaging model and with burst light emitting diodes (LEDs). This imaging system is designed to minimize battery power and the capsule moves through the GI tract with natural peristalsis movement and the color video data are captured via wireless transmitter in the WCE. Despite the advantages of WCE videos, the obtained frames exhibit uneven illumination and sometimes result in darker regions that may require enhancement afterwards for better visualization of regions of interest. In this work, we extend a human visual system (HVS) based image enhancement model that uses a feature-linking neural network model based on timing precisely of the spiking neurons. Experimental results on various WCE frames show that we can obtain better enhancement of regions of interest and compared to other enhancement approaches in the literature we obtain better quality restorations in general. Further, we show the applicability of our enhancement method on improving the automatic image segmentation, and 3D shape from shading visualization reconstruction indicating that it is viable to be used within a computer-aided diagnosis systems for GI tract diseases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 INTRODUCTION
Colonoscopy is the imaging standard of choice for visualization as well as identifying various abnormalities of human gastrointestinal (GI) tract. Though widely used colonoscopy is a painful process that require careful preparation and monitoring of the patients that undergo the procedure. Wireless capsule endoscopy (WCE) is an exciting imaging system that uses a pill-sized camera that can transmit hours of video data wirelessly on a body-worn receiver [1–3]. The video data can then be used in a computer for analysis (see Fig. 1a). As the WCE is a wireless imaging system, it moves with the aid of natural peristalsis and this in turn makes the capsule tumble, and turn in an unconstrained manner (see Fig. 1b). Despite the opportunity to image the inner GI tract with little or no pain, compared to the traditional tethered models of colonoscopy systems, the WCE images pose challenges to automatic computer vision systems in terms of image quality [4–8].
One of the important image processing tasks in the WCE imagery is that of enhancement of uneven illuminated images due to the unconstrained motion of the camera system. There have been significant progress made in contrast enhancement in natural images, and also some prior works used them for solving the contrast enhancement in WCE images. Here we highlight the relevant ones to our work. Majority of the traditional enhancement models use the image histogram computations, such as the histogram equalization [9] or contrast limited adaptive histogram equalization [10]. Other techniques include the application of spectral optimal contrast tone mapping [11], and inverse diffusion model that uses a partial differential equation [12]. Extension of the histogram equalization to color RGB images is straightforward. Recently, there have been efforts to incorporate human visual system (HVS) based models [13] that mimic our visual system approach in solving color constancy. However, to the best of our knowledge these models have not yet been applied to WCE or even general endoscopy procedures that usually contain nonuniform illumination and specular artifacts, dark regions, etc. In this work, we propose to use a consistent model that not only uses HVS driven approach, but also obtains improved enhancement compared to traditional histogram and other models applied to WCE images. Our work follows the feature-linking model (FLM) by converting the color red, green, and blue (RGB) images into hue, saturation, and value (HSV) space [14] for boosting chromaticity thereby avoids streaking artifacts associated with histogram type models. The FLM approach is based on the precise timing spikes in neuronal elements and is inspired by neuroscience of explicit times of spikes that occur in neural representations. We choose appropriate parameters in the FLM model and apply to frame-by-frame WCE videos as well.
Experimental results on various WCE images indicate that our approach obtains better quality enhancement without artifacts such as streaking, saturation, and color mix-up. We also provide comparative experimental results against the previous enhancement models [9–11] from the literature. Further, quantitatively we prove that our model obtains better results than existing enhancement techniques in terms of image quality metrics and thereby show that it can be a viable preprocessing task in automatic computer-aided diagnosis systems for WCE. To show the applicability of enhancement prior to applying automatic image processing tasks such as the mucosa-lumen segmentation [15, 16], and 3D reconstruction with shape from shading technique [17], we show example use cases indicating the improvements obtained.
We organized our paper as follows. In Section 2 we introduce our model for enhancing WCE images via HVS consistent neural spiking approach. In Section 3 experimental results and comparison with related enhancement models are given to highlight the performance of our model. In Section 4 we conclude the paper.
2 CAPSULE ENDOSCOPY ENHANCEMENT WITH HVS CONSISTENT METHOD
2.1 HVS Consistent Model
We follow Zhan et al. [13] for a feature-linking model (FLM) model that is based on spiking neurons, thereby modeling the neuronal mechanism inspired by human vision system. We can utilize the FLM model for enhancing digital images using the timing of the first spike where the majority of the image information is contained. In the FLM, each neuron corresponds to given image pixel and the intensity values are encoded in the stimulus. In this work, we use the WCE frame that needs to be enhanced as input to the FLM and computed time matrix provides the output enhanced frame. The FLM method is based on HVS and simulates the Mach band effect and this processing mechanism is consistent with the Weber-Fechner law. The overall process of FLM can be described in three main constituents:
▪ membrane potential,
▪ threshold value, and
▪ action potential.
We show the flow of the FLM approach in Fig. 2a, and steps involved in computing the time matrix \(T\) starting with the stimuli matrix \(S\) in Fig. 2b. The involved terms \(U\), \({\Theta }\), \(Y\) are described next.
The neural membrane potential term \({{u}_{i}}_{j}\) can be written as
Here the notation (\(i,j\)) represents each neuron and neighbors of the neuron represented by (\(k,l\)) or (\(p,q\)), \({{Y}_{{ij}}}\) is the postsynaptic action potential, \({{S}_{{ij}}}\) is the stimulus given to the neuron (\(i,j\)), \({{M}_{{ij,kl}}}\) is the synaptic weight given to the feeding inputs, \({{W}_{{ij,pq}}}\) is the synaptic weight applied to linking inputs, \(f\) attenuation time constant, \(d > 0\) is a constant for global inhibition, \(\alpha \) feeding strength, and \({\beta }\) is the linking strength. The input to the threshold function is based on the post synaptic action potential and can be written as
Here, \(h\) is the magnitude adjustment, \(g\) is the attenuation time constant, and \({{Y}_{{ij}}}\) is the postsynaptic action. In order to produce an action potential, we compute a threshold based on \({\Theta }\):
2.2 Overall Flow
In the FLM overall flow (see Fig. 2b) the stimuli matrix \(S\) to pixel values of a given WCE frame \(I\) (input image) with neurons at every pixel location (\(i,j\)) with \(r \times c\) be the total number of neurons, where \(r\) and \(c\) are numbers of neurons by a row and a column, respectively. First, we convert a given WCE RGB frame \(I\) to the HSV space and apply the min-max normalization
Here, we assumed that the input image \(I\) is of 8-bit coded and set a small value of \(\epsilon > 0\). The above normalization makes the stimuli matrix \(S\) values greater than zero since neurons cannot be captured and never fire if their thresholds are positive.
3 EXPERIMENTAL RESULTS
3.1 Setup, Parameters, and Data
We set optimal parameters of HVS consistent for the FLM approach based on experiments conducted on various WCE videos. We apply a presmoothing with Gaussian filter to the following attenuation time constant \(f\):
with \({{c}_{0}},{{c}_{1}} > 0\), and standard deviation \({{{\sigma }}_{f}} = 1\). We further fix constants in the FLM method as \(h = {{2}^{{10}}}\), \(d = 2\), \(g = 0.911\), \(\alpha = 0.01\), \(\beta = 0.03\). Next, initial values of \(U\), \(Y\), \(T\) are all set to 0. The threshold \({{{\Theta }}_{{ij}}}\) is initially set as follows
where \(L\) is the Laplacian derivative operator [17], \( \otimes \) is the convolution operation. Weight matrices for synaptics are given with the following numerical values
We experimented with various parameter sweep options and these parameters are found to be optimal and does not yield drastic changes in the obtained enhanced images in WCE videos. This is also similar to the observations of [13] and the parameter \({\beta }\) is perhaps the important parameter that directly affects the final contrast values of the enhanced images. Thus, setting this at different values and benchmarking against visual scoring of gastroenterologists is an interesting future direction for visual experiments domain. We concentrate here on benchmarking against automatic contrast enhancement methods instead and leave the benchmarking with humans as a potential future work that will involve inter-observer variability among various experience levels in gastroenterologists. We implemented the method frame-by-frame on a MATLAB enabled MacBookPro Laptop with CPU core i7 with 8 GB RAM. The method takes 0.50 seconds for getting the enhanced image of a given color RGB image size \(512 \times 512\), and this can further needs to be reduced as the typical WCE hardwares now produce 4 to up-to 30 frames per second in the state-of-the-art systems.
3.2 Comparison with Other Enhancement Models
We conducted experiments on various WCE videos that consist of frames with normal mucosal regions, dark regions, illumination problems, intestinal juices, bleeding, polyps, diverticula etc. We use these images to compare and benchmark with different enhancement methods, histogram equalization (HISTEQ) [9], contrast limited adaptive histogram equalization (CLAHE) [10], spectral optimal contrast tone mapping (SOCTM) [11], inverse diffusion (INVDIFF) [12], and the FLM method. As can be seen in Fig. 3, where the top row contains the original input images that need enhancement and different enhancement models optimal outputs. The CLAHE results over-amplify the regions that are over-illuminated thereby obtaining a washed out appearance devoid of mucosal information. This can be detrimental in diagnosis based on vasculatures [4, 6]. In contrast to these results, our approach obtains better enhancements of obscured regions of interest and is not over-saturate the bright mucosal surfaces.
To quantitatively compare the output obtained by various enhancement models and to asses image quality we utilized the following blind metrics (as the ground-truth is not available).
▪ Local contrast (LC):
where \(\mathcal{L} = \left( {I \circ I} \right) \otimes \mathcal{E} - \left( {{{\mathcal{L}}_{m}} \circ {{\mathcal{L}}_{m}}} \right)\), \({{\mathcal{L}}_{m}} = I \otimes \mathcal{E}\), with \( \circ \) is the Hadamard product, that is the elementwise multiplication, \( \otimes \) is the convolution operation, \(I\) is the input image, \(N = r \times c\), where \(r\) is the number of pixels in the vertical dimension, and \(c\) is the number of pixels in the horizontal dimension. The matrix \(\mathcal{E}\) is based on the identity matrix
We note that the higher values indicate better enhancement than lower values.
▪ Spatial frequency (SF):
where \({{R}_{F}}\) is the row frequency, and \({{C}_{F}}\) is the column frequency computed as follows
▪ Mean gradient (MG):
where \({{G}_{{ij}}}\) represents the mean gradient magnitude of the image at a pixel (\(i,j\)), with higher values representing better quality.
Table 1 shows quantitative comparisons of different enhancement models with the no-reference (blind) image quality metrics LC, SF, MG. Results are given for average across 30 different WCE frames which contain various structures to represent the heterogenous nature of WCE videos. Overall values indicate better enhancement across images and the HVS driven model outperformed other models consistent with visual results. Further, the model obtained higher values in all three metrics indicating the promise of using our approach for WCE video enhancement with structure preservation.
3.3 Applications
As an application and to show that the enhancement model can be a viable preprocessing task in automatic computer-aided diagnosis systems for WCE, we first apply to a mucosa-lumen segmentation pipeline that was based on active contours paradigm [15]. Figure 4 shows an example of improvement obtained by applying the enhancement model prior to running the lumen segmentation. Figure 4a shows the input (top row), and its segmentation result Fig. 4a (bottom row) blue curve indicating that the segmentation failed in obtaining the lumen but converged on the illumination saturated area instead. However, after applying the FLM based model, we obtain Fig. 4b (top row) and applying the segmentation now yields Fig. 4b (bottom row) where the white curve indicating that segmentation is near-perfect in capturing the lumen area.
Next, we show that using enhancement step before applying a shape from shading for 3D reconstruction of WCE frames [17] improves the visualizations. Figure 5 shows the improvement in 3D visualizations on a WCE frame that contains polyp (a precursor to colorectal cancer) by applying the enhancement model. Figure 5a shows the input image (top row), and its 3D shape from shading reconstruction result indicating that the illumination affects the visualization near the polyp and the surrounding mucosal folds. After applying the FLM based model, we obtain Fig. 5b (top row) and applying the 3D shape from shading now in Fig. 5b (bottom row) obtained better reconstruction. Other possible applications include extracting vessel structures for better polyps recognition [6, 18] and benchmarking other segmentation models [19, 20].
4 CONCLUSIONS
In this work, we considered an enhancement model based on a human visual system for wireless capsule endoscopy (WCE) videos with illumination problems. The feature-linking model (FLM) inspired approach is well-suited to enhance inhomogeneous illumination based WCE and does not create artifacts associated with global histogram based techniques as well as other tailed WCE enhancement models based on spectral optimal contrast-tone mapping, and inverse diffusion. Our experimental results and benchmarking showed that the HVS consistent model avoids the saturation artifacts and obtains better enhancement results with better structure preservation. We are currently trying to reduce the computational time of the HVS model to perform real-time enhancement of WCE videos along with testing the model on flexible imaging color enhancement (FICE) imagery [6] and for improving the colorectal polyp detection [7].
REFERENCES
G. Iddan, G. Meron, A. Glukhovsky, and P. Swain, “Wireless capsule endoscopy,” Nature 405 (6785), 417 (2000).
M. Keroack, “Video capsule endoscopy,” Curr. Opin. Gastroenterol. 20 (5), 474–481 (2004).
A. Moglia, A. Menciassi, and P. Dario, “Recent patents on wireless capsule endoscopy,” Recent Pat. Biomed. Eng. 1 (1), 24–33 (2008).
V. B. S. Prasath and H. Kawanaka, “Vascularization features for polyp localization in capsule endoscopy,” in Proc.2015IEEE Int. Conf. on Bioinformatics and Biomedicine (BIBM) (Washington, DC, USA, November 2015), pp. 1740–1742.
V. B. S. Prasath, “On fuzzification of color spaces for medical decision support in video capsule endoscopy,” in Proc. 26th Modern Artificial Intelligence and Cognitive Science Conference 2015 (Greensboro, NC, USA, 2015), pp. 1–5. http://CEUR-WS.org/Vol-1353/paper_20.pdf
V. B. S. Prasath, “Vascularization from flexible imaging color enhancement (FICE) for polyp localization,” J. Med. Life 10 (2), 147–149 (2017).
V. B. S. Prasath, “Polyp detection and segmentation from video capsule endoscopy: A review,” J. Imaging 3 (1), Article No. 1, 1–15 (2017).
V. B. S. Prasath, “Automatic image and video analysis for capsule endoscopy – an open frontier,” Int. J. Rob. Eng. 3 (1), 1562–1580 (2018). https://www.vibgyorpublishers.org/content/ijre/fulltext.php?aid=ijre-3-007
R. Hummel, “Image enhancement by histogram transformation,” Comput. Graphics Image Process. 6 (2), 184–195 (1977).
K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics Gems IV, Ed. by P. S. Heckbert (Academic Press, Boston, 1994), pp. 474–485.
V. B. S. Prasath and R. Delhibabu, “Automatic contrast enhancement for wireless capsule endoscopy videos with spectral optimal contrast-tone mapping,” in Computational Intelligence in Data Mining – Volume 1, Ed. by L. Jain, H. S. Behera, J. K. Mandal, and D. P. Mohapatra, Smart Innovation, Systems and Technologies (Springer, New Delhi, 1994), Vol. 31, pp. 243–250. https://doi.org/10.1007/978-81-322-2205-7_23
B. Li and M. Q.-H. Meng, “A novel enhancement method for capsule endoscopy images,” Int. J. Inf. Acquis. 4 (2), 117–126 (2007).
K. Zhan, J. Teng, J. Shi, Q. Li, and M. Wang, “Feature-linking model for image enhancement,” Neural Comput. 28 (6), 1072–1100 (2016).
V. B. S. Prasath, D. N. H. Thanh, L. T. Thanh, S. N. Quang, and S. Dvoenko, “Wireless capsule endoscopy image enhancement with a human visual system consistent model,” in Proc. 14th Int. Conf. on Pattern Recognition and Information Processing (PRIP ’2019) (Minsk, Belarus, 2019), pp. 271–274.
V. B. S. Prasath and R. Delhibabu, “Automatic image segmentation for video capsule endoscopy,” in Computational Intelligence in Medical Informatics, Ed. by N. B. Muppalaneni and V. K. Gunjan, SpringerBriefs in Applied Sciences and Technology (Springer, Singapore, 2015), pp. 73–80. https://doi.org/10.1007/978-981-287-260-9_7
M. Mackiewicz, J. Berens, and M. Fisher, “Wireless capsule endoscopy color video segmentation,” IEEE Trans. Med. Imaging 27 (12), 1769–1781 (2008).
V. B. S. Prasath and H. Kawanaka, “Near-light perspective shape from shading for 3D visualizations in endoscopy systems,” in Proc.2017IEEE Int. Conf. on Bioinformatics and Biomedicine (BIBM) (Kansas, MO, USA, November 2017), pp. 2293–2295.
D. N. H. Thanh, S. Dvoenko, V. B. S. Prasath, and N. H. Hai, “Blood vessels segmentation method for retinal fundus images based on adaptive principal curvatures and image derivative operators,” in Int. Workshop on Photogrammetric & Computer Vision Techniques for Video Surveillance, Biometrics and Biomedicine (Moscow, Russia, May 2019); Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLII-2/W12, 211–218 (2019).
D. N. H. Thanh, V. B. S. Prasath, L. M. Hieu, and N. N. Hien, “Melanoma skin cancer detection method based on adaptive principal curvature, colour normalization and features extraction with the ABCD rule,” J. Digit. Imaging (2019). Published online: 17 Dec. 2019 https://doi.org/10.1007/s10278-019-00316-x
D. N. H. Thanh, N. N. Hien, V. B. S. Prasath, L. T. Thanh, and N. H. Hai, “Automatic initial boundary generation methods based on edge detectors for the level set function of the chan-vese segmentation model and applications in biomedical image processing,” in Frontiers in Intelligent Computing: Theory and Applications, Ed. by S. Satapathy, V. Bhateja, B. Nguyen, et al., Advances in Intelligent Systems and Computing (Springer, Singapore, 2020), Vol. 1014, pp. 171–181. https://doi.org/10.1007/978-981-13-9920-6_18
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
V. B. Surya Prasath graduated PhD from Indian Institute of Technology Madras in 2010, majoring in Mathematics. Currently, he works as an assistant professor at the Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, and also affiliated with the Departments of Biomedical Informatics, Electrical Engineering and Computer Science, University of Cincinnati, USA. He was a postdoctoral fellow at the Department of Mathematics, University of Coimbra, Portugal (2010–2012). From 2012 to 2017 he was assistant professor at the Computational Imaging and VisAnalysis (CIVA) Lab of the University of Missouri, USA. He had summer fellowships/visits at Kitware Inc. NY, USA, The Fields Institute, Canada, and Institute for Pure and Applied Mathematics (IPAM), University of California Los Angeles, USA. He has over 180 works on international peer-reviewed journals and conference proceedings. Research interests are nonlinear PDEs, regularization methods, inverse and ill-posed problems, variational and PDE-based image processing, computer vision with applications in remote sensing, biometrics, and biomedical imaging domains.
Dang Ngoc Hoang Thanh graduated from Belarusian State University in 2008 and M.Sc. in 2009 majoring in Applied Mathematics; graduated PhD of Computer Science (2016) from Tula State University, Russia. Currently, he works as an assistant professor at department of Information Technology, School of Business Information Technology, University of Economics Ho Chi Minh city (UEH), Vietnam. He was a lecturer and researcher at department of Information Technology, Hue College of Industry, Vietnam. He is a member of scientific organization INSTICC (Portugal), ACM (USA), IAENG (Taiwan) and he is also a member of international conferences committee such as IEEE ICCE 2018 (Vietnam), IWBBIO (Spain), IEEE ICIEV (USA), IEEE ICEEE (Turkey), ICIEE (Japan), ICoCTA (Australia), ICMTEL (UK), etc. He has over 50 works on international peer-reviewed journals and international conference proceedings, 5 book chapters, one book, and one European patent. Research interests are image processing, computer vision, machine learning, data mining, computational mathematics, and optimization.
Le Thi Thanh graduated from Voronezh State University in 2009 and M.Sc. in 2011 majoring in Applied Mathematics; graduated PhD of Applied Mathematics (2018) from Tula State University, Russia. Currently, she works as an assistant professor at Ho Chi Minh city University of Transport, Vietnam. She has over 20 works on international peer-reviewed journals and conference proceedings. Research interests are mathematical models, nonlinear PDEs, image processing, computational mathematics, and dynamical systems.
Nguyen Quang San graduated from Belarusian State University in 2015 and M.Sc. in 2016 majoring in Theoretical Physics. He is now a PhD candidate of Theoretical Physics at Belarusian State University, Belarus. Currently, he works as a lecturer/researcher at Nha Trang University, Vietnam. He has over 15 works on international peer-reviewed journals and conference proceedings, and one book. Research interests are image processing, quantum computing, quantum mechanics, nonlinear PDEs, and dynamical systems.
Sergey Dvoenko received Dr. Sci. degree in 2002 at the Dorodnitsyn Computing Centre of the Russian Academy of Sciences (CC of RAS), in the field of Theoretical Foundations of Informatics (05.13.17 of RAS) with the thesis “Pattern Recognition Methods for Arrays of Inter-connected Data”. He received his PhD degree in 1992 after the postgraduate course at the Institute of Control Sciences of the Russian Academy of Sciences (ICS of RAS), in the field of Computer Sciences (05.13.16 of RAS) with the thesis “Learning Algorithms for Event Recognition in Experimental Waveforms”. Since 2003, he is a professor at the Institute of Applied Mathematics and Computer Sciences of the Tula State University (IAMCS of TSU) in the Tula city, Russia. Currently, he is a professor at the Tula State University. Some recent courses: Data Analysis (Machine Learning and Clustering), Decision Theory, Operational Research, Functional and Logical Programming, System Analysis, Algorithms and Calculus Theory. His scientific and research interests include the following fields: image processing, hidden Markov models and fields in applied problems, machine learning and pattern recognition, cluster analysis and data mining. He has over 60 scientific publications (papers in peer-reviewed journals and international conference proceedings), one European patent. He is a member of the Russian Association for Pattern Recognition and Image Analysis (RAPRIA).
To Whom correspondence should be addressed.
Rights and permissions
About this article
Cite this article
Prasath, V.B., Thanh, D.N., Thanh, L.T. et al. Human Visual System Consistent Model for Wireless Capsule Endoscopy Image Enhancement and Applications. Pattern Recognit. Image Anal. 30, 280–287 (2020). https://doi.org/10.1134/S1054661820030219
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661820030219