Abstract
Celiac disease is a genetically determined disorder of the small intestine, occurring due to an immune response to ingested gluten-containing food. The resulting damage to the small intestinal mucosa hampers nutrient absorption, and is characterized by diarrhea, abdominal pain, and a variety of extra-intestinal manifestations. Invasive and costly methods such as endoscopic biopsy are currently used to diagnose celiac disease. Detection of the disease by histopathologic analysis of biopsies can be challenging due to suboptimal sampling. Video capsule images were obtained from celiac patients and controls for comparison and classification. This study exploits the use of DAISY descriptors to project two-dimensional images onto one-dimensional vectors. Shannon entropy is then used to extract features, after which a particle swarm optimization algorithm coupled with normalization is employed to select the 30 best features for classification. Statistical measures of this paradigm were tabulated. The accuracy, positive predictive value, sensitivity and specificity obtained in distinguishing celiac versus control video capsule images were 89.82%, 89.17%, 94.35% and 83.20% respectively, using the 10-fold cross-validation technique. When employing manual methods rather than the automated means described in this study, technical limitations and inconclusive results may hamper diagnosis. Our findings suggest that the computer-aided detection system presented herein can render diagnostic information, and thus may provide clinicians with an important tool to validate a diagnosis of celiac disease.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Celiac disease is an autoinflammatory ailment, triggered in genetically susceptible individuals by the consumption of food containing dietary gluten. Gluten is a protein found in some grains such as wheat, barley, and rye [1]. In celiac disease, gluten reactivity causes destruction of the small intestinal villi, which are primarily responsible for absorption of nutrients into the blood. Damage to the villi can result in diarrhea, abdominal discomfort and malabsorption [2].
The disease is present in countries globally, with a prevalence rate of about 1%. Although more celiac cases are surfacing with the advent of better diagnostic tools, a large number of cases remain undiagnosed [3]. Detecting celiac disease is challenging due to its limited awareness amongst some clinicians and patients, and limited diagnostic capabilities in under-resourced countries. The prevalence of celiac disease, being associated with other diseases like diabetes mellitus, thyroid and liver diseases, is high [4]. Hence, accurate diagnosis and treatment is imperative. These reasons warrant the need for an accurate, automated diagnostic tool.
Celiac disease can be pathologically subdivided, depending upon the severity of mucosal atrophy, into stages 0, 1, 2, and 3, using the Marsh scoring system [4].
Stage 0: This is known as the pre-infiltrative stage, the mucosa appears normal.
Stage 1: This is the infiltrative stage, where cells on the surface of the intestinal lining (the epithelium) have more lymphocytes than normal.
Stage 2: This is the infiltrative-hyperplastic stage, also known as crypt hyperplasia. Besides having more intra-epithelial lymphocytes than normal, depressions known as “crypts” that exist between the intestinal villi become longer than that of the normal intestinal lining, giving rise to crypt hyperplasia.
Stage 3: At this stage, increased intraepithelial lymphocytes are accompanied by crypt hyperplasia and shrinkage of the villi (villus atrophy). Stage 3 can be subdivided into stages 3a, 3b and 3c, in which there is progressive flattening of the villi. [5]. Figure 1 illustrates a healthy intestinal mucosa and an affected (atrophic) mucosa. Intestinal biopsy during standard endoscopy with analysis of the villi in the biopsy specimen is the yardstick for celiac disease diagnosis [7]. Normally the duodenum, which is a component of the small intestine, is most affected by celiac disease. Hence, healthcare experts conduct standard endoscopic biopsy by examining the proximal duodenum. Besides being invasive however, this method can be costly [7]. On the contrary, video capsule endoscopy (VCE) is noninvasive and relatively inexpensive [8]. The capsule, which is swallowed, has a video camera embedded within. This transits along the gastrointestinal tract after it is swallowed by the subject together with some water. The images obtained by the capsule during its journey along the gastrointestinal tract are recorded by a device attached to a retaining belt. The recording device is removed from the belt after about eight hours, while the capsule is excreted with the feces. Once data is obtained, by reading the video capsule data, one can only comment on whether villous atrophy is present and if it looks severe or mild. A quantitative classification based on the VCE image is currently not possible. Visual examination of the images from VCE is subject to interobserver variability, and has limited spatiotemporal resolution. Hence, diagnosing celiac disease via VCE is less precise and arduous. This warrants the need for a more pragmatic, computer-aided detection(CAD) system to diagnose the disease using VCE. Herein, this study underscores the use of a computerized technique to assist in diagnosis of celiac disease, and summarizes the CAD systems employed in other studies. Table 1 discusses the varied CAD systems exploited by other researchers for the automated categorization of celiac disease. Vecsei et al. [9] employed Fourier domain features coupled with SVM, kNN and Bayes classifiers, achieving an accuracy of 94% while Koh et al. [6] examined nonlinear features yielding an accuracy of 86.47%. Ciaccio et al. [10, 11] employed statistical methods such as using mean and standard deviation to compare image brightness, achieving specificities of 96% and 93.5%, respectively. The same researchers delved into dominant frequency analysis [12] and also explored histogram level techniques [14], achieving a sensitivity of 84.6% for combined and single basis images respectively. Gadermayr et al. [13] and Zhou et al. [15] developed unique features extracted from deep learning techniques. In our study herein, daisy descriptors are used as a cutting edge and novel method for feature extraction, prior to assessment for the presence of celiac disease After concealing video image borders, feature extraction was performed via DAISY description. The features were then reduced using Shannon entropy, after which particle swarm optimisation (PSO) was executed for its proficiency in selection of features, for classification.
Data Used
The data used in this study are VCE images obtained from 16 healthy individuals and 21 celiac patients from the Columbia University Medical Center in New York. A PillCam™ SB [Medtronic, Minneapolis, MN] was swallowed by each subject, one that is adept in attaining images of the small intestine. Figure 2 presents images of damaged (celiac) and normal villi present in the duodenum. Two different PillCam devices were incorporated in the study, specifically the PillCams SB2 [16] and SB3 models [17]. PillCam SB2 was used for acquisition from 9 celiac patients and 8 control individuals, while PillCam SB3 obtained images from 12 celiac patients and 8 control individuals. The rate at which images are attained is two frames per second for the SB2, and an adaptive frame rate of 2-6 frames per second is used for the SB3 capsule. The image resolution is 576 × 576 pixels, and the videoclip function enables between 100 to 1000 images to be downloaded as a movie. The SB2 images were captured from portions of the small intestine consisting of the duodenal bulb, distal duodenum and jejunum, as well as the jejunum and ileum in close proximity. The SB3 images were captured from three levels, i.e., the duodenum, jejunum, and ileum.
Methodology
A combination of the SB2 and SB3 videos were utilized for this study. Table 2 lists the characteristics of the images obtained from the small intestine. The images acquired from the PillCam devices were masked before extraction of DAISY descriptor features [18]. The DAISY descriptor is a computation that enables the transformation of a local image domain into low dimensional continuous descriptors that are apposite for pairing and classification. The DAISY descriptor [19] can be applied in a concentrated manner or to areas extracted around points of interest. Employing this method reduces error rates, cost of computation, as well as the storage requirements of the descriptor. The DAISY descriptor typically comprises five facets: feature detector, summation, robust normalization, PCA dimension reduction, and quantize and compress. The first three aspects are particularly pertinent for all types of application. The feature detector calculates a vector of k feature detector responses for every pixel in the input data. In summation, the feature detector responses are integrated spatially by pooling them via the two-dimensional Gaussian weighting profiles. These pooling centers have an arrangement akin to flower petals, with the size expanding radially away from the data centroid. In normalization, the vector from the summation undergoes unit-normalization.
The features were vectorized, and Shannon entropy [20] was used to reduce the feature size to 300. PSO [21] was employed to select a subset of features for categorization. Figure 3 represents the methodology. The illustration of the method is presented in Fig. 4.
Pre-processing
The border of the acquired video images were first masked to eradicate any artefact surrounding the image. This is so that the results of the automated system will not be affected.
DAISY
Scale invariant features (SIFT) are commonly employed for identifying and obtaining characteristic local features from images [22]. In contrast, DAISY has been portrayed to be apt for intense computation with favourable performance. Hence DAISY descriptors are employed in this study. Figure 5 presents the shape of the DAISY descriptor. Existing in histograms of positioned gradients, measurements are made around a significant point with varying orientations and radii. Orientation gradients that are dissimilar are gauged, earmarking the positive directions. The descriptor has a design similar to that of a flower, with areas that intersect. In this study, SIFT was applied on the DAISY descriptors, causing a change in gradient. The descriptors of 2-dimensional images ( 331776 × 200) were projected to a 1-dimensional (1 × 66355200)vector. The optimal features were extracted from the 1-dimensional vector thereafter.
Data reduction
The acquired data was quantitatively analyzed using Entropy calculation. Entropy is a measure of disorder or variability in the pixel intensities or variation in the samples [23]. Hence, entropy is useful to detect pixel intensity variations and the value of entropy obtained is high if pixel variation is high. Shannon entropy, the most basic form of entropy, is governed by the equation [20]:
H = − ∑ pk log pk
where pk denotes the likelihood of a data point being located in bin k. Shannon entropy is applied to each non-overlapping window(size of 221184). This helps to reduce the feature set to 300, which is needed to mitigate overfitting, so that the classifier is able to yield accurate results [24,25,26].
Selection of features
The PSO technique was employed to automatically select a group of optimum features for categorization, following feature extraction and reduction. PSO is a feature selection technique first mooted by Kennedy and Eberhart, and is invigorated by swarm intelligence [27]. PSO emulates the behavior of a group of birds that scout around for food in a particular domain. The birds are oblivious to the location of food initially, but with each round, they become more familiar with the location, by trailing the bird which is closest to the food source. Following this phenomenon, each feature in PSO acts like a bird in the parameter space. Every feature has its own velocity, which determines its direction. The features shadow the best feature at present. This process is replicated in a circle, till the reiteration is completed. The two existing algorithms are known as the 'global best' and 'local best' methods. The former identifies the position of features based on the most befitting particle from the absolute swarm, while the latter does so based on its neighbors [28]. 30 discriminant features are selected using PSO for classification in this study. PSO and normalization are executed per fold for the 10-fold, to obtain the average accuracy level.
Classification
The central part of each videoclip, comprising a segment length 20% of the total length, was selected to train and test the classifiers. The summation of frames was considered and divided by three to obtain the midmost frames. Some classifiers together with the support vector machine (SVM) [29] were utilized in the study. The types of SVM classifiers used were those with radial basis (RBF) function and polynomial 1,2,3, kernel functions. The 10-fold cross-validation [30] technique was implemented to evaluate the performance of the developed CAD system. The statistical metrics: accuracy(Acc), sensitivity(Sen), specificity(Spe) and positive predictive value(Ppv) were used to gauge system performance.
Results
Thirty features were identified using the PSO method. Figure 6 presents the boxplot for the unique features used to differentiate the normal and abnormal classes. After conducting 10 iterations, the average performances (Acc, Sen, Spe and Ppv ) were obtained. Figure 7 highlights these values attained in each fold, with the SVM RBF classifier. The performance per fold varies marginally, with 30 features.
Table 3 depicts the performance results of the best classifier exploited in this study. The average of the 10 repetitions conducted is the hallmark of the performance of the proposed technique. Hence, it is apparent from Table 3 that using 30 features, SVM with a parameter(sigma) of 1.4 displays the highest accuracy of 89.82% as compared to the other classifiers, with the application of the 10-fold cross-validation method.
Discussion
The DAISY descriptor is an advanced technique, explored in our study for feature extraction and classification of celiac disease. It is shown to be expedient in the extraction of optimal features from VCE imageries of the small intestine. This technique has been previously used to assess compact depth maps from broad-baseline image pairs, and purports to show competency in gauging the depth and for occlusion identification [31]. It is explicit from Fig. 2 that the VCE image of the normal villi comprises more pixel variations, contributing to high entropy. On the contrary, the VCE image of celiac villi contains more pixels harboring the same intensity, thus reducing the variability. This in turn lowers the entropy values, vindicated by the high mean values of the Shannon entropy for the control images as compared to that of celiac, as seen in Fig. 6. Thus the features used are unique. The greater variation in local detail that is evident in control images may result from densely packed normal villi, which are minute in size, while in celiac patients with villous atrophy, clumping of villi occurs [16, 17], which would tend to smooth the local pixel variation. Analyzing Table 1, the method proposed herein appears to be best suited for automated assessment of celiac disease, as it has the most number of images used, compared to the other techniques, fetching results of relatively high accuracy.
Thus, extracting DAISY descriptors from the masked images and using the PSO technique and SVM (RBF) classifier is helpful in classifying celiac disease, and hence for the automated diagnosis of celiac disease.
Nonetheless, Zhou et al. [15] investigated the use of a CAD system involving deep learning. It comprises a deep convolution neural network (CNN) with twenty-two layers, dexterous in distinguishing normal versus celiac images with an accuracy of 100%. CNN is a subset of the deep learning mechanism, which circumvents the conventional feature extraction and selection processes [32, 33]. The model is trained to spontaneously sieve out the outstanding features from the input data. However, the implementation of this technique requires a sizeable data pool. Hence, in the near future, we are optimistic about gathering more images, and adopting deep learning techniques, to automatically classify celiac disease, thereby improving automation.
The proposed system has an advantage and disadvantage. The advantage is :
- I.
Compared to some of the other studies, this study utilized a greater subject pool (37) with more images being used, and better accuracy yielded. Hence the developed system may possibly be more robust to differing data characteristics.
The main disadvantage is:
- I.
The performance of the developed system should be further improved by using more images. Usage of larger data will enable advanced deep learning methods to be applied.
The deep learning model has provided a new impetus for the accurate diagnosis of diseases at present. Deep learning models include the Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) and autoencoders. These models that are spearheading artificial intelligence, constitute multiple processing layers, to allow substantial data to be learnt [34]. Deep learning is an innovative tool that is being used to process video, speech and audio, as well as images. The model is lauded for its ability to automatically extract the best features, and classify signals or images thereafter, evading traditional feature extraction and selection processes, paramount in machine learning techniques. Hence, deep learning paradigms coupled with machine learning techniques are bound to be important to the future of celiac disease diagnosis.
Conclusion
Celiac disease is a systemic disorder, characterized by hypersensitivity to gluten, that affects the small intestine as well as acting systemically. Celiac disease remains underdiagnosed in many countries, and this may lead to serious health conditions if left undetected. As discussed in Section 1, conventional methods of celiac disease diagnosis are often slow and manually determined, with prohibitive cost. Thus, it could be useful to employ a CAD system. Herein, a CAD system was described to diagnose celiac disease automatically using DAISY descriptors. An accuracy of 89.82% was attained with the 10-fold cross-validation technique. The proposed system is therefore likely to be useful to healthcare professionals in diagnosing early stages of celiac disease.
References
Parzanese, I., Qehajaj, D., Patrinicola, F., Aralica, M., Chiriva-Internati, M., Stifter, S. et al., Celiac disease: From pathophysiology to treatment. World Journal of Gastrointestinal Pathophysiology 8(2):27–38, 2017.
Silvester, J., Duerksen, D., Celiac disease. Canadian Medical Association Journal, vol. 185, no.1, 2013.
Rubio-Tapia, A., Lidvigsson, J. F., Branter, T. L., Murray, J. A., and Everhart, J. E., The prevalence of celiac disease in the United States. American Journal of Gastroenterology 107(10):1538–1544, 2012.
World Gastroenterology Organisation, Celiac disease. WGO Global Guidelines:1–29, 2016.
Oberhuber, G., Granditsch, G., and Vogelsang, H., The histopathology of coeliac disease: time for a standardized report scheme for pathologists. European Journal of Gastroenterology & Hepatology 11(10):1185–1194, 1999.
Koh, J. E. W., Hagiwara, Y., Oh, S. L., Tan, J. H., Ciaccio, E. J., Green, P. H., Lewis, S. K., and Acharya, U. R., Automated diagnosis of celiac disease using DWT and nonlinear features with video capsule endoscopy images. Future Generation Computer Systems 90:86–93, 2019.
Green, P. H. R., The role of endoscopy in the diagnosis of celiac disease. Gasteroenterology & Hepatology 10(8):522–524, 2014.
Ciaccio, E. J., Lewis, S. K., Bhagat, G., and Green, P. H., Coeliac disease and the video capsule: what have we learned till now. Annals of Translational Medicine 5(9):197–197, 2017.
Vecsei, A., Fuhrmann, T., Liedlgruber, M., Brunauer, L., Payer, H., and Uhl, A., Automated classification of duodenal imagery in celiac disease using evolved Fourier feature vectors. Computer Methods and Programs in Biomedicine 95:68–78, 2009.
Ciaccio, E. J., Tennyson, C. A., Bhagat, G., Lewis, S. K., and Green, P. H. R., Classification of videocapsule endoscopy image patterns: comparative analysis between patients with celiac disease and normal individuals. Biomedical Engineering 9(44):1–12, 2010.
Ciaccio, E. J., Tennyson, C. A., Lewis, S. K., Krishnareddy, S., Bhagat, G., and Green, P. H., Distinguishing patients with celiac disease by quantitative analysis of video- capsule endoscopy images. Computer Methods and Programs in Biomedicine 100(1):39–48, 2010.
Ciaccio, E. J., Tennyson, C. A., Bhagat, G., Lewis, S. K., and Green, P. H. R., Robustspectral analysis of videocapsule images acquired from celiac disease patients. Biomedical Engineering 10(78):1–14, 2011.
Gadermayr, M., Wimmer, G., Uhl, A., Kogler, H., Vecsei, A., and Merhof, D., Fully-automated CNN-based Computer Aided Celiac Disease Diagnosis. Image Analysis and Processing 6978:467–478, 2011.
Ciaccio, E. J., Tennyson, C. A., Bhagat, G., Lewis, S. K., and Green, P. H. R., Use of basis images for detection and classification of celiac disease. Biomedical Materials Engineering 24:1913–1923, 2014.
Zhou, T., Han, G., Li, B. N., Lin, Z., Ciaccio, E. J., Green, P. H., and Qin, J., Quantitative analysis of patients with celiac disease by video capsule endoscopy: a deep learning method. Computers in Biology and Medicine 85:1–6, 2017.
Ciaccio, E. J., Bhagat, G., Lewis, S. K., and Green, P. H., Extraction and processing of videocapsule data to detect and measure the presence of villous atrophy in celiac disease patients. Computers in Biology and Medicine 78:97–106, 2016.
Ciaccio, E. J., Bhagat, G., Lewis, S. K., and Green, P. H., Recommendations to quantify villous atrophy in video capsule endoscopy images of celiac disease patients. World Journal of Gastrointestinal Endoscopy 8(18):653–662, 2016.
Winder, S., Hua, G., Brown, M., Picking the best DAISY image descriptors, Image(Rochester, N.Y.), 2009.
Tola, E., Lepetit, V., DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo, IEEE Intelligence on Pattern Analysis and Machine Intelligence, vol.32, no.5, 2010.
Shannon, C. E., A Mathematical Theory of Communication. The Bell System Technical Journal 27(3):379–423, 1948.
Couceiro, M., Ghamisi, P., Particle Swarm Optimization, Fractional Order Darwinian Particle Swarm Optimization: Applications and Evaluation of an Evolutionary Algorithm, pp.1-10, 2016.
Lowe, D., Distinctive image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, pp.1-28, 2004.
Acharya, U. R., Faust, O., Kadri, N. A., Siru, J. S., and Yu, W., Automated identification of normal and diabetes heart rate signals using nonlinear measures. Computers in Biology and Medicine 43(10):1523–1529, 2013.
Guyon, I., and Elisseeff, A., A, An Introduction to Variable and Feature Selection. Journal of Machine Learning 7(8):1157–1182, 2003.
Hua, W. T., and Dougherty, E., Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition 42:409–424, 2009.
Kohavi, R., and John, G., Wrappers for Feature Subset Selection. Artificial Intelligence 97:1–2, 1997.
Kennedy, J., Eberhart, R., Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural Networks, Perth, WA, Australia, 1995.
Girdhar, A., Swarm Intelligence and Flocking Behaviour, International Journal of Computer Applications, pp. 975-8887, 2015.
Cortes, C., and Vapnik, V., Support-Vector Networks. Machine Learning 20:273–297, 1995.
Duda, R. O., Hart, P. E., Stork, D. G., Pattern classification, second edition, John Wiley and Sons, New York, 2001.
Tola, E., Lepetit, V., Fua, P., A fast local descriptor for dense matching, Conference on Computer Vision and Pattern Recognition, Alaska, USA, 2008.
Lee, J. G., Jun, S., Cho, Y. W., Lee, H., Kim, G. B., Seo, J. B., and Kim, N., Deep learning in medical imaging: General overview. Korean Journal of Radiology 18(4):570–584, 2017.
Faust, O., Hagiwara, Y., Tan, J. H., Oh, S. L., and Acharya, U. R., Deeplearningforhealth- care applications based on physiological signals: A review. Computer Methods and Programs in Biomedicine 161:1–13, 2018.
Lecun, Y., Bengio, Y., and Hinton, G., Deep Learning. Nature 521(7553):436–444, 2015.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Image & Signal Processing
Rights and permissions
About this article
Cite this article
Vicnesh, J., Wei, J.K.E., Ciaccio, E.J. et al. Automated diagnosis of celiac disease by video capsule endoscopy using DAISY Descriptors. J Med Syst 43, 157 (2019). https://doi.org/10.1007/s10916-019-1285-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-019-1285-6