Convolutional Neural Network Architectures for the Automated Diagnosis of Celiac Disease

Wimmer, G.; Hegenbart, S.; Vecsei, A.; Uhl, A.

doi:10.1007/978-3-319-54057-3_10

G. Wimmer²⁰,
S. Hegenbart²⁰,
A. Vecsei²¹ &
…
A. Uhl²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10170))

Included in the following conference series:

International Workshop on Computer-Assisted and Robotic Endoscopy

1218 Accesses
8 Citations

Abstract

In this work, convolutional neural networks (CNNs) are applied for the computer assisted diagnosis of celiac disease based on endoscopic images of the duodenum. To evaluate which network configurations are best suited for the classification of celiac disease, several different CNN networks were trained using different numbers of layers and filters and different filter dimensions. The results of the CNNs are compared with the results of popular general purpose image representations such as Improved Fisher Vectors and LBP-based methods as well as a feature representations especially designed for the classification of celiac disease. We will show that the deeper CNN architectures outperform these comparison approaches and that combining CNNs with linear support vector machines furtherly improves the classification rates for about 3–7% leading to distinctly better results (up to 97%) than those of the comparison methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Automated detection of celiac disease using Machine Learning Algorithms

Article Open access 08 March 2022

A Deep Learning-Based Pipeline for Celiac Disease Diagnosis Using Histopathological Images

Convolutional Neural Network with Batch Normalization for Classification of Endoscopic Gastrointestinal Diseases

Keywords

1 Introduction

Convolutional neural networks (CNN) are gaining more and more interest in computer vision. The increase in computational power based on GPUs has led to more sophisticated and deeper architectures which have proven in various challenges to be the state-of-the art in image classification. Generally thousands or millions of images are used and required as data corpus to achieve well generalizing deep architectures. In endoscopic image classification however the available amount of data usable as training corpus is often much more limited to a few hundreds or thousands of images or even less. Another difference to datasets such as used in ILSVRC or Places is however that image classification problems in medical scenarios are often reduced to a few categories instead of thousands in the former. Consequently, deep architectures designed for recognizing images from thousands of categories could be too complex for the classification of celiac disease.

CNNs are already widely used for the computer aided diagnosis in medical scenarios [10], however not so in the computer aided diagnosis using endoscopic imagery. We found only three publications in this area, 2 about the classification of digestive organs using wireless capsule endoscopy images [19, 21] and one about lesion detection [20] in endoscopic images. Since the classification of celiac disease can be considered as a texture classification problem and CNNs are state-of-the-art in texture recognition, CNNs are promising image representations for the automated classification of celiac disease.

In this experimental study we apply CNNs for the classification of celiac disease using a experimental setup especially adapted for endoscopic imagery and we try to answer the following open questions:

1.
Are deep-architectures suited to classify celiac disease or are simpler and more shallow architectures more suited in such a scenario because of the low amount of training data and categories
2.
What are the best network configurations like e.g. the number or filters and their dimensions
3.
How well do CNNs perform compared to other state-of-the-art approaches
4.
Are linear support vector machines (SVMs) able to furtherly improve the results when applied on the activations of the nets.

2 Celiac Disease

Celiac disease is a complex autoimmune disorder in genetically predisposed individuals of all age groups after introduction of gluten containing food. The gastrointestinal manifestations invariably comprise an inflammatory reaction within the mucosa of the small intestine caused by a dysregulated immune response triggered by ingested gluten protein. During the course of the disease, hyperplasia of the enteric crypts occurs and the mucosa eventually looses its absorptive villi thus leading to a diminished ability to absorb nutrients. [5] state that more than 2 million people in the United States, this is about one in 133, have the disease. People with untreated celiac disease are at risk for developing various complications like osteoporosis, infertility and other autoimmune diseases including type 1 diabetes, autoimmune thyroid disease and autoimmune liver disease. So an early diagnosis is of highest importance.

Endoscopy with biopsy is currently considered the gold standard for the diagnosis of celiac disease. Computer-assisted systems for the diagnosis of CD have potential to improve the whole diagnostic work-up, by saving costs, time and manpower and at the same time increase the safety of the procedure. A motivation for such a system is furthermore given as the inter-observer variability is reported to be high [1, 12]. A survey on computer aided decision support for the diagnosis of celiac disease can be found in [9].

Besides standard upper endoscopy, several new endoscopic approaches for diagnosing CD have been evaluated and found their way into clinical practice [2]. The most notable techniques include the modified immersion technique (MIT [7]) under traditional white-light illumination (denoted as \(\text {WL}_{\text {MIT}}\)), as well as MIT under narrow band imaging [3, 17] (denoted as \(\text {NBI}_{\text {MIT}}\)). These specialized endoscopic techniques were specifically designed for improving the visual confirmation of CD during endoscopy.

In this work we differentiate between healthy mucosa and mucosa affected by celiac disease using images gathered by \(\text {NBI}_{\text {MIT}}\) as well as \(\text {WL}_{\text {MIT}}\) endoscopy. Examples of the two classes for both endoscopy types are shown in Fig. 1. In [6] it was shown that using \(\text {NBI}_{\text {MIT}}\) or \(\text {WL}_{\text {MIT}}\) as imaging modality has a significant impact on the underlying feature distribution of general purpose image representations. However, it was also shown that systems trained on images from both modalities generalize well without requiring additional domain adaption techniques and that combining both modalities improves the accuracies in case of an insufficient amount of data for training (as is probably the case for CNNs).

3 CNN Architectures

All our networks share the same basic principal architecture. They consist of a variable number of convolutional blocks (CONV) using rectified linear units (RELU) for non-linearity, local response normalization (LRN) [11] and max-pooling (POOL), two fully connected blocks (FC) using RELU and dropout and a last fully connected block acting as soft-max classifier: [CONV, RELU, LRN, POOL]\(^n\) \(\rightarrow \) [FC, RELU, DROPOUT]\(^2\) \(\rightarrow \) [FC, SOFTMAXLOSS]. We only vary the number of convolutional blocks, the filter dimensions and the number of filters. To provide a systematic analysis, we trained networks with \(n=1,2, 3\) and 4 convolutional blocks using different filter dimensions and different numbers of filters in each layer. We follow the general approach of employing large filter dimensions in lower layers and subsequently smaller filters in higher layers.

A high number of filters per layer allows the training process to adapt to highly abstract features. However, it is unclear in the context of celiac disease and endoscopic imagery in general if such abstract features are visible or even useful for prediction. Consequently, we analyze the impact of the number of filters per layer by training multiple nets of the same architecture with varying numbers of filters. We generally rely on the concept of increasing the number of filters from the lower to the higher layers by a factor of two per layer.

All our models are initialized and trained using the same set of techniques. The coefficients of the nets are randomly initialized based on He et al. [8] and the bias terms are initialized as 0. All architectures rely on using max-pooling with a windows size of three and stride two. Stochastic gradient descent (SGD) with weight decay (\(\lambda = 0.0005\)) and momentum (\(\mu = 0.9\)) is used for the training of the models. Regularization is achieved using drop-out (\(p = 0.5\)) during training. Training is performed on batches of 128 images each, which are for each iteration randomly chosen from the training data and subsequently augmented (see Sect. 4.1). The learning rate is initialized at 0.01 and four times divided by three whenever the training-loss stopped improving with the current learning rate. For this, each 250th iteration we compute the average loss of the previous 250 iterations. If the currently computed average loss is greater than 0.99 times the previously computed average loss and if the current learning rate is in use for at least 1000 iterations, then the learning rate is divided by three. Due to the differing number of parameters among the architectures, optimization is continued until the training-loss shows no improvement over 2500 iterations but at least until the learning rate has been reduced the fourth time. The model of the iteration achieving the lowest training-loss is then used for validation.

Our learning rate configurations and break off condition are especially adapted on our celiac disease image data to achieve high results without needing too much time for training (the nets were trained for \({\approx }10000\) iterations in average). Since we train 36 different nets (4 (different numbers of convolutional blocks) \(\times \) 3 (different filter sizes) \(\times \) 3 (different filter numbers)) on 10 different training splits (see Sect. 4.1), we had to choose such configurations that enable a limited time of training per network.

3.1 Very-Shallow Networks

We start off with a very uncommon variation of CNNs using only one single convolutional block. By analyzing different architectures growing from very shallow to deep we hope to gain some insight on the problem. Although this sort of architecture is quite uncommon and might not fit into the general CNN schemes, the lower abstraction of features in endoscopic images and the small number of categories (two) make it necessary to start with such shallow architectures. The Very-Shallow networks (see Table 1) are trained with \(N=10,48\) and 96 filters to analyze the impact of the number of filters on the results.

Table 1. Architecture of the Very-Shallow networks. The first row in a convolutional block (CONV) specifies the receptive field size of the convolutional filters and their number (N). The second row indicates the stride (st.) and padding (pad). Furtherly we indicate the dimensionality of the fully connected (FC) blocks.

Full size table

3.2 Shallow Networks

The next generation of architectures is based on the Very-Shallow networks but the number of convolutional blocks is increased to two. Like in the previous and also in the following deeper network architectures, the network is trained with different numbers of filters (\(N=10, 48\) and 96 filters in the first convolutional layer). The network architecture of the Shallow nets is shown in Table 2.

Table 2. Architecture of the Shallow networks.

Full size table

3.3 Deep Networks

The third generation of nets use 3 convolutional blocks and can therefore be considered as our first deep architecture. The network architecture of the Deep nets is shown in Table 3.

Table 3. Architecture of the Deep networks, where \(m_{b}^{a}=\max (a,b)\) and denotes the number of convolutional filters.

Full size table

3.4 Very-Deep Networks

In our last generation of nets we use 4 convolutional blocks (see Table 4). Although the term Very-Deep is not quite true considering the number of layers of other very-deep architectures, we use the term to easily distinguish between our four basic architectures.

Table 4. Architecture of the Very-Deep networks, where \(m_{b}^{a}=\max (a,b)\).

Full size table

4 Experimental Setup and Results

4.1 Experimental Setup

Our celiac disease image database consists of 1661 RGB image patches of size \(128\times 128\) pixels that are gathered by means of flexible endoscopes using \(\text {NBI}_{\text {MIT}}\) as well as \(\text {WL}_{\text {MIT}}\). The database consists of 1045 images gathered by \(\text {WL}_{\text {MIT}}\) endoscopy (587 healthy images and 458 affected by celiac disease) and 616 images gathered by \(\text {NBI}_{\text {MIT}}\) endoscopy (399 healthy images and 217 affected by celiac disease). So in total 986 image patches show healthy mucosa and the remaining 675 image patches show mucosa affected by celiac disease. The images were captured from 353 patients.

Due to the relatively small amount of data, we perform cross-validation to achieve a stable estimation of the generalization error. We generated 10 (fixed) splits for training and validation (80% training and 20% validation) and took care that images of a single patient are never in training and evaluation sets. All nets are trained using the training portion of our data corpus. The final validation was performed on the left-out part.

The image data is normalized by subtracting the mean image of the training portion. We then linearly scale each image within \([-1,1]\). Due to the small amount of available data we use data augmentation to increase the number of images for training. Augmentation is applied to the batches of images extracted for training. The augmentation is based on cropping one sub-image (\(112\times 112\) pixels) from each training image with randomly chosen position. Subsequently, the sub-image is randomly rotated (0\(^{\circ }\), 90\(^{\circ }\), 180\(^{\circ }\) or 270\(^{\circ }\)) and randomly either horizontally reflected or not. Validation is performed using a majority voting of five crops from the validation image using the upper left, upper right, lower left, lower right and center part.

In our experiments, we compute the overall classification rate (OCR) for each split and report the mean OCR over all 10 splits with the respective standard deviation.

The CNNs are implemented using the MatConvNet framework [18]. Additionally to the CNN soft-max-classifier we employ linear SVMs as provided by the LIBLINEAR library [4]. For this, the training and test samples are fed through the CNNs and the output of the second fully connected layer is extracted as feature for further SVM classification. The size of the extracted feature vector per image is \(1024\times 1\) in case of the very-deep architectures and \(512\times 1\) for the other architectures. Augmentation is also applied for the extraction of features from the nets for further for SVM classification. The augmentation is basically the same as for the training of the nets with only one difference. The patches of the training images are extracted from the fixed center position instead from random positions (8 patches per image with 4 different rotations, either horizontally flipped or not). The SVM cost factor (C) is found using cross validation on the training data.

Additionally, we combine CNNs, principle component analysis (PCA) and SVMs by applying PCA to the CNN features resulting in 100 principal components which are furtherly classified using SVMs.

We compare the CNNs against three popular general purpose image representations and one feature representations especially developed for the classification of celiac disease. As general purpose image representations we use multi-resolution local binary patterns (LBP [13]) and multi-resolution local ternary patterns (LTP [15]), both with 3 scales, 8 neighbors and uniform patterns. As third general purpose method we employ the improved fisher vectors (IFV [14]) computed from SIFT descriptors on a dense \(6\times 6\) pixel grid. The fourth method, further denoted as fractal analysis based method (FRAC [16]), was especially developed for the classification of celiac disease and is based on pre-filtering images using the rotation invariant MR8 filterbank, followed by computing the local fractal dimension (see [16]) of the resulting filter responses and applying the bag-of-visual words (BoW) approach to them. We rely on in-house MATLAB implementations for LBP, LTP and FRAC and use the implementation of IFV as provided by VLFeat. The comparison methods are classified using SVMs in an analogous manner as for the CNN features.

4.2 Results

The results of our experiments are presented in Table 5. The standard deviations are given in brackets. The best result of each network architecture and classification strategy is given in bold face numbers.

Table 5. Results of the CNNs and comparison methods

Full size table

As we can seen in Table 5, the highest CNN results are achieved using the Deep and Very-Deep network architectures combined with large or medium sized filters. Using only 10 filters in the first convolutional layer is insufficient for the classification of celiac disease, but using 48 filters achieves similar results as using 96. The two deeper CNN architectures with large or medium sized filters achieve classification rates of \({\approx }90\%\) and hence outperform the comparison methods, whose highest classification rate is 89.5% (LTP). Combining CNNs and SVMs furtherly improves the results for about 3–7%. Additionally applying PCA to the CNN features has only a minimal effect to the results. The best results (\({\approx }97\%\)) are achieved using SVM classification (with or without PCA) applied to the CNN features of the Very-Deep net with 96 filters of size \(11 \times 11 \times 3\) in the first convolutional layer.

5 Conclusion

In this work we showed that deep CNN architectures are very suited for the classification of celiac disease based on endoscopic image data. These CNN networks outperform other state-of-the-art image representation approaches. Simpler and more shallow-architectures cannot compete with the deeper architectures. Using large or medium filter dimensions generally leads to higher results than using smaller filter dimensions.

Applying SVMs on the activations of the nets furtherly improves the results of the CNNs for about 3–7% up to a maximum of \({\approx }97\%\). The highest result was achieved using SVM classification, the deepest architecture (Very-Deep), the largest filter dimension and the highest number of filters (96 filters of size \(11\times 11\times 3\) in the first convolutional layer).

References

Biagi, F., Rondonotti, E., Campanella, J., Villa, F., Bianchi, P.I., Klersy, C., Franchis, R.D., Corazza, G.R.: Video capsule endoscopy and histology for small-bowel mucosa evaluation: a comparison performed by blinded observers. Clin. Gastroenterol. Hepatol. 4(8), 998–1003 (2006)
Article Google Scholar
Chand, N., Mihas, A.A.: Celiac disease: current concepts in diagnosis and treatment. J. Clin. Gastroenterol. 40(1), 3–14 (2006)
Article Google Scholar
Emura, F., Saito, Y.: Narrow-band imaging optical chromocolonoscopy: advantages and limitations. World J. Gastroenterol. 14(31), 4867–4872 (2008)
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Fasano, A., Berti, I., Gerarduzzi, T., Not, T., Colletti, R.B., Drago, S., Elitsur, Y., Green, P.H.R., Guandalini, S., Hill, I.D., Pietzak, M., Ventura, A., Thorpe, M., Kryszak, D., Fornaroli, F., Wasserman, S.S., Murray, J.A., Horvath, K.: Prevalence of celiac disease in at-risk and not-at-risk groups in the united states: a large multicenter study. Arch. Intern. Med. 163, 286–292 (2003)
Article Google Scholar
Gadermayr, M., Hegenbart, S., Kwitt, R., Uhl, A.: Narrow band imaging versus white-light: what is best for computer-assisted diagnosis of celiac disease? In: Proceedings of the 13th IEEE International Symposium on Biomedical Imaging (ISBI 2016), pp. 355–359, April 2016
Google Scholar
Gasbarrini, A., Ojetti, V., Cuoco, L., Cammarota, G., Migneco, A., Armuzzi, A., Pola, P., Gasbarrini, G.: Lack of endoscopic visualization of intestinal villi with the immersion technique in overt atrophic celiac disease. Gastrointest. Endosc. 57, 348–351 (2003)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: CoRR (2015)
Google Scholar
Hegenbart, S., Uhl, A., Vécsei, A.: Survey on computer aided decision support for diagnosis of celiac disease. Comput. Biol. Med. 65, 348–358 (2015)
Article Google Scholar
Jiang, J., Trundle, P., Ren, J.: Medical image analysis with artificial neural networks. Comput. Med. Imaging Graph. 34(8), 617–631 (2010)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Niveloni, S., Fiorini, A., Dezi, R., Pedreira, S., Smecuol, E., Vazquez, H., Cabanne, A., Boerr, L.A., Valero, J., Kogan, Z., Maurino, E., Bai, J.C.: Usefulness of videoduodenoscopy and vital dye staining as indicators of mucosal atrophy of celiac disease: assessment of interobserver agreement. Gastrointest. Endosc. 47(3), 223–229 (1998)
Article Google Scholar
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29(1), 51–59 (1996)
Article Google Scholar
Perronnin, F., Liu, Y., Sanchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceedings of CVPR 2010, pp. 3384–3391 (2010)
Google Scholar
Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. In: Zhou, S.K., Zhao, W., Tang, X., Gong, S. (eds.) AMFG 2007. LNCS, vol. 4778, pp. 168–182. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75690-3_13
Chapter Google Scholar
Uhl, A., Vécsei, A., Wimmer, G.: Fractal analysis for the viewpoint invariant classification of celiac disease. In: Proceedings of the 7th International Symposium on Image and Signal Processing (ISPA 2011), Dubrovnik, Croatia, pp. 727–732, September 2011
Google Scholar
Valitutti, F., Oliva, S., Iorfida, D., Aloi, M., Gatti, S., Trovato, C.M., Montuori, M., Tiberti, A., Cucchiara, S., Di Nardo, G.: Narrow band imaging combined with water immersion technique in the diagnosis of celiac disease. Dig. Liver Dis. 46(12), 1099–1102 (2014)
Article Google Scholar
Vedaldi, A., Lenc, K.: Matconvnet - convolutional neural networks for matlab. In: Proceeding of the ACM International Conference on Multimedia, pp. 689–692 (2015)
Google Scholar
Yu, J., Chen, J., Xiang, Z.Q., Zou, Y.X.: A hybrid convolutional neural networks with extreme learning machine for wce image classification. In: IEEE International Conference on Robotics and Biomimetics (ROBIO) 2015, pp. 1822–1827, December 2015
Google Scholar
Zhu, R., Zhang, R., Xue, D.: Lesion detection of endoscopy images based on convolutional neural network features. In: 8th International Congress on Image and Signal Processing (CISP), pp. 372–376, October 2015
Google Scholar
Zou, Y., Li, L., Wang, Y., Yu, J., Li, Y., Deng, W.J.: Classifying digestive organs in wireless capsule endoscopy images based on deep convolutional neural network. IEEE Int. Conf. Digit. Signal Process. (DSP) 2015, 1274–1278 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, University of Salzburg, Salzburg, Austria
G. Wimmer, S. Hegenbart & A. Uhl
Department Pediatrics, St. Anna Children’s Hospital, Vienna, Austria
A. Vecsei

Authors

G. Wimmer
View author publications
You can also search for this author in PubMed Google Scholar
S. Hegenbart
View author publications
You can also search for this author in PubMed Google Scholar
A. Vecsei
View author publications
You can also search for this author in PubMed Google Scholar
A. Uhl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Wimmer .

Editor information

Editors and Affiliations

Robarts Research Institute, London, Ontario, Canada
Terry Peters
Imperial College London, London, United Kingdom
Guang-Zhong Yang
Johns Hopkins University, Baltimore, Maryland, USA
Nassir Navab
Graduate School of Information Science, Nagoya University, Nagoya, Japan
Kensaku Mori
Department of Computer Science, Xiamen University, Xiamen, China
Xiongbiao Luo
KUKA Robotics, Augsburg, Bayern, Germany
Tobias Reichl
Robarts Research Institute, Western University, London, Canada
Jonathan McLeod

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wimmer, G., Hegenbart, S., Vecsei, A., Uhl, A. (2017). Convolutional Neural Network Architectures for the Automated Diagnosis of Celiac Disease. In: Peters, T., et al. Computer-Assisted and Robotic Endoscopy. CARE 2016. Lecture Notes in Computer Science(), vol 10170. Springer, Cham. https://doi.org/10.1007/978-3-319-54057-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-54057-3_10
Published: 22 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54056-6
Online ISBN: 978-3-319-54057-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Convolutional Neural Network Architectures for the Automated Diagnosis of Celiac Disease

Abstract

Similar content being viewed by others

Automated detection of celiac disease using Machine Learning Algorithms

A Deep Learning-Based Pipeline for Celiac Disease Diagnosis Using Histopathological Images

Convolutional Neural Network with Batch Normalization for Classification of Endoscopic Gastrointestinal Diseases

Keywords

1 Introduction

2 Celiac Disease

3 CNN Architectures

3.1 Very-Shallow Networks

3.2 Shallow Networks

3.3 Deep Networks

3.4 Very-Deep Networks

4 Experimental Setup and Results

4.1 Experimental Setup

4.2 Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Convolutional Neural Network Architectures for the Automated Diagnosis of Celiac Disease

Abstract

Similar content being viewed by others

Automated detection of celiac disease using Machine Learning Algorithms

A Deep Learning-Based Pipeline for Celiac Disease Diagnosis Using Histopathological Images

Convolutional Neural Network with Batch Normalization for Classification of Endoscopic Gastrointestinal Diseases

Keywords

1 Introduction

2 Celiac Disease

3 CNN Architectures

3.1 Very-Shallow Networks

3.2 Shallow Networks

3.3 Deep Networks

3.4 Very-Deep Networks

4 Experimental Setup and Results

4.1 Experimental Setup

4.2 Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation