Keywords

1 Introduction

Glaucoma is a complicated disease that damages optical nerve and causes irreversible blindness due to lack of proper diagnosis. This “sneak thief of sight” can affect anyone at any age. Even newborn babies can be affected. According to doctors, it grows in such a manner that patient does not experience any complication. By 2040, the number of glaucoma affected people likely to increase 111.8 million [2]. Glaucoma causes vision loss and blindness due to damage of optic nerve. Our optic nerve provides visual sensation to our brain from both eyes. We know, eyes continuously make aqueous humor and it fills the front part of eyes. If the drainage channels of aqueous humor are blocked, the IOP (Intraocular pressure) increases and optic nerve may become permanently damaged. There are five major types of glaucoma: Open angle glaucoma, Angle Closure Glaucoma, Congenital Glaucoma, Normal Tension Glaucoma and Secondary Glaucoma. For glaucoma diagnosis, ophthalmologist need to perform a comprehensive examination of eye, including Tonometry, Gonioscopy, Ophthalmoscopy, Nerve fiber analysis and Perimetry. These diagnosis procedures are expensive and also time consuming. So, to cope with this great ocular problem, a CNN architecture will be approached for glaucoma detection.

Glaucoma is one of the most dangerous causes of blindness. Sometimes patient have no symptoms and the vision may remain 6/6 till late stage. Actually, there are no specific symptoms during early stage. Some patient may never have increased Intra-ocular pressure. Besides some patient with high Intra-ocular pressure may not diagnose glaucoma, which is called ocular hyper tension. Early detection of glaucoma associated with immediate treatment that has been shown to prevent major problems.

Vision loss caused by glaucoma is not reversible with treatment, even surgery cannot help to recover it. In USA, after cardiac attack and cancer, blindness is the third formidable health complication. Only the better awareness could prevent permanent visual disability. In this era of artificial intelligence, automated health care system has the capability to identify diseases within a short period. In order to serve the medical community, deep learning algorithm will help to detect glaucoma. Work flow for diagnosis glaucoma will be faster than the regular one. So affected people will get proper treatment during first stage of glaucoma. However, glaucoma is preventable if it is diagnosed early and effective treatment is provided. That is what motivated us to conduct this thesis.

2 Problem Statement

Eyes are important sensory organs that provides sight. Some parts of eyes are: cornea, sclera, choroid, iris, pupil, lens, ciliary muscle, suspensory ligament, conjunctiva, anterior chamber (between cornea and iris), posterior chamber (between iris and lens), macula, vitreous humour, aqueous humour, hyaloid canal, retina, optic nerve, optic disc, blood vessels, fovea.

Glaucoma is referred as an eye disease that damage optic nerve and cause vision loss. Optica nerve carries information that we can see through eye to brain. Optic nerve head is called optic disc, it connects retina and optic nerve. The center of optic disc is called optic cup. When the optic cup enlarges and occupy more area of optical disc then the cup to disc ratio (CDR) increases. When the cup to disc ratio is greater than normal range, the patient’s eye is suspected as glaucomatous eye. Doctors need to perform many tests such as: Ophthalmic Test, Tonometry, Ophthalmoscopy, Perimetry, Pachymetry, Gonioscopy. After getting results from different test, doctor have to decide whether it is a glaucomatous eye or not. Careful evolution is important to detect glaucoma and there is a high chance of not getting accurate result due to lack of skill. This work proposes an efficient method for detecting glaucoma which will lessen time and costs [25] at the same time in order to facilitate ophthalmologists and optometrists.

  • \(\Rightarrow \) An automated system for glaucoma diagnosis.

  • \(\Rightarrow \) Applied augmentation technique for getting varied images.

  • \(\Rightarrow \) Used large amount of image data.

  • \(\Rightarrow \) Collected images from different available data sets.

  • \(\Rightarrow \) Compared with other popular CNN methods.

3 Literature Review

O. J. Afolabi et al. [5] introduced a redesigned U-Net model named U-Net Lite and XGB (extreme gradient boost) algorithm. From RIM ONE V3 and DRISTI-GS the extreme gradient boost algorithm achieved an accuracy of 88.6 and an AUC-ROC value of 93.6. Chaudhary P. K, and Pachori R. B. [10] has proposed the order zero and order one 2D-FBSE-EWT (two dimensional Fourier-Bessel series expansion based empirical wavelet transform) methods at quarter, half and full frequency scales which are used for disintegrating fundus image into consequential sub-images. Then from obtained sub-images, proposed method 1: a conventional ML based method and proposed method 2: an ensemble ResNet50 based method, are studied for detection. In this paper [23], S Pathan et al. proposed image processing methods are used to define an automated framework for Computer Aided Diagnosis (CAD) of glaucoma. Here pre-processing algorithm includes the identification and exclusion of blood vessels for effective OD and OC segmentation. The use of a decision tree classifier and a circle finder approach helped in robust OD segmentation. The proposed OC segmentation method aims to enhance the OC region by creating a new channel due to reduced variability between the pixels of OD and OC. The obtained threshold value for the segmentation algorithms is not limited to a single dataset. Feature extraction requires domain knowledge of glaucoma, such as the CDR and NRR area, as well as statistical color and texture features. The classifiers used for classification are SVM, ANN, and AdaBoost classifier ensemble with dynamic selection methods for identifying fundus images whether it is affected or not. A ten-fold cross validation is also performed for the ensemble of AdaBoost classifiers with dynamic selection methods, SVM, and ANN. In this paper, Mufti Mahmud et al. [20] stated that overnight advances in hardware based technologies during the previous many years have opened up additional opportunities for life researchers to assemble multimodal data in different applications, for example, omics, clinical imaging, bioimaging and (cerebrum/body)- machine interfaces which have created novel freedoms for advancement of devoted information escalated AI strategies. Specifically, recent research in reinforcement learning, deep learning, and their combination promise to advance the future of AI. Mufti Mahmud et al.discussed about different CNN architectures [19] and also stated that diverse biological data from various application domains is multi-modal, multidimensional, and complex in nature. The author included that “Currently, a massive amount of such data is publicly available”. The availability of these data came with a significant challenge in analyzing and recognizing patterns in them, which necessitated the use of sophisticated machine learning [21] tools. In paper [28] Saxena et al. proposed an architecture that differentiates between the patterns for glaucoma and non-glaucoma using of the CNN. The total work was evaluated within six layers. Authors used ROI extraction, dropout, data-augmentation for preprocessing of data. For the experiments, authors used SCES and ORIGA data set. They got .822 and .882 values for the ORIGA and SCES data set respectively. In paper [22] Palakvangsa-Na-Ayudhya et al. proposed an automated system using Mask Regional - Convolutional Neural Network [32]. It is an advancement of Faster R-CNN by joining a branch for predicting segmented masks on each ROI along with the existing branch for classify an object and bounding box regression. This automatic screening system calculates CDR. They used four datasets: Drishti GS1 and RIMONE (r1, r2, r3). They used four datasets individually and also in a combined manner. They got values for 50 epochs of Individual dataset: (RIM One r3 0.66, Drishti-GS1 0.73, RIM One r1 0.74, RIM One r2 0.78) and for 100 epochs (RIM One r3 0.68, Drishti-GS1 0.75, RIM One r1 0.75, RIM One r2 0.85). With the computational time of 8 h, 4 h and 2 h they obtained 0.68(400 epoch), 0.71(200 epoch), 0.64(100 epoch) respectively for combined dataset. Then they set up the epochs to 200 with the 10-fold cross validation and achieved accuracy of 0.78. Pinos-Velez et al. [24] diagnosed glaucoma by the using of ISNT rule. In a normal eye CDR ratio is below 0.3. ISNT rule was used for measure the width of retinal rim. Juneja, M et al. proposed [17] an approach based on deep learning [18] which is disc cup segmentation glaucoma network (DC-GNet). This segmentation network extracts the CDR, DDLS and ISNT feature from fundus images. The input images to the CNN model were cropped to 512–512 pixels and resized to 256–256 pixels. This network has 28 layers: pooling layers, drop out layers, 2D convolutional layers and up sampling layers. An accuracy of 0.937 (Dristi dataset) and 0.996 (RIM One dataset) were achieved from segmentation of disc. And from cup segmentation technique they got an accuracy of 0.900 (Dristi dataset) and 0.978 (RIM One dataset). Debasree Sarkar and Soumen Das [27] proposed a method which used media filter for noise reduction. Thresholding is applied to extract to OD (optic disc) and OC (optic cup). By using RIM-ONE data set they got an accuracy of 97.58. A. Serener and S. Serte proposed a system [29] detects early and advanced glaucoma automatically. They applied ResNet50 and GoogLeNet algorithm and got an accuracy of 79 and 83 respectively.

4 Method

According to Fig. 1, After collecting fundus images, divided into two set: training and testing images. We have trained our model after applying augmentation techniques. During the training time we took 600 images for validating our model from training images. Then evaluated it using test images.

Fig. 1.
figure 1

System architecture.

4.1 Data Collection

For this work, we have collected images from ACRIMA dataset [1], LAG dataset [3] and Glaucoma Data set and combined them.

Table 1. Dataset details.

4.2 Data Augmentation

Data augmentation is a process which helps to increase the diversity of data for training a model without gathering new data. It acts as a regularizer. It enhance the performance of the model [14]. It helps to avoid over-fitting problem. Neural network treats augmented images as distinct images. The deep learning neural network library of Keras provides the facility of data augmentation. We augmented our data using ImageDataGenerator class. We applied rotation, width shift, height shift, zooming, sheer, channel shift and horizontal shift. After applying the augmentation [30] technique more image data were generated. We used data augmentation technique [12] only for our training dataset. And for evaluating our model we used original images rather than augmented images. Shifting of an image means moving all pixels in one direction. Two types of shift can be done width shift and height shift. Shifting helps us to change the position of an object. Flipping of an image means reversing the columns or rows of pixels in case of horizontal or vertical flip respectively. It is similar to rotating an object left to right or up to down. Rotation is done by rotating an image clockwise or anticlockwise within 0 to 360\(^{\circ }\). In zooming technique, images are either zooms in or zooms out. Value less than 1 used to zoom in and greater than 1 zooms outs an image and value equals to 1 does not have any effect. Shearing of an image means shifting a specific part of the image like a parallelogram. In shear one axis remains fixed. In channel shift, RGB channel values are shifted randomly.

Fig. 2.
figure 2

Augmented Images

Figure 2 shows an original fundus image and images after applying flipping, shearing, rotating, shifting, zooming and channel shifting on the original image.

4.3 Inception V3

Inception V3 [31] is a CNN pre trained model [9]. It is computationally more efficient and focuses on using less computational power. It is a multi-level feature extractor. Inception V3 model is a collection of symmetric and asymmetric building blocks. It includes convolution, max pooling, average pooling, dropouts, concats and fully connected layers. By using Softmax, loss is computed. A schematic diagram is given below: [16]. We have collected images according to Table 1, then applied augmentation technique [7] according to Sect. 4.2 for getting varied fundus images. We trained Inception V3 model using augmented training dataset. Our model has total 312 layers: 1 input layer, 94 Cov2d layer, 94 batch normalization layer, 94 activation layer, 11 mixed layer, 8 average pooling layer, 4 max pooling layers, 2 concatenate layer, 3 global average pooling layer, 1 dense layer. We evaluated our model using test fundus images. Finally, our model will able to detect fundus images whether it is normal or glaucomatous.

Fig. 3.
figure 3

Schematic diagram of Inception V3

5 Results

5.1 Evaluation Criteria

There are different performance [13] matrices for evaluating a model. In this work we utilize Confusion matrix, Accuracy, Precision, Recall, Specificity, F1 score to evaluate the performance. Confusion matrix gives a clear idea of values like True Positives, False Positives, True Negatives and False Negative.

  • True Negative (TN): When the actual value was negative and predicted negative.

  • True Positive (TP): When the actual value was positive and predicted positive.

  • False Negative (FN): When the actual value was positive but predicted negative.

  • False Positive (FP): When the actual value was negative but predicted positive.

$$\begin{aligned} Accuracy = (TP+TN)/(TP+TN+FP+FN) \end{aligned}$$
(1)
$$\begin{aligned} Specificity = TN/(TN+FP) \end{aligned}$$
(2)
$$\begin{aligned} Recall= TP/(TP+FN) \end{aligned}$$
(3)
$$\begin{aligned} Precision= TP/(TP+FP) \end{aligned}$$
(4)
$$\begin{aligned} F1= 2TP/ (2TP+FP+FN) \end{aligned}$$
(5)

5.2 Comparison of Different Types of CNN Model

Fig. 4.
figure 4

AUC Curve (a) Inception V3 (b) Densenet121 (c) Resnet50

Area under curve given in Fig. 4 measures the capability of a classifier to separate between classes. Higher the value of AUC better the classifier and its performance. AUC range value lies between 0 to 1. It is an important evaluation criterion. We can notice that Fig. 3(a) has the higher AUC value for Inception v3 which is 0.9387.

Table 2. Performance of various models.
Fig. 5.
figure 5

Comparison between Inception-V3, ResNet50 DenseNet121.

From Table 2 and Fig. 5, we can notice that due to uneven class distribution, precision value (normal class) and recall value (glaucoma class) of DenseNet121 has highest value than other two models. We know F1 score is called the weighted average of precision and recall. So, for coping with this uneven class we should consider F1 score rather than precision and recall value. According to Table 2, Fig. 4, Fig. 5 we can say Inception V3 model has highest test accuracy, AUC value and F1 score. So Inception V3 is the best classifier for this problem.

6 Conclusion

Glaucoma is complication that is associated with the damage of optic nerve and causes permanent blindness. This approach to medical image processing technology [26] will enlarge the application of detecting glaucoma. This thesis work will lead to the computer-generated result to improve the clinician’s judgment standard of glaucoma detection. This model can detect more normal fundus images than glaucomatous image due to higher number of normal fundus images in dataset. We got less accuracy than other previous work as we took a huge amount of data than other. Besides our dataset is a collection of different publicly available dataset.

Though optic disc is the brightest part of the fundus, we did not use multi-level segmentation [4] technique. Also, our target classes were not equally distributed with positive and negative samples.

In future we will train this model using good quality images and for coping with data imbalance issue we will introduce resampling technique to our model. We have a plan to build different integrated models to improve the detection of glaucoma. For the integration, we will use different algorithms and techniques along with Inception V3 like CNN, RNN, LSTM [11], deep learning [6], belief rule base [8, 15], etc. Besides we plan to extend our study of convolutional neural network to multiple ocular diseases detection like cataract, retinal detachment, diabetic retinopathy.