Keywords

1 Introduction

Diabetic retinopathy (DR) is one of the serious eye diseases. The Union Heath Ministry’s first National Diabetes and Diabetic Retinopathy Survey (2015–19) revealed that the prevalence of diabetic retinopathy in India is 16.9%, while that of sight-threatening DR is 3.6% [1]. Detection and treatment of DR at an initial stage may avoid harmful consequences in the future. Diabetic retinopathy is classified into four stages, Stage 1—mild DR, Stage 2—moderate DR, Stage 3—severe DR, and Stage 4—proliferative DR (PDR) [2]. The first three stages are collectively termed as non-proliferative DR (NPDR). It is essential to treat the disease early to avoid severe complications. Hence, we developed a machine learning model that can classify the image as normal, suffering from NPDR, or suffering from PDR.

Fundus photography is a procedure of obtaining images of the inner eye through the pupil. A fundus camera is an exclusive low-power microscope connected to a camera. It is used to diagnose internal eye structures such as the optic disc, the retina, and lens. Figure 1 represents fundus photograph of normal fundus eye images. Figure 2 represents diabetic retinopathy fundus eye image indicating microaneurysms, exudates, and hemorrhages.

Fig. 1
A photograph of a normal fundus eye. The vasculature is normal in course. There are no lesions, scars, or pigmentary changes in the macula or periphery.

Normal fundus eye image

Fig. 2
A photograph of a diabetic retinopathy fundus eye. Arrows mark leaky blood vessels, dot-like hemorrhages, and small bright patches of exudates.

Diabetic retinopathy fundus eye image indicating microaneurysms, exudates, and hemorrhages

Mild NPDR: This is the primary stage of DR. It is also termed as background retinopathy. The minute blood vessels in the retina start developing small bulges at this stage. These bulges are also called microaneurysms. They might cause the blood vessels to leak small amounts of blood into retina.

Moderate NPDR: This is the second stage of the disease. At this stage, the retinal blood vessels start swelling. It might affect their blood-carrying capacity. Physical changes in the retina and hard exudates can be observed.

Severe NPDR: At this stage, the blockages in the blood vessels increase leading to a reduced blood supply to the retina. The insufficiency of blood activates a signal to the retina to generate new blood vessels. Reaching this stage of the disease indicates a high chance of vision loss. Medical treatment could stop further vision loss. But if some of the vision is already lost, it is impossible to get it back.

Proliferative Diabetic Retinopathy: At this stage, fresh blood vessels start developing in the retina. Since the newly developed blood vessels are fragile and thin, they start bleeding. PDR shall result in vitreous hemorrhage or retinal detachment.

2 Related Work

In a research paper written by Amol et al. [3], the use of multilayer perception neural network (MLPNN) to detect diabetic retinopathy in retinal images was put forth. Swati et al. [4] performed a comparative analysis of KNN and SVM classifier and obtained an accuracy of 85.60% for SVM classifier. Yashal Shakti Kanungo [5] got the best results for DR classification using Inceptionv3 transfer learning model. Researchers such as Kranthi et al. [6] and Vidya et al. [7] have used preprocessed images to train machine learning models using SVM, KNN, and artificial neural network (ANN). Mohamed Chetoui et al. [8] obtained accuracy of 0.904 using support vector machine with a radial basis function kernel. Kaur et al. [9] generated a neural network model and compared its performance with the existing support vector machine classification (SVM) model. The neural network worked better than SVM. Sonali et al. [10] have firstly segmented the optic disc and retinal nerves and then extracted the features using gray-level co-occurrence matrix (GLCM) method. Robiul Islam [11] developed a deep learning model with transfer learning from VGG16 model followed by a novel color version preprocessing technique. Revathy et al. [12] performed image preprocessing using image processing techniques like color space conversion and zero padding. These were followed by median filtering and adaptive histogram equalization. This process was followed by image segmentation and feature extraction. Classification was done using a classifier which was a combination of KNN, random forest, and SVM. The accuracy of this model resulted to be 82%. Out of these three models, the best result was obtained with the SVM model. The accuracy of this model was 87.5%. Satwik et al. [13] used transfer learning methods to detect diabetic retinopathy. Pre-trained models, namely SEResNeXt32x4d and EfficientNetB3 were used and accuracy obtained was 85.13% and 91.42%, respectively. Ayala et al. [14] implemented a transfer learning model using DenseNet. For this purpose, they used two publically available datasets, APTOS and Messidor. The accuracy obtained for these datasets was 81% and 64%, respectively. Rajkumar et al. [15] also used transfer learning technique, and the pre-trained model used was ResNet-50. The accuracy of the model resulted to be 89.4%.

From the literature survey, it was observed that in the earlier years, the research work in this field was limited only to traditional machine learning algorithms. However, in the recent past, techniques such as neural network and transfer learning have been implemented. We also noticed that some of the researchers have trained the machine learning models without preprocessing the images and some have limited themselves to only one or two transfer learning approaches. Considering these limitations of the existing work, we proceeded with developing a methodology that employed preprocessing of images followed by application of three transfer learning techniques and their comparative study as explained in detail in the proposed work.

3 Proposed Work

3.1 Dataset

The dataset plays a vital role in training a machine learning model. The images we used for training the model originally belonged to the diabetic retinopathy detection dataset provided by EyePACS, a free platform for retinopathy screening, and percent of the images in our dataset belonged to the Aptos 2019 Blindness Detection dataset. These datasets were available on the official Kaggle website.

The former comprised 35,126 images with 708 proliferative, 873 severe, 5,292 moderate, 2443 mild, and 25,810 normal eye images.

The latter comprised 3,662 fundus eye images, where 295 images belonged to the proliferative stage, 193 images to severe, 999 to moderate, 370 images to mild, and 1,805 images to normal eye images.

We observed that the datasets are highly imbalanced, and hence, training the machine learning model using these datasets would result in biased results. In an attempt to avoid this, we decided to create a balanced dataset by choosing equal number of clear images per category from the available dataset. The dataset created for binary classification consisted of 2000 images associated to two separate classes, normal eye images and diabetic retinopathy eye images. The dataset generated for multiclass classification consisted of 3000 images belonging to three distinct classes, normal, NPDR, and PDR. Thus, we successfully generated a balanced dataset for training our model.

3.2 Preprocessing

The original dataset consisted of color fundus eye images. As mentioned in the literature review, many researchers had performed image preprocessing before using the images for training the model. Hence, we used various image processing techniques and finalized a sequence which suited best for the images in our dataset. Machine learning algorithm implementation using Python is shown in Fig. 3.

Fig. 3
A flowchart includes the original image, green channel extraction, gray scale conversion, and contrast limited adaptive histogram equalization.

Machine learning algorithm implementation using Python

An RGB color fundus photograph comprises three channels: red, green, and blue. The original image was split into these components before processing it further. For the sake of visualizing the actual colors, we retained all the channels and set the values of the other channels to zero. Thus, we could obtain each channel separately. In order to convert these images into grayscale, we have used a function ‘split’ which is available in the openCV Python library. We have used green channel grayscale images for further processing as they displayed the best background contrast between the optic disc and retinal tissue. We observed that these images showed a better contrast than the red and blue channel images. Also, they were slightly better than the grayscale images obtained directly from the RGB image. Hence, the green component grayscale images were used for further processing. To enhance the contrast, we used a technique named as contrast limited adaptive histogram equalization (CLAHE). Image processing of eye images is shown in Fig. 4.

Fig. 4
Four photographs of a diabetic retinopathy eye with leaky blood vessels, dot-like hemorrhages, and small bright patches of exudates. a. Original photograph, and b to d, with green channel extraction, green component grayscale image, and contrast enhancement using CLAHE.

Image preprocessing

3.3 Machine Learning Techniques

To develop a binary classification model, we used different machine learning techniques and followed subsequent steps for implementing those using Python as shown in Fig. 5.

Fig. 5
A flowchart includes image pre-processing, fitting the machine learning algorithm to the training set, test result prediction, and result analysis using a confusion matrix and classification report.

Machine learning algorithm implementation using Python

The procedure of image preprocessing has been mentioned above. Later on, we have fitted various machine learning algorithms to our training set.

For evaluating our results, we have observed the test accuracy and plotted a confusion matrix for each algorithm. With the help of the confusion matrix, we could easily understand the number of images that were classified correctly. It was also helpful in understanding biased results.

The labels of confusion matrix were divided into following categories.

True Negative: Model predicted the value as No DR, and the actual value is also No DR.

True Positive: Model predicted DR, and the real value is also DR. False Negative: Model predicted No DR, but the actual value was DR.

False Positive: The model predicted DR, but the real value was No DR [16].

Decision tree classification is a predictive modeling tool used in various sectors. We can create decision trees with an algorithmic approach which can split the dataset in numerous ways based on distinct states. We obtained a testing accuracy of 56.0% for this model. Naive Bayes classification is a technique derived from Bayes’ theorem where it is considered that all the predictors are self-reliant. Here, the assumption is that the existence of a feature in a category does not rely on any other feature in the same category. The testing accuracy obtained by us for Naïve Bayes’ classification model was 60.5%.

The K-nearest neighbor algorithm considers the correlation between present classes and the new data. It puts the new data in the class that is almost alike to the present classes. KNN saves the entire data and classifies a new data point on the basis of resemblance. It means whenever a new data arrives, they may be effortlessly classified into a perfectly matched class with the help of KNN classifier. We obtained a testing accuracy of 63.75% for this classification model. Firstly, random forest classifier generates decision trees on data samples. In addition, it receives the prediction from all of them. At the end, it chooses an appropriate answer through voting. Also, it minimizes the overfitting by averaging the result. The testing accuracy obtained by us for this model was 65.25%. An SVM model represents different categories in a hyperplane in multi-dimensional space. The algorithm creates the hyperplane iteratively to reduce the error. This classifier aims to split the dataset in different categories to detect a maximum marginal hyperplane. We obtained a testing accuracy of 67.5% for SVM binary classification model. Confusion matrix of different classifiers is shown in Figs. 6, 7, 8, 9, and 10.

Fig. 6
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 110 and 85. The entries in row 2 are 91 and 114. The color scale ranges from 85 to 115.

Confusion matrix for decision tree classifier

Fig. 7
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 135 and 92. The entries in row 2 are 66 and 107. The color scale ranges from 60 to 140.

Confusion matrix for Naive Bayes classifier

Fig. 8
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 164 and 108. The entries in row 2 are 37 and 91. The color scale ranges from 25 to 175.

Confusion matrix for KNN classifier

Fig. 9
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 132 and 69. The entries in row 2 are 70 and 129. The color scale ranges from 60 to 140.

Confusion matrix for random forest classifier

Fig. 10
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 134 and 67. The entries in row 2 are 63 and 136. The color scale ranges from 60 to 140.

Confusion matrix for SVM classifier

Result analysis of the machine learning models is shown in Fig. 11. Since the highest testing accuracy obtained using machine learning techniques was only 67.5%, we approached the transfer learning techniques.

Fig. 11
A vertical bar graph plots accuracy in percentage. The accuracies of the decision tree classifier, Naive Bayes classifier, K-Nearest Neighbors, Random Forest classifier, and support vector machine are 56.0, 60.5, 63.75, 65.25, and 67.5 percentages.

Result analysis of the machine learning models

3.4 Transfer Learning Techniques

A process where a neural network model is firstly trained on some problem alike to the problem that is to be resolved is called as transfer learning [17]. It is majorly used for problems where the dataset consists of limited data for training a model from scratch. In order to implement the transfer learning technique, we used three pre-trained models namely ResNet-50, VGG16, and EfficientNetB0. We used these pre-trained models for binary as well as multiclass classification.

Following steps were followed for the implementation of transfer learning using Python. Firstly, we imported all the necessary Python libraries and loaded the pre-trained model. The arguments used here were ‘input_shape’, ‘weights’, and ‘include_top’. Here, we used the weights of the ImageNet database, and the input shape was 224 × 224 pixels. The parameter ‘include_top’ was set as false for removing the last layer from the model. This ensured that we could add our own input and output layers according to our custom data. Since the existing layers of models are already trained, we do not have to train them again. Hence, the parameter ‘trainable’ in the model layers is set as false, and these layers are frozen. If we skip this step, then the model will not be able to give good accuracy because this pre- trained model is already trained on many images. Hence, it is necessary to set the parameter ‘layer.trainable’ as false as it ensures that the model does not learn the weights once again. This saves space complexity and time. The next step is flattening, where the entire data is converted to a one-dimensional array. We added two dense layers with the activation function ‘ReLU’. We used the ‘softmax’ activation function for the output layer. Here, the number of nodes implies the number of classes. The number of classes was 2 for binary classification, and it was 3 for multiclass classification. Process flow of transfer learning implementation using Python is shown in Fig. 12.

Fig. 12
A flowchart includes importing the libraries, loading the pre-trained model with required arguments, removing the last layer of the model, freezing the existing layers, flattening, adding dense layers, compiling the model, data augmentation, fitting the model to our dataset, and result analysis.

Transfer learning implementation using Python

For compiling the model, ‘categorical_crossentropy’ was used as the loss function, and the optimizer used was ‘Adam’. Then we used ImageDataGenerator class for data augmentation. It is an approach to grow the distinctiveness of the training data by applying arbitrary transformations on the images. The next step is fitting the model to our dataset. Here, we trained the model using our training dataset which consisted of fundus eye images. The process took place for ten epochs.

Once the training is completed and the testing accuracy of the model is obtained, analyzing the obtained results is crucial. We did so with the help of a confusion matrix and classification report. As mentioned earlier, we have used the ResNet-50, EfficientNetB0, and VGG16 algorithms. They have been pre-trained on the ImageNet image database.

ResNet-50 is a convolutional neural network. It is fifty layers deep. ResNet is used to overcome the vanishing gradient problem which was a major disadvantage of convolutional neural networks.

EfficientNet comes up with a family of models (B0–B7) that represents a fine combination of efficiency with accuracy.

VGG16 is a convolutional neural network which has 13 convolutional layers and 3 dense layers. The VGG architecture looks close to the actual convolutional networks. The major thought for VGG was to construct the network deeper by stacking additional convolutional layers. This was implemented by limiting the dimensions of the convolutional windows to 3 × 3 pixels.

4 Result Analysis

4.1 Binary Classification

Out of the 200 DR images in the testing dataset, the ResNet-50 algorithm could correctly detect 174 DR images. On the other hand, out of 200 normal eye images, 177 images were detected correctly. The accuracy obtained here was 87.75%.

Similarly, the accuracy obtained for EfficientNetB0 was 90.75%, and for VGG16, it was 90%. Confusion matrix of binary classification model is shown in Figs. 13, 14, and 15.

Fig. 13
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 174 and 26. The entries in row 2 are 23 and 177. The color scale ranges from 40 to 360.

Confusion matrix for ResNet-50 binary classification model

Fig. 14
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are D R and no D R. The entries in row 1 are 176 and 24. The entries in row 2 are 13 and 187. The color scale ranges from 20 to 180.

Confusion matrix for EfficientNetB0 binary classification model

Fig. 15
A color-coded confusion matrix of true labels versus predicted labels with 2 columns and 2 rows. The column and row headings are normal and D R. The entries in row 1 are 183 and 17. The entries in row 2 are 23 and 177. The color scale ranges from 20 to 180.

Confusion matrix for VGG16 binary classification model

4.2 Multiclass Classification

The testing set in our dataset consisted of 600 images for multiclass classification with an equal number of NPDR, PDR, and normal fundus eye images.

Out of the 200 fundus eye images in each class, the ResNet-50 algorithm could correctly detect 180 normal eye images, 139 NPDR images, and 127 PDR images. The accuracy obtained for multiclass classification using ResNet-50 was 74.33%. Similarly, the accuracy obtained using EfficientNetB0 was 77.5%, and the accuracy obtained using VGG16 was 81.3%. Confusion matrix of multiclassification model is shown in Figs. 16, 17, and 18.

Fig. 16
A color-coded confusion matrix of true labels versus predicted labels with 3 columns and 3 rows. The column and row headings are normal, N P D R, and P D R. The entries are, row 1, 180, 20, and 0, row 2, 60, 139, and 1, row 3, 69, 4, and 127. The color scale ranges from 0 to 180.

Confusion matrix for ResNet-50 multiclass classification model

Fig. 17
A color-coded confusion matrix of true labels versus predicted labels with 3 columns and 3 rows. The column and row headings are normal, N P D R, and P D R. The entries are, row 1, 147, 57, and 2, row 2, 31, 164, and 5, row 3, 7, 33, and 160. The color scale ranges from 0 to 160.

Confusion matrix for EfficientnetB0 multiclass classification model

Fig. 18
A color-coded confusion matrix of true labels versus predicted labels with 3 columns and 3 rows. The column and row headings are normal, N P D R, and P D R. The entries are, row 1, 147, 51, and 2, row 2, 2, 198, and 0, row 3, 7, 49, and 144. The color scale ranges from 0 to 200.

Confusion matrix for VGG16 multiclass classification model

Testing accuracy of the transfer learning models for binary and multiclass classification is shown in Fig. 19. As discussed in the ‘Related work’ section, most of the researchers have directly used color images for training the model. But we observed that training the model without preprocessing the images did not give satisfactory results. Hence, we firstly preprocessed the images and then trained the model. Also we have utilized various machine learning and transfer learning techniques and created a comparative study of these techniques.

Fig. 19
A double bar graph plots accuracy for binary and multiclass classification in % for Resnet 50, V G G 16, and efficient net B 0. The accuracies for binary classification are 87.75, 90, and 90.75, and for multiclass classification are 74.33, 81.5, and 77.5.

Testing accuracy of the transfer learning models for binary and multiclass classification

5 Conclusion

In this study of classification of fundus images for diabetic retinopathy, various machine learning models like decision trees, random forest, Naive Bayes, K-nearest neighbors, and support vector machine were employed for binary classification and among all these models. The support vector machine classifier was found to give the highest testing accuracy. The accuracy of this model was 67.5%, which was poor as compared to the expected accuracy. So, to improve the model performance, transfer learning techniques such as ResNet-50, VGG16, and EfficientNetB0 were employed for performing binary classification. VGG16 and EfficientNetB0 were found to give high testing accuracy of 90%, for binary classification. In case of multiclass classification, the testing accuracy was highest for VGG16 at 81.5%. Thus, for both the types of classifications, VGG16 was found to be giving better results than the other transfer learning techniques. It was also observed that the testing accuracy could be further improved if the number of images available for training the model was increased and more fundus images were available for proliferative diabetic retinopathy condition.

This research work can be employed to develop an application which can classify the image as normal, suffering from DR or PDR. This could be achieved by providing real-time fundus images to the application. As a future scope, this work could be extended for applying machine learning algorithm for classifying real-time images for dynamic diagnosis using a web application.