Introduction

Diabetic retinopathy (DR) is an illness that causes irreversible vision loss in some people with diabetes mellitus. The increasing glucose level in blood enhances its viscosity, which leads to fluid leakage into the surrounding tissues in the retina. This ultimately results in vision loss. As the disease progresses, lesions (MicroAneurysms (MA), Hemorrhage (HM), Exudates, and neovascularization) appear in the retina. These are considered the central components of DR degradation [1]. Non-Proliferative DR (NPDR) and Proliferative DR (PDR) are the main stages of disease severity. Based on lesions, NPDR is further classified as mild, moderate, and severe [2]. The critical phase with lesions and neovascularization is termed PDR. The changes in the retina at different stages are depicted in Fig. 1. The earliest noticeable sign of DR is the presence of small, red dots called MA in the small blood vessels of the retina [3]. Retinal hemorrhage is another complication of DR that occurs due to hypertension and occlusion of retinal veins. Sometimes the small HMs may resemble MAs. The exudates are yellow flicks composed of lipids and proteins residues that filter out from the damaged capillaries. DR, in its severe phase, is hard to cure. Therefore, it is important to detect DR early on to plan and execute efficient management strategies. Thus, several techniques are being developed to detect and determine the severity of DR lesions. The challenging step is to accurately extract the essential features from fundus images for precise classification of DR. Artificial Neural Network (ANN) architectures have provided elegant solutions to image classification problems, including disease detection using biomedical images. Among the ANN techniques, Convolutional Neural Networks (CNN’s) are futuristic deep learning architectures that have led to many breakthroughs in automated object detection and classification. These deep learning architectures can extract even minute features that are useful for accurate classification of images. In this work, a Multipath Convolutional Neural Network (M-CNN) is designed to extract DR features from retinal fundus images that can be used in Machine Learning (ML) classifiers for DR classification.

Many research studies are underway to detect and grade DR through neural networking approaches. Some of the groundbreaking DR-classification methods based on different feature extraction techniques are reviewed in this section. A CNN method was proposed by [4] for diagnosing and classifying DR from fundus images based on severity; this method resulted in 75% average accuracy on 5,000 validation images. Another study [5] combined CNN extracted features with support vector machine (SVM) classifiers for lung disease detection (using lung sounds and spectrograms). Yet another study combined CNN with Biometric Pattern Recognition (BPR) [6]. In these two methods, CNN was used as a feature extractor. In [7], the automatic evaluation of the DR severity using ANN was demonstrated. Images of four lesions were extracted and fed into the multilayer feed-forward neural network for grading the disease stages. In [8], a two-stage CNN was used to diagnose abnormal lesions of DR from the fundus image.

Features such as area, perimeter, and count of the DR lesions were extracted in [9], and an ANN was implemented for DR classification into mild, moderate, and severe. The authors of [10] used the findings of a validated red lesion detection method to perform automated classification of DR. Assessment was performed using data from a public database by the leave-one-out validation method and to show the viability of automatic DR screening. In [11], the retinal fundus image was first divided into four sub-images. Haar wavelet transformation was applied to extract features, and better feature selection was achieved using Principal Component Analysis (PCA). Then for DR or No DR classification, a backpropagation neural network and one rule classifier were used. DR screening using a four-layer CNN was proposed in [12]. The results were evaluated by performing cross-validation. The essential five-class grading of DR was implemented in [13] by extracting the Hard exudates area, blood vessels area, texture, entropies, and bifurcation points. For classification, a combination of texture and morphological changes was considered. Probabilistic Neural Network (PNN) classifier parameter (\(\sigma\)) was tuned using a genetic algorithm and particle swarm optimization. A bi-channel CNN for DR detection was proposed in [14]. The green channel component was selected using the unsharp masking method from the original input image, and the red channel component was converted into a greyscale image. Further, these two images were given to CNN for detection purposes.

For DR detection, the authors of [15] demonstrated how to tackle blurred retinal images. They used a regularized filter deblurring algorithm to boost the effectiveness of the technique. The blood vessel, MAs, and exudates areas were then computed to give the ANN classifier input. A DR screening using inception-v3 was explained in [16]. The evaluation was carried out using the Kaggle database. In [17], modified Alexnet architecture was used for DR grading from the retinal fundus images. CNN architecture with sufficient pooling layers was suggested to classify the fundus images according to the disease severity. Local features from the retinal fundus images were extracted in [18] using the Local Binary Patterns (LBP) technique. The detection was then made through ML classifiers, particularly Random Forest (RF), Support Vector Machines (SVM), and ANN.

In [19], a SURF-BRISK combined local feature extraction method was implemented, and the most relevant 30 features were selected using the Minimum Redundancy-Maximum Relevance (MR-MR) method. Then the chosen features were fed into different classifiers for DR classification. Another feature extraction technique using a combination of the Haralick and Anisotropic Dual-Tree Complex Wavelet Transform (ADTCWT) was suggested in [20]. This feature extraction method was a time-consuming process as it necessitated the extraction of features using two complex methods. The DR grading method implemented in [21] used a small CNN architecture for feature extraction, followed by ML classifiers for DR classification using different databases. IDx-DR is the first FDA (Food and Drug Administration) approved autonomous AI system for DR screening. In [22], the validation of IDx-DR device is performed for the screening of Referable DR (RDR) and Vision-Threatening Retinopathy (VTDR). The studies in [23], used spanish population to validate the IDx-DR system for DR screening. According to their observation, the system had high specificity of 100% while sensitivity is of 82%. In recent years, multipath and multiscale neural networks have become common for various classification problems. A multipath-multiscale CNN for pulmonary nodule classification from Computed Tomography (CT) images was introduced in [24]. This method reportedly overcome the high variance of nodule characteristics in the CT images during classification. Thus, multipath architectures could be utilized for adequate feature extraction from images. In [25], a multipath ensemble CNN was designed, and the network’s evaluation was carried out on different databases. A 3D-multipath neural network for DR grading was reported in [26]. This work combines the features from Optical Coherence Tomography Angiography (OCTA) Scans, Demographic, and Clinical Bio-markers. The machine learning classifiers were used for DR grading, which resulted in an average accuracy of 96.8%.

Fig. 1
figure 1

Different stages of DR in fundus images

Fig. 2
figure 2

Proposed method

Fig. 3
figure 3

Proposed M-CNN architecture

Fig. 4
figure 4

Visualization of M-CNN Feature map from a mild NPDR image a Original Image (Mild NPDR category from Messidor database), b Feature Maps

Methodology

Efficient automated methods for accurate grading of DR from retinal fundus images are required to detect the disease. Conventional CNN has been used in DR detection and grading in recent years. This method uses convolutional kernels (filters), activation function (usually ReLU), pooling layers, and fully connected layers [27]. For the first time in 2012 by Alex Krizhevsky [28], CNN was proposed as a winning entry to the ILSVRC challenge [29]; CNN’s have revolutionized the domain of pattern recognition and data inference, especially in the field of computer vision. This work presents a novel M-CNN architecture for extracting features from the retinal fundus images for DR grading, and popular machine learning classifiers are used to grade the disease. The method called transfer learning via feature extraction [30] is adopted. The classifiers are trained with the extracted M-CNN features. Different classifiers (SVM, Random Forest and, J48) are evaluated by calculating the performance metrics from the corresponding confusion matrices for classifiers. After different stages of evaluation, the best classifier for DR grading with M-CNN features chosen. The proposed work flow is demonstrated in Fig. 2.

Architecture specifications

The proposed CNN architecture is shown in Fig. 3. The image size to the CNN input layer is \(196 \times 196\). In the proposed network, two feed-forward paths are designed. The first is the main path that resembles conventional CNN. The second is intended for multipath feature extraction. It begins after the first CL and concatenates both the feature maps before the fully connected layers. The activation function ReLU is used as it works with better gradient change than sigmoid and tanh functions [31]. The weighted sum of inputs and biases is computed by the Activation Function (AF), which is used to determine whether a neuron can be fired or not. The first convolutional layer uses a \(5 \times 5\) convolutional operation using eight kernels. Then the path is branched. The previous layer’s feature maps are given to the second path that performs a \(9 \times 9\) convolutional operation with 32 kernels and a max-pooling layer. The main trail leads to a max-pooling layer and, again, to a \(5\times 5\) convolution layer. The number of kernels in the network are chosen after the trial and error procedure. After different trials, the prescribed number of kernels in Fig. 3 provides better DR feature extraction. Some of the crucial trials in the kernel count and size selection are demonstrated in Table 1. The minimum error rate obtained is 0.97% and those kernel parameters are chosen to build the M-CNN architecture. The max-pooling operation downsamples the input using the maximum value from each cluster of neurons at the previous layer. The downsampling reduces the spatial resolution of the successive layers, helping to preserve the relevant local structures. Then, the main path and the secondary path are eventually concatenated, and the feature maps are given to the weighted transform layer (\(1 \times 1\) without bias) to integrate the features before giving into the Fully Connected Layer (FCL). The first and second FCLs are designed with 128 and 64 hidden neurons, respectively. It was reported in [32] that the softmax classifier degrades the prediction performance of the network. Before taking the features from the second fully connected layer, a dropout (technique to fire out units in a neural network) of 0.5 is used after first fully connected layer to avoid the overfitting during the training of ML classifier. Therefore, a CNN and an ML classifier can improve the entire classification system’s efficiency. Hence the 64 features from the second FCL are provided to different classifiers for classification.

Table 1 Evaluation of different Kernel parameters in the M-CNN layers for DR classification

Procedure of feature extraction using M-CNN

The extracted features are crucial factors that decide the efficacy of an automated system. A system with the best feature extraction capability can have accurate classification rates. In this work, the M-CNN architecture is designed for extracting the DR features from retinal fundus images for grading the disease stage. So that the issues with the database size can be resolved upto some extent. Generally, the features extracted using a traditional neural network may have losses in the global structures because of the too short or long straight forward path. This can be rectified using shortcut paths. The multipath extracted features help preserve global structures’ losses, thereby producing more relevant global and local structures from the M-CNN. After concatenating the feature maps from the two paths, the output competent feature vectors for classification is taken from the second FCL. The main issues facing in multipath feature extraction are (1) A chance to deceive CNN in analyzing the global structures while transferring the features from the current layer to the shortcut path. (2) If the image resolution is poor, the network will become susceptible to global noise interference [24]. These issues can be resolved by including sufficient convolutional layers in the shortcut path. The proposed M-CNN works best with a \(9 \times 9\) convolutional layer with 32 kernels in the short cut path for DR classification. The implementation of M-CNN is done through Keras [33], a Python-written high-level Application Program Interface (API). An example for the feature maps obtained from the final convolutional layer after concatenation of multi-paths in M-CNN is demonstrated in Fig. 4. for the mild NPDR input image from Messidor database. The 64 feature maps obtained from the final convolutional layer is analyzed to verify the presence of DR features. Even the input size is \(196 \times 196\) (resized the orginal image size), the M-CNN retains the features that are difficult to visualize with the naked eye.

Feature extraction using pre-trained networks

Pre-trained networks are now available that can be used for classification problems. At the same time, these deep CNNs can be adapted as feature extractors. In such cases, the features are extracted from the intermediate layers. In this work, two pre-trained networks (ResNet-50 [34], VGG-16 [35]) are used for DR feature extraction from retinal fundus images. These networks are used with the pre-trained weights and extract the features from the last pooling layer. Then these features are used in different classifiers for DR classifiers. While analyzing the feature maps in the intermediate layers, it is observed that there is loss of minute features as the network becomes deeper. So, it is necessary to evaluate the effect of such features in DR classification. It is discussed in Sect. 4.3.

Classifiers

After the feature extraction using the M-CNN method, the images are classified into different categories using machine learning classifiers such as SVM, Random Forest, and J48. The classifiers are trained and validated using the extracted M-CNN features.

Support vector machine (SVM)

The Support Vector Machine (SVM) [36] is a supervised learning strategy relevant to binary classification tasks. The SVM classifier is helpful in situations in which the input data are non-linearly separable in space. It is also suitable for many multiclass classification problems. For this, it uses a “one vs. all” scheme. In this method, the multiclass problem is split into a binary classification problem for each class. The steps for creating the classifier pseudo-code are adopted from [19].

Random forest

Random Forest [37] is an ensemble model classifier with a collection of decision trees structured to have different random vectors [38] for each of them. The steps for obtaining pseudo-code are as described in [19].

J48

J48 is the java version of C4.5 [39] Decision Tree (DT) intended for data mining. In DT, the information gain is the fundamental parameter in the design. The equations related to the J48 classifier are adopted from [20]. During the decision tree construction, the most substantial information gain is picked as the test feature for the current node. To make the classifier more efficient, we use the maximum depth parameter value as 3, finalized through experiments with random values. The decision tree is used for reduced error pruning [40] to reduce the complexity with fewer power nodes. The classification of input feature vectors is carried out according to the conditions during validation/testing.

Performance analysis

A system’s performance appraisal is critical as it establishes the efficiency of a new system. In the following steps, the consistency of the proposed model is assessed.

K-fold cross validation

The performance of the classifier is evaluated using the technique called K-fold cross-validation [41]. In order to clear up the issue with imbalanced database in Table 2, stratified random sampling is involved in the cross validation method. In this method of cross validation, the data splitting assures same class distribution in each subset. The cross-validation procedure is followed as in [19].

Evaluation metrics

The evaluation results are stored in the confusion matrix format [42]. For example, consider a “\(C \times C\)” matrix with \(P_{ij}\) as elements (where, \(\text {i},\text {j}=1,2,3\ldots\), no. of classes). In this matrix, let J represent True Positives (TP) count, K-False Negatives (FN) count, M and N denote the False Positives (FP) and True Negatives (TN) count respectively. TP and TN present correctly classified data, while FP and FN are the incorrectly classified information. In multiclass classification, the TPs and FPs can be acquired for each actual class i by taking p predicted classes through equation (6). Then the model performance can be analyzed by calculating different evaluation metrics from the confusion matrices.

$$\begin{aligned} \begin{aligned}&\# \quad TPs, \quad J_i = P_{ii}\\&\# \quad FNs, \quad K_i = \Sigma _{j=1}^{p}P_{ij}-J_i\\&\# \quad FPs, \quad M_i = \Sigma _{j=1}^{p}P_{ji}-J_i\\&\# \quad TNs, \quad N_i = \Sigma _{j=1}^{p}\Sigma _{k=1}^{n}P_{ik}-J_i-M_i-K_i \end{aligned} \end{aligned}$$
(1)

Accuracy determines the overall strength of the system. It is established using Eq. 2:

$$\begin{aligned} Accuracy_i = \frac{J_i}{J_i+K_i+M_i+N_i} \end{aligned}$$
(2)

False Positive Rate (FPR) describes the incorrect positive predictions rate during the classification. For an ideal classifier, the FPR is 0.0. It is evaluated from the confusion matrix using the following equation:

$$\begin{aligned} FPR_i=\frac{M_i}{M_i+N_i} \end{aligned}$$
(3)

Precision represents the efficiency with which the system makes perfect positive predictions. It is measured as,

$$\begin{aligned} Precision_i=\frac{J_i}{J_i+M_i} \end{aligned}$$
(4)

Recall, also called as sensitivity explains how a model prevents FNs effectively.

$$\begin{aligned} Recall_i=\frac{J_i}{J_i+K_i} \end{aligned}$$
(5)

F1-score is required to evaluate the model’s accuracy when there is imbalanced data input. It determines the harmonic mean of precision and recall.

$$\begin{aligned} (F1-score)_i=\frac{2J_i}{2J_i+M_i+K_i} \end{aligned}$$
(6)

Specificity valuates the effectiveness of preventing false positives(FPs) during the classification. The FPR and specificity total equals 1.

$$\begin{aligned} Specificity_{i}=\frac{N_i}{M_i+N_i} \end{aligned}$$
(7)

Kappa-score (K-score) is the classifier’s consistency metric that measures the inter-observer reliability. This is a ratio of observed accuracy (\(R_O\)) to predicted accuracy (\(R_E\)) and is estimated as:

$$\begin{aligned} K-score = \frac{(R_O-R_E)}{(1-R_E)} \end{aligned}$$
(8)

Except for the FPR, high values of the other measurements reflect an excellent classifier performance. There was some difficulty in using the detailed class efficiency measures for the analysis of the model. Therefore, the weighted average values of the evaluation metrics are calculated for easy evaluation of the system. If \(M_1\) indicates the class 1 (\(C_1\)) evaluation metric and \(M_2\) indicates class 2 (\(C_2\)) evaluation metric, then the weighted average of metric \(W_{em}\) can be written as:

$$\begin{aligned} W_{em}=\frac{(M_1*|C_1|)+(M_2*|C_2|)}{|C_1|+|C_2|} \end{aligned}$$
(9)

Experimental results

This section evaluates how the M-CNN features influence the performance of SVM, Random Forest, and J48 classifiers for DR grading and to select the classifier that functions best for the DR grading while using M-CNN features. For that, different evaluation metrics are calculated from the confusion matrices obtained after the model’s cross-validation.

Database description

The databases used in this work are IDRiD [43], Kaggle (for DR detection) [44], MESSIDOR [45]. The IDRiD contains 413 images, Kaggle database containing 35126 images. The MESSIDOR database includes 1200 images. The database description is given in Table 2. The DR features are extracted from the images in these databases separately and those features are used to train the ML classifiers. The ML classifiers doesn’t require a very large database like what a deep neural network requires.

Table 2 Database description

Proposed method of DR multiclass classification

The proposed system for DR grading consists of an M-CNN for feature extraction and ML classifier for DR multiclass classification/severity grading. The designed network is treated as an arbitrary feature extractor. In order to extract the most efficient features, it is required to pre-train the designed M-CNN with fundus images of DR. The M-CNN is pre-trained with a total of 53679 (not used for performance evaluation of the system) DR category fundus images taken from Kaggle, Messidor and IDRiD databases. The best learning rate has been found experimentally to be 0.003. A momentum factor of 0.9 is used to make the training less noisy and converge faster to the objective. The problem of overfitting is compromised by using a dropout factor of 0.5. The network is pre-trained for 100 epochs with multiple iterations to get the optimized weights that make the network a good DR feature extractor. After that the the images are given into the M-CNN pre-trained network for forward propagation. Then from the second fully connected layer the output features are collected. The classifier can be selected only after evaluating different classifiers using these M-CNN features from the images of different databases. The advantage of multiple path CNN is the extraction of local as well as global features. Already deep neural networks are computationally expensive to train. So, this issue is solved by using the M-CNN extracted features in the ML classifiers. The stratified 10-fold cross validation is then applied to evaluate the performance of the mentioned classifiers. More deeper and wider network leads to loss of minute features in the DR image. In this work the minute features are important for mild NPDR classification. So, this specified network architecture paves a way for better DR feature extraction.

Confusion matrices of each classifiers

The primary fact that required in the assessment of a model is the confusion matrix. The confusion matrices that shows the classifier’s efficiency are obtained by performing 10- fold cross validation using extracted M-CNN features in each classifier. The performance metrics are further calculated from the corresponding confusion matrices using the basic equations mentioned in Sect. 2.5.2. The confusion matrices obtained for three classifiers using M-CNN extracted features from IDRiD, Kaggle and MESSIDOR databases are provided in Tables 3, 4 and 5 respectively.

Performance metrics calculation using proposed feature extraction

The evaluation metrics described in Sect. 2.5.2 are calculated from the confusion matrices obtained for each classifier. The different measures used for better classification efficacy are FPR, Specificity, Precision, Recall, F1-Score, and Accuracy. The specificity and recall are needed to understand the test’s strength of the classifier. Specificity determines the proportion of the actual negatives, and recall determines the proportion of actual positives that are predicted correctly. The F1-score is the combined metric of precision and recall. In giving weightage to both precision and recall values, this F1-score can be considered an evaluation metric rather than an accuracy metric. The detailed metrics are tabulated in Table 6 for IDRiD database, Table 7 for Kaggle database, and Table 8 for MESSIDOR database. The weighted average values for each metric are calculated from the detailed efficiency measures and shown in Table 9. The other performance metrics for each classifier, such as validation accuracy and K-score, are depicted in Table 10. The inter-rater reliability implied by K-score represents the extent to which the values gathered in the experiment are accurate representations of the calculated data. The K-score ranges from 0.81 to 1.00 represent almost perfect agreement [46]. Training a CNN with small database doesn’t produce a good classification model. At the same time, CNNs are capable of extracting minute features from images. When comparing the results obtained using different database, it is clear that the database size has an important role in modeling a perfect classifier. The IDRiD database contains only 413 images. This is not enough to train the M-CNN. In this work the proposed M-CNN is used for DR feature extraction and the extracted features from 413 images are used to train the ML classifiers. The same is done for the other two databases.

Performance metrics calculation using pre-trained network feature extraction

There are many existing CNNs that produce good results in different classification problems. The features extracted using the ResNet-50 and VGG-16 networks are fed into each classifier. Then the system is evaluated by applying K-fold cross valuation. The K-score and the accuracy obtained for each classifier are given in Table 11. The IDRiD and MESSIDOR databases are used for the experiments.

The DR classification performance of ResNet-50 and VGG-16 via the transfer learning method is also analyzed. For fine tuning the network, the fully connected layers in the ResNet-50 and VGG-16 are removed. Then a global average pooling layer is added to the output of the backbone model. The overfitting is avoided at this layer as there is no parameter to optimize in the global average pooling. Then three fully connected layers are used with batch normalization between them. The first fully connected layer consists of 512 nodes, second one with 64 nodes. A drop out of 0.5 is used after the second fully connected layer. Then the last fully connected layer using ‘softmax’ activation with number of nodes equal to the number of classes is added for the final DR classification. Table 12 shows the results of transfer learning based DR classification for different databases.

Table 3 Confusion matrix for evaluation using IDRiD Database
Table 4 Confusion matrix for evaluation using Kaggle database
Table 5 Confusion matrix for evaluation using MESSIDOR database

Discussions

Confusion matrix analysis

From the confusion matrices, it is seen that the J48 classifier is capable of more effective DR grading than the others. Suppose the mild NPDR and the Normal categories are analyzed, in that case, it is clear that the proposed M-CNN feature extraction with the J48 classifier is effective in detecting DR. Another important factor is that the classification using J48 classifier gives better results in early detection of DR. While analyzing the confusion matrices the mild NPDR and normal images are almost classified correctly. So, the model has the capability of early detection of DR.

Efficiency evaluation

While analyzing the evaluation metrics results in Sect. 3.4, it is clear that the SVM classifier does not perform well in DR grading for all the databases. The Random Forest classifier performs better than the SVM classifier. SVM uses the “one vs. all” method in multiclass problems, which induces difficulties in analyzing the output. Random Forests can handle categorical features well, and therefore, in multiclass problems, it outperforms SVM to some extent. J48, which gives the highest specificity, precision, recall, and F1-score for our classification problem, works best with M-CNN features for DR grading. The FPR obtained for the J48 classifier is nearly 0 in the case of all databases. According to the initial evaluation, the M-CNN features with the J48 classifier are suitable for DR grading. In the evaluation of performance metric for each classifier in Table 10, the J48 classifier can be seen to have a validation accuracy of above 99% using all databases. The K-score is also higher for the J48 classifier. When the other classifiers show K- score of less than 0.95, the J48 classifiers offer K- score above 0.98, which shows the proposed model’s markable efficacy. The results are highlighted in the performance evaluation tables to notice the efficiency of J48 classifier than the other classifiers.

Table 6 Detailed efficiency measures calculated from confusion matrix of IDRiD Database
Table 7 Detailed efficiency measures calculated from confusion matrix of Kaggle Database
Table 8 Detailed efficiency measures calculated from confusion matrix of MESSIDOR Database
Table 9 Weighted average values from the detailed efficiency measures
Table 10 Validation accuracy and Kappa Score for each classifier with M-CNN feature extraction using different databases

DR grading using pre-trained networks

Evaluation of DR grading using pre-trained networks is performed in this section. The pre-trained networks are used as feature extractors of DR and also used as a DR multiclass classifier utilizing the transfer learning method. On analyzing the Table 11, ResNet-50 features using IDRiD, the J48 classifier performs well with a K-score of 0.901 and an accuracy of 92.46%. But for the MESSIDOR database, the J48 performs with a K-score of 0.892 and an accuracy of 91.22%. On analyzing the VGG-16 features using IDRiD and MESSIDOR databases, the SVM classifier performs better than the other classifiers. However, the K-score and classifier accuracy for SVM is low. The Table 12 shows the performance of pre-trained CNNs in the DR grading. The Kaggle, MESSIDOR and IDRiD databases are used for evaluation. The results shows that the difference in the number of layers and the database size affects the DR classification performance of existing pre-trained CNNs. Compare to ResNet-50, VGG-16 shows more accuracy using Kaggle database. In the case of medical images, as the network becomes deeper there might be chance of losing the useful features. So the classification performance degrades in deeper networks [47]. Then also the performance of proposed method is better than the VGG-16.

Table 11 Evaluation of the classifiers using pre-trained network extracted features for DR Grading
Table 12 Evaluation of pre-trained networks via transfer learning for DR Grading

Comparison of the proposed (M-CNN + J48) system with existing DR classification methods

Many methods are implemented for DR classification using different techniques. The relevant existing methods are compared with the efficiency of the proposed method, and the comparisons are tabulated in Table 13. The technique proposed in [20] shows almost the same ability as in the proposed work. But the time complexity for extracting features is less in the proposed method because we use two different feature extraction methods to obtain the features. The ADTCWT and Haralick are functional feature extractors, but it requires more time for extracting the features than our M-CNN feature extraction method. When considering both the time consumption and classification efficacy, it can be concluded that the proposed model is the fastest method that can be used for DR grading. The proposed system works better than using the M-CNN alone for the DR classification. The features are collected from the second fully connected layers and replaced the third fully connected layer and softmax function with ML classifiers; this helps reduce the model’s time complexity. The efficiency measures obtained for the proposed work are almost the same as those reported in [21]. While considering the early detection of DR, the proposed method works best in classifying normal and mild NPDR images, which is a milestone in DR classification. From the comparative analysis and the efficiency measures of the proposes system, it can be seen that the proposed system works best for early and fast DR classification.

Table 13 Comparison of proposed work with existing methods

Conclusion

DR is vision-threatening morbidity that has become widely prevalent in recent times. This work proposes an automated early DR diagnosis and fast grading technique. M-CNN extraction and ML classifier are used to extract relevant features from fundus images and classify the lesions according to their severity levels. The model is analyzed using IDRiD, Kaggle, and MESSIDOR databases. The ML classifiers used in the experiments are SVM, Random Forest, and J48. The features extracted using pre-trained networks are also used for evaluation. The proposed method exhibits an average validation accuracy of 99.62% and a K-score of 0.995. After many experiments, it is seen that the M-CNN features show the best performance with the J48 classifier. The M-CNN and J48 classifier combination can thus be used for early and fast automatic prediction, and grading of DR. This multipath network can be modified to predict other retinal diseases, enhancing the retinal health care monitoring system.