1 Introduction

Diabetic Retinopathy (DR) is a diabetes consequence that appears in the back of the eye (retina), causing injuries that can go from mild vision troubles to loss of sight [29].DR is a tedious eye condition to vision loss and blindness in patients with diabetes [4]. High blood sugar levels cause severe damage to blood vessels in the retina. Blood vessels in the eye start to leak fluid causing the macula to swell or thicken, preventing blood from passing over. At times, abnormal growth of new blood vessels on the retina can lead to permanent loss of vision. Early and precise DR detection is critical for both the care and management of diabetic disease. When the disease is identified, the patient can be regularly tested, to correctly identify the improvement of the disease. The existing various color deviations such as red yellow, blue, and green, as well as incorrect illuminations, lead to difficulties in the observation of eye fundus images, which in turn leads to problems in the diagnosis. Therefore, an efficient method is needed to detect retinopathy as early as possible, so blindness can be avoided.

1.1 Motivation

Many algorithms have been previously designed for an accurate diagnosis of DR. While using automatic detection method is significant to DR, systematic screening for DR has recognized as to cost-effective way to save health services resources. Automatic retinal image analysis is raising a vital screening tool for early DR detection, which can lessen the workload associated to manual grading as well as save diagnosis costs and time. In addition to this motivation, a novel Shape Adaptive box linear filtering-based Gradient Deep Belief network classifier (SAGDEB) Model was developed for early and accurate DR detection with minimal time consumption.

1.2 Major contributions of the paper

To overcome the existing issues, a novel SAGDEB was developed with the following contributions:

  • The inclusion of three different stages, namely pre-processing, feature extraction, and classification.

  • To enhance the peak signal-to-noise ratio (PSNR), a shape adaptive box linear filtering is applied in the SAGDEB for the image pre-processing. The filtering technique uses the Wilcox index to identify the blurred pixels, which are replaced by the average of all the neighboring pixels; increasing the quality of images.

  • To minimize the DR identification time, the SAGDEB performs the Isomap geometric feature extraction, which can extract features from retina images of several natural shapes, colors, and texture features.

  • The extracted features are sent to the Adaptive gradient Tversky Deep belief network classifier, which uses the Tversky index to analyze the features, with testing disease features. Based on the similarity value, the sigmoid activation function is used to detect the different levels of DR.

  • To minimize the error of DR, the weight of the input is updated and an adaptive gradient method is applied.

  • At last, extensive experimental evaluations are performed utilizing various performance metrics to underline the improvement of the proposed SAGDEB, over conventional techniques

1.3 Organization of this paper

The article is organized as follows: Section 2 analyzes reviews of related works developed to predict DR. In Section 3 the proposed methodology (SAGDEB) is explained in detail. Section 4 includes the description of both experimental datasets that were used during this work and also in the works used for comparison. Afterward, performance results and discussions are presented in Section 5. The conclusion of the work is given in Section 6.

2 Literature Review

An ensemble of deep convolutional neural networks (DCNNs) was developed for accurate detection and level of DR, by using fundus images. In [1], the multistage patch-based deep CNN (MPDCNN) provides an improvement on the classification accuracy level of DR. Features help to model learn the important data from DR images. However, the designed technique was not tested to examine the DR related to high-risk characteristics, and it also failed to include a complete and more accurate automatic DR grading system for the retinopathy screening. A paradigm called Few-Shot Learning (FSL), which uses a relatively small amount of training information, was developed in [2], using a DR Net framework to grade and identify DR. The detection of DR and its severity level is essential to initiate clinical interventions to avoid adverse outcomes. However, it failed to minimize the time consumption of DR.

A deep learning architecture was developed in [3], which used segmented fundus image features to detect DR. However, the complexity analysis of DR detection was not been focused on. In [4], an automated DR grading technique was performed, using a classification of features based on the severity level using algorithms of deep learning and machine learning (ML). Nonetheless, the classifier was not used for a fast automatic prediction and grading of DR.

In [5], both a CNN and a Hybrid DCNN were developed as classifiers, by extracting the features of the eye, although user interface implementation was not developed, so the system could be used for real-time applications. A reformed capsule network was introduced in [6], to identify DR via the evacuation of the features from fundus images. However, a designed network model has failed to early identify the retinal issues in diabetic patients, to improve diagnosis, and to avoid vision loss.

CNN-singular value decomposition (SVD) was developed in [7] to improve the classification and detection of DR, having resulted in the use of more time for different stages of DR detection. An intelligent computer-aided system was introduced in [8], by using a pre-formation CNN for a DR screening on fundus images. However, the designed system depended on the forecast of the DR level quality, resulting in reduced quality, minimum contrast, and saturated fundus images at the early stages.

In [9], a novel image processing scheme and stacked deep learning technique were developed for the diagnosis of DR level to remove unnecessary reflectances. However, the modifying and improving the images in the dataset, the developed method was not suitable for improving the feature extraction capabilities. A severity diagnosis of DR named GCA-Efficient Net (GENet) was introduced in [10] to the algorithm based on the dimension of the feature map. However, it failed to enhance the performance of detecting DR by more deep learning models.

In [11], a survey gives a comprehensive review of the use of ML in the medical field highlighting standard technologies as well as how they affect medical diagnosis. But the error rate was not minimized. In [12], a compendious review of different medical imaging modalities and evaluation of related multimodal databases along with statistical results was provided. However, it failed to minimize the diabetic retinopathy detection time.

A multitask deep learning model was developed in [13], to identify all different levels of DR in a more precise way, using regression. This method comprised the real-time implementation within medical environments for diabetic retinopathy eye assessment. A Generative Adversarial Network (GAN) based visualization method was developed in [14] for DR detection. However, the method failed to improve the reliability, and fairness of the algorithm.

A Local Extrema Quantized Haralick Features with Long Short Term Memory (LSTM) Network was developed in [15] to identify DR. DR performance was not improved. In [16], a global transformer block and a relation transformer block were developed to detect the special medical patterns with small sizes, for the identification of DR lesions. However, it failed to achieve a better performance in DR multi-lesion segmentation with less memory required.

To improve the classification accuracy of DR, a hybrid attention method, integrated with a residual CNN algorithm, was developed [17]. However, the algorithm did not apply multimodal integration to train the model, to enhance the classification accuracy. A DCNN, together with a genetic algorithm-based feature selection method, was developed in [18] to detect DR, by extracting features from fundus images. Nonetheless, it failed to focus on distinguishing between individuals with a normal sign and those with a non-proliferative sign, in terms of accuracy. An automated and interpretable referable DR screening model was developed in [19] but was not able to reduce the time consumption of DR screening.

A DR detection system was performed in [20], which was based on applying deep learning to ultra-wide-field fundus photography. However, this work failed to apply the model to the large volume of images. A deep symmetric CNN was developed in [21] for the detection of DR, having failed to detect more objects, both simultaneously and accurately. In [22], a novel automatic deep learning-based approach was introduced for severity detection, by using a single color fundus photograph, not able to provide good performance, although having improved the dataset size. In [23] it was designed the method Regularized Anisotropic Filtered Tanimoto Indexive Deep Multilayer Perceptive Connectionist Network (RAFTIDMPCN), which consisted of the use of many layers of nodes for a deep analysis of the input, was able to enhance the classification results. However, DR detection time was not reduced.

CNN was employed in [25] to detect colorectal cancer with maximum accuracy. A novel writing hand pose detection algorithm was developed in [26] for robust fingertip discovery and tracking. But, the time was higher. A new image compression method was developed in [27] to achieve the quality of the reconstructed image. Recurrent self-evolving Takagi–Sugeno–Kang fuzzy neural network (RST-FNN) was introduced in [28] for forecasting blood glucose levels. However, the accuracy was not sufficient.

3 Proposed Methodology

Diabetic Retinopathy (DR) is a diabetes consequence that appears in the back of the eye (retina), causing injuries that can go from mild vision troubles to loss of sight. Various algorithms have been previously designed for an accurate diagnosis of DR[30]. Above this motivation, a novel SAGDEB was developed for early and accurate DR detection with minimal time consumption.

Figure 1 illustrates a diagram of the architecture of the proposed SAGDEB, which can perform the DR identification with improved accuracy and minimum time consumption. The used DR Dataset was taken from Kaggle. First, retina fundus images〖 F〗_1,F_2,….F_nare gathered from the image dataset, and following this step, three different processes are performed: image pre-processing, feature extraction, and classification.

Fig. 1
figure 1

Architecture of proposed SAGDEB

Image pre-processing is performed to enhance the image quality, by extracting the blur pixels in the image without removing important parts of the image content. This is done by using the shape adaptive box linear filtering, which accurately identifies the blur pixels through the Wilcox index. Afterward, the blur pixels are replaced by taking the average of all the neighboring pixels. Next, the Isomap geometric feature extraction process is carried out, to remove shape, color, and texture features from the pre-processed images. Isomap is a nonlinear dimensionality reduction process that is used to compute the quasi-isometric with a set of high-dimensional images. Finally, adaptive gradient Tversky deep belief network-based classification is used for DR identification with higher accuracy. The deep learning technique uses the Tversky similarity coefficient to analyze the extracted features (i.e. Features are extracted by using adaptive gradient Tversky deep belief network-based classification) and testing features (i.e. diseased features in dataset). Then, the sigmoid activation function classifies diabetic retinopathy into five different classes namely, No DR, Mild, Moderate, Severe and Proliferative DR. A detailed description of the SAGDEB method is discussed in the following sections.

3.1 Shape adaptive box linear filtering-based image preprocessing

Pre-processing is the fundamental process of the proposed SAGDEB, being able to enhance the image quality, by suppressing undesired distortion, artifacts, or noise. The images are recorded by sensors that are equipped with the scanning tool, which can lead to errors related to the geometry and brightness values of the pixels. In image pre-processing steps, these distortions are corrected using the appropriate image filtering technique. The filtering process reduces noise and enhances both contrast and image smoothing. To remove the noise artifacts, as well as to enhance image quality and artifacts, the proposed SAGDEB uses the shape adaptive box linear filtering method.

Figure 2 illustrates the different processes of image pre-processing, which uses the shape adaptive box linear filtering technique. Here, the input fundus images are denoted as\({F}_{1},{F}_{2},{F}_{3},\dots .{F}_{n}\), and pixels are denoted by \({a}_{1,}{a}_{2},{a}_{3},\dots {a}_{m}\). The pixels are arranged in a \(3*3\) filtering window.

$$k= \left[\begin{array}{ccc}{a}_{1}& {a}_{2}& {a}_{3}\\ {a}_{4}& {a}_{5}& {a}_{6}\\ {a}_{7}& {a}_{8}& {a}_{9}\end{array}\right]$$
(1)

where ‘\({\prime}{k}{\prime}\) denotes the pixel arrangement in the \(3*3\) filtering windows. In this filtering window, the center value is determined by sorting the pixels in ascending order and taking the one in the middle. If two pixels in the middle are constant, the average of these two pixels is considered the middle pixel.

Fig. 2
figure 2

Flow process of shape adaptive box linear filtering-based preprocessing

After sorting the pixels in the filtering window, the blurred pixel is identified by measuring the variance between the pixels, applying a Wilcox variation index, which is an index of qualitative variation in statistical analysis. Then, the variation between the center pixels and the neighboring pixels is measured as given below,

$$WI=\sum \nolimits_{i=1}^{n}\sum \nolimits_{j=1}^{m}\left|{a}_{c}-{a}_{ij}\right|$$
(2)

where \(WI\) indicates a Wilcox variation index, \({a}_{c}\) denotes the center pixel, and \({a}_{ij}\) indicates the neighboring pixels. If the variation is high, then it is said to be a blur pixel. After identifying the blur pixels, each one is replaced by the average value of its neighboring pixels. This aids in maintaining the shape of the object in the input image.

$$y= \left(\frac{\sum {a}_{ij}}{n}\right)$$
(3)

where, \(y\) denotes the output of the pre-processing, which is an image where the blur pixels were replaced with the average of all the pixels in the neighboring, including the center pixel itself. Consequently, the noises from the input images are removed, and also, there is an improvement in the quality of the fundus image. The algorithm of shape adaptive Wilcox box linear filtering algorithm is given below,

Algorithm 1
figure a

Shape adaptive Wilcox box linear filtering

Algorithm 1 shows the different procedures of pre-processing, to obtain improved images. By resuming it, it can be said that the input retinal fundus images are gathered from the database, and for each one, initially, pixels of the images are arranged in the filtering window. Then, the center pixel is identified by sorting all pixels, and the variation between the center pixel and neighboring pixel is measured and identified as blur pixels, being replaced by an average of all the pixels. As a result, the contrast-enhanced smooth retinal image is obtained to enhance the PSNR.

3.2 Isomap geometric feature extraction

The Isomap Feature Extraction process is performed to obtain image features like shape, texture, intensity, and color from the pre-processed images. The nonlinear dimensionality reduction of Isomap is achieved by computing the quasi-isometric with a set of high-dimensional images. Isomap is a type of isometric mapping, being able to provide a simple method to estimate the intrinsic geometry of an image pixel based on a rough estimate of neighboring pixels. Isomap is very capable, and generally applicable to a wide range of image pixels and dimensionalities.

Isometric mapping is carried out based on pair-wise distances between data points, which are generally measured by using straight-line Euclidean distance. Isomap is distinguished by measuring the geodesic distance when induced by neighborhood pixels.

The shape feature in the given input image is extracted by defining the contour's region. The contour's areas are identified by estimating the Euclidean distance from the center of the image and the edge of the object as given below,

The centroid of the image is calculated as given below,

$${C}_{x}=\frac{1}{n}\sum \nolimits_{i=1}^{n}{x}_{i}$$
(4)
$${C}_{y}=\frac{1}{n}\sum \nolimits_{i=1}^{n}{y}_{i}$$
(5)

where (\({C}_{x},{C}_{y}\))denotes a centroid of the image contour, and the contour area of the image coordinate is(x,y). The Euclidean distance is measured as given below,

$$D=\sqrt{{\left({C}_{x}-x\right)}^{2}+{\left({C}_{y}-y\right)}^{2}}$$
(6)

where ‘D’ is the distance, the point (\({C}_{x},{C}_{y}\))represents the centroid of the image, and the point (x,y) denotes a contour area of the image. In this way, the shape of the object is identified through isometric mapping by using the Euclidean distance.

The input image of the texture feature image is measured to supply the spatial data of pixels' intensities.

$$TX=\frac{1}{{\delta }_{i}*{\delta }_{j}}\sum \nolimits_{i}\sum \nolimits_{j}\left({a}_{i}-{\mu }_{i})({a}_{j}-{\mu }_{j}\right)$$
(7)

where ‘TX’ indicates the texture feature, \({\mu }_{i}\) and \({\mu }_{j}\) indicate a mean of the pixels \({a}_{i}, {a}_{j}\) respectively, and \({\delta }_{i}\) and \({\delta }_{j}\) are the deviations.

Color is the most effective visual feature used in medical images, with colors being usually described in 3-dimensional areas. It may either be Red, Green, and Blue (RGB), Hue, Saturation, and Value (HSV), Hue, Saturation, and Brightness (HSB), among many other color spaces.

The original image is RGB color, which is then transformed into HSV, as shown below:

$$h=\left\{\begin{array}{c}0+\frac{\left(g-b\right)}{\left(M_{ax}-M_{in}\right)}\ast60,\;if\;M_{ax}=r\\2+\frac{\left(b-r\right)}{\left(M_{ax}-M_{in}\right)}\ast60,\;if\;M_{ax}=g\\4+\frac{\left(r-g\right)}{\left(M_{ax}-M_{in}\right)}\ast60,\;if\;M_{ax}=b\end{array}\right.$$
(8)
$$H=h+360\;if\;h<0$$
(9)
$$sat=\left(\frac{{M}_{ax}-{M}_{in}}{{M}_{ax}}\right)$$
(10)
$$val={M}_{ax} \left(rgb\right), {M}_{in}(rgb)$$
(11)

where \({M}_{ax}\) denotes a maximum pixel of the image, \({M}_{in}\) indicates a minimum pixel of the image,r indicates a red, g denotes a green, b represents the blue, sat indicates saturation, and val denotes value. The algorithm of the feature extraction is given below,

Algorithm 2
figure b

Isomap geometric feature extraction

Algorithm 2 represents the procedure of feature extraction from the input retinal fundus images. First, the pre-processed image is given as input to the feature extraction algorithm. The shape features are computed by estimating the distance between the center and contour area, texture features are extracted by identifying the spatial connection with the pixels, and color features are extracted through statistical measures. The feature extraction process of the proposed technique is lower than the DR total detection time.

3.3 Adaptive gradient Tversky Deep belief network classifier-based diabetic retinopathy identification

The last step is the classification, which is performed by using the Adaptive gradient Tversky Deep belief network, with the extracted features. Classification model is simple and easy to implement. It is supervised machine learning process employed in machine learning for categorizing images into dissimilar classes. Classification is carried out to precisely find the DR images with accurate manner. A deep belief network is a probabilistic and generative graphical model with multiple layers of latent hidden units, such as two visible units (input and output) and more than one hidden unit. The main benefit of a deep belief network is to provide less computational complexity while processing the huge size of the input. This method works based on the layer-by-layer method.

Figure 3 illustrates the construction of the Adaptive gradient Tversky Deep belief network, which consists of two visible units (input and output), one or more hidden units, as well as sub-hidden layers. Each layer usually includes a small individual unit named artificial neurons or nodes. The main purpose of a neuron is to process the given input and to forward the output to other nodes, through the activation function. The connection between neurons is called a synapse.

Fig. 3
figure 3

Construction of the adaptive gradient Tversky Deep belief network

The input unit considers the extracted image features, which are given to the input unit. The activity of the neuron in the input unit is measured as the product of a set of weights and inputs.

$$N= \sum \nolimits_{i=1}^{n}\sum \nolimits_{j=1}^{m}({f}_{i}*{w}_{j})+ b$$
(12)

where N the is an activity of the neuron in the input unit, \({w}_{j}\) denotes weights assigned to an input ‘\({f}_{i}\)’, and ‘b’ indicates a bias that stores the value of ‘1’. The input is transferred into the hidden unit where the pattering matching process is said to be performed by applying a Tversky similarity measure.

The Tversky similarity is used to measure the relationship between two variables (i.e. extracted features and testing patterns). The relationship between these two variables is measured as given below,

$$TS=({f}_{i}\cap {f}_{t}/{\text{p}}{f}_{i}\Delta {f}_{t})+ q\left({f}_{i}\cap {f}_{t}\right))$$
(13)

where TS indicates a Tversky similarity coefficient, f_i denotes extracted image features, f_t a testing disease features. f_i ∩ f_t indicates a mutual dependence between the two features, f_i ∆f_t indicates a variance between the two features, and p and q designate parameters of the Tversky index (p,q ≥ 0). The similarity coefficient ( TS) provides the output value between [0, 1].

To identify the different levels of DR, results are given to the activation function in the hidden unit, which in this case, the sigmoid activation function was used. The activation function is a numerical equation used to determine the output of a deep belief network. The outputs of activation functions also help to normalize the range between 0 and 1.

The activation function is efficient, minimizing the computation time since the deep belief network is trained on a large number of data. Sigmoid activation is used to make a clear disease prediction. The mathematical formula for the sigmoid activation function is given below,

$$\beta ={(1+{\text{exp}}(TS))}^{-1}$$
(14)

where β denotes a sigmoid activation function. Based on the activation function results, the different levels of DR can be identified as No DR, Mild, Moderate, Severe, and Proliferative DR.

After the classification, the error rate of DR prediction is measured by using the squared difference between the actual and observed classification results. The error rate is calculated as given below,

$${\sigma }_{E}=\frac{1}{n}{({Z}_{a}-{Z}_{p})}^{2}$$
(15)

where,\({\sigma }_{E}\) denotes an error rate of classification, n indicates the number of input samples, \({Z}_{a}\) indicates an actual classification result, and ‘\({Z}_{p}\)’ an output generated by the deep belief network. To minimize the error, the deep belief network updates its weight.

The adaptive gradient method is applied to update the weight of the input. The mathematical formula for weight updating is given below,

$${w}_{j+1}={w}_{j}-\frac{\tau }{\sqrt{k+\varepsilon }} .\left(\frac{\partial {\sigma }_{E}}{\partial {w}_{j}}\right)$$
(16)
$$k={k}_{t-1}+{\left(\frac{\partial {\sigma }_{E}}{\partial {w}_{j}}\right)}^{2}$$
(17)

where, \({w}_{j+1}\) denotes an updated weight, \({w}_{j}\) denotes a current weight, τ denotes a constant valueτ = 0.01, and the value of \(\varepsilon ={10}^{-7}\) In the beginning, k is initialized to 0, and the process is repeated until the error is minimized. At last, the results are sent into the output layer of the deep belief network classifier.

Algorithm 3
figure c

Adaptive gradient Tversky Deep belief network classifier algorithm

The above algorithm describes the step-by-step process of DR identification through the Adaptive gradient Tversky Deep belief network classifier. Initially, the extracted features are given to the input unit, which is then sent to the first hidden layer, where the similarities between features are performed. After that, the sigmoid activation function is applied to analyze the similarity value, and as a result, different levels of DR are correctly identified. Finally, the adaptive gradient method is determined to update the weight of the input. Finally, the different levels of DR are identified at the output layer with improved accuracy.

4 Performance Analysis

The experimental assessment of the SAGDEB Model and two conventional methods namely MPDCNN [1] and DRNet [2] were implemented using a MATLAB simulator. The numbers of high-resolution retina fundus images are collected from the Diabetic Retinopathy Arranged dataset, which was taken from https://www.kaggle.com/amanneo/diabetic-retinopathy-resized-arranged [24]. In this work, validation is performed by using a public dataset named diabetic retinopathy arranged dataset for estimating the detection performance of the proposed method. The dataset is applied to measure fundus images with class labels for classification. This dataset was utilized in evaluating the illustration of detection for DR. This dataset includes 5 different types of images, such as No DR, Mild, Moderate, Severe, and Proliferative DR, representing it on a scale of 0 to 4. The dataset consists of 35,126 images, where the high-resolution retina images were gathered from the dataset in a diversity of imaging conditions to affect the visual appearance of the left and right eye. For this experimental purpose, 35,000 images were used for DR identification. tenfold-cross-validation is employed in the proposed SAGDEB Model for splitting the dataset. The Diabetic Retinopathy Arranged dataset is divided into two sets namely training and testing. Most of the retinal images (70%) are used for training, and lesser retinal images (30%) are employed for testing.

5 Implementation details

In this study, we developed a Shape Adaptive box linear filtering-based Gradient Deep Belief network classifier (SAGDEB) Model for accurate DR detection. The SAGDEB model comprises pre-processing, feature extraction, and classification.

  • We compared our SAGDEB model to existing MPDCNN [1] and DRNet [2] using the Diabetic Retinopathy Arranged dataset to validate the results.

  • Initially, retina fundus images are collected from the dataset.

  • Second, Shape adaptive box linear filtering is used to perform preprocessing to minimize the noise.

  • Third, the isomap geometric feature extraction is utilized for extracting the features such as shape, texture and color with less time.

  • Finally, the Adaptive gradient Tversky Deep belief network classifier is employed for discovering the DR images. The deep belief network includes several layers namely input unit, hidden units, and output unit. The extracted image features are taken in the input layer and it is transmitted to hidden layers. The extracted and testing features are examined via the Tversky similarity index. The sigmoid activation function is utilized to find different levels of DR. The error is reduced by using the adaptive gradient method. In the output layer, the categorization outcomes are achieved.

The hyper parameters used in the evaluation are summarized in Table 1.

Table 1 Hyper parameters description

6 Result Analysis and Discussion

The performance of the proposed SAGDEB technique and two conventional methods, namely MPDCNN [1] and DRNet [2], are discussed for PSNR, accuracy, error rate, and detection time. Performance analyses of the different metrics are performed through the table and graphical illustration.

6.1 Peak signal to noise ratio (PSNR)

PSNR is utilized for measuring the image quality based on mean square error. It is estimated as the difference between the original retinal image size and pre-processed image size. The PSNR is formulated as given below,

$${PS}_{nr}=10*{{\text{log}}}_{10}\left[\frac{{M}^{2}}{Err}\right]$$
(18)
$$Err={\left[{F}_{p}-{F}_{o}\right]}^{2}$$
(19)

where, ‘\({{\varvec{P}}{\varvec{S}}}_{{\varvec{n}}{\varvec{r}}}\) refers to a mean square error,\({{\varvec{F}}}_{{\varvec{p}}}\)’ denotes a preprocessing image size,‘\({{\varvec{F}}}_{{\varvec{o}}}\)’ denotes an original image size, ‘M’ is the maximum possible pixel range (i.e. 255), and \({\varvec{E}}{\varvec{r}}{\varvec{r}}\) denotes an error in the image preprocessing. The PSNR is measured in decibels (dB).

Figure 4 shows the graphical representation of PSNR with the sizes of retina images. As shown in the above graph, the sizes of images are taken into the x-axis and the performance values are observed on the y-axis. The performance of the PSNR using SAGDEB is represented in the blue color curve. Similarly, the PSNR using existing MPDCNN [1] and DRNet [2] are shown in red and green, respectively. In other words, increasing the retinal image size causes a slight increase or decrease in the PSNR and vice versa. But, with simulations performed in the first iteration using 39.05 KB retinal images, PSNR using the SAGDEB technique was found to be 50.06 dB, 49.04 dB [1], and 47.70 dB [2] respectively. From this result, the PSNR using SAGDEB trends was observed to be comparatively lesser than [1] and [2].However, with simulations performed in the second iteration using 45.64 KB retinal images, PSNR using the SAGDEB technique was found to be 40.45 dB, 38.64 dB [1], and 37.73 dB [2] respectively. From this result, the PSNR using SAGDEB trends was observed to be comparatively lesser than [1] and [2].Among the three methods, the SAGDEB achieves a higher PSNR. The shape-adaptive Wilcox box linear filtering-based image preprocessing was applied to obtain preprocessed image. For each collected original retina image, the pixels are arranged in the filtering window, with the identification of the center pixel value. Afterward, the blurred pixel is identified by estimating the variation between the center pixels and neighboring pixels using the Wilcox index. Finally, the blurred pixel is replaced by averaging all the pixels in the window. The pixel that deviates from the centre is named “noisy pixel”. These noisy pixels are eliminated from the filtering window. This process of the proposed SAGDEB model improves the peak signal-to-noise ratio and minimizes the mean square error. The quality estimation is said to be performed with the preprocessed image. The average of ten comparison results indicates that the performance of PSNR using the SAGDEB method is improved by 5% when compared to [1], and 7%, when compared to [2], respectively.

Fig. 4
figure 4

Graphical representation of Peak Signal to Noise Ratio for proposed SAGDEB and existing MPDCNN [1] and DRNet [2]

6.2 Diabetic retinopathy identification accuracy

Accuracy is measured as the number of retinal images that are correctly identified as No DR, Mild, Moderate, Severe, and Proliferative DR from the total number of retinal images. The accuracy is mathematically calculated as,

$${Acc}_{DRI}=\left[\frac{Number\;of\;images\;correctly\;identifed}{Number\;of\;images}\right]*100$$
(20)

where,\({Acc}_{DRI}\) indicates DR identification accuracy, which is measured in percentage (%).

Figure 5 shows the graphical illustration of DR identification accuracy along with the number of retina images. The amount of retina images are taken on the horizontal axis and accuracy can be observed on the vertical axis. In this image, three different colors of curves are represented by blue, red, and green, which characterize the accuracy of three techniques, namely SAGDEB, existing MPDCNN [1], and DRNet [2], respectively. This can be proven through statistical evaluation. Since 3500 pages were used as input to measure the accuracy, the SAGDEB technique, achieved 96.85% of DR identification accuracy.

Fig. 5
figure 5

Graphical representation of diabetic retinopathy identification accuracy for proposed SAGDEB and existing MPDCNN [1] and DRNet [2]

With the comparison methods, the accuracy of MPDCNN [1] and DRNet [2] were observed to be 94.85% and 92.85%, respectively. Among the three methods, the proposed SAGDEB could improve DR identification accuracy. This is due to the reason that accuracy was improved because an adaptive gradient Tversky deep belief network classifier was used. The deep learning classifier analyzes the extracted feature and tests disease features, by applying the Tversky similarity index. Afterward, the sigmoid activation function uses the similarity value, providing different levels of DR, resulting in an increase in accuracy. The average of ten comparison results proves that the accuracy of the SAGDEB is significantly increased by 3% and 6%, when compared to the existing [1, 2], respectively.

6.3 Error rate

The error rate is measured as the number of retinal images that are inaccurately identified as No DR, Mild, Moderate, Severe, and Proliferative DR, from the total number of retinal images taken for experimental evaluation. The error rate is calculated as

$${R}_{E}=\left[\frac{Number\;of\;images\;in\;correctly\;identifed}{Number\;of\;images}\right]*100$$
(21)

where \(,\boldsymbol{ }{{\varvec{R}}}_{{\varvec{E}}}\) indicates error rate that can be measured in percentage (%).

Figure 6 it is shown the simulation results of the error rate in DR identification. The error rate is measured based on a different number of retina images 3500–35000. The error rate of the three methods, namely SAGDEB and existing MPDCNN [1] and DRNet [2], is represented by three colors, namely blue, red, and green, respectively. As illustrated in Fig. 6, the experimental results of the SAGDEB technique minimize the error rate. The reason for minimizing the error rate is to apply adaptive gradient function in the Deep belief network classifier. The weight of input gets updated depending on the estimated error value for each classification result. This process is iterated until the error gets minimized. The accurate outcomes are transmitted to the output layer of the deep belief network classifier. Therefore, classifications of different retinopathy levels are achieved with reduced error. The observed results indicate that the error rate using the SAGDEB is considerably minimized by 39% and 50% when compared to the existing [1] [2].

Fig. 6
figure 6

Graphical illustration of error rate for proposed SAGDEB and existing MPDCNN [1] and DRNet [2]

6.4 Diabetic retinopathy identification time

Identification time is measured as the amount of time consumed to detect DR, such as No DR, Mild, Moderate, Severe, and Proliferative DR. The overall time taken for diabetic retinopathy identification is calculated as,

$${T}_{DRI}=\sum \nolimits_{i=1}^{n}{F}_{i}*T\left(ISI\right)$$
(22)

where,\({T}_{DRI}\) indicates the diabetic retinopathy identification time, \({F}_{i}\) denotes the number of images, \(T\left(ISI\right)\) denotes a time for identifying the disease in a single image. This metric is measured in terms of milliseconds (ms).

Figure 7 shows the graphical representation of DR identification time, with some retina images. With the collected retina mages, the time consumption of the DR identification was analyzed and plotted in the above graph. In Fig. 7, the performance of DR identification time is increased while increasing the number of retina images. However, the DR identification time was minimized by using SAGDEB, upon comparison with MPDCNN [1] and DRNet [2]. When considering 3500 retina images, the time taken to identify diabetic retinopathy identification was 66.5 ms by using the SAGDEB, 89.25 ms when using MPDCNN [1], and 98 ms when using DRNet [2]. The minimum time consumption of SAGDEB was due to the application of Isomap geometric feature extraction. These extracted features are given to the deep belief network classifier, which is then identified the different levels of DR, with minimum time. The overall performance results of the proposed SAGDEB are compared to the existing results. The comparison results depict that the proposed SAGDEB was able to minimize the time consumption by 9% and 16% when compared to existing methods.

Fig. 7
figure 7

Graphical representation of diabetic retinopathy identification time for proposed SAGDEB and existing MPDCNN [1] and DRNet [2]

6.5 Sensitivity

It also known as recall or true positive rate, computes the ratio between true positive and the sum of true positive and false negative. It is mathematically estimated as follows:

$$Sen=TP/TP+FN*100$$
(23)

where ‘\(Sen\)’ indicates sensitivity and ‘\(TP\)’ denotes a true positive, ‘\(FN\)’ represents a false negative. It is estimated in terms of percentage (%).

The results of the sensitivity of three different methods using the SAGDEB and existing MPDCNN [1] and DRNet [2] are illustrated in Fig. 8. Compared to existing deep learning methods, SAGDEB achieves a higher sensitivity. For the experiment, 3500 number of retinal images were considered. The sensitivity of SAGDEB was observed to be 98.54%, while the MPDCNN [1] and DRNet [2] achieved recall results of 97.5% and 96.43% respectively. Based on these results, SAGDEB shows an increase in sensitivity percentage of 2% and 3% compared to the existing methods [1] and [2], respectively. The reason behind the improvement was owing to the application of the adaptive gradient Tversky deep belief network classifier. In this way, true positives are accurately identified with minimizing false negatives.

Fig. 8
figure 8

Graphical representation of sensitivity for proposed SAGDEB and existing MPDCNN [1] and DRNet [2]

6.6 Specificity

It is computed as a percentage of appropriately detected as No DR, Mild, Moderate, Severe, and Proliferative DR out of a total number of retinal images. It is estimated in terms of percentage (%).

$$SP=TN/TN+FP*100$$
(24)

From the above Eq. (24), the specificity ‘\(SP\)’ is measured based on the percentage ratio of the true negative rate ‘\(TN\)’ and the false positive rate ‘\(FP\)’ respectively.

The performance analysis of the specificity with respect to the number of retinal images is depicted in Fig. 9. The number of retinal images collected from the dataset ranges from 3500 to 35,000. The graphical results indicate that the specificity of the proposed SAGDEB is increased compared to the conventional [1] and [2]. Considering the 3500 number of retinal images, the specificity of the SAGDEB was observed to be 96.5, while the specificity of the existing methods [1] and [2] was observed to be 94.42% and 92.39% respectively. Based on the observed results, the proposed SAGDEB enhances the specificity by 3% and 6% when compared to the existing methods. The improvement in specificity is achieved by applying the deep belief network classifier. The tversky similarity index is applied to examine the extracted feature and test disease features. Based on similarity value, the sigmoid activation function is to obtain dissimilar levels of DR with higher specificity.

Fig. 9
figure 9

Graphical representation of specificity for proposed SAGDEB and existing MPDCNN [1] and DRNet

7 Conclusion

A novel automatic system in deep learning model called SAGDEB was developed, to identify DR from retinal fundus images at an early stage. First, the input retina images are pre-processed by using a shape adaptive box linear filtering technique, which removed the blur pixels. Followed by, the different geometric features are extracted from the preprocessed images. Finally, an Adaptive gradient Tversky Deep belief network classifier is employed to classify the different levels of diabetic retinopathy disease with higher accuracy and a minimum error. To validate the performance of the proposed SAGDEB method, the simulation was conducted with a Diabetic Retinopathy dataset. The quantitative analysis of the three different deep learning methods is derived in terms of PSNR, accuracy, error rate, and identification time. The comparative result indicates that the proposed SAGDEB significantly improved the performance of accuracy, and PSNR and minimizes the time and error rate than the conventional methods.