Introduction

Glaucoma (Gl) is an ocular disease in which the optic nerve linking the eye and brain is impaired due to the increased intraocular pressure (IOP) [1]. It is an irreversible neuro-degenerative optical disease that was first discovered back in the 17th century and is known as a main cause of blindness since the 19th century [2]. Regardless of the technological advancements and availability of effective treatments, glaucoma is still a leading source of irreversible blindness all across the globe. According to a systematic review and meta-analysis developed by Tham et al. [3] from 50 population-based studies, 111.8 million people are anticipated to suffer from glaucoma by the year 2040, out of which more than 85 million cases will be from Africa and Asia. The estimated cases for each continent are provided in Table 1.

Generally, glaucoma is diagnosed through IOP analysis that should be > 22 mmHg without medication, the glaucomatous visual field defects, and the glaucomatous cupping of the optic disc [4]. Presently, the identification of glaucomatous structural changes and damages is a challenging attribute of glaucoma detection methods [5]. The shape and size of the optic cup disc is a vital attribute that needs to be considered during glaucoma diagnosis [6]. The comparison of Fig. 1a with b indicates an increase in the cup, which is a clear sign of glaucomatous optic neuropathy [4].

Table 1 Estimated cases (in millions) with primary glaucoma by the year 2040
Fig. 1
figure 1

a Optic nerve with normal cup; b optic nerve with increased cup due to glaucoma

Timely diagnosis and treatment can greatly help in preventing loss of vision due to glaucoma. Therefore, glaucoma detection in the early stages is critical and needs to be improved through the introduction of novel techniques for screening, detection, and diagnosis of changes over time [7]. The collation of large medical datasets and recent developments in artificial intelligence (AI) have incited great research interest in developing deep learning algorithms that can more rapidly and precisely detect the glaucomatous damage on diagnostic tests in comparison to the conventional manual approaches [8,9,10,11].

Automated glaucoma detection has several advantages over the manual approach. Easy identification of minute abnormalities, less time consumption, and reduced human error are some of these advantages. Moreover, it is possible to develop automated glaucoma detection systems via combined image processing techniques utilizing either deep learning (DL) or machine learning (ML) techniques.

The deep learning approaches initially involve the collection of images with and without glaucoma. This is followed by the application of image preprocessing techniques for reducing noise from the images to get them ready for the feature extraction stage. These images are then inputted into the DL framework to automatically extract features and associated weights for learning the rules of classification. The weights are repeatedly optimized for ensuring optimal classification outcomes. Lastly, an unseen set of images is used for testing the optimized weights. However, a large set of images is required in this architecture for training purposes. Thus, its performance can be critically restricted in the case of a limited number of images. Figure 2 depicts the deep learning pipeline for glaucoma detection.

Fig. 2
figure 2

Deep learning approach in glaucoma detection

Motivation

Considering the rapid advancements in the machine learning domain, many researchers have recently employed transfer learning and DL approaches to build automated glaucoma detection systems. However, there are only a few studies in the existing literature that have comprehensively reviewed the latest DL approaches used for glaucoma detection and discussed the available datasets and processing techniques. Recently, Thompson et al. [7] published a review paper that discussed the DL techniques employed in the screening, detection, and diagnosis of glaucoma progression. The latest applications of DL models with regard to glaucoma detection, their benefits, and the challenges linked with the development of these models are critically reviewed in this paper. However, this study mainly focused on glaucoma detection with standard automated perimetry (SAP) and optical coherence tomography (OCT). Moreover, the several available glaucoma-labeled datasets, image pre-processing techniques, and classification techniques were not fully addressed in this study.

Lately, Tong et al. [12] and Sarki et al. [1] have reviewed the applications of deep learning in detecting ocular diseases including cataract, diabetic macular edema (DME), glaucoma (Gl), and diabetic retinopathy (DR). However, due to the wider scope of these studies, most of the state-of-the-art literature based on the DL techniques for glaucoma detection could not be covered.

In another study, Barros et al. [4] conducted a systematic review on the ML algorithms used in the retinal image processing for glaucoma detection and diagnosis. They shortlisted only 18 articles for review out of which only 8 specifically covered the DL methods while the remaining 10 included varied ML approaches. Moreover, only the articles published till August 2019 were included in this review paper.

Sengupta et al. [13] also reviewed the applications of DL for ophthalmic diagnosis using retinal fundus images. This paper described different fundus image databases that might be employed for DL purposes and assessed the DL applications in the detection of lesions and segmentation of blood vessels, optic cup, and optic disc. The researchers also discussed the DL models for classifying diseases like diabetic retinopathy, glaucoma, and macular degeneration and identified directions for future research as well. But none of the studies published recently were covered in this paper. Moreover, only 11 studies addressing the DL techniques for glaucoma detection were covered in their review [13].

Very recently, Janani et al. [14] has conducted a survey on early investigation of glaucoma using different ML, TL, and DL techniques. The paper presented the latest segmentation and detection approaches and discussed current challenges and trends for giving the readers an overview of the existing state of research. However, the scope of this research is very limited, and only 10 studies from 2017 to 2020 are covered in this review.

Thus, considering the shortcomings in the above-discussed review papers, it seemed important to perform a comprehensive review of the recent studies that employed DL and TL techniques for Glaucoma detection.

Contribution

This paper provides a thorough overview of the recent developments in glaucoma detection using deep learning approaches by reviewing the state-of-the-art literature from the below-mentioned perspectives:

  1. 1.

    Available glaucoma databases.

  2. 2.

    Image pre-processing techniques for glaucoma detection.

  3. 3.

    Deep learning methods proposed for glaucoma detection.

  4. 4.

    Performance metrics for evaluating the glaucoma detection algorithms.

The remaining of the paper is organized as follows: the next section discusses the available glaucoma-labeled datasets and reviews the papers on the basis of datasets employed in their study. The pre-processing techniques applied to the retinal fundus images in the selected papers are assessed in the following section. The disease classification techniques employed in the research articles are analyzed in the next section. The following section highlights the key observations and findings of this study. The research gaps and future recommendations are covered in the next section followed by the conclusion of the study in the following section. Table 2 represents the distribution of the selected 61 articles with respect to the review objective.

Table 2 Categorization of selected articles based on study target

Glaucoma Datasets

In the selected papers, different public and private datasets of retinal images are used and divided into testing and training examples. Drishti-GS [82] and RIM-ONE [83] the most commonly employed datasets in the recent studies for glaucoma detection. Drishti-GS dataset is used in 20 out of 61 studies, while 22 studies employed RIM-ONE dataset for training and evaluation purposes. The description of all the datasets used in the articles is included in Table 3.

High-resolution fundus (HRF) [84]: This dataset has been established by a collaborative research group to support comparative studies on automatic segmentation algorithms on retinal fundus images. The public database contains at the moment 15 images of healthy patients, 15 images of patients with diabetic retinopathy and 15 images of glaucomatous patients. Binary gold standard vessel segmentation images are available for each image. Also the masks determining field of view (FOV) are provided for particular datasets. The gold standard data are generated by a group of experts working in the field of retinal image analysis and clinicians from the cooperated ophthalmology clinics.

Drishti-GS1 Dataset [85, 86]: This dataset consists of a total of 101 images. These have been divided into 50 training and 51 testing images. All the images have been marked by 4 eye experts with varying clinical experience. All images were collected at Aravind eye hospital, Madurai from visitors to the hospital, with their consent. Glaucoma patient selection was done by clinical investigators based on clinical findings during the visit. Selected patients were between 40 and 80 years of age with roughly equal number of males and females. The data collection protocol was as follows:

Online retinal fundus image dataset for glaucoma (ORIGA) [87]: ORIGA is an online retinal fundus image dataset for glaucoma analysis and research. Currently, ORIGA contains 650 retinal images annotated by trained professionals from Singapore Eye Research Institute where 482 are normal and 168 are glaucomatous images.

The Singapore Chinese Eye Study (SCES) [88]: SCES is a dataset contains 1676 fundus images where 46 images are glaucoma cases.

Automatic glaucoma assessment using fundus images (ACRIMA) [89]: This dataset includes 705 fundus images (396 glaucoma and 309 normal images). They are part of the ACRIMA project and were obtained with prior consent from glaucoma and normal patients in accordance with the ethical standards set forth in the 1964 Declaration of Helsinki. All patients were selected by experts according to their criteria and clinical findings during the examination. Most of the fundus images in this database are from the left and right eyes previously dilated and centered on the optic disc. Some were rejected because of artifacts, noise, and low contrast. All images in the ACRIMA database are annotated by two glaucoma experts with 8 years of experience. No other clinical information was taken into account when labeling the images.

Retinal image dataset for optic nerve evaluation (RIM-ONE) [90]: RIM-ONE is a dataset that contains 159 stereo eye fundus images with a resolution of 2144 \(\times\) 1424. The right part of the stereo image is disregarded. Two sets of ground-truths for optic disc and optic cup are available. The first set is commonly used for training and testing. The second set acts as a “human” baseline.

DRIONS-DB [91]: This dataset that contains 110 eye fundus images with a resolution of 600 \(\times\) 400. Two sets of ground-truth optic disc annotations are available. The first set is commonly used for training and testing. The second set acts as a “human” baseline.

High-resolution fundus (HRF) [92]: HRF is a dataset contains 601 fundus images divided into 4 groups: normal (300 images), glaucoma (101 images), cataract (100 images) and retina disease (100 images).

Retinal fundus glaucoma challenge (REFUGE) [93]: REFUGE is a dataset that provides a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one.

Ocular disease intelligent recognition (ODIR) [94]: This challenge is a structured ophthalmic dataset of 5000 patients with age, color fundus photographs from left and right eyes and doctors’ diagnostic keywords collected by Shanggong Medical Technology Co., Ltd. from different hospitals/medical centers in China. This competition consists of eight types of ocular diseases and a total of 6392 images with 2873 normal, 1608 diabetes, 284 glaucoma, 293 cataract, 266 age-related macula degeneration, 128 Hypertension, 232 Pathological Myopia and 708 other diseases/abnormalities.

Retinal fundus images for glaucoma analysis (RIGA) [95]: RIGA is a dataset that includes 3 different files: (1) MESSIDOR dataset file contains 460 original images and 460 images for every single ophthalmologist manual marking in total of 3220 images for the entire file. (2) Bin Rushed Ophthalmic center file and contains 195 original images and 195 images for every single ophthalmologist manual marking in total of 1365 images for the entire file. (3) Magrabi Eye center file and contains 95 original images and 95 images for every single ophthalmologist manual marking in total of 665 images for the entire file. The total of all the dataset images are 750 original images and 4500 manual marked images. The images are saved in JPG and TIFF format.

The Illinois Ophthalmic Database Atlas (I-ODA) [96]: I-ODA is a dataset that contains retinals images of the patients from the illinois eye and ear infirmary of the university of illinois chicago (UIC).

Large-scale attention-based glaucoma (LAG) [97]: LAG is a dataset that contains 4584 fundus images from Tongren Hospital in Beijing, including 1711 positive and 3143 negative samples for glaucoma. Each fundus image is diagnosed by a qualified glaucoma specialist, taking into account morphological and functional analysis, namely intraocular pressure, visual field loss and manual assessment of the optic disc.

Joint Shantou International Eye Centre (JSIEC) [98]: JSIEC is a dataset comprised 1087 high-resolution retinalfundus images taken at Joint Shantou International Eye Centre, China, This dataset is classified into 37 categories, out of which one contains 54 normal images and one has 13 glaucoma images.

Table 3 Available datasets for glaucoma detection and the works who used them

Image Preprocessing Techniques

Several image pre-processing steps are typically performed to enhance the images. Moreover, the extraction of more unique and salient features also becomes easier for a network when the images are clearer and brighter [1]. This section provides an overview of the image pre-processing techniques employed in the selected recent studies. In the RGB color space, the green channel offers improved contrast and more information compared to the red and blue channels; therefore, the extraction of the green channel is done in some studies before further processing. For example, Chaudhary et al. [74] carried out the green channel extraction from the RGB image for further processing due to the increased sensitivity of human vision towards the green color [99].

Contrast enhancement is known to be an important image preprocessing technique. Gour et al. [69] used contrast limited adaptive histogram equalization (CLAHE) to improve the contrast of the image at all the channels. CLAHE is a processing method that is focused on small portions of an image rather than the complete image. Thus, it achieves better performance in the case of fundus images compared to other methods [100]. The quality of all the images available in the dataset is standardized through CLAHE for classification. By doing so, the analysis of different attributes is not influenced by the sharp variations in the contrast of the images [54] and the learning complexity is also reduced [49]. CLAHE is employed in several recent studies including [25, 32, 49, 54, 57] for enhancing the contrast of the images.

Image resizing is another popular image pre-processing technique. The images are resized to lower resolution in accordance with the system requirements. For instance, Borwankar et al. [26] scaled down the images to 256 \(\times\) 256 to reduce the time and computational complexity, whereas images are resized to 512 \(\times\) 512 dimensions in [32].

The region near the optic disc of the retina is especially affected in glaucoma and is regarded as the main region of interest (ROI) for Gl classification [69]. Orlando et al. [21] showed that better results are achieved when optic nerve head (ONH) images are used as the CNN input instead of complete retinal images. In [31,32,33,34], the researchers cropped the images around the ONH in the pre-processing step and then used these as input for their models. Sometimes researchers also masked the optical discs and blood vessels to avoid false Gl detection and segmented the worthless black borders in the images for focusing on the ROI [1]. In [48], the ARGALI approach is employed for removing the bright fringe to get the center and radius of the trimming circle for ROI extraction. Chakrabarty et al. [66] used Adaptive Thresholding to get binary images. Gour et al. [69] cropped fundus images considering the optic disc as the center of the ROI to extract.

Table 4 Image pre-processing techniques used in the reviewed articles

Another important pre-processing step, known as image augmentation, is typically applied in case of image imbalance. Images are cropped, resized, rotated, and mirrored to produce new images when the Gl images are less than the healthy retinal images in a dataset [1]. Augmentation is an ML technique that is commonly employed in medical imaging due to the unavailability of sufficient images [101]. It involves the addition of several image transformations in order to create modified versions of already available images to create larger dataset. The major objective of augmentation is to prevent overfitting that is a typical issue that arises while working with small datasets. For instance, Ovreiu et al. [31] applied different transformations, such as vertical and horizontal flipping, brightening by a factor in the range of 0.2–1, rotating from 0 to 180 degrees, and zooming by a factor in the range of 0.2–1. This is followed by augmentation for enhancing the image dataset to avoid overfitting of the training model and keeping the accurate image details. Similarly, Joshi et al. [29] also applied four types of image augmentation on the original dataset, which include resizing half scale, resizing double scale, 15 degrees anticlockwise rotation, and 15 degrees clockwise rotation.

A summary of pre-processing techniques used in the reviewed studies is given in Table 4.

Glaucoma Eye Disease Classification Techniques

This section reviews the deep learning approaches employed for glaucoma detection. Out of the 61 chosen articles, 25 used a transfer learning technique (TL), 17 proposed new deep learning methods, 11 used a combination of machine learning classifiers such as backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF), etc. and 8 used deep learning with optical coherence tomography (OCT) for glaucoma detection and classification.

Table 5 Articles using transfer learning for Gl detection

TL Based on DL Approaches

Overall 25 out of 61 works have adopted a transfer learning approach for glaucoma detection through DL. An overview of these studies is given in Table 5.

In [15], transfer learning is used for detecting glaucoma using colored fundus images. The researchers used 10-fold Cross-Validation (CV) for evaluating the AUC of the model and achieved an AUC of 96.30%. Asaoka et al. [16] employed ResNet architecture and carried out tests using two datasets acquired from different institutes. Asaoka et al. used the data augmentation technique and utilized the area under the receiver operating characteristic curve (AROC) for measuring performance. They achieved an AROC of 99.7% and 94.8% in a dataset without augmentation and with augmentation respectively.

Phan et al. [17] compared three CNN models, namely DenseNet-201, ResNet-152, and VGG19 for the classification of retinal images. They applied deep CNN to 3312 (2687 non-Gl, 256 Gl-suspected, and 369 Gl) retinal images and achieved an AUC of 90%. Al Ghamdi et al. [18] propose a semi-supervised TL convolutional neural network model for automated glaucoma detection. They achieved a sensitivity (SE) of 91.7%, a specificity (SP) of 93.3%, and accuracy (Acc) of 92.4% using the RIM-ONE database.

Diaz et al. [19] used five ImageNet-pretrained models (ResNet50, InceptionV3, VGG19, Xception, and VGG16) for automated glaucoma assessment using 1707 retinal images from five databases (ACRIMA, sjchoi86-HRF, RIM-ONE, Drishti-GS1, and HRF). They achieved an average sensitivity of 93.46%, a specificity of 85.80%, and an AUC of 96.05%. Cerentinia et al. [20] employed GoogLeNet architecture to detect the presence of glaucoma. They used images from varied databases and obtained an accuracy of 90% for HRF, 86.4% for RIM-ONE(r3), 86.2% for RIM-ONE(r2), and 94.2% RIM-ONE(r1) and 87.3% for the combination of the three RIM-ONE versions.

Orlando et al. [21] employed two convolutional neural network models from VGG-S and OverFeat to develop an automatic glaucoma detection system. An AUC value of 71.8% and 76.3% is achieved for VGG-S and OverFeat respectively. De et al. [22] used InceptionResNetV2, InceptionV3, ResNet50, VGG-19, and VGG-16 for detecting glaucoma on RIM-ONE datasets. The combination of Logistic Regression and InceptionResNet yielded promising results on RIM-ONE (r3) with an AUC of 86%. The same classifier with ResNet resulted in an AUC of 95.7% on RIM-ONE (r2). In [23] VGG network were used for the classification of non-glaucoma and glaucoma images based on the visual field (VF) study results. For this purpose, VF samples were obtained from 3 ophthalmic centers in China and the sensitivity 93.2%, specificity 82.6%, and accuracy 87.6% were achieved.

Gómez-Valverde et al. [24] used VGG 19 with a CNN TL approach for the classification of glaucoma using one private and two public datasets (DRISHTI-GS and RIM-ONE). Serener et al. [25] proposed automated detection of advanced and early glaucoma using fundus images. In this work, TL is used for training and fine-tuning GoogLeNet and ResNet-50 deep CNN algorithms for Gl classification. The performance evaluation of the two models is done in terms of area under the ROC curve, specificity, sensitivity, and accuracy. The obtained results indicate that ResNet-50 is outperformed by GoogLeNet in the advanced as well as early detection of glaucoma. Borwankar et al. [26] also presented a robust CNN-based model using ResNet architecture to detect glaucoma. This model achieved an F1 score of 98.8% and an accuracy of 98.9% on the classification of glaucomatous images.

In [27], a TL-based model is designed for diagnosing IOP in the optic nerve. An improved validation accuracy of 91.2% is achieved through this model using RIM ONE, ORIGA, and DRIVE datasets. The training time, on the whole, is substantially reduced using the TL approach and inter-observability errors are minimized. Kim et al. [28] proposed a TL-based approach where Gradient-weighted Class Activation Mapping (Grad-CAM) and CNNs were employed for the detection and localization of glaucoma. This approach showed promising results by achieving a ROC-AUC score of 92% and an accuracy of 91% for the detection task.

Joshi et al. [29] proposed a cost-efficient automated Gl detection and pre-screening architecture for suspected glaucoma in retinal images. The fundus images obtained from local hospital datasets and different public databases are used in training. The five-fold cross-validation of the trained model is done and 95.848% specificity, 89.054% sensitivity, and 93.698% accuracy are achieved. The obtained results also showed that this method has a fast glaucoma screening time and is scale and rotation invariant, and resolution-independent. Gour et al. [30] proposed an automated multi-label multi-class TL-based convolutional neural network for detecting ocular diseases using the ODIR database. In [31], the authors investigated the option of employing residual networks for detecting glaucoma in the early stages. They used a ResNet50 network that is pre-trained using the ImageNet dataset and achieved a validation accuracy of 96.95%.

Yu et al. [32] introduced a modified version of U-Net architecture with ResNetmodel. The fundus images are taken from RIM-ONE, DRISHTI-GS1, and RIGA databases for glaucoma assessment. Their proposed approach achieved state-of-the-art performance on all three datasets. U-Net architecture is also employed in [33], where Kim et al. proposed automated techniques for optic cup and disc segmentation from regions of interest (ROI) in retinal images for glaucoma detection. They implemented two (multi-class and binary) fully convolutional networks and tried two ROIs (masked ROI and original ROI) as inputs for estimating the best segmentation results. They employed the RIGA dataset for training and testing the fully convolutional networks and achieved improved performance compared to the existing algorithms.

In [34], the authors developed a generalized DL model for glaucoma classification on fundus images. The model is trained and tested for three different DL architectures, namely ResNet-152, GoogLeNet, and ResNet-50, using five databases including ACRIMA, sjchoi86-HRF, RIM-ONE, Drishti-GS1, and HRF. The model is fine-tuned in order to achieve satisfactory specificity, accuracy, and AUC performance when any of the datasets is used for testing.

Wang et al. [35] also used a TL approach for Gl detection. The obtained results on three datasets (iSee, ORIGA, and REFUGE) indicate the efficiency of the proposed system in terms of different metrics including G-mean, F1, and recall. Claro et al. [36] proposed an automated glaucoma detection approach using CNNs, shape, and texture descriptors. They used 6 retinal image datasets for evaluating the proposed approach. The best result (an accuracy of 92.78% and 93.35% for performance set and development set respectively) is achieve through the concatenation of CNNs with GLCM. Norouzifard et al. [37] developed a DL model for Gl detection from fundus images employing InceptionResNetV2 and compared it with another commonly employed model-VGG19. They employed the TL approach to overcome the overfitting issue due to the limited quantity of input images. The average sensitivity and specificity of InceptionResNet-V2 on re-test and test datasets were 93.3%, 90.9%, 90.1%, and 100% respectively.

Ahn et al. [38] proposed a TL-based Glaucoma detection that involved CNN models and logistic classification. They achieved AUC values of 87.9%, 88.6%, and 92.2% on different datasets. Shibata et al. [9] presented a Deep Residual Learning Algorithm with ResNet for Glaucoma screening. They used a training dataset having 1364 glaucomatous and 1768 healthy images and achieved an AUC of 96.5%.

Manop [39], proposed a deep transfer learning of the CNN model for detecting the glaucoma using ResNet50V2, VGG16, InceptionV3, and Xception. Since the existing dataset has a small number of images, this study uses the data augmentation techniques to increase the virtual number of images. The results reveal that the proposed models have performed the classification task for detecting glaucoma. The proposed model achieved an accuracy level of VGG16, RestNet50V2, InceptionV3, and Xception are 97.27%, 94.53%, 95.31%, and 94.92%, respectively. The comparison reveals that the deep transfer learning model with VGG16 architecture is the highest performance with an AUC of 98.94%.

Xi Xu et al. [40], propose a transfer learning technique that leverages the fundus feature learned from similar ophthalmic data to facilitate diagnosing glaucoma. Specifically, a transfer induced attention network (TIA-Net) for automatic glaucoma detection. The proposed framework extracts the discriminative features that fully characterize the glaucoma-related deep patterns under limited supervision. The proposed TIA-Net was tested on two real clinical datasets and achieve an accuracy of 85.7%/76.6%, sensitivity of 84.9%/75.3%, specificity of 86.9%/77.2%, and AUC of 92.9% and 83.5%.

Touhidul et al. [41], propose, a glaucoma disease detection model based on transfer learning. Various pretrained models were used, such as VGG-16, VGG-19, DenseNet121, InceptionV3, and ResNet50. In addition, the local interpretable model-agnostic explanations (LIME) are used for the explainability of each used model. The comparison reveals that ResNet50 outperforms other models with an ACC of 94.7%.

Table 6 Articles proposing new DL model for Gl detection

DL Approaches

In some studies, the authors developed a new deep learning-based framework for automated Gl detection. Table 6 includes an overview of these research works with models details.

In [42], Chen et al. designed a 6-layered CNN framework and achieved an AUC of 88.7% and 83.1% in SCES and ORIGA datasets using Softmax classifier. Raghavendra et al. [43] developed an 18-layer CNN model for glaucoma detection using 937 glaucomatous and 589 non-glaucomatous fundus images. They achieved a specificity of 98.3%, a sensitivity of 98%, and an accuracy of 98.13%. Pal et al. [44] put forward a new multi-model deep learning network, termed as G-EyeNet, for detecting glaucoma using Drishti-GS and DRIONS datasets and achieved an AUC of 92.3%.

Juneja et al. [45] developed an intelligent system based on the optic disc and optic cup segmentation. A DL architecture is designed in which convolutional neural networks are employed for automated Gl detection. In this system, two neutral networks operate in conjunction for segmenting optic disc and cup. They used fundus images from RIM-ONE r2 and DRISHTI-GS datasets for testing and achieved an accuracy of 93% and 95.8% for cup and disc segmentation.

Karkuzhali et al. [46] proposed the use of optic cup and disc segmentation for testing glaucoma on the basis of GLCM-CNN classification. The proposed approach is found to be more reliable compared to the irregular visual field, intraocular pressure, and previous GLCM-CNN-based classification methods. Islam et al. [47] also presented a convolutional neural network-based method for early detection of ocular diseases including Glaucoma. They achieved an AUC of 80.5%, Kappa score of 31% and an F1-score of 85%. Saxena et al. [48] also proposed a DL-based mechanism for detecting glaucoma. It is a six-layer framework in which CNN is employed for the classification of patterns for glaucoma detection. The proposed architecture achieved satisfactory AUC values of 82.2% and 88.2% for ORIGA and SCES dataset respectively.

Huang et al. [49] designed a DL method for simultaneous segmentation of optic cup and optic disc in a retinal image. The lightweight concept of MobileNetV2 and encoder-decoder structure of DeepLabV3+ is employed for the simplification of the DL model. This is for reducing the burden of high-resolution clinical image input and enhancing the images’ features through histogram equalization and polar coordinate transformation for better generalization. This method achieved excellent results on varied datasets.

Mojab et al. [50] developed a new multi-task DL-based model, named interpretable glaucoma detector (InterGD), for glaucoma detection. There are two major components of this model, i.e. prediction and segmentation modules, which are efficiently incorporated in a unified multi-task architecture to allow end-to-end training. Phasuk et al. [51] proposed an effective glaucoma screening network that achieved an AUC value of 94% on public datasets including DRISHTI-GS, RIM-ONE R3, and ORIGA. Zilly et al. [52] used an ensemble learning-based CNN framework for retinal image segmentation and glaucoma detection. The proposed segmentation algorithm outperformed the present approaches on the public DRISHTI-GS dataset on several metrics. The proposed approach provide effective results with an AUC of 94% even with limited dataset availability.

Raja et al. [53] designed a novel DLRNL technique to improve early detection of glaucoma. This objective is achieved by applying the damped least-squares (DLS) method, Morlet wavelet transformation, and balanced histogram thresholding method in recurrent DNN. The performance of the proposed technique is measured in terms of false-positive rate, detection time, and detection accuracy and it achieved improved results compared to the state-of-the-art approaches. In the DLRNL technique, the false positive rate of Gl detection is reduced to 38% and 49% when compared to the state-of-the-art SP3S [102] and MLP classification [103] respectively.

Dos et al. [54] presented a method for automated glaucoma classification using capsule network (CapsNet), which is a state-of-the-art DL model in which the hierarchical spatial relationships between attributes are analyzed for representing images so that fewer training samples are required compared to classic CNNs for attaining effective classification. They achieved promising results with 80.1% of kappa score, 90.4% of AUC, 90.59% of F1-score, 94.64% of precision, 86.88% of recall, and 90.90% of accuracy. The CapsNet achieved improved results in comparison to the other commonly employed frameworks and even the TL techniques.

Bajwa et al. [55] performed a two-stage study: a segmentation stage employing regions with convolutional neural network (RCNN), and a classification stage in which the regions are classified as healthy and glaucomatous using deep CNN. They used 780 fundus images and achieved an AUC of 87.4%, a Sensitivity of 71.17% and an accuracy of 79.67%. Sharma et al. [56] proposed a robust DL-based CNN architecture for dealing with the glaucoma detection problem. The proposed network is composed of 6 convolutional layers with varied activation functions along with pooling layers to get detailed and abstract details of the input image. The probability of an image being glaucomatous is predicted by the proposed model. The model is capable of automatically detecting glaucoma with 90% of specificity, 100% of sensitivity, and 95% of accuracy. Shah et al. [57] proposed two new techniques, i.e. weak region of interest model-based segmentation (WRoIM) and parameter-shared branched network (PSBN) for the identification of cup and disc boundaries. Contrary to the past methods, the proposed techniques involved end-to-end training using single neural network architecture and employed dynamic cropping rather than the classic computer vision-based or manual cropping. They achieved a performance comparable to state-of-the-art techniques with fewer network parameters on RIM-ONE v3 and Drishti-GS1 datasets. The results showed that the proposed techniques can serve as an interesting tool in the accurate and fast screening of glaucoma.

Silvia et al. [58], proposed an early detection system of glaucoma based on a new method which uses densely connected neural networks (DenseNet) with 201 layers, initially pre-trained on ImageNet, using ACRIMA dataset. An accuracy of approximately 97% and an F1-score of 96.9% were obtained.

Raveenthini et al. [59], proposed an automated framework for detection of diabetic retinopathy and glaucoma using non-linear features. Support vector machine (SVM) classifier with different kernels was used. Results shows that SVM-radial basis function (RBF) kernel combination resulted in maximum accuracy of 85%, sensitivity of 84% and specificity of 94.32%. Tasnim et al. [60], proposed a glaucoma detection system where they compare three pretrained models InceptionV3, ResNet50, DenseNet121. An accuracy of 85.29%, 77.61%, 81.53% were obtained respectively for the used models.

Shubham et al. [61], proposed a deep learning-based system for glaucoma diagnosis using retinal fundus images. In this work, noise removal algorithm was used to enhance the quality of images. To accurately identify glaucoma from images of retinal fundus, an ensemble learning of three pretrained models was used (VGG16, ResNet50 and GoogLeNet). Overall, this strategy of ensemble learning allowed each single model contributes equally to the final prediction. The proposed framework proved to be highly effective in classification performance with an accuracy of 91.11%, specificity of 95.90% and a sensitivity of 85.55%.

Marriam et al. [62], proposed an efficient deep learning approach for automatic glaucoma detection using optic disc and optic cup. The proposed framework includes three steps for glaucoma localization and classification. First, deep features of suspicious samples are computed using the EfficientNet-B0 feature extractor. Then, EfficientDet-D0’s bidirectional feature pyramid network (BiFPN) module takes the computed features from EfficientNet-B0 and performs top-down and bottom-up keypoint fusion multiple times. In the last step, local regions containing glaucomatous lesions with associated classes are predicted. An accuracy rate of 98.21% was achieved.

Mohammed et al. [63], proposed a novel decision support system based on deep learning to diagnose glaucoma. First, the images were cropped using segmentation to ensure that the optic disk is center located in the image. Second, noise removal algorithm was used to enhance the quality of images. Five pre-trained models were used in this study (Densenet121, InceptionV3, Resnet50v2, Resnet101 and Mobilenet). The hyperparameters of the models were fine-tuned to improve the performance. To combine the prediction output of the five models, the results obtained by the models were averaged to get the final prediction. The results showed that the proposed method can identify glaucoma from eye fundus images with an accuracy of 90.05%, a sensitivity of 85.05%, a specificity of 96.01% and an AUC of 96.50%.

Table 7 Articles using combined DL and ML for Gl detection

Approaches Combining DL and ML

A combination of DL and ML classifiers, including Backpropagation neural network (BPNN), support vector machine (SVM), and random forest (RF) is proposed in 11 papers for glaucoma detection. An overview of these studies is given Table 7.

Al-Bander et al. [64] carried out feature extraction using CNN and then employed SVM for glaucoma and non-glaucoma classification. They achieved a sensitivity of 85%, a specificity of 90.8%, and an accuracy of 88.2%. Pandey et al. [65] developed glaucoma detection methods using machine learning techniques, image processing techniques, and DL-based CNN model on the Bin Rushed database. Features, such as RDR and CDR, are extracted using image processing techniques followed by classification of images using K-Nearest Neighbors, decision tree, support vector machine, and neural network. An accuracy of 99.6% is achieved in this study.

In [66], Chakrabarty et al. developed a DL–ML hybrid model with image processing for classifying high-resolution fundus images into non-glaucomatous and glaucomatous ones. They used the publicly available HRF database for this study and achieved 100% training and testing accuracy. Li et al. [67] proposed a classification-based Gl detection approach that integrated both holistic and local features. ConvNets are employed for representing the ROI features. SVM is applied to the deep features for detecting glaucoma. The proposed model achieved an AUC of 83.84% on the ORIGA dataset.

Touahri et al. [68] compared two different classification methods, one based on CNN classifiers and the other on Twin SVM (TWSVM) method. They employed these methods as a computer-aided diagnostic system for the automated classification of glaucomatous fundus images using the RIMONE dataset. The effectiveness of the proposed model is shown through several experimental results. Gour et al. [69] also developed an automated glaucoma diagnostic system using fundus images. The fundus images of HRF and Drishti-GS1 databases are classified into glaucomatous and non-glaucomatous images using an SVM classifier. The performance of this approach is compared with the latest Gl detection approaches, including the DL ones, in terms of AUC and accuracy. The proposed system achieved 83.4% and 79.2% classification accuracy and AUC of 88% and 86% for HRF and Drishti-GS1 datasets respectively.

Abbas et al. [70] presented an unsupervised approach for detecting glaucoma, in which CNN is used for extracting features and a deep belief network (DBN) Model with a Softmax classifier is employed to make the final decision. Diaz et al. [71] came up with a new approach by proposing a novel retinal image synthesizer and deep convolutional generative adversarial networks (DCGAN)-based semi-supervised learning technique for glaucoma assessment. The model is trained using 86,926 retinal images taken from 14 public and 1 private database and achieved a high classification AUC of 90.17%. Thankur et al. [72] also proposed an ML-based classification approach for improved glaucoma detection using retinal images.

In a recent study, Bisneto et al. [73] proposed another approach for automated Gl detection employing generative adversarial network (GAN) for optic disc segmentation, followed by the use of texture descriptors for the classification of healthy and glaucomatous regions. They used 556 retinal images to evaluate the proposed method and achieved a ROC curve of 1 and an accuracy of100%. Chaudhary et al. [74] presented a novel method, termed two-dimensional Fourier-Bessel series expansion-based empirical wavelet transform (2D-FBSE-EWT), in which Fourier–Bessel series expansion (FBSE) spectrum of 0th and 1st order are used for detecting boundaries. The fundus images are decomposed into sub-images. Two methods, one based on classic machine learning and the other based on ensemble ResNet-50, are proposed to detect glaucoma from the sub-images. These methods showed improved performance than state-of-the-art glaucoma detection approaches. For RIM-ONE datasets, the first method achieved an accuracy of 95.51% and 90% using random forest (RF) classifier and SVM respectively. The second method also showed promising results achieving AUC, specificity, sensitivity, and accuracy values equal to 96%, 83.3%, 94.3%, and 91.1% respectively.

Table 8 Articles using OCT with DL

Glaucoma Diagnosis with Optical Coherence Tomograpgy (OCT)

Deep learning has also been used lately in some studies to detect and diagnose the progression of glaucoma using optical coherence tomography (OCT). Over the past few years, spectral-domain optical coherence tomography (SDOCT) has turned out to be the most commonly employed tool for the diagnosis and detection of structural damages caused by glaucoma [104]. This section reviews some of the recent works in this area.

Measurements of retinal nerve fiber layer (RNFL), macula, and optic nerve head (ONH) are routinely used in clinical settings for diagnosing diseases and detecting their progression [105]. But the traditional structural damage assessment through SDOCT needs segmentation of the structure of interest to enable the extraction of proper measurements like the thickness of RNFL. Although the software is used for the automated segmentation of the area of interest, however, the output is still quite imperfect. Several works have reported segmentation errors of 19.9-\(-\)46.3% in SDOCT scans of the retinal nerve fiber layer [106, 107]. Although it is possible to manually check and correct the errors; this approach takes a lot of time and is quite hard to perform in a busy medical setting. Moreover, the analysis of multiple regions and parameters also makes the interpretation of SDOCT difficult. The integration of all the details acquired from sectoral and global RFNL thickness measurements along with macular assessment and topographic ONH parameters can be quite tricky for the clinician. Furthermore, the chance of error is also increased due to the involvement of a larger number of parameters [7].

Considering these shortcomings associated with OCT interpretation, DL models can offer alternate methods for quantifying structural damages without depending upon the pre-defined attributes acquired from the automated segmentation software. DL algorithms are capable of automatically learning attributes from the data, provided that there is an adequate amount of available data. Thus, these models can utilize raw SDOCT images with no requirements of the input or previously defined attributes (Table 8).

Keeping these into consideration, Mariottoni et al. [75] showed that it is possible to train a segmentation-free DL algorithm for predicting the thickness of RNFL during the assessment of a raw OCT B-scan. A high correlation has been observed in the segmentation-free predictions and conventional thickness of the retinal nerve fiber layer (\(r = 0.983, P < 0.001\)), with an absolute error of almost 2 \(\mu\)m in the case of high-quality images. Moreover, the DL model was successful in extracting the reliable value of RNFL thickness from such images where the classic segmentation method failed.

In another study, Thompson et al. [76] demonstrated that the raw SDOCT B-scans can be used to train a DL algorithm to directly discriminate glaucomatous eyes from the healthy ones. Their proposed DL algorithm showed an improved diagnostic performance compared to the classic parameters of RNFL thickness and achieved an AUC value of 96% in comparison to the 87% value for the global peripapillary RNFL thickness (\(P < 0.001\)). Similarly, Maetschke et al. [77] proposed a DL algorithm capable of distinguishing between healthy and glaucomatous eyes using unsegmented, raw OCT volumes of the ONH. This algorithm also exhibited better performance compared to the traditional SDOCT parameters, with an AUC of 94% compared to the logistic regression model that combined SDOCT parameters and acquired an AUC value of 89%.

In addition to the ONH scans [77, 78] and RNFL [75], deep learning has also been employed for investigating macular scans [79]. Asaoka et al. [8] illustrated that a DL model developed using an 8 \(\times\) 8 macular grid shows better results in the detection of glaucomatous damages than the ganglion cell or RNFL thickness measurements. Moreover, the DL model depicted better performance compared to the conventional random forest and SVM techniques applied to the macular measurements.

In another work, Muhammad et al. [79] built a hybrid DL system for glaucoma detection using swept-source wide-field OCT. They employed a pre-trained CNN for extracting features from the probability map images to use them as an input to the RF model for classification. The proposed model exhibited better performance than the traditional summary OCT parameters. However, a small sample having only 45 healthy and 57 glaucomatous images was used in this study which makes it unlikely to allow for enough generalizability and variation.

Besides the posterior segment OCT analysis, some works have also applied DL models upon anterior segment OCT images to diagnose angle closure or narrow angles [80, 81]. Fu et al. [81] achieved an AUC of 96% with a specificity of 92% and a sensitivity of 90% for a DL system trained for angle-closure detection using Visante OCT images. In another study, Xu et al. [80] used American–Chinese eyes for testing three multi-class convolutional neural networks. The ResNet18 classifier showed the best results in the detection of gonioscopic angle closure and achieved an AUC value of 92.8%. Considering the complexity involved in interpreting anterior segment OCT images, these models showed promising output for the automated assessment of those images to detect the presence of the narrow angles.

Metrics for Performance Evaluation

Different parameters are used for evaluating the efficiency of the classifiers. These metrics include area under the curve (AUC), accuracy (Acc), sensitivity (SE), specificity (SP), F1-score, precision, recall, G-mean, and Kappa score. The details of these parameters are covered in [108]. The most commonly used evaluation metrics include Accuracy which is chosen as a performance indicator in 29 studies, AUC used in 27 articles, while Specificity and Sensitivity are used as performance indicators in 16 and 17 studies respectively.

The other employed performance metrics include G-mean (1 studies), Kappa score (1 studies), precision (5 studies), recall (4 studies), and F1-score (6 studies). Table 9 shows the used metrics with their equations.

Table 9 Metrics with their equations

Discussion

Artificial intelligence is an exciting technology that is gaining attention in varied domains all across the research community. Machine learning has a rich history in the scientific field [109, 110]. Currently, deep learning-based models in machine learning are effectively employed in imaging for pre-processing, segmentation, classification, and detection. In the reviewed studies, the convolutional neural network is found to be the most commonly used DL architecture, where 51 out of 61 studies have employed CNN architecture. It can be said that CNN is presently the most dominant deep neural network especially for glaucoma detection along with diagnosing any other pathological sign from the clinical images (Table 10).

Table 10 Performance evaluation metrics used in the selected articles

Moreover, it has been observed that the deep learning approach showed a good performance, particularly for binary classification. The binary classification is mostly done between the glaucomatous and healthy (non-glaucomatous) cases. For instance, Bajwa et al. [55] and Al-bander et al. [64] employed DL techniques for the identification of non-glaucomatous and glaucomatous retinal images. Moreover, the DL approaches employed in most of the articles have efficiently identified a large number of cases with prominent pathological indications. However, there is a need for such efficient classifiers that can show outstanding detection performance for the early stages of glaucoma developments as well. This is because early glaucoma detection is particularly crucial for taking appropriate preventive measures to avoid blindness caused by the deterioration over time.

Furthermore, the detailed analysis of selected studies showed that DL has a great potential in the health care sector, particularly in the domain of detecting ocular diseases like glaucoma. But the high computational costs and requirement of large databases are found to be some major issues associated with the deep learning techniques. Therefore, transfer learning and data augmentation techniques are used in some recent studies as an alternate way of optimizing and reducing network training. For instance, [16] adopted transfer learning approach to reduce the extensive training involved in the classic DL approaches, and obtained quite a high value of AUC 99.7%

Although no standardized metrics have been found in the literature for evaluating the performance of glaucoma detection models, nevertheless, the authors have used different performance indicators for assessing their proposed work. The authors have predominantly used a combination of metrics including sensitivity, specificity, and accuracy to validate the performance of their proposed approaches. For instance, Raghavendra et al. [43] used CNN for detecting structural damages due to glaucoma and reported a sensitivity 98%, specificity 98.3%, and accuracy 98.13% for their presented method. Moreover, sensitivity, accuracy, and AUC is another widely employed metric combination, which is particularly suitable in DL approaches with an imbalanced image class. However, resampling or augmentation techniques are used in such cases for solving the issue of data imbalance. For instance, Chen et al. [42] employed augmentation techniques for overcoming the overfitting issue in data and obtained an AUC 88.7% and 83.10% on the SCES and ORIGA datasets respectively. Some other metrics are also used for measuring performance, such as G-mean used by Wang et al. [35], Kappa score used by Islam et al. [47], F1-score used in [35, 47, 50, 54, 55, 66], and Precision used in [50, 54, 55, 66, 68].

Future Research

Though a lot of progress has been made in the detection of glaucoma progression using DL approaches, however, there are certain open research challenges that need to be addressed in the future. This section discusses these shortcomings and indicates the possible improvements that are needed in this regard.

Unavailability of adequate data: At present, large datasets of the clinical images are unavailable and manually annotated data is also scarce. However, a huge set of fundus images is typically required for training DL algorithms, since a smaller training set may result in unsatisfactory output with respect to accuracy. This issue can be solved in the future by:

  1. 1.

    Employing numerous enhancement techniques, such as color settings, cropping, shifting, and rotating.

  2. 2.

    Advanced augmentations techniques, such as (Mixup [111], CutMix [112]...).

Moreover, Generative Adversarial Network (GAN) can also be employed for training the DL architecture with more distinctive attributes and robustness, as argued in [71, 73, 113]. Successful implementation of generative adversarial networks can greatly help in generating big volumes of medically related synthetic data. This will not only facilitate in increasing the availability of relevant data but will also be useful in avoiding privacy concerns [13].

Architectures specific for clinical imaging: In deep learning, various transfer learning-based architectures, including VGGNet, AlexNet, and GoogLeNet, are available to train a new set of images, like clinical photographs. However, these architectures are less appropriate with respect to classification accuracy for clinical data. For instance, Li et al. [23] employed VGG for glaucoma detection using retinal fundus images and achieved almost 87.6% accuracy. It is mainly because these TL architectures are developed for objects like flowers, animals, etc. Therefore, these architectures might not be appropriate for real-time clinical images. Some work is thus needed for implementing a transfer learning-based framework that is trained on proper clinical images instead of objects it and may function as a general framework and can be retrained eventually for improving the classification accuracy of the medical images.

Improvements in DL models: Although deep learning approaches have shown exceptional performance in medical imaging and detection of ocular diseases, these DL Models can be improved further by increasing computational power through an increase in the network capacity [114, 115] while keeping the overfitting factor into consideration. Moreover, the effectiveness of these models can also be enhanced by creating object-based models instead of image-based ones. For instance, in order to detect a particular malformation in the eyes, a deep neural network should be designed to learn about only that malformation while ignoring the other types of malformations. Ouyang et al. [116] has also pointed out the effectiveness of object-based identification over the one based on images.

Selection of optimum values for DL architectures: Neural networks have shown exceptional results in detecting ocular diseases, however, the complexity associated with modulation is not very obvious. For example, the hyperparameters of existing deep learning techniques, like AlexNet or CNN, are fine-tuned by many researchers for enhancing classification efficiency. However, in some cases, the background behind deep learning frameworks predictions is not quite known and is considered as a BlackBox. Thus, it is still quite tricky to identify the effectual model and best possible values for modules in different layers as well as the total number of hidden layers. Moreover, knowledge specific to the domain is also required to select attributes for regularization, learning rate, and the number of epochs. Therefore, automated algorithms for optimization can be introduced in the future for finding the optimum rates for different deep learning architectures on varied glaucoma datasets.

Unavailability of standardized performance evaluation metrics: Another open challenge that needs to be addressed in the future is the unavailability of standardized metrics that can be used for evaluating the performance of the models designed for glaucoma detection [13]. Different metrics have been used by different researchers for measuring the effectiveness of their proposed work. This variability makes it quite difficult to have a comparison among different DL architectures developed for a particular state of glaucoma. For instance, Chakrabarty et al. [66] reported accuracy of 100% and argued that their design is better than most of the other state-of-the-art approaches, while with respect to area under the curve, Asaoka et al. [16] achieved 99.7% which is higher than other AUC values reported in the literature.

Integration of telehealth with deep learning: A considerable portion of the world population residing in the rural regions is particularly suffering from a lack of access to health experts. Telehealth has emerged to be a promising solution under such circumstances [117]. Thus, there could be a possibility of combining telehealth, cloud computing, and neural networks in the future for diagnosing glaucoma from retinal fundus photographs. For instance, patients from a rural region can capture fundus images using a mobile phone and transfer them using cloud computing to a platform where a glaucoma detection model (designed using a deep learning approach) is implemented. The configured system can then detect glaucoma by analyzing the image followed by returning the diagnosis and prescription to the patients.

Conclusion

Deep learning techniques hold a promising future in the domain of glaucoma diagnosis and progression detection. In the past few years, DL models have exhibited exceptional performance in the detection and quantification of glaucomatous damages through fundus images and thus have shown potential for cost-effective glaucoma screening tests.

This paper provides a thorough systematic review of the latest techniques used in the literature for glaucoma detection. The selected papers are reviewed from the perspective of datasets, image pre-processing techniques, and classification methods employed in these studies. With regard to classification approaches, this review included papers that (1) employed transfer learning, (2) designed a new deep learning network, (3) adopted a combination of ML and DL approaches, and (4) utilized optical coherence tomography. A comprehensive discussion on the findings is also covered in this paper.

Papers published from 2015 to 2022 have been considered for review. Though considerable developments have been made with artificial intelligence and deep learning in the diagnosis and detection of glaucoma progression, however, a lot of work remains to be done. This paper can serve as an important study to understand the state-of-the-art developments in glaucoma detection and might be expanded further in the future for including the updated review of the challenging and rapidly growing domain of glaucoma detection.