figure a

Introduction

Biomedical research has experienced a paradigm shift as artificial intelligence (AI) analysis has become more prevalent. As AI-based tools are deployed clinically, the applications are projected to expand [1-3]. Both ophthalmology and early AI tools having a strong focus on image based diagnosis, causing ophthalmology to emerged at the forefront of clinical AI applications [4, 5].

As AI applications mature beyond imaging, AI analysis of omics data also represents great promise; advanced analytical tools such as AI can uncover meaningful relationships between clinical characteristics and the complex, highly dimensional data found in molecular etiologies such as genomics, lipidomics, metabolomics, and proteomics [6]. These molecular etiologies can be easily access in ophthalmology. Patients with ocular conditions undergo frequent procedures in clinical and surgical contexts, allowing for relatively easy access to biofluids such as serum, plasma, tears, aqueous humour, and vitreous humour that present opportunities for large omics datasets to be analyzed using AI [7].

AI analysis of these biofluid markers have varied applications in ophthalmology. The variability in methodologies reflects the wide range of applications, including pathogenic exploration [8, 9], diagnosis [10, 11], guidance of treatment selection [7, 12], and definition of distinct disease subtypes [13, 14]. Selection of an AI algorithm is highly dependant on a studies goals or the intended clinical application, as AI algorithms have diverse functions. = For example, supervised learning is a machine learning (ML) technique that learns to map an input to an output using example input–output pairs, called the training set, that have been defined by an expert [15]. Supervised AI algorithms can subsequently predict analogous outcomes or classify cases in a new data set, the test set. Supervised AI algorithms include artificial neural networks (ANN), support-vector machines (SVM), and discriminant analyses (DA). In contrast, unsupervised AI requires no example input–output pairs, and can determine patterns in a data set based on similarities or differences [16]. Unsupervised AI is particularly valuable in the analysis of highly dimensional and large data sets, with examples including hierarchical cluster analysis and principal component analysis (PCA). Finally, bioinformatics applications such as Gene Ontology (GO) translate complex biomarker profiles or findings into interpretable data (Table 1) [17].

Table 1 Summary of common classes of artificial intelligence used in the analysis of ocular biofluid markers. The information presented in this table was gathered from a number of the included studies and augmentative papers [73-80]

Given the variability in applications of AI to analyze biofluid markers and the wide spectrum of AI algorithms utilized, understanding how best to deploy each algorithm and how to consider biofluids in ophthalmology practice and research is challenging. This study summarizes the types of AI and bioinformatics used in biofluid marker analysis in ophthalmology, with a focus on methodological considerations. We also explore how research has strategically deployed these analysis techniques for common and unique use-cases. Finally, we describe the AI algorithm parameters, the goals of AI application, commonly accessed biofluids, and identify areas for future investigations.

Methods

This study was conducted in accordance with the PRISMA Extension for Scoping Reviews (PRISMA-ScR) [18, 19]. The protocol was prospectively registered in PROSPERO (reg. CRD42020196749). Ethics approval from our Institutional Review Board was not required given this is a review of previously published studies. Given the large quantity of papers identified on this topic, a scoping review was deemed to be most appropriate for characterizing literature that used AI algorithms for biofluid marker analysis in ophthalmic conditions. This preliminary exploratory assessment was undertaken to determine the potential size and scope of available research literature. As all ophthalmic conditions were surveyed, the literature was highly heterogenous, varying by study design, outcome measures, and omics discipline. Numerous AI algorithms and bioinformatics tools were also examined. Scoping reviews are particularly useful in such broad and complex areas that have not been reviewed comprehensively. We sought to create a database of papers that use AI to analyze biofluid markers in ophthalmology, that can be subsequently analyzed in different ways such as grouped by disease state..

Search strategy

A search strategy was developed following an extensive literature review and consultation with an experienced librarian. Five electronic databases including Embase, Medline, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and Web of Science were comprehensively searched from inception to August 11, 2020. The search was updated on July 14, 2021, to capture articles published between these dates. No language or study design restrictions were placed on the search. To ensure search sensitivity, free-text and Medical Subject Heading (MeSH) terms of the respective databases pertaining to the concepts of “ophthalmology” and “AI/bioinformatics” and “proteomics, metabolomics, lipidomics” were included in the search strategy. The complete search strategy is contained within Appendix A. Each of the included studies’ references were hand searched for relevant articles that were not captured in the initial database searches.

Selection criteria

Study inclusion criteria were: (1) original peer-reviewed study; (2) biofluid marker concentrations were analyzed, notably lipidomics, metabolomics, or proteomics from serum, plasma, tear fluid, vitreous humour, aqueous humour, or ophthalmic biopsy; (3) study population had intra-ocular ophthalmic conditions, a systemic disease affecting intra-ocular structures/physiology, or were well (in the case of exploratory studies). Study exclusion criteria were: (1) non-ophthalmic conditions; (2) extra-ocular ophthalmic conditions (e.g. strabismus); (3) ophthalmic disease only affecting pediatric patients (e.g. retinopathy of prematurity); (4) studies utilizing non-human subjects (animal studies, in-vitro studies), post-mortem samples, or enucleated eyes; (5) studies restricted to non-biofluid markers (imaging); (6) studies restricted to genomic or transcriptomic biomarkers; (7) abstracts, non-peer reviewed articles, reviews, systematic reviews, meta-analysis; (8) studies using only regression analysis. Note, studies combining AI analysis of biofluid markers with other types of data, such as imaging, were included. However, AI algorithms applied within software used to produce the raw data (e.g. pre-processing the spectra in mass spectrometry) were not included. For the purposes of this manuscript, the definition of AI remains broad, as there is no consensus about the definition of AI within the scientific community. Notably, statistical methods such as regression analysis are often considered a basic form of AI, but discussion of them has been omitted from this manuscript as they are ubiquitous in modern research.

Abstracts and titles were screened for inclusion by two independent reviewers in the first stage of screening. In the subsequent stage of screening, the full manuscript texts were screened by two independent reviewers. Conflicts between reviewers in these stages were resolved by a third independent reviewer. Covidence (Melbourne, Australia) was used to manage manuscript files and study eligibility status.

Data collection and extraction

One reviewer performed data extraction for each study using standardized data collection forms, with 10% of the extractions verified by a second independent reviewer to ensure agreement and consistency between data extractors. Key data extracted from each article included country of publication, disease of interest, study objective, types of AI used, AI algorithm accuracy, biofluid analyzed, and significant findings.

Synthesis of evidence

Descriptive synthesis of evidence was undertaken for the included studies. The characteristics of the included papers were described, including the diseases studied, the biofluids analyzed, and the AI algorithms deployed in analysis. The AI and bioinformatics methodologies utilized in the included papers were summarized. Algorithm accuracy is also explored in the results, although no calculations were applied to the accuracy measurements given the variability in reporting. No formal risk of bias assessment was performed [20].

Included studies were categorized according to study objectives into the following categories: 1) Diagnosis or prognosis; 2) Identifying characteristics; 3) Treatment decisions; and 4) Exploratory. Studies characterized as “Diagnosis or prognosis” sought to either diagnose disease or predict progression using AI. “Identifying characteristics” studies detailed exploration of biomarkers with the goal of exploring the pathogenic mechanisms or factors that contribute to disease progression. Among the “Treatment decisions” studies, the objective was to predict outcomes following treatment selection or guide selection of therapeutic or surgical options using biomarkers. Finally, in the “Exploratory” studies, there was an untargeted exploration of biomarkers with no specific disease of interest; for example, a study with the goal of describing the proteomic profile of the aqueous humour in a healthy patient.

Results

The included studies utilized heterogeneous methods, and had highly variable findings and objectives. Firstly, a summary of the characteristics of the studies included is presented. Next, the methodologies and aims of AI algorithms are assessed by dividing them into supervised and unsupervised AI. Commonly encountered applications as well as unique examples are presented to illustrate their use in investigating various ophthalmological conditions. Then, their predictive accuracy is appraised. Finally, the most common biofluids and significant biomarkers are summarized.

Study characteristics

A total of 10,262 articles were found in the literature search after deduplication, and 177 studies met inclusion criteria (Fig. 1). The complete list of included papers and study characteristics is contained in Appendix B. There was a global distribution of included studies, with the largest proportion of studies being performed in China (27%), the USA (17%), Germany (8%), Japan (5%), Singapore (5%), and Spain (5%). In total 31 countries were represented. The most commonly studied ocular diseases were diabetic eye diseases, with 50 papers (28%) focusing on diabetic retinopathy (DR), proliferative DR, and diabetic macular edema (DME). Glaucoma was explored in 25 studies (14%), age-related macular degeneration (AMD) in 20 studies (11%), dry eye disease in 10 studies (6%), and uveitis in 9 studies (5%). 63 studies explored other ocular diseases. The majority of studies (97, 55%) were classified as “Identifying characteristics”, while 53 (30%) were classified as “Diagnosis or prognosis”, 17 (10%) as “Treatment decisions”, and 10 (6%) as “Exploratory”.

Fig. 1
figure 1

PRISMA flowchart diagram for study identification and selection

AI characteristics

Table 1 summarizes the activities, strengths, and weaknesses of the commonly used algorithms of included papers. A summary of how often these algorithms are deployed and which disease they are used to study is contained in Table 2. Supervised learning was used in 91 papers (51%), unsupervised AI was used in 83 (46%), and bioinformatics in 85 (48%). Ninety-eight papers (55%) used more than one class of AI (e.g. > 1 of supervised, unsupervised, bioinformatics, or statistical techniques), while 79 (45%) used only one. Study sample sizes ranged from 1 to 19,084, and the number of inputted variables in analyses ranged from one to thousands. A common analysis pathway employed in the included studies was first to perform dimensionality reduction using unsupervised techniques such as PCA, followed by supervised learning, such as discriminant analysis, to differentiate between cases and controls, followed by bioinformatics analysis, such as pathway analysis, to output information about the biological processes implicated. Similarly, bioinformatics tools were often deployed with unsupervised techniques to translate groups of biomarkers into information regarding biological processes or pathways, for which treatment targets or disease etiology could be inferred.

Table 2 Characteristics of artificial intelligence and bioinformatic analyses of biofluid markers by ophthalmic disease

Supervised AI

Supervised AI was the most commonly used class of AI. Discriminant analysis was used in 58 (33%) papers, making it the most commonly used supervised learning technique. Other common supervised algorithms were random forest (31, 18%), ANN (25, 14%), SVM (24, 14%), and decision trees (16, 9%). Most often, the application of these algorithms was to differentiate cases of ophthalmic disease detection from controls, using input variables that were either proteomic, metabolomic, or individual proteins in combination with demographic, genomic, or imaging markers. These tools were implemented to diagnoses a wide range of ophthalmic diseases such as DR [21, 22], glaucoma [23, 24], and uveitis [25]. Other predictive applications included discriminating between different diseases or diseases subtypes [26] and prediction of long-term risk of progression of an ophthalmic condition [27, 28].

Supervised AI was also used to determine the most influential biomarkers on an algorithm’s predictive value, thereby implying possible biological significance of the biomarker in disease. During the learning process, the supervised AI was trained on expert graded data in order to identify differentially expressed biomarkers between cases and controls in various ophthalmic diseases [29,30,31,32,33,34]. Notably, as humans are required to classify data in the training set, there is potential for error if samples are incorrectly classified. Among the included studies there was inconsistent reporting of the diagnostic guidelines used to classify data, the processes for training the supervised AI, the size of the test and training sets, and the specific algorithm activities. This could have introduced error into a substantial portion of the studies, reducing the external validity of their findings and compromising study reproducibility.

Unsupervised AI

PCA was the most widely used unsupervised technique, found in 48 papers. Also commonly used were hierarchical cluster analysis (38) and k-means clustering (4). PCA was a highly versatile algorithm, and was commonly deployed both alone and in conjunction with supervised AI. PCA was implemented within a large proportion of ML studies, as it was often applied as a step prior to a second ML analysis. In these instances, it was applied in order to determine in an unassuming manner whether the disease and control groups are distinguishable based on the biomarkers applied, and identify/remove confounding factors and outliers causing the disease and control groups to cluster in an unexpected manner [23, 35,36,37,38,39,40,41]. When deployed in this way, the results often determined how the data would be best applied in the final predictive AI model of the study, taking into account the levels of importance of certain biomarkers, and confounding factors [23, 35,36,37,38,39,40,41]. PCA was also used as a comparator model amongst AI algorithms to determine the algorithm that outputs the highest predictive accuracy, achieving the highest accuracy in contexts with highly complex datasets. Finally, several studies utilized PCA to identify biomarkers of interest within discriminative principal components, which were then subsequently analyzed by ontological methods to understand the implications for specific molecular pathways [41,42,43,44].

Hierarchical cluster analysis was almost always used with other forms of AI analysis. For example, biomarkers of interest were identified via clustering, and the strength of relationships subsequently compared using techniques such as discriminant analysis or regression [34, 45, 46]. Other forms of clustering analyses were also commonly used as a tool alone or in tandem with ontological analysis, and were deployed to (1) determine whether biomarker profiles can distinguish experimental and control groups in an unsupervised fashion, (2) identify molecularly distinct subgroups that may not have been anticipated, and (3) objectively cluster a disease cohort into distinct subgroups that are useful for prediction of the disease course. In use case (2) these algorithms enabled the identification of characteristic biomarkers and then the translation of these markers into meaningful pathways; for example, Zhavoronkov determined that TGF-b was elevated in primary open angle glaucoma patients using hierarchical cluster analysis, and linked the biomarker to pro-fibrotic pathways leading to extracellular matrix remodeling in trabecular meshwork and lamina cribosa using pathway analysis [47]. Often heatmaps were used in tandem with hierarchal cluster analyses to visually depict the most up- or downregulated proteins or protein clusters in a given patient group of patients [48,49,50]. Heatmaps were, in a small number of instances, used without a cluster analysis, still providing a visual guide to biomarker patterns but without an objective assignment of statistically distinct groups. K-means clustering was used to cluster a disease cohort into distinct subgroups [51, 52]. In this use case, cluster analyses were particularly useful in defining subgroups that shared common characteristics in disease states that may be fairly heterogeneous in underlying etiology.

Bioinformatics

There were many protein/metabolite/gene ontology tools utilized for both defining the functional or structural groups, and for conducting the pathway analyses themselves. These included Kyoto Encyclopedia of Genes and Genomes (KEGG), MetaboAnalyst, REACTOME, STRING, PANTHER, DAVID, and SWISSPROT. KEGG was the most commonly used, found in 39 studies. In some cases, as opposed to using gene ontology (GO) to identify changes in pathological groups, a number of studies including Aretz et al. (2013) and Dor et al. (2019) applied GO functional annotation in order to characterise the most prominent functional pathways in healthy human vitreous and tear fluids respectively [53, 54]. In another unique use case, Velez et al. (2017) utilized hierarchical clustering and pathway analysis to identify the most prominent therapeutic targets for individual cases of neovascular inflammatory vitreoretinopathy [55]. This group chose and implemented effective pharmacologic therapies for individual patients based on their most prominently dysregulated proteins and pathways, allowing for direct clinical application of findings [55].

AI Predictive accuracy

Amongst the identified studies, AI was often used to differentiate disease status from controls, or to predict disease subtype. Quantitative outcomes expressing the efficacy of a given predictive model were presented in multiple ways, which included percentage accuracy, percentage sensitivity and specificity, or area under the curve (AUC) of a receiver operator characteristic (ROC) curve—a total of 82 papers (46%) reported accuracy. Accuracies of AI tools used in each study are contained in Appendix B. A number of studies, particularly those studying patients with diabetic retinopathy (DR), stated an aim to optimize predictive accuracy for DR detection [22, 56, 57]. Some studies also compared different predictive AI algorithms to maximize accuracy [26, 58,59,60]. While summarizing the accuracies of these models is made challenging by their differing objectives, and variable accuracy reporting, many of the models described achieved strong levels of accuracy, with AUCs over 0.85, and accuracy, sensitivity, and specificity over 90%. While no definitive trends in accuracy emerged between different AI algorithms, ANNs, random forest models, and decision trees tended to exhibit the highest level of accuracy. The majority of analyses implemented validation methods such as training and test samples or tenfold cross validation to ensure that the estimated accuracy was highly unlikely to be a result of chance. Several studies applied AI algorithms with the alternative goal of determining the most influential biomarkers on accurate prediction. This required the use of algorithms that expressed the relative importance or rank of inputted variables in the model, with random forest, k-means clustering, and PCA algorithms facilitating this goal.

Biofluids and significant biomarkers

Serum was the most commonly accessed biomarker, used in 53 studies (30%). Aqueous humour was analyzed in 30 studies (17%), tears in 25 (14%), plasma in 18 (10%), vitreous humour in 17 (10%), and tissue biopsy in 12 (7%). The most common biopsy locations were cornea, pterygium, and conjunctiva. Combinations of biofluids were used in 16 (9%) studies. The complete proteomic profile was examined in 82 (46%) of studies, the metabolic profile studied in 39 (22%), and the cytokine profile studied in 7 (4%). Given the expansive nature of some of the studies included, significant biomarkers found ranged from none to thousands. In some of the studies with thousands of significant findings, the identified biomarkers were not detailed completely, making compilation of significant biomarkers for each disease challenging [36]. Additionally, while some biomarkers were implicated in the development, progression, or treatment of a specific disease over multiple studies, for most significant findings there was conflicting evidence presented in other studies. The biomarkers and pathways implicated in diabetic eye disease, glaucoma, AMD, ocular surface diseases, and uveal diseases are summarized in Table 3.

Table 3 Implicated biomarkers and pathways in common ophthalmic diseases

Discussion

The current scoping review summarizes the methods of AI and bioinformatics as they have been applied for analysis of ocular biofluid markers. The database of studies presented could be further analyzed for specific disease states and types of AI. With ophthalmology being at the forefront of medical AI development, it is important that ophthalmologists be aware of these developing technologies and remain mindful of the possibility that these technologies could be incorporated into clinical practice in the near future.

One of the most self-evident advantages of bioinformatic methods in proteomic and metabolomic studies, particularly overexpression/enrichment analyses, is that they provide specific insights into the complex molecular mechanisms and actions occurring in a pathological or physiological state [42, 61]. This can be advantageous for genomic and transcriptomic data, but as RNA concentrations are not always precisely proportionate to the amount of protein produced, proteomic analysis could provide more specific insight on the level of action of specific mechanisms. In ophthalmology, there is the particular advantage of conducting pathway analyses on vitreous or aqueous fluid samples to provide insight on the specific dysregulations that are occurring in the organ of interest [12, 27, 43]. Data from detailed fundoscopic or optical coherence tomography images could greatly complement bioinformatic data, providing insight on both the micro- and macroscopic pathologies occurring. Pathway analyses are also advantageous in very small patient samples, or in rare diseases, as they do not require the same power that is needed for AI algorithm accuracy [55]. However, as pathway analyses indicate significantly altered molecular pathways but do not make predictions, the results only serve as indicators for further investigation in the population of interest. Finally, Velez et al. [55] demonstrated the application of bioinformatics for individualized therapeutic management if applied to a patient’s proteome.

One of the most effective ways to approach any predictive hypothesis in the included studies was the comparison of accuracies of multiple algorithms, assuming each one was designed and implemented properly, to see which model performed best with the given biofluid markers and patient population. In many instances in the included literature, it was observed that a random forest model outperformed other tested models in accuracy, and in particular cases even outperformed ANNs, which are often thought to be the most accurate predictive tools [59, 62]. It is unclear why random forest models consistently exhibited slightly better accuracy than other algorithms, but merits further investigation, and is worth consideration of inclusion when implementing future biofluid marker studies. Broadly, ANNs and decisions trees also had strong predictive accuracy. PCA was often used with supervised AI, in part because it can improves accuracy of other algorithms via dimensionality reduction. They are also very easy to implement for a wide variety of uses.

Interestingly, despite a multitude of AI models over many applications demonstrating strong predictive accuracy, no definitive characteristic biomarkers emerged for most diseases. As noted above, most studies found biomarkers significantly associated with disease development, progression, or treatment, but few were confirmed by other studies and conflicting findings were often found. As such, AI tools remain valuable for predictive applications, but have shown restricted utility in exploration of disease etiology. AI tools should be adept at such applications, but a number of issues in the included studies prevent strong levels of agreement between studies. The complete activities of the algorithms were rarely explained, also known as a “black-box” approach [61]. Further, the rationale for AI algorithm selection was often excluded. As such, studies with analogous objectives, participants, and data sets could be using wholly different selection parameters for biomarker, and variation in AI activities could cause disagreement in biomarker significance. Many studies did not describe their patient population in detail, which could have led to factors such as demographics, comorbidities, lifestyle, or medication use altering their biomarker profiles. For example, all of the patients recruited by Li et al. were unrelated Chinese Han individuals who were recruited from the Zhongshan Ophthalmic Center, which could theoretically influence their distinct biomarker profiles [63]. There was also intrinsic variability in the biomarker profiles of clinically similar patients [45]. Additionally, biofluid extraction techniques varied significantly between studies, with differing location of biomarker extraction and small quantities of biofluid analyzed; volumes of ocular biofluids extracted ranged from 25 to 1000uL. While small volumes technically fall within the range that is acceptable for analysis, small aliquots can be susceptible to changes in the microenvironment, an issue made worse by differences in storage technique, sample handling, and the dilution of samples for analysis. Future efforts should describe analytical methods in detail and comprehensively describe the study population. Our group has previously published systematic reviews of AI analysis of biofluid markers in AMD [64], glaucoma [65], corneal disease [66], uveal disease [67], and retinal occlusive disease [68].

Although not within the scope of this review, it is worth acknowledging that regressions are argued to be the simplest form of ML, although this is controversial [69]. Regressions are highly restricted, simple, supervised prediction models [69]. Although less powerful and useful in highly complex datasets, they should not be discounted if they are the appropriate method for a simple question with a relatively small number of input variables. Over 40 articles in the current review included logistic regressions, either to use as comparator models against the other AI models tested or to quantify associations determined by other AI methods. Regressions were able to achieve accuracies that were often comparable to other types of AI, in some instances achieving a higher (but not statistically significant) area under the curve than a compared ANN [70]. Limitations of this review include the restriction to English language papers only. While a more focused systematic review could have explored these concepts in more depth, the database of studies we have created in this study will allow for this in future research. While a more focused systematic review could have explored these concepts in more depth, the database of studies we have created in this study will allow for this in future research.

The studies included in this scoping review are varied both in terms of their methodology and their objectives. Numerous studies provide examples of AI tools that could be directly applied to clinical practise following further development and investigation. For example, AI tools can support diagnosis of glaucoma, either in a screening context or to augment a clinicians own decision making [71, 72]. Automated AI tools could enable glaucoma screening at primary care facilities or low resource settings, leading to early diagnoses and the subsequent improvement of outcomes and efficient use of specialist time. Alternatively, AI tools could be used to predict responsiveness to anti-VEGF therapy in the setting of wet AMD, potentially sparing a patient countless uncomfortable injections or supporting the preservation of their vision [12]. While an exhaustive list of potential clinical applications is beyond the scope of this—or any other—manuscript, AI has the potential to transform clinical ophthalmology. However, it is crucial to note that none of the included studies include a clinical proven application of AI.

Conclusion

AI and bioinformatic analyses offer major advantages in understanding and treating ophthalmic diseases. When used in conjunction with biofluid markers as input variable, they provide improvements in detection of disease, understanding mechanisms of molecular etiologies, and an ability to provide individualized targeted therapeutic treatment for patients. However, despite the promise of application of AI in tools that have diagnostic or prognostic power, none of these tools have been directly integrated or tested in clinical workflow. Therefore, most AI-based applications using ocular biofluids are still in the translational stage and have not yet proven a clear use in clinical trials. Additionally, it is important to consider the role of these tools in a clinical context to ensure their thoughtful implementation and reduce poor technical understanding or inappropriate use [3]. There are many AI algorithms currently being utilized in ophthalmology, and selecting a tool appropriate for the intended task is crucial. Given the progression of AI towards use in both research and the clinic, ophthalmologists should be broadly aware of the commonly used algorithms and their applications. Future directions include the development of robust, open-source algorithms that make use of both biofluids and imaging variables to make predictions regarding disease exploration, diagnosis or prognostication. Furthermore, it is imperative to determine validation models and evaluate approaches to clinical deployment. A cost-effective analysis of implementation in clinical practice as well as training for ophthalmologists on their use may increase clinical acceptance.