Introduction

Approximately 57.4 million people were living with dementia worldwide in 2019. What is more, population aging has been having an enormous impact on the increasing prevalence of dementia. By the year 2050, the worldwide prevalence of dementia will grow to more than double, to 152.8 million, according to the forecast [1]. Dementia is generally understood as an acquired loss of cognitive ability caused by brain disease or injury that is sufficiently serious to interfere with functional activities of daily living [2]. Dementia due to Alzheimer’s disease (ADD) is the most common form of dementia, accounting for about 60% of all dementia cases [3]. Between ADD and normal cognition, there can be a prodromal, intermediate stage called mild cognitive impairment (MCI). MCI is marked by lower performance in one or more cognitive domains than the person’s previous level, although this does not fully interfere with independence of function in daily life [4]. Based on statistical data, the prevalence of MCI is two times higher than dementia, with an estimated annual conversion rate of 10–15% towards dementia [5]. The rising prevalence of MCI and ADD results in a substantial economic burden on both individuals and society [6].

Early detection and diagnosis of ADD could motivate patients and caregivers to participate in early and effective therapeutic intervention, potentially improving quality of life and reducing the death rate of ADD [7]. However, the diagnosis of ADD and MCI has always been challenging in the clinic. At present, positron emission tomography (PET) targeting amyloid plaque deposition and tau tangle is the only approved test for Alzheimer’s disease (AD) diagnosis by FDA [8, 9]. Nevertheless, these scans are costly. Neuropsychological assessment is most widely used to obtain the overall level of cognitive functioning and screen for cognitive impairment [10, 11]. However, the use of comprehensive neuropsychological assessment in clinical practice still faces many obstacles, including time-consuming and requiring a trained and qualified professional, which limits its application in clinical practice [12]. Furthermore, most screening tools based on language are affected by educational level and age [13]. Thus, the discovery of a reliable, noninvasive, and affordable marker would be an urgent need for subsidiary diagnosis of cognitive impairment in routine clinical work.

Eye movement, a sensitive, low-cost, non-invasive, and portable marker, has emerged as a potential tool for detecting cognitive change or deterioration [14, 15]. Eye movements are triggered and regulated by an intricate neural network involving numerous cortical and subcortical regions, which develop specific pathological changes beginning decades before clinical symptoms appear in patients with AD. Hence, eye movement has the potential for the timely identification of subtle cognitive deterioration [16]. Abnormalities in eye movements have been repeatedly observed in patients with AD. For example, patients with ADD have longer prosaccade and anti-saccade latencies and more anti-saccade errors than cognitively normal older adults [17, 18]. A critical issue then is whether eye movement has the potential to diagnose AD.

Several studies aim to distinguish patients with MCI and AD from normal cognition or distinguish AD from MCI using eye movements [19,20,21]. These studies achieve diagnostic results with the area under the receiver operating characteristic curve (AUC) reaching 0.752. Nonetheless, for effective clinical application, eye movement must be able to simultaneously distinguish ADD, MCI, and normal cognition, which needs further exploration. Moreover, the relationship between eye movements and specific cognitive subdomains is still unclear even prosaccade latency, anti-saccade errors, and saccadic intrusions have been found to correlate with Mini-mental State Examination (MMSE) [22]. Therefore, the critical issue of how eye movements can diagnose AD remains inadequately addressed.

In this study, we collected demographic information, cognitive scores, and eye movement metrics from participants who completed the fixation, smooth pursuit, prosaccade, and anti-saccade tasks and screened the key eye movement markers that are capable of distinguishing ADD, MCI, and normal cognition. In addition, machine learning models were constructed to explore the diagnostic potential of eye movements. As a secondary objective, we aimed to determine the correlation of eye movement parameters with various specific cognitive subdomains through neuropsychological scales.

The main findings of our study are as follows: Firstly, 16 key eye movement features were identified for diagnosing cognitive impairment, including saccadic latency, error rate, and number of forward saccades. Secondly, 17 three-classification diagnostic models for ADD, MCI, and normal cognition were established, with the GBC model demonstrating superior performance, achieving an accuracy of 68.2% and an F1-score of 66.32%. Finally, the novel association between eye movement parameters in prosaccade and anti-saccade tasks and Free and Cued Selective Reminding Test (FCSRT)-immediate total recall, instrumental activities of daily life (IADL), and Boston Naming Test (BNT)-total score was observed.

Related Work

Jessica et al. summarized the researches that used eye-tracking technology and computational analysis to measure and compare eye movements in participants with different cognitive statuses. The authors introduced three tasks that are more complex than saccadic eye movement tasks, encompassing reading task, visual exploration task, and free viewing video task. At the same time, it is proposed that currently eye trackers have only been used in controlled laboratory environments, while by using computational technology to analyze recorded videos, eye tracking can be applied to a wider range of scenarios [23].

Opwonya et al. utilized eye movement parameters to generate logistic regression, random forest classifier, support vector machines, and extreme gradient boosting classification models to predict cognitive status, and logistic regression algorithm exhibited the highest AUC of 0.715 in normal cognition vs MCI classification [19]. Logistic regression algorithm was also implemented by Jiang et al. for selecting discriminatory features and classifying different models, achieving discrimination with an AUC of 0.807 in normal cognition vs MCI classification [20].

Liu et al. proposed a novel deep learning-based approach called Depth-induce Integrated Comparison serial attention Network (DISCN). The multi-image integration module was used to integrate visual stimuli and eye movement heatmap into visual saliency map, which were then sent to serial attention module for diagnosis. The DISCN achieved an AUC difference of 0.81 in normal cognition vs AD classification [24].

The aforementioned studies achieved a high rate of accuracy in the binary classification due to clear distinctions between the two groups. However, tackling the three classifications of AD diagnosis proves to be more practical but formidable compared to binary classification [25, 26]. Currently, only a few diagnostic models have achieved an accuracy of over 60% in the three classifications of AD [27]. Tong et al. utilized a nonlinear graph fusion approach in the three-classification diagnosis of AD, achieving an accuracy of 60.2% [28]. Lin et al. achieved a three-classification diagnosis accuracy of 66.7% by using decision tree algorithm with multimodal data of magnetic resonance imaging (MRI), PET, cerebrospinal fluid markers, and genetic features [29]. In another study, the multiclass diagnosis of AD was performed using electroencephalography (achieved accuracies of 65.52%) and functional near-infrared spectroscopy (achieved accuracies of 58.62%) [30].

Methods

Participants and Diagnosis Procedures

In this study, 258 subjects were recruited from Memory Clinic, Department of Geriatrics, The First Affiliated Hospital of Chongqing Medical University and the surrounding community, including 111 patients with ADD, 81 patients with MCI, and 66 people with normal cognition. The detailed demographic information of subjects was provided in Table 1, and the flowchart is shown in Fig. 1, Step I. The diagnosis was performed by experienced geriatric psychiatrists according to the results of clinical evaluation and cognitive assessment. AD diagnosis was determined using the National Institute of Aging and the Alzheimer’s Disease Society for clinically probable AD in 2011[31]. MCI diagnosis was made according to MCI International Working Group consensus criteria [4]. Inclusion criteria included being 60 years or older, participating in the study voluntarily, and having sufficient hearing and vision to complete the neuropsychological test and eye-movement calibration. Patients with a history of alcohol or drug abuse, severe psychiatric illness, intracranial tumor, or stroke were excluded. The protocol was approved by the Medical Ethics Committee of The First Affiliated Hospital of Chongqing Medical University (approval number: 20212901; time of ethics approval: 10 May 2021). All participants or their legal guardians signed the informed consent after receiving a detailed explanation of the study.

Table 1 Participants’ demographic characteristics
Fig. 1
figure 1

The flowchart of the proposed model, which was composed of four steps: data collection and preprocessing; feature selection; establishment and evaluation of classification model; the correlation between important indicators of the model and neuropsychological scales

Neuropsychological Evaluation

Neuropsychological batteries were used to assess general cognitive status, attention, executive function, language, visuospatial skills, and memory. General cognitive status was evaluated by MMSE and Alzheimer’s Disease Assessment Scale-Cognitive subscale (ADAS-Cog) [32, 33], attention was assessed by Trail Making Test-A (TMT-A) and Digit Span Test (DST) [34], executive function was assessed by TMT-B [35], language was assessed by BNT [36], visuospatial skills were assessed by Clock Drawing Test (CDT) [37], memory was assessed by FCSRT and Auditory Verbal Learning Test (AVLT) [38, 39], the abilities of daily life were assessed by physical self-maintenance scale (PSMS) and IADL scale [40].

Assessment of Eye Movement

Eye movement parameters were recorded during four visual tasks, including direct gaze toward the target (fixation task), smooth pursuit tasks, looking toward a jumping target (prosaccade task), and looking away from a jumping target (the anti-saccade task). A desktop-mounted eye-tracker (Tobii4L model, Ji Zhi Medical Technology Co. Ltd, China) monitored eye movements using the 250 Hz pupil-corneal reflex mode. Tasks were displayed on a 15.6-inch LED monitor and a screen resolution of 1920 by 1080 pixels. All subjects were seated approximately 60 cm from the screen; their heads were properly fixed on the support frame and their eye position was horizontally aligned with the center of the screen. The sign of proper head fixation was that two bright circles, around 0.5–1 cm in size, appeared in the center of the screen. The instrument tracked eye movements in horizontal (± 30°) and vertical (± 30°) positions. The testing was performed in a separate, quiet room without strong light stimulation. The built-in software, Tobii Studio, was used to collect and analyze the eye-movement data.

Candidate Feature Selection

Univariate analysis was carried out for eye movement parameters and general information. Specifically, Kruskal–Wallis tests were used to compare continuous variables, and two-sided chi-squared test was used to compare categorical variables. Next, the meaningful variables detected by univariate analysis were subjected to further screening. Light gradient boosting machine-based recursive feature elimination (LightGBM-RFE) and lasso-multinomial algorithms in conjunction with a five-fold cross-validation approach were employed to obtain the two optimum feature sets. Finally, the intersection of two optimum feature sets was used to build the machine learning model. The flowchart is shown in Fig. 1, Step II.

Establishment and Validation of Machine Learning Models

The diagnostic model for classifying ADD, MCI and normal cognition was constructed based on the filtered feature set using machine learning algorithms including gradient boosting classifier (GBC) [41], light GBM [42], random forest classifier [43], extra trees classifier [44], naïve Bayes [45], logistic regression [46], ridge classifier [47], linear discriminant analysis [48], decision tree classifier [49], quadratic discriminant analysis [50], ada boost classifier [51], support vector machines-linear kernel-linear kernel [52], K neighbors classifier [53], and dummy classifier [54]. These algorithms were implemented by using PyCaret package (version 2.3.3). The leave-one-out method was used for tenfold cross-validation of the model and the evaluation metric included accuracy, macro_Sensitivity, macro_Specificity, the AUC and macro_f1. The calculation formula is as follows: Accuracy = (TP + TN)/N; macro_Sensitivity = (Sensitivity_class1 + Sensitivity_class2 + Sensitivity_class3)/N; macro_Specificity = (Specificity_class1 + Specificity_class2 + Specificity_class3)/N; macro_F1 = (F1_class1 + F1_class2 + F1_class3)/N. The optimal model with the highest macro_f1 value was used as the final model. The hyperparameters of the model and the tuning range for hyperparameters can be found in Supplementary Table 1. The flowchart is shown in Fig. 1, Step III.

Fully Connected Neural Network (FCNN) is a fundamental deep learning model that utilizes a hierarchical structure to map input data to output results [55]. FCNN consists of multiple layers, including an input layer, hidden layers, and an output layer. Input layer accepts an input feature vector with a dimension of input_dim. First hidden layer contains 64 neurons. The activation function is ReLU (rectified linear unit), which introduces non-linearity. Second hidden layer contains 32 neurons. The activation function is ReLU. Output layer contains 3 neurons corresponding to 3 categories. The activation function is Softmax, used to output the probability for each category. For model compilation, the optimizer is adaptive moment estimation (Adam). The loss function is categorical cross-entropy, suitable for multi-class problems.

Feature Transformation-Transformer (FT-Transformer) is the latest advancement in deep learning, building on the success of other well-established architectures such as convolutional neural networks and recurrent neural networks [56]. FT-Transformer includes a linear embedding layer, two transformer encoder layers, a fully connected layer, and a dropout layer. The parameters for the transformer encoder layers are as follows: the input feature dimension is 16, the embedding dimension is 32, each layer employs 4 parallel attention mechanisms, and the dropout rate is 0.1, meaning that 10% of neurons are randomly “dropped out” during each forward pass. The activation function is ReLU, and the training is conducted over 100 epochs. The Adam optimizer is used for model optimization. Finally, the model’s performance is evaluated using tenfold cross-validation.

Self-attention model [57] consists of an embedding layer, self-attention layer, average pooling, dropout layer, and fully connected layer. The embedding layer elevates the dimensionality, transforming the input feature dimension from 16 to an embedding dimension of 32, thereby enhancing the model’s capacity to capture information. The self-attention layer encompasses the computation of attention scores, application of softmax, and weighted summation. Specifically, attention scores are calculated by determining the similarity between queries and keys through a dot product, normalized by dividing by the square root of the vector dimension. The softmax function is utilized to convert attention scores into attention weights, and the weighted summation involves multiplying these attention weights by the corresponding values to produce the final output. The dropout rate is set to 0.1, and the model is trained for 100 epochs. The Adam optimizer is used for model optimization.

Analysis of Key Features

The top five important features were extracted in the best-performing machine learning model. Pearson’s correlation analysis was adopted to calculate the correlation between the top five important features and cognitive scales. Group comparisons as well as correlations have been performed for the three most relevant scales. The flowchart is shown in Fig. 1, Step IV.

Statistical Analysis Environment

All statistical analysis and computations were done using R version 4.2.2 and Python Version 3.6.2. Categorical variables were expressed as frequency (percentage). The continuous variables conforming to the normal distribution were expressed as the mean ± standard deviation and the other not conforming to normal distribution were presented as median (interquartile range). All differences with p < 0.05 were considered statistically significant. Plots were generated with the packages Matplotlib and ggplot2.

Results

Eye Movement Characteristics in Different Severities of Cognitive Impairment

A total of 38 eye movement features were acquired using the built-in software of an eye tracker. Subsequently, 25 were statistically significant among ADD, MCI, and normal cognition by the univariate analysis. Detailed data is shown in Table 2.

Table 2 Comparison of multiple models of eye movement in different individuals with cognitive impairment

Selection of Diagnostic Markers via Lasso and LightGBM-RFE

Two distinct algorithms (Lasso and LightGBM-RFE) were used for selecting feature sets screened from the meaningful variables identified through univariate analysis. For the Lasso algorithm, feature set was selected based on the minimum mean cross-validated error of the lambda value and a set of 21 parameters was selected (Fig. 2A, B). For LightGBM-RFE algorithm, the feature set was selected based on the highest macro_f1 value, and the classifier produced the minimum error when the feature number was 20 (Fig. 2C). Overall, 16 features shared between the Lasso and LightGBM-RFE algorithms as diagnostic markers for cognitive impairment were finally selected for further analysis (Fig. 2D). The optimal feature set encompassed features from four tasks, especially the prosaccade and anti-saccade tasks. The detailed features information is found in Table 3.

Fig. 2
figure 2

A Lasso coefficient profiles of the twenty-five eye movement parameters. B Selection of the optimal lambda value through the tenfold cross-validation. C Line graph shows the f1 value based on different numbers of eye movement parameters in LightGBM-RFE model. D Screening of sixteen eye movement parameters using Lasso and LightGBM-RFE algorithms

Table 3 The 16 eye movement parameters screened by Lasso and LightGBM-RFE algorithms

Establishing Machine Learning Model for Diagnosing ADD, MCI and Normal Cognition

After feature selection, the machine models were trained using Pycaret, and Supplementary Table 2 describes the performance of these models. The models’ performance was evaluated using AUC, precision, recall, accuracy and F1-score metrics. Results showed that GBC, LightGBM, random forest classifier, and extra trees classifier were the four best-performing models according to the F1-score. Logistic regression was allowed to proceed to subsequent parameter optimization since it is widely used to deal with classification problems. After parameter optimization, the GBC model showed the optimal balanced prediction performance after parameter optimization and the accuracy, recall, precision, and F1-score were 0.6820, 0.6681, 0.6761, and 0.6632. Table 4 and Fig. 3 show the specific performance of the five machine learning models.

Table 4 The specific performance of the five machine learning models
Fig. 3
figure 3

The confusion matrix and receiver operating characteristic of the five machine learning models

To further validate the performance of the GBC model, comparative experiments were conducted using the FCNN, FT-Transformer, and self-attention models. The accuracy, recall, precision, and F1-score of the FCNN model were 0.6508, 0.6508, 0.6870, and 0.6553, respectively (Fig. 4A). The corresponding metrics for the FT-Transformer model were 0.6091, 0.5970, 0.5969, and 0.5802 (Fig. 4B). For the self-attention model, these metrics were 0.5818, 0.5576, 0.6053, and 0.5404 (Fig. 4C). The results indicate that the performance of these three models is inferior to that of the GBC model. This discrepancy may be due to the one-dimensional vector nature of the eye-movement data in this study, which may not be well-suited for end-to-end machine learning model development. Additionally, the three models require a substantial volume of data for robust training to achieve effective feature representations and generalization. In contrast, the GBC model may perform better on small-scale structured datasets.

Fig. 4
figure 4

The confusion matrix and receiver operating characteristic of FCNN, FT-transformer, and self-attention model

The Correlations of Top Five Eye Movement Parameters Ranked by Importance Based on GBC Model and Neuropsychological Scales

To determine the contribution of individual features to the GBC model’s performance, we computed the Gini importance and identified the five most important features. The five most important features were saccadic latency in anti-saccade task, error rate in anti-saccade task, saccadic latency in prosaccade task, number of forward saccades in anti-saccade task, and gaze duration in the right area/total area (%) in prosaccade task. Detailed results are presented in Fig. 5.

Fig. 5
figure 5

The plot shows the ranking of the relative importance of features in GBC models. Abbreviation: AST_SL, saccadic latency in anti-saccade task; AST_ER, error rate in anti-saccade task; PT_SL, saccadic latency in prosaccade task; AST_NFS, number of forward saccades in anti-saccade task; PT_GDRA/TA, gaze duration in right area/total area; AST_FL, fixation latency in anti-saccade task; PT_NBS, number of backward saccades in prosaccade task; PT_FP, fixation point in prosaccade task; FT_FP, fixation point in fixation task; PT_NFS, number of forward saccades in prosaccade task; AST_FFD, first fixation duration in anti-saccade task; PT_SAS, saccadic average speed in prosaccade task; AST_SD, saccadic distance in anti-saccade task; SPT_GD, gaze duration in smooth pursuit task; FT_FJON, fixation jump-offs number in fixation task; AST_FD, fixation duration in anti-saccade task

Thereafter, correlation analyses were conducted between these features and the neuropsychological test scores (Fig. 6A). Saccadic latency in prosaccade task was positively correlated with IADL (r = 0.41; p < 0.001), negatively correlated with CDT (r =  − 0.49; p < 0.001) and TMT-A (r =  − 0.38; p < 0.001). Saccadic latency in anti-saccade task was negatively correlated with FCSRT-immediate total recall (r =  − 0.34; p < 0.001), BNT-total score (r =  − 0.37; p < 0.001) and TMT-A (r =  − 0.46; p < 0.001). Error rate in anti-saccade task was negatively correlated with CDT (r =  − 0.48; p < 0.001), TMT-A (r = 0.57; p < 0.001), and TMT-B (r = 0.49; p < 0.001). Number of forward saccades in anti-saccade task was negatively correlated with reverse-DST (r =  − 0.44; p < 0.001), TMT-A (r =  − 0.55; p < 0.001), and TMT-B (r =  − 0.46; p < 0.001). The detailed presentation of the subgroup analysis can be observed in Fig. 6B.

Fig. 6
figure 6

A Heatmap of correlation between eye movement parameters and neuropsychological scales. B Scatter plot of correlation between eye movement parameters in subgroups and neuropsychological scales. Abbreviation: AST_SL, saccadic latency in anti-saccade task; AST_ER, error rate in anti-saccade task; PT_SL, saccadic latency in prosaccade task; AST_NFS, number of forward saccades in anti-saccade task; PT_GDRA/TA, gaze duration in right area/total area; MMSE, Mini-mental State Examination; FCSRT_ITR, immediate total recall in the Free and Cued Selective Reminding Test; FCSRT_ISC, index of sensitivity of cueing in the Free and Cue Selective Reminding Test; ADAS_CTS, total scores in the Alzheimer’s Disease Assessment Scale-Cognitive section; TMT_A_NC, number of correct in Trail Making Test-A; TMT_B_NC, number of correct in Trail Making Test-B; FDST, Forward Digit Span Testing; RDST, Reverse Digit Span Testing; AVLT_IR, immediate recall in Auditory Verbal Learning Test; AVLT_SR, short-term free delayed recall in Auditory Verbal Learning Test; AVLT_LR, long-term free delayed recall in Auditory Verbal Learning Test; AVLT_REC, recognition in Auditory Verbal Learning Test; BNT_TS, total scores in Boston Naming Test; BNT_CS, phonemic cue scores in Boston Naming Test; CDT, Clock Drawing Test; PSMS, physical self-maintenance scale; IADL, instrumental activity of daily life

Discussion

The present study establishes a three-classification diagnostic model for ADD, MCI, and normal cognition with 68.2% of accuracy and 66.32% of F1-score. Moreover, the new link of eye movements with episodic memory, language, and IADL has been detected. Together, our results indicate that eye movement biomarkers are of great significance for the diagnosis of AD in clinical practice. To the best of our knowledge, this is the first study to evaluate the potential of eye movement biomarkers in differentiating ADD, MCI and normal cognition, and to elucidate the association between eye movement features and individual cognitive functions.

The Three-Classification Model for Diagnosing AD Based on Eye Movements

In this study, the proposed three-classification diagnostic model could accurately and directly distinguish between ADD, MCI, and normal cognition. In contrast, the binary-classification models established in previous studies are only able to differentiate between AD or MCI and normal cognitive function—examples include Opwonya et al. [19], Jiang et al. [20] and Liu et al. [24]. Other studies that included normal cognition, MCI and AD did not construct the diagnostic models, instead simply comparing the eye movement parameters among the three groups, lacking the practical interest in clinical work [58, 59]. Therefore, our study addresses this gap and establishes a foundation for future eye-movement applications in the clinical aiding diagnosis of AD.

Non-invasive, Easy-to-Implement, and Cost-Effective Eye Movement Technology

In a recent study, a combination of multimodal data, encompassing MRI, PET, cerebrospinal fluid biomarkers, and genetic features, has been employed to conduct a multiclass diagnosis of AD. The study achieves accuracies of 66.7% and F1 scores of 64.9% [25]. However, these multimodal methods are complicated, invasive, and high-cost. Although the features adopted in the study are the central pathological hallmarks of AD, it is noteworthy that the clinical symptoms of AD patients do not always align with the evidence of pathological changes [58]. Hence, using pathological criteria to identify the early stage of AD, especially MCI, is problematic.

Eye movement involves a complex oculomotor control system formed by a wide range of cerebral regions, and there is evidence that the pathology associated with AD can affect the oculomotor brain areas [23, 60,61,62]. Therefore, by analyzing eye movement data, specific patterns in the processing of visual stimuli in patients with AD can be identified, which may be associated with the cognitive and neurological features of early AD. Furthermore, eye movement is a non-invasive, timely, easy to perform and cost-effective technique, and is easily accepted by patients [63]. The combination between eye movement parameters and GBC model is beneficial to identifying patients with AD in a timely manner and prompt necessary treatment interventions.

Proposed Model Outperforms Current Studies in Accuracy

The accuracy of our proposed model based on eye movement parameters in the three classifications of ADD, MCI, and normal cognition is 68.20%, which is higher than majority of current studies [29]. In the study by Cicalese et al. [30], the multiclass diagnosis of AD was performed using electroencephalography (achieved accuracies of 65.52%) and fNIRS (achieved accuracies of 58.62%), both accuracies were lower than our model. In fact, the model in this study also exhibits superior diagnostic accuracy compared to current binary classification models. The DISCN model, developed by Liu et al. [24], achieved an AUC of 0.81 in differentiating between AD and normal cognition, while our model attained an AUC of 0.87.

The high accuracy of our model can be attributed to several factors: First, unlike the image data used by Liu et al. and Cicalese et al., which is easily affected by noise and variations leading to model interference and reduced accuracy, the structured data utilized in this study is less susceptible to such factors. Additionally, structured data features are highly intuitive, with clear relevance and certain advantages. Second, out of 17 machine models tested in this study, the GBC model demonstrates the best performance. The GBC operates as an ensemble method, integrating multiple weak learning models to form a powerful model, enabling it to achieve high accuracy in prediction and classification tasks [64]. Lastly, participants for this study were strictly selected according to inclusion and exclusion criteria, ensuring the quality and reliability of the data.

Anti-saccade Tasks Are More Valuable for the Diagnosis of AD than the Other Three Tasks

Through ranking the importance of the features, the five most important parameters are all derived from the prosaccade tasks and the anti-saccade tasks. This observation suggests that the saccadic tasks may possess greater diagnostic value for AD compared to the fixation and smooth pursuit task. What is more, the anti-saccade task may be more sensitive to cognitive changes than the prosaccade task.

Prosaccade latency, anti-saccade latency, and anti-saccade error rate are the three most significant biomarkers to discriminate between patients with and without cognitive impairment, which is in good agreement with previous findings [15, 22, 65]. Besides these, the number of forward saccades and gaze duration in the right area to the total area also show a good performance in class diagnosis of AD, which has not been reported in previous studies. If the saccade is in the same direction as the target’s movement, the saccade is classified as a forward saccade. ADD and MCI have a greater number of forward saccades in the anti-saccade task compared with normal cognition. This indicates that ADD and MCI have an increased number of error saccades in the opposite direction of the instructions, similar to the error rate of anti-saccade task. Meanwhile, the proportion of gaze duration in both the left and right area to the total area is reduced in patients with ADD and MCI. This reduction may be attributed to the slowing of reaction speed in AD patients, resulting in prolonged gaze duration in the middle area.

The Novel Association Between Eye Movements and Episodic Memory, Language, and IADL Has Been Detected

While progressive memory loss is the primary cognitive deficit in all patients with AD, other non-memory domains are also affected at the early stage including attention, visuospatial abilities, and language functions [31]. Attention and executive functions, in particular, are often the first and common non-memory domains affected by AD [66]. In this study, saccadic latency in the prosaccade task, saccadic latency, error rate, and number of forward saccades of the anti-saccade task were negatively correlated with TMT-A and DST tests. This suggests a potential association between attention deficits and increased saccadic latency and error rate in patients with AD, aligning with prior research outcomes [17, 67]. Indeed, there exists an extensive literature demonstrating the close relationship between saccadic eye movements and attention. However, the relationship between deficit of eye movement and episodic memory impairment has received less attention [68].

Episodic memory impairment is widely recognized as a prominent cognitive characteristic in the early stages of AD [69]. It is reported that FCSRT-immediate total recall score demonstrates the best value for predicting the dementia progression in 2-year follow-up [70]. Our results discover a negative association between parameters in anti-saccade task and FCSRT-immediate total recall, indicating that eye movements might potentially serve as an indicator of episodic memory in patients with AD. Therefore, we hypothesize that eye movement parameters, particularly in the context of the anti-saccade task, may also have the potential to predict the progression of dementia. Subsequent investigations will be undertaken to address this issue.

IADL is considered to be important for maintaining an independent lifestyle in the community [67]. This study has identified a correlation between the increased error rate of anti-saccade and the prolonged latency of prosaccade performance with the decline in IADL. Another study shows a strong positive correlation between anti-saccade performance and the driving performance. Older participants at risk of driving impairment exhibit abnormal anti-saccade error rates, similar to patients with dementia [71]. Overall, the saccadic latency in the prosaccade task and the anti-saccade error rates have the potential to serve as useful screening tools for the IADL in individuals with AD.

Eye movement abnormalities can impact the language processing abilities of patients with AD [72]. Our study revealed that eye movement parameters in prosaccade and anti-saccade tasks are significantly associated with the total scores of spontaneous naming and semantic cueing naming in the BNT, while showing no correlation with phonemic cueing scores. It is established that spontaneous naming errors and semantic cueing errors are early indicators of AD, in contrast to phonemic cueing [73]. Our study indirectly supported this conclusion and proposed that eye movement abnormalities have the potential to be a crucial indicator in the evaluation of language functional impairment in AD patients.

Future Work

Despite this study has made some progress, much work still needs to be done. For instance, the other types of dementia including frontotemporal dementia, Lewy body dementia, and dementia in Parkinsonism are not included in the study. Future research should encompass these types of dementia to determine the specificity of eye movement in the diagnosis of AD.

There are some similarities between MCI and AD or between MCI and NC, so including MCI will reduce the diagnostic accuracy. The ROC curves depicted in Fig. 3A reveal that the diagnostic accuracy of MCI (AUC = 0.736) is comparatively lower than that of NC (AUC = 0.895) and ADD (AUC = 0.870). As previously indicated by Weiming Lin et al., the classification of progressive MCI and stable MCI is crucial for enhancing the accuracy of three-classification diagnosis of AD [29]. Our future work will undertake the classification of MCI subtypes to attain improved levels of accuracy.

The collection of eye movement data can be conducted flexibly, allowing for completion in various comfortable environments beyond the confines of hospitals (Neuroimaging techniques are not capable of doing so.). With the advent of digital healthcare, the mobile phone camera combined with machine learning algorithms could afford mobile phone eye tracking [74]. Our results proposed a scientific basis for remote evaluation of cognitive function and efficacy detection of remote cognitive rehabilitation. Furthermore, Jessica et al. proposed that computer vision techniques can be used to analyze the eye movements of AD patients in naturalistic scenarios, with areas of interest utilized for early detection [23]. This points the way for future research on eye movement diagnosis of AD.

In a nutshell, future research could incorporate more diverse datasets and subtype the MCI patients. Moreover, there is potential in exploring the use of mobile phone eye tracking for remote assessment of cognitive function.

Conclusion

In this study, a three-classification model for distinguishing ADD, MCI, and normal cognition is established based on machine learning and eye movement features. The classification accuracy of the model is 68.20%, which is higher than the majority of previous models. Furthermore, this study also reveals the correlations between the eye movement parameters and various subdomains of cognition, including attention, episodic memory, immediate memory, and language and visuospatial skills.