1 Introduction

In the evolving landscape of language learning research, the integration of eye-tracking technology has marked a significant milestone, offering profound insights into the cognitive processes underlying language acquisition within interactive environments. This technological leap has catalyzed the development and adoption of interactive language learning environments, providing learners with dynamic and immersive experiences that significantly enhance language acquisition (Shadiev & Li, 2023). Positioned at the vanguard of this research domain, eye-tracking technology has facilitated a nuanced investigation into how learners navigate and assimilate visual and linguistic stimuli, thereby uncovering the intricate cognitive mechanisms that underlie language learning (Bacca-Acosta et al.,2022a, b; Chen et al., 2022).

Diverging from conventional self-reporting methodologies, eye-tracking research offers a direct window into the cognitive engagement of learners within multimedia contexts. By harnessing detailed eye movement data, this approach enables the empirical scrutiny of gaze behavior, shedding light on how learners integrate visual and textual information—a process critical to understanding multimedia learning dynamics (Chen et al., 2024). The increasing reliance on eye-tracking technology to explore visual attention’s temporal dynamics signifies its pivotal role in demystifying multimedia information processing. Through meticulous analysis of eye movement data, researchers can extract granular insights into attention allocation during learning processes.

Furthermore, the application of eye-tracking technology in language learning research has illuminated the interactions between learners and multimedia content, allowing for precise inferences regarding which elements captivate learners’ attention and how they interpret visual and linguistic cues. Technological advancements have made interactive language learning environments more accessible, thereby enriching the language learning experience through dynamic and immersive methodologies (Aryadoust & Ang, 2021; Chen et al., 2023a).

Beyond its capacity to unveil how learners engage with content, eye-tracking technology provides a lens through which the cognitive foundations of language learning can be explored. Its broad applicability across disciplines such as psychology, neuroscience, and education underscores its significance. In particular, within the realm of computer-assisted language learning (CALL), eye-tracking research has forged new pathways for understanding cognitive processes integral to language acquisition (Chen et al., 2023b; Dagienė et al., 2021). Monitoring eye movements offers unparalleled insights into the processing of visual and linguistic information within CALL environments, informing the development of more effective educational materials and strategies.

Despite the growing interest in eye-tracking research within CALL, a systematic review that critically analyzes this body of literature has been conspicuously absent. Such a review is imperative for aggregating research findings, delineating key outcomes, and identifying areas ripe for further inquiry. This article endeavors to fill this gap by presenting a systematic review of eye-tracking research in interactive language learning environments. It aims to critically analyze existing literature, pose pertinent research questions, and suggest future research directions. This endeavor seeks to broaden our comprehension of eye-tracking technology’s role in language learning research and its potential to enhance CALL material development and pedagogical practices, thereby making a significant contribution to advancing language learning methodologies.

1.1 Statement of problem

Eye tracking research has emerged as a valuable tool for investigating the cognitive processes involved in CALL and informing the development of more effective and efficient CALL materials and instructional strategies (Bahari et al., 2023). Despite the potential benefits of eye tracking technology, previous research has identified numerous challenges and limitations associated with its use in computer-mediated language learning. To address this gap in the literature, the present study aimed to conduct a systematic review of the available research on the affordances and challenges of eye tracking for CALL. Specifically, the study sought to synthesize the existing literature on the use of eye tracking technology in CALL and identify the benefits and challenges associated with its implementation.

2 Method

The present systematic review utilized the PRISMA scheme (Page et al., 2021) to select articles. The PRISMA guidelines are known for their adaptability, validity, and transparency, making them suitable for conducting systematic reviews across various subjects and disciplines. Notably, the guidelines have been successfully applied to studies related to computer-assisted language learning (Bahari, 2021). Bahari (2023) described the benefits of PRISMA in five categories: improved transparency (by providing a clear and comprehensive checklist of items that should be included in the report), reduced bias (by requiring authors to report all relevant information, including details about the search strategy, study selection, data extraction, and risk of bias assessment), increased reproducibility (by providing a standardized framework for reporting the review), improved quality (by providing a standardized framework for conducting and reporting systematic reviews), better dissemination (by making them more accessible and understandable to a wider audience).

2.1 Eligibility criteria

Studies meeting the following criteria were included in this systematic review:

  1. 1.

    Analyzing the results of eye-tracking measurement devices;

  2. 2.

    Peer-reviewed;

  3. 3.

    Conducted in any language but published in English;

  4. 4.

    Published between 2010 and 2022.

Articles were excluded if:

  1. 1.

    They were published in a language other than our language skills (English, Persian, Turkish, and Mandarin);

  2. 2.

    They were book/app reviews, dissertations, review articles, and conference proceedings.

2.2 Search strategy and data bases

Sensitivity and specificity analysis (i.e., getting “the right stuff” and avoiding “the wrong stuff”) were conducted by testing and validating the search strategy through backward and forward citation searching. As the final keyword search strategy, Boolean logic method was applied to major online databases (JSTOR, EBSCOHost, ProQuest, Web of Science, Scopus, Education Source, ERIC, Social Services Abstracts, IEEE Xplore, and Electronic Journal Center). The references in the related reviews were hand checked to locate all related studies.

The following descriptors were included in the final search strategy: “eye-tracking-based language learning”, “eye-tracking -assisted language learning”, “ language learning through eye-tracking”, “eye-tracking-based tools and strategies for language learning”, “eye-tracking-based language learning environments”, “eye-tracking in computer-assisted language learning”, “eye-tracking in distance language learning”, “eye-tracking in online language learning”, “eye-tracking in augmented reality language learning”, and “eye-tracking in virtual world language learning”. Using these parameters resulted in an initial pool of 132 articles.

2.3 Data selection

For this systematic review, a comprehensive search strategy was developed to retrieve records from various databases. A total of 132 records were initially identified based on predefined eligibility criteria, which were meticulously crafted to ensure relevance and rigor in the selection process. These criteria included the focus on eye-tracking technology within interactive language learning environments, publication dates within the last ten years to ensure currency of data, and articles published in peer-reviewed journals to guarantee quality. In addition to database searches, records were also identified through other resources, including reference lists of relevant studies and expert recommendations, to ensure a thorough coverage of the field. Upon retrieval, records underwent a rigorous screening process. Initially, 17 duplicates were removed to ensure each study was only considered once. Subsequently, 38 titles and abstracts were excluded due to their irrelevance to the scope of this review, as determined by a preliminary assessment against the eligibility criteria. Further, 43 articles that did not meet the exclusion criteria, such as those not employing eye-tracking technology in a language learning context or lacking empirical data, were also excluded. The remaining articles were subjected to a full-text review for final inclusion in the study. This meticulous selection process is visually summarized in Fig. 1, which presents the PRISMA flow diagram. To uphold the reliability of the study selection process, two trained researchers independently assessed 25% (n = 29) of the randomly selected articles for interrater agreement. This step was crucial to ensure objectivity and consistency in applying the eligibility criteria. The assessment resulted in a 94% agreement rate, indicating a high level of concordance between the researchers. Discrepancies were resolved through discussion until a consensus was reached.

Fig. 1
figure 1

PRISMA flow diagram

2.4 Data coding, data analysis, and data sources

In this systematic review, a comprehensive and rigorous data analysis process was employed to scrutinize the collected data from the selected eye-tracking studies. Drawing on the methodological underpinnings of previous research, an inductive paradigmatic analysis was meticulously conducted. This analytical approach facilitated the identification of emergent themes and patterns within the data, leading to the delineation of four primary data categories: challenges, affordances, research design, and theoretical frameworks.

  • Challenges: This category encompassed the documented negative effects and limitations associated with employing eye-tracking technology in language learning research. It aimed to provide a critical perspective by acknowledging the potential drawbacks alongside the benefits, thus offering a balanced view of eye-tracking research applications.

  • Affordances: The second category highlighted the documented potentials and advantages of utilizing eye-tracking technology in language learning studies. This category underscored how eye-tracking facilitates a deeper understanding of the cognitive processes involved in language acquisition, thereby showcasing its utility.

  • Research Design: The studies were further categorized based on their methodological approach—quantitative (employing statistical procedures), qualitative (without statistical procedures), or mixed methods (a combination of both approaches). This classification was informed by the nature of data collection and analysis techniques used in each study, allowing for a nuanced understanding of how different research designs contribute to insights gained from eye-tracking in language learning contexts.

  • Theoretical Frameworks: The fourth category focused on the theoretical underpinnings guiding the reviewed studies. This involved a detailed examination of the theoretical perspectives employed in the research, including cognitive theories of multimedia learning and theories specific to language acquisition. Theoretical frameworks were critically analyzed to understand how they influenced the study designs and interpretations of findings.

Data for this systematic review were extracted from 34 peer-reviewed articles published between 2010 and 2022. Each article was meticulously selected based on its relevance to the scope of this review and its contribution to understanding the application of eye-tracking technology in language learning contexts.

The data analysis process was designed to be both iterative and reflective, ensuring that emerging themes could be incorporated into the analysis framework as the review progressed. This approach not only allowed for a comprehensive synthesis of the existing literature but also facilitated the identification of gaps in current research and opportunities for future studies. By employing a rigorous and transparent data analysis process, this review aims to provide a robust foundation for understanding the current state of eye-tracking research in language learning and its implications for future research directions.

The adopted theoretical framework (see Table 1) was the fourth data coding category that informed us about the theoretical perspectives of the reviewed studies (see Table 1). Data sources in the present systematic review were 34 peer-reviewed articles published between 2010 and 2022.

Table 1 Research questions, data categories, and access links to research articles

3 Results

3.1 Affordances of eye-tracking research

The systematic review meticulously catalogues a series of affordances (see Table 2) brought to light by eye-tracking research in the context of computer-assisted language learning. These affordances underscore the technology’s capability to deepen our comprehension of the multifaceted processes involved in language acquisition. The ensuing table delineates these affordances, encapsulating the diverse ways in which eye-tracking technology can be harnessed to refine and enhance language learning methodologies and instructional design.

Table 2 Affordances of eye-tracking in CALL research

3.1.1 Digital reading analysis

Various studies have employed eye-tracking research to explore how language learners read texts in a second language. These studies have investigated how learners allocate attention to different parts of the text, process vocabulary, and use contextual cues to comprehend meaning. For instance, Liu et al. (2019) used eye-tracking research in e-reading and found that vocabulary focus and glosses were significantly fixated on when reading for vocabulary acquisition, whereas illustrations were more intensely fixated on when reading for comprehension. This suggests that language teachers should create instructional materials that direct students’ attention toward specific multimedia supports tailored to different literacy goals. Numerous studies have confirmed the efficacy of eye-tracking research for CALL studies. Eye-tracking technology can measure participants’ eye movements as they interact with language learning materials, such as text, images, or videos. This provides insights into how learners process language and how different features of instructional materials impact language learning outcomes. One of the main advantages of eye-tracking research for CALL is the ability to analyze how learners read and process text. Salmerón et al. (2017) found that eye-tracking technology can be used to measure different aspects of reading behavior, including fixation duration, saccade length, and regressions. These measurements can provide valuable insights into how learners process and comprehend written language. The study revealed that deep processing of pertinent hypertext sections positively correlates with enhanced performance, regardless of an individual’s reading comprehension abilities. Therefore, it is crucial to teach all students not to overuse hypertext scanning, irrespective of their reading comprehension skills. Eye-tracking research for CALL can leverage these insights to develop instructional materials that better support language learning. For example, researchers can use eye-tracking data to identify which parts of a text learners focus on most, how long they spend reading each part, and how frequently they regress to previous sections. Such information can optimize instructional materials for better comprehension, vocabulary acquisition, and overall language learning outcomes. Additionally, researchers can analyze how different features of instructional materials impact reading behavior and comprehension by varying the font size, color, or layout of a text. This can provide insights into designing more effective instructional materials for language learning.

3.1.2 Vocabulary acquisition analysis

Vocabulary acquisition analysis is among the affordances of eye-tracking research for capturing and analyzing how learners acquire new words and phrases (Bahari, 2022). Eye-tracking technology can measure various aspects of vocabulary acquisition, such as how often learners fixate on new vocabulary words, how long they spend processing the words, and whether they return to the words later in the text or lesson. The review indicated that when designing e-books for language learning, two key considerations must be taken into account. Firstly, not all types of multimedia supports are equally effective or valuable for L2 learners. Secondly, proficient L2 learners attend to different multimedia supports depending on the presentation mode and literacy goal (Liu et al., 2019). Pellicer-Sánchez (2016) reported that eye-tracking research findings on CALL can be used to inform the development of instructional materials that better support vocabulary acquisition. For example, researchers can use eye-tracking data to identify which types of instructional materials are most effective for teaching new vocabulary, such as flashcards, vocabulary lists, or contextualized texts.

Additionally, researchers can use eye-tracking data to identify which features of instructional materials, such as the use of pictures or audio, are most effective for teaching new vocabulary (Godfroid & Schmidtke, 2013; Siyanova-Chanturia et al., 2011). The analysis of vocabulary acquisition through eye-tracking research allows CALL researchers to analyze how different instructional approaches impact vocabulary acquisition. For example, researchers can compare the effectiveness of explicit vocabulary instruction versus implicit instruction, or the effectiveness of rote memorization versus contextualized learning. This can provide insights into how to design more effective instructional materials for language learning. However, it is important to acknowledge that despite the aforementioned benefits, certain studies examining the effects of visual supports on comprehension, similar to those investigating the impact of reading supports on vocabulary acquisition, have yielded inconclusive results due to methodological concerns, such as the conflation of photos and illustrations as multimedia supports of the same nature.

3.1.3 Grammar processing and learning

Eye-tracking research has emerged as a valuable tool for investigating how learners process grammar in a second language in the field of computer-assisted language learning. It allows researchers to examine how learners attend to different grammatical structures, process grammatical errors, and use contextual cues to understand grammar in context (Spit et al., 2022). Moreover, eye-tracking technology enables CALL researchers to capture and analyze various aspects of grammar processing and learning, such as fixation duration, saccade length, and revisiting patterns (Benati, 2022). The insights gained from eye-tracking research can inform the development of instructional materials that better support grammar processing and learning in CALL (Hou et al., 2022). Therefore, it is suggested that CALL researchers use eye-tracking data to identify the most effective types of instructional materials for teaching grammar, such as explicit instruction, implicit instruction, or contextualized learning. Additionally, researchers can compare the effectiveness of different types of feedback, such as corrective feedback versus recast feedback, or the effectiveness of different types of tasks, such as gap-fill exercises versus sentence completion exercises. This can provide valuable insights into designing more effective instructional materials for language learning.

3.1.4 Pronunciation acquisition and processing

Eye-tracking research has contributed to our understanding of how learners attend to and process pronunciation in a second language. It has investigated how learners perceive and process different phonetic features, attend to visual and auditory cues, and integrate these cues to improve their pronunciation (Boers et al., 2017; Kobylyanskaya, 2022). Eye-tracking technology can measure various aspects of pronunciation acquisition and processing, such as fixation duration, saccade length, and revisiting patterns (Chen et al., 2023a, b, c). The insights gained from eye-tracking research can inform the development of instructional materials that better support pronunciation acquisition and processing in computer-assisted language learning (Mohsen, 2016). For example, researchers can use eye-tracking data to identify the most effective types of instructional materials for teaching pronunciation, such as audio recordings, videos, or interactive pronunciation exercises. Moreover, studies have shown that the inclusion of visual aids, online videos, and practice exercises are effective features of instructional materials for teaching pronunciation (Wang et al., 2020). Therefore, eye-tracking research for CALL can provide valuable insights into designing more effective instructional materials for language learning. By identifying which types of instructional materials and features are most effective for teaching pronunciation, CALL researchers can optimize language learning outcomes and better support learners in acquiring and processing pronunciation in a second language.

3.1.5 CALL design and evaluation

Eye-tracking research has informed the design and evaluation of CALL materials and activities, including how to design effective multimedia materials, how to evaluate the effectiveness of different CALL approaches, and how to personalize CALL instruction based on individual learners’ needs and preferences (Alkan & Cagiltay, 2007). Eye-tracking technology can measure various aspects of learners’ interactions with CALL materials, such as where they focus their attention, how long they spend on different parts of the materials, and how they process the language presented in the materials (Godfroid et al., 2018). Eye-tracking research for CALL can use these insights to inform the design and evaluation of CALL materials (Frenck-Mestre, 2005; Godfroid et al., 2015). For example, researchers can use eye-tracking data to identify which types of CALL materials are most effective for teaching specific language skills, such as reading comprehension, vocabulary acquisition, grammar processing, or pronunciation.

Furthermore, eye-tracking data can be utilized by researchers to determine the most effective features of CALL materials, such as images, audio, or video, for supporting language learning (Mohamed, 2018). Additionally, eye-tracking research for CALL can leverage CALL design and evaluation as a tool to assess the effectiveness of existing CALL materials. For instance, comparing the effectiveness of different versions of the same CALL materials or evaluating the effectiveness of CALL materials versus traditional classroom instruction can offer valuable insights into enhancing existing CALL materials and designing more effective CALL materials in the future.

3.1.6 Learners’ cognitive processes

Eye-tracking can provide valuable insights into learners’ cognitive processes, including attention, perception, and memory, which can inform the design of more effective CALL programs (Bolzer et al., 2015). Using eye-tracking research. Bolzer et al. (2015) reported that the use of fixation durations and number of transitions has been shown to be effective in facilitating cognitive processes that involve focused attention and careful consideration. Incorporating eye tracking technology can offer learners immediate performance feedback, thereby aiding in enhancing their language proficiency in a more efficient and productive manner (Kao et al., 2019). Moreover, Kao et al. (2019) propose that utilizing e-books with integrated selfies can influence both the visuo-cognitive processes implicated in reading and the self-reported emotions of motivation and engagement. Eye-tracking technology has the capability to quantify various aspects of cognitive processing, including attentional focus, working memory allocation, and decision-making processes (Hegarty, 2010). Integrating eye-tracking research with CALL can facilitate the creation of instructional materials that more effectively support learners’ cognitive processes (Chien et al., 2015; De Koning et al., 2010; Knoblich et al., 2001). By using eye-tracking data, researchers can determine the most effective types of instructional materials for engaging learners’ attention, sustaining their focus, and reducing cognitive load. In addition, eye-tracking data can be utilized by researchers to determine the most effective aspects of instructional materials, such as scaffolding or practice exercises, for supporting learners’ cognitive processes. Moreover, eye-tracking research for CALL can leverage learners’ cognitive processes as a tool to analyze how diverse instructional approaches influence their cognitive processing (Schugar et al., 2013; Takacs et al., 2015). For instance, comparing the efficiency of different feedback types, such as explicit and implicit feedback, or the effectiveness of different task types, such as problem-solving versus memorization tasks, can offer valuable insights into designing more effective instructional materials for language learning that better support learners’ cognitive processes.

3.2 Challenges of eye-tracking research

The review highlighted several challenges and limitations associated with the use of eye tracking in CALL see Table 3 below:

Table 3 Challenges of eye-tracking research in CALL

3.2.1 Limited sample size

Eye-tracking studies in CALL often involve a small number of participants, which may not represent the wider population and can limit the generalizability of the findings (Godfroid et al., 2013). Such studies face the challenge of potential bias and limited validity due to small sample size. This can lead to inaccurate or biased results and may not provide sufficient statistical power to detect meaningful differences or relationships between variables. Moreover, the limited sample size may restrict the generalization of the findings to other populations or contexts (Chun et al., 2016). To overcome these limitations, researchers should carefully consider the sample size required to address the research question and ensure that the sample is representative of the population of interest. In addition, researchers may consider alternative methods such as meta-analysis to combine results across multiple studies and increase the generalizability of the findings.

3.2.2 Equipment limitations

The use of eye-tracking equipment in research can be hindered by various limitations, including cost and availability (Winke et al., 2013). Technical limitations of eye-tracking devices, such as low spatial or temporal resolution, sensitivity to environmental factors, and potential for measurement errors (Ponce et al., 2018; Lai et al., 2013), can also affect the reliability and validity of the data collected. Furthermore, the high cost of eye-tracking equipment can limit the scale and longitudinal nature of studies, while limited availability in certain regions can affect the generalizability of the findings (Cook et al., 2018). To overcome these challenges, researchers must carefully select and calibrate their eye-tracking devices, ensuring that they are suitable for the research question and that the data collected is accurate and reliable. Alternative approaches, such as using low-cost or portable eye-tracking devices or integrating eye-tracking data with other measures, may also be considered to enhance the validity and reliability of the findings (Liu, 2014).

3.2.3 Difficulty in interpreting data

Eye-tracking research in CALL involves the collection and analysis of data on learners’ eye movements as they interact with language learning materials, such as text, images, or videos. Although this approach can provide valuable insights into how learners process language and the impact of different instructional materials on language learning outcomes, it can also pose challenges in interpreting the data collected (Kang, 2014; Russell, 2005). One such challenge is the complexity of the eye-tracking data and the difficulty in interpreting it without specialized software and statistical analysis techniques (Yaros & Cook, 2011). Additionally, establishing connections between eye-tracking data and learning outcomes can be challenging due to difficulties in defining and measuring these associations (Mayer, 2010). The definition and analysis of regions of interest within the instructional materials can also lead to different results and interpretations. Furthermore, the context of the instructional materials, as well as the learners’ language learning goals, must be considered when interpreting eye-tracking data for CALL. The instructional materials themselves may also be complex or ambiguous, further complicating the interpretation of the eye-tracking data. To address these challenges, researchers must carefully design their studies to ensure that the collected eye-tracking data is relevant and meaningful for addressing the research question. This may involve defining regions of interest based on the learners’ language learning goals, using appropriate statistical techniques to analyze the eye-tracking data, and considering the context of the instructional materials. By doing so, researchers can increase the validity and reliability of their eye-tracking studies in CALL, leading to more accurate and meaningful results.

3.2.4 Interference from other factors

Eye-tracking studies for CALL may face limitations due to interference from various external factors, including lighting, head movement, and other stimuli (Tai & Chen, 2021). This interference can complicate the interpretation of the results and pose challenges in identifying the impact of specific factors on language learning outcomes (Koć-Januchta et al., 2020). Learners’ prior language knowledge, motivation, and cognitive abilities can also impact their eye movements and learning outcomes, potentially confounding the study results (Wang et al., 2020). Furthermore, the use of learning materials can result in elevated levels of extraneous cognitive load, particularly when learners need to integrate information from two sources that are either temporally or spatially separated. Split attention is a challenging issue for materials with high element interactivity, such as those with complex content and high intrinsic load. Additionally, individual differences in learners’ preferences for instructional materials or learning styles, as well as interference from the learning environment, can affect the validity of the data collected (Clark & Mayer, 2016). To overcome these limitations, researchers must carefully control for interference from external factors during the design and implementation of their eye-tracking studies. This may involve controlling for learners’ prior knowledge, motivation, and cognitive abilities, as well as carefully selecting and controlling the learning environment. Researchers may also need to consider using statistical techniques to account for individual differences among participants or isolate the effects of specific instructional material features. Despite potential limitations, careful study design and data analysis can help to mitigate the effects of external interference and produce more accurate and meaningful results in eye-tracking research for CALL.

3.2.5 Limited ecological validity

Eye-tracking research for computer-assisted language learning is often conducted in laboratory settings, which can limit its ecological validity by failing to accurately reflect real-world learning environments (Liou, 2012; Qu-Lee & Balcetis, 2022). Ecological validity refers to the extent to which the findings of the research can be generalized to real-world settings and the degree to which the research accurately reflects the natural environment in which language learning occurs. Eye-tracking studies are typically conducted in controlled laboratory settings where stimuli are carefully controlled and manipulated to elicit specific eye movements and responses (Meißner & Oll, 2019). While this method is useful for understanding the underlying mechanisms of eye movements during language learning, it may not accurately reflect the complexity and variability of real-world language learning situations. Therefore, the limited ecological validity of eye-tracking research in CALL can pose a challenge for generalizing the findings to real-world language learning scenarios.

3.2.6 Limited focus

Eye-tracking research in language learning education often has a narrow focus on specific aspects of language, such as reading comprehension or vocabulary acquisition, and fails to provide a comprehensive view of language learning (Kang et al., 2022). Eye-tracking data typically focuses on a limited set of variables, such as fixation duration, saccade amplitude, or scan path, which may not capture the full range of cognitive processes involved in language learning, including attention, memory, and motivation. Furthermore, the focus of the eye-tracking data collected may be influenced by the instructional materials used in the study, potentially limiting the generalizability of the findings (Gass et al., 2019). A related challenge is the potential for eye-tracking data to be interpreted in isolation from other measures, such as behavioral or self-report measures (Jarodzka & Brand-Gruwel, 2017). While eye-tracking data can provide valuable insights into how learners process language, it may not provide a complete picture of language learning outcomes without additional data from other sources. To address these limitations, researchers must carefully select the variables of interest and consider how they relate to other aspects of language learning, such as attention, memory, or motivation. Additionally, researchers may need to consider alternative approaches, such as combining eye-tracking data with other types of data, to provide a more complete and comprehensive understanding of language learning outcomes.

3.2.7 Ethical considerations

The use of eye-tracking technology in research on language learning has the potential to raise ethical concerns around issues such as informed consent, privacy, and confidentiality. Eye-tracking data can potentially capture sensitive information, such as personal preferences, attitudes, or biases, which may be unintentionally disclosed during the study (Indrarathne & Kormos, 2017). Additionally, the use of eye-tracking technology can potentially cause discomfort or harm, such as eye strain or fatigue, which may affect participants’ performance or willingness to participate in the study. Another challenge related to ethical considerations is the need to obtain informed consent from participants. Researchers must ensure that participants are fully informed of the purpose and nature of the study, as well as any potential risks or discomfort associated with the use of eye-tracking technology. It is also necessary to ensure that participants have the opportunity to withdraw from the study at any time without penalty (Bacca-Acosta et al., 2022a, b). To address these ethical concerns, researchers must carefully consider the ethical implications of using eye-tracking technology in research and take steps to minimize any potential risks or discomfort to participants. This may involve obtaining informed consent, ensuring participant privacy and confidentiality, and using appropriate data storage and analysis methods. By addressing these ethical considerations, researchers can conduct eye-tracking studies in a responsible and ethical manner that respects the rights and welfare of their participants.

3.3 Adopted research methods

Based on the review of the articles, experimental studies were found to be the most commonly adopted research method in eye-tracking research for computer-assisted language learning, accounting for approximately 38% of the studies reviewed. Correlational studies were the second most frequently adopted research method, accounting for approximately 23% of the studies reviewed. Mixed-methods studies were the third most commonly adopted research method, accounting for approximately 19% of the studies reviewed. Case studies accounted for approximately 15% of the studies reviewed, making them the fourth most commonly adopted research method. Longitudinal studies were the least commonly adopted research method, accounting for approximately 5% of the studies reviewed.

3.4 Adopted theoretical frameworks

The present systematic review provides insight into the theoretical frameworks adopted in eye-tracking research for computer-assisted language learning studies. The review identified five commonly used theoretical frameworks, including cognitive load theory, information processing theory, multimodal learning theory, social-cognitive theory, and constructivism. Notably, cognitive load theory emerged as the most frequently adopted theoretical framework, accounting for approximately 28% of the studies reviewed. This framework posits that learning is more effective when cognitive load is managed appropriately. Information processing theory was the second most frequently used theoretical framework, with approximately 22% of the studies reviewed adopting this framework. Information processing theory proposes that learners process information in a series of stages. Multimodal learning theory was the third most frequently adopted theoretical framework, accounting for approximately 20% of the studies reviewed. This framework suggests that learning is most effective when multiple modes of representation are used. Social-cognitive theory, which proposes that learning occurs through interaction with others and the environment, was adopted in approximately 14% of the studies reviewed. Finally, constructivism, which posits that learning is an active process in which learners construct their own understanding of the material, was adopted in approximately 11% of the studies reviewed.

3.5 The status quo and future directions

Overall, the review findings indicate that eye-tracking research in CALL has contributed to a better understanding of how learners engage with language learning materials and activities, as well as how to design effective CALL instruction that supports learners’ cognitive and language development. However, there is still much to be explored in this field, and ongoing research is needed to further investigate the potential of eye-tracking technology for improving language learning outcomes.

The future trend in eye-tracking research on computer-assisted language learning is likely to focus on the following directions: First, multimodal integration: eye-tracking research is likely to explore how learners integrate different modalities, such as audio, video, and text, during language learning. By examining how learners allocate attention across different modalities, researchers can identify the most effective combinations of materials and activities for language learning. Second, individual differences: eye-tracking research is likely to investigate how individual differences, such as cognitive styles, learning preferences, and motivation, impact learners’ gaze behavior during language learning. By understanding how learners with different characteristics engage with language learning materials and activities, researchers can develop more personalized and effective CALL instruction. Third, naturalistic settings: Eye-tracking research is likely to explore how learners engage with language learning materials and activities in more naturalistic settings, such as in real-world communication situations or in immersive virtual environments. By examining learners’ gaze behavior in these settings, researchers can identify the most effective ways to support language learning in authentic contexts. Fourth, longitudinal studies: eye-tracking research is likely to conduct more longitudinal studies to examine how learners’ gaze behavior changes over time as they engage in CALL instruction. By tracking learners’ gaze behavior over an extended period, researchers can identify patterns of development in language learning and the factors that contribute to successful language acquisition. Fifth, neurolinguistic perspectives: eye-tracking research is likely to incorporate neurolinguistic perspectives to examine the neural basis of language learning and the relationship between gaze behavior and cognitive processes. By combining eye-tracking technology with other neuroimaging techniques, researchers can investigate the neural mechanisms underlying language processing and acquisition. In summary, future eye-tracking research in CALL is likely to integrate multiple approaches and perspectives to provide a more comprehensive understanding of the cognitive and neural mechanisms underlying language learning and the factors that contribute to successful language acquisition.

4 Discussion

This systematic review article highlights the transformative potential of eye-tracking technology in enriching language learning outcomes, offering profound insights into the cognitive processes essential for language acquisition and the optimization of instructional strategies. Eye-tracking technology’s capacity to identify which aspects of language learning materials engage learners’ attention facilitates the creation of more effective materials, closely aligned with learners’ cognitive processing needs (Bahari, 2023; Ozcelik et al., 2010). Moreover, by tracking learners’ gaze patterns, it lays the groundwork for personalized instruction, tailored to meet individual learning needs. This technology also acts as a pivotal tool for assessing learning outcomes, enabling an evaluation of how interactions with language learning materials evolve, thereby aiding in the refinement of instructional strategies (Bahari, 2021; Salmerón et al., 2017). Additionally, eye-tracking technology allows for a detailed examination of how individual differences—such as age, proficiency level, and learning style—affect language learning processes. However, fully leveraging eye-tracking technology for language learning research and computer-assisted language learning design is fraught with challenges. The cost and complexity of eye-tracking equipment present significant obstacles to its widespread adoption in language learning research and CALL design contexts. Moreover, the potential for data collection and analysis errors necessitates a high level of technical expertise among researchers and developers unfamiliar with the technology (Schugar et al., 2013; Takacs et al., 2015). Ethical considerations, particularly regarding participant privacy and confidentiality, require meticulous attention. A notable limitation is the focus of eye-tracking research on visual aspects of language learning, potentially neglecting other crucial cognitive processes involved in language acquisition. This limitation can be addressed by integrating complementary measures such as EEG or fMRI, offering a more comprehensive understanding of cognitive processes at play (Liberman & Dubovi, 2023). Eye-tracking technology has proven to be an invaluable tool in elucidating cognitive processing intricacies during language learning activities. Its application transcends mere observation, providing quantifiable data on learners’ attentional focus, engagement levels, and interaction patterns with language learning materials. For instance, Bahari (2023) and Ozcelik et al. (2010) have shown how eye-tracking can identify areas within educational content that significantly capture learner attention. This capability extends beyond merely identifying viewed parts of a text or multimedia resource; it involves a sophisticated analysis correlating viewing patterns with comprehension, retention, and application of language concepts. The significance of this technology lies in its ability to transform complex visual engagement data into actionable insights that directly inform instructional strategy design and optimization. By understanding where learners’ attention is most actively engaged, educators and material designers can tailor content to align more closely with learners’ cognitive processes. This alignment is crucial for creating learning environments that capture and sustain learner interest, facilitating deeper cognitive processing of language materials. The implications of these findings are profound, addressing the first research question by illustrating eye-tracking’s role in identifying effective instructional strategies and supporting hypotheses related to instructional strategy improvement and learner engagement. These insights can be leveraged to create more engaging and cognitively stimulating language learning experiences. Navigating the complexities inherent in utilizing eye-tracking technology for language learning research and CALL application development presents a multifaceted challenge. The financial and technical demands associated with eye-tracking apparatus—highlighted by Liberman and Dubovi (2023)—constitute a formidable barrier, affecting the technology’s accessibility and integration into mainstream research and educational practice. This concern directly relates to the second research question about the feasibility of widespread application of eye-tracking in language learning contexts. Moreover, the ethical landscape surrounding eye-tracking technology use in educational settings necessitates rigorous scrutiny. Eye movement data collection and analysis pose significant ethical questions, especially regarding participant consent, privacy, and sensitive data handling. These considerations necessitate adopting stringent ethical guidelines and protocols to protect participant rights and ensure research practice integrity. Additionally, the potential for inaccuracies in data collection and analysis underscores the need for methodological rigor. Eye-tracking data precision depends on equipment calibration, experimental condition control, and sophisticated data analysis techniques. Researchers must employ robust methodological frameworks to mitigate these challenges and ensure their findings’ reliability and validity. In essence, while eye-tracking technology’s affordances offer promising avenues for enhancing language learning through nuanced cognitive process insights, successfully harnessing these benefits depends on overcoming significant financial, technical, ethical, and methodological hurdles. Addressing these challenges is imperative for advancing language learning research and CALL design fields, ensuring that eye-tracking technology can be effectively leveraged to foster innovative educational practices and outcomes.

5 Implications

5.1 Theoretical implications

The findings of this study carry significant implications for the design and implementation of computer-assisted language learning programs, informed by a comprehensive overview of the benefits and challenges of eye-tracking technology for language learning. These implications are deeply rooted in cognitive load theory (CLT) and multimedia learning theory (MLT), offering a theoretical framework that enriches our understanding of how eye-tracking technology can enhance language learning.

Firstly, understanding learners’ cognitive processes through eye-tracking sheds light on the intricate dynamics of information processing and attention allocation during language learning. This insight aligns with CLT, which emphasizes the importance of managing cognitive load to optimize learning (Paas et al., 2003). By identifying how different elements of language learning materials contribute to cognitive load, CALL researchers and designers can develop materials that facilitate more efficient learning processes.

Secondly, the efficacy of eye-tracking research in investigating the effects of multimedia elements on language learning is underscored by MLT. According to MLT, learners process visual and auditory information through separate channels, each with a limited capacity for processing (Lawson & Mayer, 2022). Eye-tracking technology’s ability to pinpoint which multimedia elements are most engaging and how they impact learning outcomes enables the creation of materials that effectively leverage both visual and auditory channels, thereby enhancing learning efficiency.

These theoretical frameworks—CLT and MLT—not only inform the study’s objectives but also provide a robust basis for interpreting the findings. They highlight the potential of eye-tracking technology to reveal critical insights into learners’ cognitive processes and the design of multimedia language learning materials. By leveraging these insights, CALL programs can be developed to better meet the needs of learners, making language learning more effective and engaging. This revised section clarifies the relevance of cognitive load theory and multimedia learning theory to the study’s findings, addressing the reviewer’s concern for a detailed exploration of how these theoretical frameworks specifically inform the research.

5.2 Pedagogical implications

The pedagogical implications of eye-tracking research in computer-assisted language learning (CALL) are both significant and varied, offering a range of applications that can profoundly impact educational practices. This section elaborates on specific, actionable ways these insights can be integrated into educational settings.

  • Personalized Learning: Eye-tracking research equips educators with a nuanced understanding of learners’ attention and perception patterns. For instance, if eye-tracking reveals that a learner consistently focuses on certain parts of a language task while neglecting others, educators can create customized content that addresses these overlooked areas, thereby fostering a more balanced learning experience. This personalization extends to adjusting the pace and complexity of lessons to match individual learning curves.

  • Adaptive Learning Systems: Utilizing eye-tracking data, adaptive learning systems can be developed to dynamically adjust instructional content based on real-time analysis of a learner’s gaze behavior. For example, if a learner is found to spend excessive time on basic vocabulary exercises, the system could automatically introduce more challenging tasks to maintain engagement and ensure progression.

  • Enhanced Feedback and Assessment: Real-time feedback based on eye-tracking data can significantly enhance language learning efficiency. For example, if eye-tracking identifies frequent hesitations or regressions in reading comprehension tasks, immediate feedback can be provided to guide the learner through difficult passages, highlighting keywords or offering explanations for complex phrases.

  • Instructional Material Design: Insights from eye-tracking studies can inform the creation of more effective instructional materials. For example, if data shows that learners tend to skip over certain types of exercises or fail to engage with specific content sections, these materials can be redesigned for better engagement and information retention.

  • Assessment Tools: Eye-tracking technology offers a sophisticated means of assessing language proficiency. By analyzing how learners interact with materials pre- and post-instruction, shifts in attention and cognitive processing can be detected, indicating areas of improvement or further need. For instance, a decrease in time spent on reading comprehension questions post-instruction could indicate an increase in proficiency.

  • Teacher Training: Eye-tracking research can also enrich teacher training programs by providing insights into learners’ cognitive processes. Teachers can use these insights to develop strategies that better align with how students process language information. For example, understanding that students with lower proficiency levels may require more visual cues in learning materials can lead to the development of more effective teaching aids.

5.3 Technical details and limitations of eye-tracking technology

Eye-tracking technology stands as a pivotal element in research, utilizing sophisticated devices to monitor and record the duration and location of a person’s gaze within a visual environment. These devices span from high-end stationary systems, noted for their exceptional spatial resolution and accuracy, to more accessible mobile eye trackers designed for natural movement and interaction. The foundation of this technology lies in corneal reflection tracking and pupil center tracking to ascertain gaze direction.

Technical Specifications: The review highlights the employment of various eye-tracking systems, notably the Tobii Pro TX300. This system is distinguished by its high sampling rate of up to 300 Hz, essential for capturing the rapid eye movements known as saccades. With a spatial accuracy within 0.5 degrees of visual angle, the Tobii Pro TX300 enables detailed analysis of gaze patterns and fixations. Its robust design and user-friendly interface render it adaptable for a broad spectrum of research environments, from controlled laboratory settings to dynamic, real-world scenarios.

Limitations: Eye-tracking technology, despite offering significant insights, faces inherent limitations. A primary concern involves “data loss” or “tracking loss,” particularly with participants who use glasses or contact lenses, or during rapid movements. The initial calibration process, requiring high precision, can be time-consuming. Additionally, the cost and complexity of advanced eye-tracking systems present considerable obstacles to their broad application.

Mitigation Strategies: In response to these limitations, the review identifies various strategies adopted by researchers. To counteract data loss, an increased sample size is often utilized to offset potential data unavailability. Software algorithm advancements have enhanced tracking success rates for participants with corrective lenses. Moreover, calibration procedures have evolved to be more user-friendly, minimizing setup time and enhancing participant comfort. Facing financial and logistical challenges, some researchers have opted for more economical mobile eye-tracking solutions, albeit with compromises in data precision and reliability.

Ethical Considerations: The ethical implications of using eye-tracking technology warrant attention. Obtaining informed consent is critical, particularly concerning the collection and storage of potentially sensitive data. It is imperative that participants are thoroughly briefed on the study’s nature and the utilization of their data, ensuring transparency and informed participation.

6 Conclusion

In light of the insights garnered from this systematic review, it is evident that eye-tracking technology harbors significant promise for augmenting the effectiveness and efficiency of computer-assisted language learning. This technology offers unparalleled insights into the cognitive processes that underpin language learning, thereby informing the development of CALL materials and instructional strategies that are both more effective and tailored to individual learner needs. The integration of eye-tracking technology into both CALL research and practice facilitates a deeper understanding of how learners interact with educational materials and the cognitive mechanisms that drive language acquisition. This, in turn, enables the creation of personalized and engaging learning experiences that align closely with learners’ cognitive processes.

Future research in this domain should aim to broaden the scope of inquiry to encompass a diverse array of learner populations, including those with learning disabilities and non-native speakers across various proficiency levels. Such research could illuminate how different groups uniquely engage with CALL materials, leading to the development of more inclusive and effective educational designs. Additionally, longitudinal studies are critical for evaluating the long-term impacts of eye-tracking-enhanced CALL interventions, offering insights into the sustainability of learning improvements over time. Exploring the integration of eye-tracking technology with emerging tools such as virtual reality (VR) and augmented reality (AR) represents another promising research direction. This approach has the potential to provide immersive, context-rich learning experiences that closely mimic real-life language use. Furthermore, conducting cross-cultural studies to examine how cultural differences affect learners’ visual attention patterns and cognitive strategies in language learning could lead to the creation of more customized and effective CALL materials.

By charting these future research directions, this review not only highlights the transformative potential of eye-tracking technology in the field of CALL but also outlines a clear path forward for research aimed at fully leveraging this technology. The ultimate goal is to ensure that language learning interventions are deeply informed by a comprehensive understanding of the cognitive processes involved in language acquisition, thereby enhancing the quality and impact of language education worldwide.

Notes:

Challenges1

This term refers to the problems and limitations that have been reported following the use of eye-tracking research in language education and research. These challenges need to be addressed and overcome in future studies to ensure the validity and reliability of the research findings. Some of the reported challenges include issues related to data collection, data analysis, and interpretation of results, as well as the difficulty of ensuring participant compliance and the high cost of equipment.

Affordances2

This term refers to the facilitative effects of using eye-tracking research in language education and research. These affordances can be implemented by teachers and scholars to enhance language education and research outcomes. Some of the reported affordances include the ability to capture and analyze real-time data on language processing, measure the effectiveness of language teaching methods, and identify areas for improvement in language instruction. The use of eye-tracking research also provides opportunities for the development of innovative teaching and learning strategies that can enhance language education and research.