Keywords

1 Introduction

Augmented Reality (AR) has emerged as a transformative tool in educational technology, offering immersive and interactive learning experiences. Despite its growing adoption, understanding the cognitive impact of AR on students remains a critical area of exploration. This study focuses on measuring and analyzing the cognitive load in biomechanics AR lectures.

Prior studies on AR-based learning have mostly focused on its effectiveness, learning outcomes, and user experience, which has led to a lack of knowledge about the cognitive load it imposes on learners. This study seeks to fill this gap using pupil dilation analysis and comparing mental workload in AR settings. To assess mental demand, we used the NASA Task Load Index (NASA TLX).

We propose that there may be a relationship between changes in pupil size and perceived mental effort, as determined through a combination of objective physiological data and subjective cognitive evaluations. The results of our research aim to provide a comprehensive understanding of the cognitive demands placed on individuals participating in augmented reality (AR) learning settings.

The research problem is twofold: firstly, to measure and quantify the cognitive load experienced by students using eye-tracking technology within AR educational settings, and secondly, to investigate the correlation between pupil area and cognitive load, assessing mental exertion in these settings. By achieving these objectives, this study seeks to provide actionable insights for educators and developers in AR-based learning, enhancing both the effectiveness and efficiency of the learning process. The significance of this research project lies in its potential to bring about transformative advancements in the field of educational technology, particularly in the context of AR-based learning.

This research emphasizes the key elements of assessing and analyzing cognitive load among students in AR educational environments. Developing a technique for gauging mental demand in AR-driven learning presents a new strategy for understanding and enhancing student education. It will help educators customize their teaching methods to boost engagement and productivity, ultimately enhancing the effectiveness of the learning experience. Through an assessment of the mental demand on student learning, we will identify factors that either hinder or facilitate learning outcomes and task execution efficiency in AR settings. Such insights empower researchers to develop interventions that significantly improve learner achievement and overall performance.

To measure cognitive workload in the AR learning environment, we implemented pupil dilation analysis. This physiological response reflects different cognitive states, such as mental workload and demand. To participants’ subjective mental demand with the pupil dilation data, we gathered information using the NASA-TLX, a tool for evaluating perceived workload. Research has shown that pupil size tends to increase with the level of cognitive effort [1, 2]. When a task requires more mental demands, such as increased attention, memory, or problem-solving, the pupils may dilate in response to the increased demand for processing resources [3]. Because of this relationship, pupillometry can be used as an objective measure to gauge mental workload. In studies where NASA-TLX is used, pupil dilation measurements can serve as a corroborating physiological marker to support subjective mental demand ratings [4]. While NASA-TLX relies on subjective self-report measures of workload across six dimensions (including mental demand), pupillometry can provide complementary objective data. The combination of subjective ratings with physiological data can enhance the understanding of the actual workload experienced by individuals [5]. Objective measures like pupil dilation can help calibrate the subjective ratings given in the NASA-TLX.

In this study, we hypothesize that there would be a significant difference in eye tracking pupil area when participants are engaged in learning or problem-solving lectures in AR environments. Specifically, we expect to observe a larger pupil dilation during problem-solving tasks compared to the learning tasks, indicative of increased cognitive workload during the experiment. Our hypothesis is based on the premise that more complex cognitive processing is required for problem-solving activities, which will result in higher cognitive workload ratings on the NASA TLX questionnaire.

2 Literature Review

This literature review embarks on an exploration of pivotal research discoveries concerning the measurement of cognitive workload in various applications, with particular attention given to eye tracking, pupil area, and their impact on student performance. Cognitive load theory, as initially propounded by Sweller, Van Merrienboer [6], underscores the paramount importance of overseeing cognitive demands to optimize the learning process. In the area of educational technology, the consideration of cognitive load becomes pivotal, as digital tools and AR applications can either facilitate or impede learning, contingent upon the cognitive resources they consume. Eye tracking technology has gained prominence within cognitive load research, offering insights into the precise areas to which learners direct their visual attention during tasks [7,8,9]. Previous studies showed that eye tracking serves as a valuable tool for capturing cognitive load by tracking gaze patterns and fixation durations [10,11,12,13,14]. In the context of AR-based learning, eye tracking elucidates how cognitive load fluctuates in response to changing visual stimuli and interactive elements. Pupil dilation could serve as a physiological marker intricately linked to cognitive load [15]. The study done by Ahern and Beatty [16] demonstrated that cognitive tasks demanding heightened mental effort correspond to increased pupil dilation. This implies that pupil area can function as a real-time indicator of cognitive engagement during AR learning experiences. Numerous research endeavors have delved into the intricate relationship between cognitive load and student performance. A metanalysis conducted by Sweller [17] underscored that elevated cognitive load can act as an impediment to the achievement of learning outcomes, ultimately resulting in reduced performance. This accentuates the pivotal role of optimizing cognitive load within AR-based educational settings to augment student accomplishments. AR's potential in the realm of education is vast, providing opportunities for interactive 3D visualizations and simulations. AR has the potential to enrich spatial comprehension, critical thinking, and problem-solving skills [18]. While Augmented Reality (AR) has been increasingly integrated into educational settings, offering promising avenues for enhanced learning experiences, a specific aspect of its impact remains underexplored – the analysis of pupil size as an indicator of cognitive load in AR learning environments. Previous studies have delved into the general effects of AR on learning outcomes and student engagement. For example, research on the application of AR in educational settings has examined its influence on student motivation and performance, as seen in studies done by Braarud [19]​​. Similarly, the study “AR Learning Environment Integrated with EIA Inquiry Model: Enhancing Scientific Literacy and Reducing Cognitive Load of Students” has underscored the potential of AR in improving scientific literacy and reducing cognitive load, highlighting the EIA (Experience–Inquiry–Application) model's effectiveness in this domain​​.

However, these studies have not specifically focused on using pupil size as a metric for cognitive load in AR learning environments. This gap presents a unique opportunity for our research. Our study aims to fill this lacuna by leveraging pupillometry – the study of pupil size variation – as a novel approach to gauge cognitive workload in AR-based educational settings. By focusing on the correlation between pupil size and cognitive load, our research endeavors to provide new insights into the physiological responses of learners engaged in AR experiences. This approach is pioneering in its attempt to objectively measure the cognitive impact of AR on learners, a dimension that has been relatively overlooked in existing literature. By doing so, our study not only contributes to the broader understanding of AR's educational implications but also opens new pathways for assessing and optimizing cognitive engagement in digital learning environments.

Mental Demand refers to the amount of mental and perceptual activity required by a task. This can include aspects like thinking, decision making, calculating, remembering, looking, searching, and any other mental activities. NASA-TLX has been used for evaluating the mental exertion and cognitive involvement needed to execute a task, as perceived by the individuals themselves [20,21,22]. The rating is typically on a scale from low to high. For instance, a task might be considered to have low mental demand if it is simple, straightforward, and requires minimal thought or concentration [23]. Conversely, a task with high mental demand might be complex, challenging, involve intricate decision-making, or require sustained attention and concentration. Understanding the mental demand of a task is crucial for evaluating the potential for cognitive overload, which can occur when the demands of a task exceed an individual's cognitive capacity. It is also important for the design of systems and tasks, especially in ensuring that they are within the capabilities of the user, thereby increasing safety and efficiency. It is used not only in research but also in the design and evaluation of products, in the workplace, and in the assessment of training programs. In a typical NASA-TLX assessment, after completing a task, a participant is asked to reflect on the mental demand it required and to provide a rating. This score is then combined with the ratings from the other five dimensions (Physical Demand, Temporal Demand, Performance, Effort, and Frustration) to calculate an overall workload score Braarud [19]​​. The outcomes of the mental demand assessment can inform changes to task design, indicate the need for additional training or resources, or suggest modifications to improve user interaction and reduce the potential for errors. By assessing mental demand and the other subscales, the NASA-TLX provides a comprehensive view of workload that can inform improvements in system design.

3 Methodology

3.1 Experimental Design

Twelve participants (average age = 20.6) from University of Missouri were recruited. They were requested to complete a questionnaire encompassing general inquiries regarding their age, gender, academic status, and prior experience with AR. The flowchart shown in Fig. 1. Outlines the procedural steps for a study where participants begin by giving informed consent and providing demographic information via a questionnaire. They then proceed to the setup and calibration of eye-tracking equipment and the Microsoft HoloLens 2 device. The experiment is conducted in two parts, with a mandatory four-hour gap between them to prevent data interference. Upon completion of each experiment, participants fill out the NASA-TLX form to assess their mental workload. Only after both experiments are completed do participants move on to the data analysis phase. We will explain more details of the data analysis phase in the next section. After that, we conducted a statistical analysis with the experimental data and NASA-TLX forms, aiming to establish a relationship between the measured pupil dilation and the subjective workload reported by the participants. This structured approach ensures a systematic collection and analysis of data pertinent to understanding cognitive load in AR learning environments.

Fig. 1.
figure 1

Schematic flowchart diagram of the experimental setup.

Fig. 2.
figure 2

Dikablis eye tracker & HoloLens and Eye tracker placement.

After the experiment was explained, participants were equipped with the Microsoft HoloLens 2 headset, followed by the placement of the Dikablis Eye tracker over their eyes (refer to Fig. 2.), and a powering device was slung across their body (see Fig. 3). After the eye tracker and HoloLens 2 devices were properly placed on the participant, the calibration of both devices had been proceeded to collect accurate eye data.

Fig. 3.
figure 3

Powering unit for the Eye Tracker hung across the body.

Two experiments (lecture 1 and lecture 2) were conducted with a minimum time gap of 4 h and maximum 48 h in between. The lecture 1 is a basics Biomechanics and Ergonomics AR learning session while the AR learning in the lecture 2 is more challenging compared to the first lecture, as the participant must make use of the first session’s knowledge to solve problems in Biomechanics and Ergonomics [24]. In each learning sessions the participant will be asked to complete multiple modules (7 in first lecture and 8 in second lecture).

Fig. 4.
figure 4

Experimental setup describing the layout.

We set up a table with an indoor location sensor to trace the participant’s location during AR learning [25]. This table also functioned as a navigational tool for transitioning between different AR scenes (see Fig. 4 and 5). We used the Q-Track NFER system for accurate indoor positioning. The NFER system plays a vital role in gathering important information about the participants’ movements, facilitating an examination of their interaction with the AR material and their movement within the educational area.

Fig. 5.
figure 5

AR environment setup showing the instructor dictating a biomechanics module.

Our custom-built client program was configured to promptly receive positional data via the locator receiver as soon as participants moved the table to a marked location (see Fig. 6). Upon identifying the specific area, the program initiated the Windows Device Portal to execute the corresponding AR application and project the scene onto the HoloLens device.

Fig. 6.
figure 6

Participant with the laptop along with the location tracking equipped table.

Following each augmented reality (AR) learning scene, participants are required to answer a quiz question related to the material they just studied. They also need to assess and rate their confidence in their answer. Subsequently, they view a feedback screen. Once they have reviewed this screen, they can proceed to the next location to engage with the following AR scene. During the time when participants are engaging with the AR scene and answering the quiz, their eye pupil movements are tracked and monitored in real time (refer to Fig. 7). After completion of the lecture, the eye tracker data was saved in the DLAB eye tracking software in a CSV file.

Fig. 7.
figure 7

Dikablis eye tracking software interface showing the eye pupil.

3.2 Data Analysis for Pupil Eye Tracking

Once the participant data was gathered, multiple steps were undertaken to cleanse the data for statistical analysis, aiming to uncover its relationship with the NASA TLX Mental Demand parameter. Utilizing the Dikablis scene view camera footage, the eye tracking dataset was segmented into learning and solving phases for each AR scene (7 modules for lecture 1 and 8 modules for lecture 2).

Given the variation in pupil size among individuals, which can range from 800 to 2500 square millimeters, we collected data on the initial size of each participant's pupils before exposing them to the visual stimuli created for this experiment. This baseline measurement acts as a reference for tracking changes in pupil size in response to the AR learning experience.

After that, we applied normalization to the pupil area data using Eq. 1 [26], where Pnorm represents the normalized pupil area, Pi denotes each data point of the pupil area (i = 1, …, n), min(P) is the smallest pupil area observed in the participant's entire set of data points, and max(P) signifies the largest pupil area from that same set of data

$${\mathbf{P}}_{{\varvec{norm}}}=\frac{{\mathbf{P}}_{{\varvec{i}}}-\mathbf{min}(\mathbf{P})}{\mathbf{max}\left(\mathbf{P}\right)-\mathbf{min}\left(\mathbf{P}\right)}$$
(1)

4 Results

4.1 Pupil Dilation Analysis

Upon examining the changes in pupil dilation among participants, we identified a significant pattern between the variations in pupil dilation from baseline to problem-solving phases and the mental demand. Table 1 presents the variations in pupil size from the baseline phase (B) to the problem-solving phase (S), labeled as ‘B-S-1’ to ‘B-S-7’. For instance, ‘B-S-1’ signifies the difference in pupil area between phases B and S for AR scene 1. These values represent the normalized difference in pupil size when participants were engaged in specific AR scene, compared to their initial pupil size. This data is pivotal as it suggests a quantifiable link between physiological responses and cognitive load. The last column, titled ‘Mental Demand’, shows a subjective rating of the mental effort as reported by participants, with values ranging from 20 to 80. These values reflect the cognitive demand of the tasks, with higher numbers indicating more demanding tasks. The variation in pupil dilation across tasks—from as low as 0.0010 to as high as 0.2846. By comparing the pupil dilation data with the self-reported mental demand, we were able to find the validity of using pupillometric data as an objective metric for cognitive workload in AR learning environments.

Table 1. Table shows the difference between the absolute value of the normalized pupil data (baseline phase and problem-solving phase).

4.2 Relation Between Pupil Dilation and Mental Demand

According to the results (see Table 2), we could find the significant relation the pupil dilation and mental demand in AR scenes 1, 2, 3, 5, and 6 in lecture 1.

Table 2. Table shows regression coefficients solving pupil dilation correlated with Mental Demand in the lecture 1.

Figure 8 displays a linear trend illustrating the association between the predicted mental workload and the actual measurements derived from pupil size. There’s an ascending trend line depicted, indicating that higher predicted levels of mental workload correlate with increased actual levels. The pink band surrounding the trend line signifies the confidence interval, which provides an estimate of where the actual trend line might fall with a certain level of confidence.

For lecture 1, The RMSE value is noted as 7.3448 (see Fig. 8), serving as an index of the average discrepancy between the model’s predictions and the observed values—the smaller this value, the more accurate the model is. An R-squared value of 0.90 signifies a strong correlation, with the model accounting for 90% of the variance in actual mental workload, which demonstrates an excellent model performance. A P-value of 0.0197 indicates a statistically meaningful correlation between the predicted and actual mental workload, as it falls below the conventional threshold of 0.05.

Fig. 8.
figure 8

Relation between the predicted mental load and pupil area for lecture 1.

Fig. 9.
figure 9

Relation between the predicted mental load and pupil area for lecture 2.

However, there was no significant relation between pupil dilation and mental demand in lecture 2 (see Fig. 9). It was observed that the P-value is 0.3385, which is above the conventional threshold of 0.05 for statistical significance. This P-value suggests that the relationship observed between predicted and actual mental demand might not be statistically significant.

In terms of mental demand between lecture 1 and lecture 2, there was a notable difference in mental workload between them, as illustrated in Fig. 10. The mental demand of lecture 2 is significantly higher compared to lecture 1. This could imply that the pattern of pupil dilation becomes more unpredictable with increased mental demand.

Fig. 10.
figure 10

Mental Demand Comparison between Lecture 1 and Lecture 2.

5 Discussion

This study aimed to find the new way to measure cognitive workload using pupil dilation in an augmented reality (AR) learning environment. Our findings revealed a significant relationship between predicted and actual mental demand, as indicated by the regression results. The regression model (see Table 2), with an R-squared value of 0.90, suggests a strong explanatory power of the model, with the predicted mental demand accounting for a substantial portion of the variance in the actual mental demand measurements. However, it was only shown in lecture 1. There was no significant relationship between predicted and actual mental demand in lecture 2. A potential reason for this discrepancy may lie in the differing levels of workload between the two lectures. Figure 10 illustrates that the mental demand during lecture 2 was substantially higher compared to lecture 1. The AR learning session in lecture 2 posed greater challenges, requiring participants to apply knowledge from the first session to address problems in biomechanics. This increase in task complexity could significantly diminish the predictability of pupil dilation responses. If the task is more demanding than expected, mental demand may rise, leading to increased variability in pupil dilation as participants adapt to the real level of difficulty.

The regression coefficients in the model of lecture 1 indicate the relationship each predictor has with the dependent variable, mental demand. For instance, B-S-1, B-S-3, and B-S-6 are notable for their significant positive relationship with mental demand, suggesting these conditions notably increase cognitive workload. In other words, the positive values of these coefficients suggest that an increase of the pupil dilation difference between baseline phase and problem-solving phase is associated with a corresponding rise in mental demand. While the effect of B-S-2 lacks statistical significance, its inclusion resulted in the highest R-squared value compared to any other combination. B-S-5 shows a notable negative correlation, signifying that as the pupil dilation difference between the baseline and problem-solving phases increases, there is a decrease in mental demand. The statistical significance of the lecture 1 model is reinforced by the P-value of 0.0197, suggesting that the predictors used in the model are indeed relevant to estimating mental demand in an AR setting. Further investigation is necessary to understand why certain AR scenes exhibit a positive relationship between pupil dilation and mental demand, while others demonstrate a negative relationship.

Pupil dilation is widely recognized as an indicator of cognitive load, though finding a strong linear regression model has proven difficult. However, in this study, we have successfully found a strong linear regression pattern at a medium level of participant workload. The model's high predictive validity has practical implications for the development of adaptive AR systems. For instance, real-time monitoring of pupil area could be integrated into AR applications to assess learner engagement and cognitive load, thereby allowing for dynamic adjustments to the complexity of the content. Such adaptability could enhance learning efficiency and reduce cognitive overload, potentially leading to better educational outcomes.

6 Conclusion

In this study, we exam the effects of AR-based lectures on pupil dilation, utilizing it as an indicator of mental demand. By comparing pupil dilation measurements with cognitive load evaluation methods such as the NASA Task Load Index, we find that fluctuations in pupil size could reflect varying cognitive loads, aligning with the mental demands of AR lectures. Initial results reveal a notable link between enlarged pupil dilation and increased cognitive workload in AR settings. This study introduces an innovative approach for assessing cognitive workload in AR environments through the analysis of pupil dilation data. The regression model used in our study reliably identifies pupil dilation as an indicator of cognitive workload in AR learning settings. Our results emphasize the model's effectiveness in detecting variations in mental demand.

As for limitations, our study did not account for how individual differences related to stress and cognitive demand could affect pupil dilation. Some individuals might exhibit more significant pupil dilation as a reaction to increased cognitive demand due to stress, whereas others may not show a physiological response to the same level of demand. This variability could arise from personal differences among learners or from external factors not accounted for in our model. Future studies should aim to include additional physiological or environmental variables to improve the predictive power of the model. Moreover, it is essential to explore how these findings apply across a larger sample size that includes diverse age groups. Further research is needed to refine AR learning environments, ensuring they effectively balance educational engagement with the cognitive demands placed on learners.