Keywords

1 Introduction

Sanders and McCormick believed that 80% of the information received by humans comes from the eyes, and that the eyes are not only a main input device, but also an output device [1]. The foundation of the use of eye tracking in psychology is based on the eye-mind hypothesis, first proposed by [2], stating that the fixation of the eye is where the information is being processed by the mind. Eye movement is a process in which light reflects off an object at which we are looking, enters the eye, and falls on the central fovea to enable us to see a clear image. This process involves continuous movement of the eye, and the detection technology to observe eye movement information is called eye tracking technology.

In daily life, whether it is recognizing objects, reading, looking at pictures, or looking for things, eye movement is required to collect the necessary information in various tasks. In this sense, eye tracking was regarded as the most effective method in visual information processing [3], and also a multi-functional tool of cognitive comprehension. Rayner pointed out that eye tracking technology provides an important tool for natural and real-time exploration of cognitive thinking [4]. The information receiving process can reflect the correlation between eye movement and psychological change(s) of the reader. This method was hence widely used to understand the reading process and other related topics in various research experiments (e.g. eye movement characteristics, perceptual breadth, information integration, etc.). Eye movement data were also used to test the cognitive process during different cognitive tasks [5]. In recent years, with the development of hardware and the innovation of software technology, the collection and analysis of eye-movement trajectory data has become easier [6], which allows researchers to record individual fixation, saccade, and blink events in real time. The first two, i.e. fixation and saccade, are the two most widely used parameters in the field of image cognition. In research, “fixation” is usually defined as visual attention in a specific area, a gaze eye movement which lasts for 200–300 ms or longer; on the other hand, “saccade” is defined as rapid, ballistic movement of the eyes between different points of fixation, which is a rapid eye movement that helps the eyes land on a specific visual target. During such eye movement, although some peripheral information can be obtained, information processing is constrained [4]. The reading process includes a series of saccades and fixations, and the sequence is called a scan path [7, 8].

Based on the two basic eye movement behaviors, i.e. fixation and saccade, Lin and You pointed out in an empirical research on the application of eye tracking technology to the reading of scientific graphs and texts that commonly-used eye movement indicators include: total fixation duration, fixation count, number of saccades, sequence of fixation, ratio of total fixation duration, times of regression in a text zone, and saccade amplitude. Molina, Navarroa, Ortega, and Lacruz believed that in the process of eye tracking data analysis, the heat map can be qualitatively interpreted, and regions of interest (ROIs) can be defined according to research purposes or hypotheses for quantitative analysis, e.g. the number of fixation points in each ROI, fixation duration, or the sequence of fixation points among different ROIs [9].

Mayer believed that eye tracking technology provides a unique opportunity to understand learners’ perceptual processing in learning [10], and helped examine the influence of specific teaching methods on learning, which can serve as a reference for teaching material design [11]. At present, desktop (fixed) eye trackers are adopted in most research, where 2D materials are presented to subjects on the computer monitor. By capturing the reflection of the infrared light off of the user’s eyes, an image processing algorithm (based on the pupil position and the point of light on the eye) can determine where on a computer monitor the user is currently looking [5]. With the vigorous development of information technology, many teachers now use multimedia materials in their teaching. This trend has received considerable attention, and has helped many learners by providing an easier access to the content. However, these multimedia materials are dynamic stimuli, and present limitations in their application for desktop eye trackers, such as the definition of ROIs and analysis software support issues. To solve the problems above, portable eye trackers have been used in many studies, which are more user-friendly as they allow users to watch the actual 3D environment or objects. In addition, the eye tracking images can also be projected back to the 3D environment with ideal processing speed and accuracy [12]. In recent years, many researchers have been devoted to the study of game-based learning, through which learners can acquire knowledge and solve problems at the same time. Since games can easily stimulate learners’ interests, they usually prove to be great incentives. In addition, psychological experiments have also confirmed that the incorporation of multimedia can stimulate learners’ multiple senses (e.g. sight, hearing, taste, touch, etc.) in the process of learning, which can stimulate the brain and facilitate success in terms of learning purposes. Thus, eye tracking technology has been frequently used in multimedia learning research lately, as instructional designers attempt to revise multimedia learning materials by tapping into research of visual attention via eye tracking technology, in order to improve learners’ cognitive processing and learning effectiveness [13].

Therefore, this study used portable eye tracker and development tools to design a set of game-based English vocabulary learning materials for the experiment. Based on the aforementioned research background, this study aimed to answer the following questions:

  1. 1.

    In the DGBL context, what is the participants’ cognitive processing sequence among different ROIs?

  2. 2.

    In the DGBL context, what can be inferred from the participants’ eye tracking data among different ROIs?

2 Methods

2.1 Participants

The present study recruited 38 students in the third year of study in the school of information and design at a university in southern Taiwan. Due to the limitations of the eye tracker used in this study, those with high myopia (≤−10 diopters) or severe astigmatism (≥2 diopters) are not qualified for the experiment. Regardless of gender, the 38 participants are divided into High Competence Group (n = 5), Intermediate Competence Group (n = 8), and Low Competence Group (n = 25), according to their performance in the English courses at the same university.

2.2 Materials

The experiment material in this study is an English vocabulary game developed (with HTML5) and designed by the researcher for the purpose of Digital Game-Based Learning (DGBL). The “English Vocabulary Learning Game” have three types of questions (i.e. elementary, intermediate, and advanced). Each type consists of four questions, which are randomly presented on the monitor. The game is three minutes long and its start page was shown in Fig. 1. As the participant pressed the start button, he or she would immediately enter the answering mode interface; at the same time, the eye tracker would record the participant’s eye movement data throughout the game so that the game itself can help achieve the goals of teaching. The presentation of an example word in the “English Vocabulary Learning Game” was shown in Fig. 2.

Fig. 1.
figure 1

Start page of the “English Vocabulary Learning Game”.

Fig. 2.
figure 2

Layout of an example word in the “English Vocabulary Learning Game”.

2.3 Instruments

This study used a portable eye tracker (EyeNTNU_p) to conduct eye tracking. This eye tracker has an infrared pupil detection lens which can record wide-angle (up to 120°) real-life scenes and is very suitable for the angle of view of human eyes. The equipment can record the points and durations of fixation in the actual view of the participant as he or she perceives the external 3D environment, thus proving highly adequate for eye tracking which records dynamic reading behaviors. The hardware and software equipment of the portable eye tracker in this study was shown in Fig. 3. During the actual experiment, the 5-point calibration process was carried out under the guidance of the researcher or the assistant. A photo of participant wearing the eye tracker during the actual experiment was shown in Fig. 4.

Fig. 3.
figure 3

The portable eye tracker and tablet device used in the present study.

Fig. 4.
figure 4

A participant wearing the eye tracker.

In this study, the game was created and designed based on the game design methodology to ensure that the game content can incentivize the participants’ learning motivation [14]. Two professional English teachers with competence in game design were commissioned to review the content validity, so as to successfully construct cognitive processing in the present study.

2.4 Experimental Process

Before the portable eye tracker started to record eye movement data, it would first undergo a 5-point calibration so that the system could accurately collect eye-movement data. After the research team had assisted the participant to correctly put on the portable eye tracker, the participant would receive a series of instruction for calibration, with participant’s head moving as little as possible. When the calibration was in progress, both eyes of the participant are making saccadic movements, and will move quickly in the same direction and amplitude. In order for the eyes to focus on the stimulus, the fovea will be aligned at the same position. After calibration was completed, the formal experiment was conducted. A photo of the portable eye tracker-enabled experiment was shown in Fig. 5.

Fig. 5.
figure 5

The portable eye tracker-enabled experiment in progress.

2.5 Data Analysis

In this study, the eye tracking experiments were enabled by the portable eye tracker, and two auxiliary tools (i.e. Dynamic ROI Tool and Fixation Calculator Tool) were adopted to execute the preliminary compilation of eye movement data. Then, the participants’ eye movement data were processed in terms of fixation patterns to analyze the participants’ visual attention distribution when they were exposed to the DGBL content. The system architecture of the portable eye movement analysis software was shown in Fig. 6.

Fig. 6.
figure 6

System architecture of the portable eye movement analysis software

After participants viewed the DGBL content, the eye tracking system used in this study automatically exported a video file (.wmv) to help define the dynamic ROIs. After the dynamic ROIs had been defined for the entire video, the ROI definition tool module would generate a dynamic ROI file, which would be subsequently imported into the visualization analysis software for eye movement indicator analysis. In this study, a digital game-based English vocabulary learning game was designed as the experiment material, with its layout divided into four ROIs based on the eye tracking data analysis program. The four ROIs and their respective names were shown in Fig. 7, with the red boxes signifying the regions which needed to be defined after the experiment. Once the participant had finished the questions in the game-based learning content, he or she could raise a hand to inform the researcher or assistant, which marks the completion of the experiment.

Fig. 7.
figure 7

The four ROIs defined in this study.

The present study collected eye movement and fixation data via portable eye tracker and analyzed such data with eye movement trajectory visualization analysis software. The eye movement indicators applied in this study were as follows:

  1. 1.

    Latency of First Fixation (LFF): The latency from the onset of the stimulus to the initial fixation on the defined ROI, when the participant is involved in the DGBL content.

  2. 2.

    Duration of First Fixation (DFF): The duration of fixation when a participant’s first fixation is formed in an ROI as he or she is involved in the DGBL content.

  3. 3.

    Total Fixation Durations (TFD): The total fixation duration when a participant’s fixation falls into a certain ROI, as he or she is involved in the DGBL content.

  4. 4.

    Fixation Counts (FC): Fixation counts reflect the importance of the ROI. More fixation counts indicate that the participant considers the ROI more important or thinks that it can provide more clues.

3 Results and Discussions

3.1 Differences in the Participants’ Visual Attention Patterns

The latency of first fixation (LFF) indicator illustrated that the 38 participants’ sequence of first fixation at the four ROIs was ROI3→ROI4→ROI2→ROI1, as shown in Table 1. In addition, differences in the first fixation sequences can be observed among the three different groups. The LFF indicator illustrated that the High English Competence Group’s sequence of first fixation at the four ROIs was ROI4→ROI1→ROI2→ROI3, as shown in Table 2. The Intermediate English Competence Group’s sequence of first fixation at the four ROIs was ROI3→ROI2→ROI4→ROI1, as shown in Table 3. The Low English Competence Group’s sequence of first fixation at the four ROIs was ROI3→ROI4→ROI2→ROI1, as shown in Table 4. Hence, it can be observed that the fixation sequence of the four ROIs from participants of different groups was different.

As for the duration of first fixation (DFF) indicator, the data collected illustrated that the ROI which received the longest DFF from the 38 participants was ROI3, as shown in Table 1. In addition, the DFF indicator showed that the ROI which received the longest DFF from the High English Competence Group was ROI4, as shown in Table 2. The ROI which received the longest DFF from the Intermediate English Competence Group was ROI3, as shown in Table 3. The ROI which received the longest DFF from the Low English Competence Group was also ROI3, as shown in Table 4. Hence, it can be seen that during the experiment, for participants of Intermediate and Low Competence Groups, ROI3 (English definitions of the vocabulary) was the ROI which received the most visual attention, showing a difference from participants of the High Competence Group in terms of visual attention during the first fixation.

According to the indicators of total fixation durations (TFD) and fixation counts (FC) from the 38 participants, the sequence of ROIs with different levels of fixation was ROI3, ROI4, ROI1 and then ROI2, in terms of both TFD and FC, as shown in Table 1. This result showed that when participants were answering the questions in the digital game-based English vocabulary learning material, their visual attention was more focused on the definitions of the vocabulary in order to successfully complete the tasks. As for the Chinese translations of the vocabulary, it received the least TFD and FC, probably since it is the native (first) language of the participants. In addition, the TFD and FC indicators also showed different patterns among the groups. For the Intermediate English Competence Group, the sequence of ROIs in terms of TFD was ROI3, ROI1, ROI4, and then ROI2. However, in terms of FC, the sequence of ROIs was ROI3, ROI4, ROI1, and then ROI2, as shown in Table 3. This result showed that in the digital game-based English vocabulary learning context, the Intermediate English Competence Group’s visual attention was, as in the overall trend, more focused on the definitions of the vocabulary. However, they were also paying attention to the visual cues (i.e. the relevant pictures) so as to successfully complete the tasks assigned. On the contrary, participants from the High and Low English Competence Groups simply focused on the definitions of the vocabulary to answer the questions, as shown in Tables 2 and 4.

Table 1. List of LFF, DFF, TFD, and FC in each ROI from all participants
Table 2. List of LFF, DFF, TFD, and FC in each ROI from the High English Competence Group
Table 3. List of LFF, DFF, TFD, and FC in each ROI from the Intermediate English Competence Group
Table 4. List of LFF, DFF, TFD, and FC in each ROI from the Low English Competence Group

3.2 Cognitive Comprehension in the Game-Based Learning Setting

According to the 38 participants’ final results of the game, the percentage of correct answers was 50%, the percentage of incorrect answers was 25%, and the percentage of missing answers was 25%. In terms of different levels of English proficiency, the percentage of correct answers was 83.3% for the High Competence Group, 58.3% for the Intermediate Competence Group, and 33.3% for the Low Competence Group. However, it was worth noting that the percentage of missing answers was as high as 25%, and most missing answers were submitted by participants from the Low Competence Group. This result was likely due to the fact that this digital vocabulary game consisted of a number of more advanced English words, which created nuisances in cognitive comprehension for the said participants. Although each word is accompanied by its definition as well as a relevant picture, participants from the Low Competence Group could only answer the simple questions correctly due to the difficulty arising from issues of cognitive comprehension. As a consequence, they either had an incorrect answer or decided to simply skip the question when the English word is of intermediate or advanced difficulty.

4 Conclusions and Suggestions

The present study aimed to explore differences in visual attention of participants with different levels of English proficiency in the DGBL context based on eye movement data. The results of the study can not only pinpoint discrepancies in the participants’ cognitive comprehension and corresponding visual behaviors, but also serve as reference for the design of DGBL materials in the future. The conclusions of this study are presented below in terms of 1) eye movement behaviors and 2) cognitive comprehension in the game-based learning setting:

  1. 1.

    Eye movement behaviors: In terms of LFF, the learners’ fixation sequence was ROI3→ROI4→ROI2→ROI1. Nonetheless, for the three groups with different levels of English proficiency, different fixation sequences of the four ROIs can be observed. In terms of DFF, the ROI which received the longest DFF from the learners was ROI3. Nonetheless, differences can also be identified among the groups with different levels of English proficiency. For learners of Intermediate and Low Competence Groups, ROI3 was the ROI which received the most visual attention, showing a difference from learners of the High Competence Group in terms of visual attention during the first fixation. According to the indicators of Total Fixation Durations (TFD) and Fixation Counts (FC) from the learners, the sequence of ROIs with different levels of fixation was ROI3, ROI4, ROI1 and then ROI2, in terms of both TFD and FC. This result showed that when learners were answering the questions in the digital game-based English vocabulary learning material, their visual attention was more focused on the definitions of the vocabulary in order to successfully complete the tasks.

  2. 2.

    Cognitive comprehension: According to the final results of the game, the overall percentage of correct answers is relatively low and the percentage of missing answers is relatively high. In terms of different levels of English proficiency, learners from the High Competence Group do have a higher percentage of correct answers as compared to the learners from the Intermediate and Low Competence Groups, which is consistent with the learners’ academic performance in English courses. As for the relatively high percentage of missing answers, it is very likely contributed by learners from the Low Competence Group, who encountered difficulties in cognitive comprehension when the English word in question was of intermediate or advanced difficulty. As a consequence, they either had an incorrect answer or decided to skip the challenging questions. Hence, this digital English vocabulary game can reflect the discrepancies in the cognitive comprehension of the learners with different levels of English proficiency.