1 Introduction

The learning scenario for English as a Foreign Language (EFL) learners is of vital importance [19]. Situated learning can simulate authentic or virtual situations which learners may confront abroad or in their daily life [2]. Therefore, students can experience how to use English properly in realistic conditions or solve problems in certain situations using the knowledge they have learned and organized. If the classroom could be situated in a contextual setting allowing the students to experience language learning, such as role play or stimulation, it would enhance their learning motivation and achievements [37]. A previous study mentioned three augmented reality (AR) features, namely the integration of the virtual and real worlds, interaction in real time, and operation in three-dimensional (3D) space [3]. Directly using AR in the real world could more effectively support students in achieving the learning the course objectives. There is thus a real need for teachers or instructional designers to design AR learning activities [29].

Recently, an increasing number of studies have explored whether it is feasible to integrate games and learning. Research on mobile-assisted game-based learning has found that the quality of flow inspired by games with different levels of difficulty was the same as long as the level matched the capabilities of the individual students [6]. How to link the learning knowledge in the game to the learning objectives of the class is one of the challenges [16, 20, 47]. Strengthening the connection between the content of the game and the knowledge of the learning subjects during the game playing process could significantly improve the students’ learning effectiveness [14, 33]. It is too late to make the connection between the game and the knowledge once the game is over. Therefore, learners should construct knowledge while playing [14, 33]. Otherwise, even if they gain high scores, their subject knowledge will not be improved [16]. When students are unable to effectively link the game context with the subject knowledge, the course instructor and game designer should provide guidance and help them simultaneously play the game and learn the knowledge [13, 16, 41, 42]. Game-based e-learning has been confirmed as being able to strengthen students’ learning motivations and promote their enjoyment of learning when they learn from the situated context in the game scenarios [18]. In comparison with traditional learning, students have been found to have better memory retention in situated game-based learning [23].

The different settings of game characters can have various learning effects according to gender [17]. Therefore, this study employed two different game-based models; one used a collective game-based design (CGB) [8], while the other used a sequential-mission gaming design (SMG) [45], in an AR English learning system for third-grade students who were further divided by gender. The collective game-based AR English learning system was designed according to the interests of the children, while the sequential-mission system required them to complete each stage one by one. Both the collective and the sequential-mission gaming designs included the characteristics of competition. Strong competitiveness may be one of the factors of gender difference [17]. Moreover, the cognitive load, flow experience, and learning effectiveness of students who used the CGB and SMG designs to assist their language learning were explored.

2 Literature review

2.1 Augmented reality and game-based learning

The techniques of showing AR include monitor-based augmented reality displays, video see-through augmented reality displays, and optical see-through augmented reality displays [31]. In this study, we adopted the monitor-based augmented reality display method and used mobile device screens to show the augmented information and reality objects captured from real life [35]. When an AR system provides the individual students with immediate feedback on their progress, self-paced and self-directed learning can occur [10].

With the advanced development of mobile devices, AR is leading to explosive development in the entertainment industry, such as the Pokemon Go game. However, AR can also be used in educational contexts. The current study, therefore, employed AR with two different mechanisms in an EFL class. The gender difference between attitudes and behaviors when using AR has rarely been explored, although many studies have investigated the differences between the performance and attitudes of males and females exposed to information and technology conditions. We, therefore, felt that it would be valuable to identify the most appropriate treatment for students of different genders when teachers introduce AR into their learning activities.

2.2 Gender differences

The different settings of game characters can have various learning effects according to gender [17, 48]. Scholars have emphasized that studies should explore the effects of gender on game-based learning [21]. However, while it has been suggested that gender differences be taken into consideration when developing educational games [44], some previous game-based learning studies found no differences [28, 40], especially among young children. One study showed no gender difference in performance after learning with AR technology [22]. On the other hand, some studies have found gender differences. For example, Robertson [44] found that female students had better learning effectiveness than males because they took more time to write dialogues in the games they were playing, while Hou and Li [24] found that male students outperformed female students in game-based learning. Another study using AR in a school environment indicated that AR is more exciting and attractive to male than to female students [4]. Due to the contradictory findings of these previous studies, the aim of the current study was to further explore which features of game-based learning benefit females and males, especially young students. It was supposed that, regardless of gender, the third-grade students would make significant progress in their learning and achieve excellent learning effectiveness using the different AR gaming designs. In addition, it was expected that the females and males would both achieve similar learning effectiveness.

2.3 Flow experience

“Flow state” is the term used to describe the state of mind entered while performing an activity and refers to an optimal experience which happens in certain activities [15]. When an individual is immersed in a flow state, he or she will experience high concentration, time passing rapidly, a balance between challenge and skills, and positive enjoyment [5]. Therefore, when a student has a high flow state during an activity, he or she will feel excited, ecstatic, and be highly efficient during the learning process. Once the activity has finished, the experience of the flow state during that period of time is referred to as “flow experience.” It should be noted that passive activities usually do not elicit flow experiences, as individuals have to actively do something to enter a flow state. However, it is worth exploring the flow states of students when they conduct different AR game learning activities. Recently, therefore, many game-based learning studies have considered the flow state of students as they play games [7, 26, 49]. In the current study, it was supposed that the flow experience of the students using different AR gaming designs would vary. In addition, whether the gender difference and different learning systems had an impact on the flow experience of the students was also explored.

2.4 Cognitive load

Cognitive load includes extraneous and intrinsic cognitive load. Mental effort refers to extraneous cognitive load which occurs when students use additional knowledge not helpful to the learning, though increasing the loading of their learning memory [43]; in other words, the learning approach and the structure of the instructional material influence the students’ mental efforts. On the other hand, the degree of difficulty and the amount of learning material have a great impact on the students’ mental load or intrinsic cognitive load. The students’ cognitive load increases when they have difficulties linking what they are learning with their prior knowledge during the learning process. Therefore, it was hypothesized that the mental load of the third-grade students using different AR gaming designs would vary, while their mental efforts would be similar. The study also explored whether the gender and learning systems had interactive effects on the students’ mental effort and mental load.

3 Method

3.1 Learning content

The learning content of the English AR educational games was concerned with the learning objectives of the third-grade students, which are related to their English learning textbook. The textbook adopted by the school for these EFL beginners was the series “Our Discovery Islands,” published by Pearson. In Lesson Three, there were seven new words: pen, pencil, chair, book, ruler, table, and eraser. The spelling instructional video made by the instructors demonstrated the pronunciation of the word one time, followed by spelling out the word two times, and then the pronunciation one more time. As the letters were read out, they were shown in red. The students watched their screen and spelled out each word and pronounced the vocabulary loudly together according to what they saw in the video. Because the first ten students in the class who collected all the targets or correctly completed all the stages were rewarded, all the students in both groups tried their best to complete the learning game as quickly as they could.

3.2 System framework and function

In this study, two AR systems were developed based on different game mechanisms for learning the target English vocabulary, that is, the new words in the first textbook of the third-grade elementary school English course. One game mechanism was named the collective game-based (CGB) design, while the other was called the sequential-mission gaming (SMG) design. There were a total of seven reality targets corresponding to seven objects in the real-life situation in both systems. These seven targets could be collected randomly at the same station in the CGB mode, though had to be collected stage by stage at different stations in the SMG mode. The AR system framework is shown in Fig. 1.

Fig. 1
figure 1

System framework for the CGB and SMG AR English learning systems

For tracking and recording the students’ learning process, they had to log into their account before starting to play the game. After practicing using the system, they could enter the AR game assigned to them. Every student was equipped with a tablet PC and earphones so they could watch the information on their own screen and hear the pronunciation and spelling of the English vocabulary, as shown in Fig. 2a.

Fig. 2
figure 2

Process of using the multimedia learning materials with sound and animation to complete one learning target

After the students studied the learning material of each target item (i.e., Fig. 2a), there was an immediate spelling matching test on the tablet PC (i.e., Fig. 2b) which was designed to help them review and reflect on what they had learned. Because the students were very young, it was not appropriate to ask them to type on the small screen. Therefore, the spelling matching tests were designed so that they only had to use their finger to drag the letters into the right sequence. After spelling the word correctly, they could then click the submit icon. The system would automatically assess whether the word was spelt correctly or not.

In the SMG mode, if the students answered incorrectly two times in a row, they were directed back to the learning material. After reading it, they had the same test again, two more chances to answer correctly. In this mode, the students had to answer correctly in order to go on to the next assigned stage. Regardless of their level of prior knowledge, all of the students competed to be the fastest to play the game.

In the CGB mode, the first time the students answered incorrectly during the test, the system would provide them with a hint, giving them one letter, to help them answer; for example, the hint for pen was “p_ _”. After the students read the hint, they could answer the question again by rearranging the positions of the letters, as shown in Fig. 2b. If they answered correctly within the two attempts, the system would record that they had collected that reality target. However, if they answered incorrectly again, the system would leave the learning target directly and record that the student had not collected it. The student could then choose one learning target from all the remaining reality targets in the station. If the students answered correctly, the system would give them positive feedback, as shown in Fig. 2c. Finally, they could choose any failed targets and re-study the multimedia learning material when they had worked through all of the reality targets.

3.3 Participants

Two classes of third graders with an average age of nine from an elementary school in Taipei County in Taiwan participated in this study. They were learning English as a Foreign Language (i.e., EFL) which they studied for 3 h a week. It was the first semester of the English course, and the students were all beginners. A total of 20 students (12 males and 8 females) in one class were assigned to use the SMG system; 18 students (9 males and 9 females) in the other class were assigned to use the CGB system. The two classes were taught by the same instructor, a female teacher with more than 10 years of elementary school teaching experience. All students had had previous experience of using a tablet PC and were familiar with using their fingers to draw on a tablet.

3.4 Measuring tools

The research tools in this study included a pretest and posttest of English vocabulary, and the questionnaire for measuring the students’ “flow experience,” and “cognitive load.”

The pretest aimed to assess whether the basic knowledge of the students in the two groups was equivalent before they participated in the learning activity, while the posttest assessed their comprehension of the vocabulary after the experiment. Both tests consisted of seven matching items with scores ranging from 0 to 7. The assessment items were developed by two experienced English teachers and corresponded to the learning content in the experiment.

The cognitive load questionnaire was developed based on the measures of Paas [38] and Sweller, Van Merrienboer, and Paas [46]. It consisted of eight items in a seven-point Likert rating scheme, including three items for “mental effort” and five for “mental load” [27]. The Cronbach’s alpha values of the questionnaire and the dimensions of mental load and mental effort were 0.90, 0.87, and 0.86, respectively.

The questionnaire of flow experience was modified from the measure developed by Kiili et al. [30]. It consisted of nine items in a five-point Likert rating scheme, such as “The user interface of the learning system was easy to use. I could easily find all the necessary functionalities and information” and “I really enjoyed the playing experience. It was so gratifying that I want to capture it again for its own sake.” The Cronbach’s alpha value of the questionnaire proposed by the original study was 0.78, implying acceptable reliability in internal consistency.

3.5 Experimental procedure

The implementation of the lesson plan took a total of 4.5 periods of 40 min each (around half a day). Before the experiment, the students spent 20 min taking the English vocabulary pretest. Thereafter, they spent 40 min learning how to operate the system on the tablet PC.

Using the assigned AR learning system, the students took two periods to learn the new words which were selected from their textbook. Figure 3 shows the experiment procedure. Finally, the posttest of learning effectiveness was conducted and the students’ cognitive load and flow experience were investigated. Finally, the students and teachers were also interviewed after they experienced the use of the AR learning systems in the English course.

Fig. 3
figure 3

Experimental procedure

3.6 Data analysis

In this study, the two-way MANOVA method was employed to analyze the effects of gender and AR game design on the students’ flow experience and cognitive loads. The cognitive loads were further divided into mental effort and mental load. Therefore, the two independent variables are system and gender. The three dependent variables are flow experience, mental effort, and mental load.

In terms of learning effectiveness, the study conducted a two-way ANCOVA to analyze the effect of gender and AR game design on the English vocabulary learning effectiveness because there was a covariance (the pretest). In addition, paired sample t-tests of the posttest and pretest for each system were employed to compare the improvement in the students’ vocabulary proficiency.

As for the issue of sample size, it is rather difficult to achieve statistical significance for a smaller sample size in comparison with a large sample [12]. The small sample size is one of the research limitations of this study which needs to be addressed in future studies.

Since the sample size involved in this study is rather small, the coefficient of effect size may offer more informative implications of the data in this case. Because the coefficient of effect size is the standardized statistical score, it can give a practical estimate of the noteworthiness of the results and help future studies to conduct meta-analysis of different studies [34]. Consequently, this study reports effect sizes along with the results of the statistical significance testing.

4 Results

The homogeneous hypothesis was confirmed for the two-way MANOVA method as the value of Box’s M is 12.04 (p > 0.05). The result of the multivariate tests showed that there was an interaction between gender and system (F = 3.516*, Wilk’s Lambda Sig p = 0.026 < 0.05, Partial η2 = 0.248). After testing the null hypothesis that the error variance of each dependent variable is equal across groups, the assumption of homogeneity of regression was not violated for flow experience with F = 0.319 (p > 0.05), mental effort with F = 1.074 (p > 0.05), and mental load with F = 1.146 (p > 0.05). Table 1 further presents the between-subject effects. Moreover, the effects of the two factors on each dependent variable are provided in the following sections.

Table 1 Tests of between-subject effects for each dependent variable

4.1 Flow experience results

A significant effect was observed for the interaction between gender and system (F = 7.602**, p < 0.01) on the students’ flow experience, implying that directly investigating the simple main effects of dependent variables was reasonable. Table 2 not only displays that the males outperformed the females on average, but also shows that the females using SMG had significantly lower flow experience in comparison with the females using CGB and the males using either SMG or CGB. Figure 4 shows the interaction effects between gender and system on the students’ flow experience.

Table 2 Simple main effects of gender and system on flow experience
Fig. 4
figure 4

Interactions between gender and system

From Table 1, it can be seen that there was also a main effect of system on the students’ flow experience (F = 8.784**, p < 0.01). When the flow experience of the students using the CGB and SMG English AR games was compared, it was found that there was a significant difference, with that of the students using the CGB mode (Adjusted mean = 30.22) higher than that of the students using the SMG mode (Adjusted mean = 25.04).

4.2 Cognitive load results

The students’ cognitive load included their mental effort and their mental load. Mental effort refers to extraneous cognitive load, while mental load means intrinsic cognitive load. Mental load is mainly caused by the quantity of learning material and the interactive effects between the learning material and the student’s proficiency [46], while mental effort mainly results from the design of the learning material. For example, if the system provided the students with enough guidance, they would not waste mental effort learning how to use it.

No significant impact was found on the interaction between gender and learning systems on mental effort (F = 1.490, p > 0.05) or mental load (F = 1.219, p > 0.05), implying that directly investigating the main effects of dependent variables was reasonable.

The Mann–Whitney Test was conducted to compare the effects of the learning systems on the students’ cognitive loads, as shown in Table 3. The results displayed no significant difference between the mental effort of the students who used the CGB and the SMG mode (Z = − 0.52, p > 0.05). The mental effort of the students who used the CGB mode (mean = 1.83; SD = 1.07) was as low as that of the students who learned with the SMG mode (mean = 1.92; SD = 0.78). However, there was a remarkable difference between the mental load of the students in the two groups (Z = − 2.13*, p < 0.05), with that of the students who used the CGB mode (mean = 1.57; SD = 1.04) significantly lower than that of the students who learned with the SMG mode (mean = 2.10; SD = 0.77).

Table 3 Mann–Whitney test between the CGB and SMG modes for mental effort and mental load

4.3 Improvement in learning effectiveness

Before the experiment, the mean scores of the pretests of the students using the CGB and the SMG modes were 5.33 (SD = 1.82) and 4.90 (SD = 1.80), respectively, so there was no significant difference between the two groups (Z = 0.88, p > 0.05).

After the experiment, the Mann–Whitney Test was conducted to compare the posttest scores of the students in the two groups. The results displayed no significant difference between the academic knowledge of the students using the two modes (Z = 0.25, p > 0.05). The learning effectiveness of the students using the CGB mode (mean = 5.67; SD = 1.68) was similar to that of the students using the SMG mode (mean = 5.45; SD = 1.73), implying that the students achieved the same academic knowledge level.

The students in the SMG mode made significant progress in learning effectiveness (Paired sample t = 2.34*; p < 0.05), as shown in Table 4. However, the students in the CGB mode did not make such remarkable progress (Paired sample t = 1.46; p > 0.05). In sum, the SMG mode forced the students to repeatedly study the learning material until they passed the spelling task of the learning target, which seems to have been useful in helping them remember the spelling.

Table 4 Paired sample t-tests of the posttest and pretest for each system

4.4 Gender and system effects on learning effectiveness

In this study, we further analyzed the effects of gender and the different GBL systems on the students’ learning effectiveness. A two-way ANCOVA was employed using the pretest scores of learning achievement as a covariate, learning system (CGB/SMG) and gender (female/male) as independent variables, and the posttest scores of learning achievement as the dependent variable. After verifying that the assumption of homogeneity of regression was not violated with F = 0.641 (p > 0.05), the posttest scores of the four groups were analyzed using two-way ANCOVA. As shown in Table 5, no significant effect was observed for the interaction effect between independent variables (F = 2.05, p > 0.05, Partial η2 = 0.058) on the students’ learning achievements, implying that directly investigating the main effects of dependent variables was reasonable.

Table 5 Two-way ANCOVA for the posttest

It was found that the learning effectiveness (Adjusted mean = 5.71, SE = 0.23) of the female students was similar to that of the males (Adjusted mean = 5.38; SE = 0.21) (F = 1.08, p > 0.05, Partial η2 = 0.032), as shown in Table 6. Meanwhile, no significant difference was found between the learning achievements of the students who learned with the two learning systems (F = 0.13, p > 0.05, Partial η2 = 0.004).

Table 6 Descriptive data and adjusted posttest scores for each group

The students who used the two learning systems all performed well. The adjusted mean score in the posttest of the students who learned with the CGB system was 5.49 compared with 5.60 for those who learned with the SMG system. In sum, the two systems achieved high learning effectiveness for both the female and male students, so no significant gender difference was found between the two groups.

4.5 Teacher interviews

The teachers (i.e., one instructor and a pre-service teacher) both indicated that the students were not highly motivated to learn in the traditional class when the instructional material in the textbook only included a picture and text, even though the textbook also provided them with a short story context. However, when the game-based scenario was integrated with the actual learning environment, the students were very excited. After the learning activities, the teachers found that the students would like to use a similar system to learn and asked whether the application was free.

In addition, because the system has log-in records (see Fig. 5) in the database, the teachers could further explore individual students’ learning processes. The teachers liked to refer to the learning portfolios so as to adjust their future instruction. They stated that the students experienced learning English in realistic conditions. As a result, the teachers knew that they had changed from being instructors to advisors or facilitators giving opinions, scaffolding, or providing feedback during the learning activities.

Fig. 5
figure 5

Student uses his/her account to log into the AR system

Because it is relatively difficult for third-grade students to type on a tablet PC, the log-in process is the only time when they had to use the virtual keyboard. The teachers appreciated this aspect of the design.

The teachers also pointed out some system features which they found helpful in the learning activities. Before the main learning activities, the system provided the students with an explanation and system operation instructions when they clicked the guidance button on the log-in page, as shown in Fig. 5. The teachers found that the students showed great interest in this, although they had only operated the demonstration system, not the target content. The teachers agreed that the system provided an innovative and appropriate way for the young children to interact with it; that is, the students directly used their fingers to move and drag the letter icons shown on the screen. Those icons were actually in picture format rather than text. The teachers stated that this approach was helpful for the young students who were not familiar with typing on the small screen and who were only novices in learning English. When the students spelled the vocabulary by correctly ordering the letter icons, the system judged whether they had correctly spelled the word. The students were then very excited about getting feedback when they answered the questions. After the experiment, the teachers also wanted to design new learning units by applying AR. Overall, the two approaches gave the teachers involved a very successful experience of conducting mobile learning integrated with the actual environment for learning a foreign language.

5 Discussion

This study employed two different game-based models, a collective game-based (CGB) design and a sequential-mission gaming (SMG) design, in an AR English learning system for third-grade students and conducted an experiment to explore the students’ learning effectiveness, cognitive load, and flow experience. These two AR gaming designs overcame the boring process of conventional vocabulary learning (e.g., learning passively by listening to the teacher’s explanations) [25], by immersing the students in the situated surroundings and the corresponding learning content.

One concern of using AR in an educational context is that the students who apply it in their learning may be cognitively overloaded by the large amount of information they encounter [51] when they have to use multiple technological devices to complete the tasks. Therefore, we explored the students’ cognitive loads during the intervention. It was found that the mental load of the students who used the CGB mode was lower than that of the students who learned with the SMG mode. There are two possible reasons for this difference. Firstly, the CGB mode provided the students with spelling cues or hints to help them recall what they had just learned. Secondly, the CGB mode did not force the students to go back to the same reality target and learn it again if they answered incorrectly two times. A little mental load may have caused less flow experience for the students who used the SMG AR English learning system; however, they made significant progress in their learning. Scholars have noted that excessive mental load is not good for students’ learning outcomes [39], so some system designs have tended to focus on lowering the users’ cognitive load [11]. However, the results of this study remind researchers that too little mental load is not necessarily better, as was also mentioned in a previous study [50]. Sometimes, students may gain more with a small degree of loading and challenge. If the mental load is too low and the challenge is too easy for the students, they may not learn as much as they would by overcoming difficulties. They can gain benefits from conquering their learning obstacles, thus achieving higher learning effectiveness. The proper integration of information and appropriate difficulty of challenge are of vital importance when instructors or researchers design an AR system for personalized learning for any subject.

The students who used the CGB and SMG modes all made significant progress and achieved a similarly high level of academic knowledge. At the same time, the mental effort of the students who used the CGB mode was as low as that of the students who learned with the SMG mode. The integration of the above-mentioned results implies that the two AR learning systems for third-grade students to learn English vocabulary in situated surroundings were easy and efficient for the students to use because, as scholars have noted, lower mental effort with higher performance is the most efficient way of learning [39, 46].

In sum, the students had excellent learning effectiveness in the posttest, regardless of whether they used the CGB or the SMG AR educational game system in this study. A previous study hypothesized that there would be no gender differences in test scores or engagement in a game if the designed AR game included elements targeting both genders [17]. However, their results showed that there were statistically significant differences in the posttest between males and females using the AR game, and they inferred that the possible reason for contradicting their original hypothesis was that the female students had more trouble learning to use the AR platform, as girls have lower 3D spatial ability than boys on average [48]. In contrast, the spatial ability of the different genders did not have such a significant impact on learning in the current study due to two possible reasons. Firstly, the current study was limited to a classroom scenario, so spatial ability did not have so much impact on the two AR games in this study. Secondly, we provided a training session before the game began, following the suggestion of Echeverría et al. [17].

In this study, we also compared the cognitive load and flow experience of the students using the CGB and the SMG AR educational game systems and confirmed that both systems elicited the same low mental effort. These results conform to previous studies which found that each student equipped with a tablet PC for mobile learning had better learning effectiveness [1], and especially effective use of AR [32]. It was concluded that the CGB system was better able to foster flow experience, as the students in this group experienced a higher flow experience than those in the SMG group. It was inferred that an important reason for this was that the students in the CGB group were able to control the steps themselves. When AR is effectively applied in education, it can increase students’ learning interest and concentration [51, 52]. In terms of flow experience, the males and females all performed well in the CGB AR educational game, though the males outperformed the females in the SMG game. Therefore, it is suggested that instructors adopt the CGB mode more frequently in their classes for young students to promote students’ flow state in game-based learning.

In the post-interview with eight students from each group, it was found that they preferred to use their fingers to move and drag the letter icons to spell the words rather than typing with the keyboard. The small keyboard has been identified as one of the difficulties of using mobile technology in learning [9, 36]. In this study, we, therefore, adopted an effective interface design which allowed for the easy movement of letters on the screen so that the students would not encounter such difficulty. If the students moved any letter to the wrong position, the system would show encouraging feedback with a cute picture asking them to recompose the letters and spell the word again. The students said that they wanted to gain scores in the game so they immediately re-tried and did not feel bored at all. Scholars have mentioned the importance of encouragement in mobile learning. The students in the current study also demonstrated that they were excited about getting instant responses. They would get encouraging feedback when answering incorrectly, and more exciting feedback when answering correctly. The students stated that they felt very excited during the learning process. These results illustrate that both systems provided encouragement for the students.

6 Conclusions

It was found that the students using the CGB mode for learning English not only experienced a higher flow state but also had a lower mental load in comparison with the students using the SMG mode, although there was no significant difference between the learning effectiveness of the students using the two systems. The students were beginner English language learners because the Ministry of Education in Taiwan requires English to be taught in elementary school from the third grade. However, in practice, some students start to learn English privately as early as preschool. This is the reason why the average performance of the students was quite high in the pretest. Therefore, it is suggested that the two English AR game systems could be provided for younger students to examine how they learn, and to compare their learning results with the outcomes of this study in the future. In addition, it is suggested that learning styles or preferences could be taken into consideration in future studies. The main research limitation of the current study was its small sample size (i.e., two classes). The study only explored the two factors of gender differences (i.e., male and female) and AR gaming systems (i.e., CGB and SMG). Since the human factor of gender seems not to have such a remarkable impact on learning effectiveness, other personal characteristics such as learning style or cognitive traits (working memory capacity, inductive reasoning ability, and associative learning skills) could be taken into account in future studies. Moreover, it would be valuable to analyze the behavioral patterns of students using AR learning systems for personalized learning in their actual surroundings.