Introduction

The benefits of well-designed educational games to promote students’ engagement and performance in mathematics classrooms have been well documented (Clark et al. 2016). When educational games are used efficiently, students can have increased engagement in learning and grasp abstract concepts easily (Lee and Hao 2015). Nevertheless, not all educational games are designed equally. A common challenge is balancing what recreational video games do well—engage players and encourage replay-ability—with the learning goals of educational games. Our research and development of educational games are grounded in theories of engagement known to facilitate learning. In particular, we have paid attention to understand students’ engagement state (Game Engagement) that can be differentiated from long-lasting engagement in mathematics learning. We define game engagement as a situational one differentiated from dispositional engagement as prior researchers have discussed (Lau and Roeser 2002; Rotgans and Schmidt 2011; Thijs and Verkuyten 2009). Corresponding to Thijs and Verkuyten (2009) who found that situational engagement was moderated by individual students’ characteristics and contextual characteristics, we hypothesized that game engagement could be influenced by both contextual (educational games) and student factors (students’ prior experiences with the games, and student perceptions of educational games). For students using any instructional technology—including educational games—engagement states drive the moment-by-moment use of the medium or device. There are fluctuating levels of interest and energy, attention, and memory available for learning that ebb and flow with the complexity of information as it is presented and utilized (or not) by the student. We defined the engagement state during gameplay as game engagement and developed the survey instrument. When reinforced, learning further deepens engagement states and students displaying academic engagement. The academic engagement includes the relationship of engagement and achievement that involves a comparatively longer engagement state than game engagement. The description of academic engagement by Fredricks et al. (2004) is subdivided into three domains of behavioral, emotional, and cognitive engagement. Behavioral engagement delineates the behavioral aspects such as participation and involvement in academic activities (Connell and Wellborn 1991) and following school rules (Finn and Rock 1997). Emotional engagement indicates emotional reactions to academics and valuing academic outcomes (Voelkl 1997). Cognitive engagement is defined as the level of investment in learning, including being strategic and willing to exert the necessary effort to obtain difficult skills (Fredricks et al. 2004). In this study, we did not pay attention to academic engagement but focused on game engagement that can lead to mathematics achievement.

The challenge is that there are insufficient instruments that can evaluate the game engagement of students in academic settings although several studies used instruments to measure the game engagement of digital games in non-educational settings (Chou and Ting 2003; Grüsser et al. 2006; Hull et al. 2013; Kneer and Glock 2013; Kim et al. 2008, 2012; Lemmens et al. 2009).

In response to the current needs, we have developed an instrument to measure the game engagement of grade 5 students, taking several iterative steps toward that goal. We first conducted extensive literature reviews on existing surveys measuring school and game engagement of older groups (i.e., high school students, college students). Based on prior research on engagement, we created items to assess the engagement state, attempting to align with our comprehensive theoretical model of engagement. In the following stage, we revised and modified original items to obtain a valid and reliable instrument measuring both baseline engagement and improved engagement before and after the educational game intervention. This paper presents the validation process of the game engagement instrument from initial items to the final engagement constructs.

After we finalized our final constructs of game engagement, we explored factors associated with engagement in gameplay and analyzed data collected from several rounds of testing. We examined the effects of students’ perception of their ability to learn games, gender, and the amount of gameplay as several studies in the field: the perceptions of people’s learning ability of games (Ha et al. 2007, Hsu and Lu 2004), the gender effect (Bressler and Bodzin 2013; Hoffman and Nadelson 2010), and students’ prior ability level (Barkatsasa et al. 2009), and the gameplay amount (Hoffman and Nadelson 2010; Sherry 2001). In the next step, we expanded our effort to analyze the effect of game engagement on students’ mathematics performance. In the analysis at this stage, we included students’ perception of educational game features that may also significantly relate to students’ performance.

The examination of the game engagement and game features in mathematics classrooms is of critical importance considering that there is not empirical research on educational games. We hope that our research findings add new information to the research field of educational technology.

To examine students’ game engagement, we used [Math App] that was developed by our research team to motivate students aged 10–12 into the deeper learning of fractions. For the game development, we appropriated theories of mathematical thinking to ground the design of educational games, realizing that our mission is to educate and not just entertain. Using the [Math App], we tested students’ engagement states after gameplay in real-life examples in interactive learning environments and explored if the improved game engagement leads to increased mathematics performance of students who are at the pre-algebraic learning stage.

For the study, the research team recruited hundred 5th graders and played the [Math App] to learn fractions in two different conditions: Control (40-min gameplay) and experimental (200-min gameplay) conditions. Before assigning students to two conditions, we collected the data on students’ perceptions of their ability to learn games, gender, and prior ability in mathematics, assessing student game engagement and perception of game features after the gameplay. We analyzed the data by conducting a series of structural equation modeling having separate sets of measurement models and structural models. Throughout the whole process, the following research questions guided the study:

  1. 1.

    Does the game instrument produce reliable and valid subdomains (behavioral, emotional, and cognitive) of game engagements? If it does not, what are the subdomain constructs converged with the game instrument?

  2. 2.

    Do students’ characteristics such as game learning ability, gender, and prior performance level relate to game engagement?

  3. 3.

    Does the game engagement show a significant association with students’ math performance?

  4. 4.

    Does [Math App] gameplay differentiated by the gameplay amount show a significant effect on students’ game engagement?

  5. 5.

    Does students’ perception of game features show a significant association with math performance?

Literature review

The [Math App] development

According to the National Mathematics Advisory Panel (2008), students’ development of pre-algebraic concepts marks a critical point in their progression toward algebra. Grade 5 students in the United States commonly rely on part-whole conceptions alone (Olive and Vomvoridi 2006; Yang et al. 2010). On the other hand, students who learn to coordinate partitioning and iterating to produce fractional sizes go on to construct splitting operations, developing a distinct advantage for algebra-readiness (Hackenberg and Lee 2015). Thus, a goal of pre-algebra education should be to support the construction of sophisticated fractions conceptions and splitting operations, by designing engaging instructional technologies that require students to coordinate partitioning and iterating operations in goal-directed activities.

Our approach, drawing from a growing body of research in instructional technology for mathematics education, has been to focus on designing and deploying educational games in elementary and middle schools. Our position is that educational games, when tested rigorously, allow for engaging experiences whereby learners focus on actions that support the understanding of targeted mathematical content. Specifically, our educational game design rests on two fundamental pillars: hypothetical learning trajectories (HLTs) and factors of learner engagement. We position the design by considering what types of games would support engagement within an HLT for fractions knowledge that contributes to algebra-readiness. In the US, fractions pose a notoriously difficult challenge for students throughout their school years. Recognizing this, several researchers have begun exploring the educational game design that might contribute to student success in gaining requisite fractions knowledge (Clarke and Roche 2010). A study by Kebritchi et al. (2010) examined the effects of educational games on learners’ proficiency and engagement in mathematics. Results revealed that the educational game led to significant improvement in mathematics performance scores for 193 students. In particular, students who played games in the classroom had greater motivation than those who played games only in school labs that were removed from instruction-as-usual settings. Other researchers have found that educational games for mathematics provide affective engagement for students’ learning (Howard-Jones and Demetriou 2009).

Despite these encouraging findings, there is a need for additional research regarding the effect of educational games on learning specifically in classroom settings. Furthermore, little has been done to synthesize information on how established learning theories and instructional strategies are applied to the design of educational games to guide research and practice. One conclusion is that current recreational video game titles on the market do not stretch students’ abilities (challenge), provide sufficient feedback (assessment), or allow students to reflect (reflection and articulation). Our intent is to investigate through educational game design, development, and experimentation the associations among the learning of fractions, the forms of engagement evidenced in educational gameplay, and the game design principles and methods used to influence engagement and thus propel learners toward higher levels of understanding and proficiency.

Game engagement

Despite gained popularity of the studies on the game engagement to use educational games as new instructional tools in schools (Hull et al. 2013; Kim et al. 2008; Kneer and Glock 2013), researchers have not reached consensus on the definition of game engagement. Several studies routinely describe game engagement as serious involvement in games. In most studies for non-educational games, the researchers defined game engagement only focusing on only emotional aspects. The typical example of the definition of game engagement in those studies is “flow” which is described as the feelings of happiness which take place to the point that players lost their self-consciousness when players’ skill and games’ challenges are balanced in playing digital games (Brockmyer et al. 2009; Csikszentmihalyi 2014; Hoffman and Nadelson 2010).

The definition of game engagement for educational games is somewhat different. Many studies emphasize behavioral aspects although a few studies still use the concept of flow (Chu et al. 2010). For example, researchers such as Bangert-Drowns and Pyke (2002) and Lim et al. (2006) paid attention to the behavioral characteristics of engagement while students were playing the educational games. The researchers determined the engagement level of students by assessing on-task behaviors such as problem-solving and using displaying self-regulatory behaviors to complete the tasks of the educational games. In a similar vein, Lowrie and Jorgensen (2011) conceptualized game engagement by adopting behavioral indicators of visual/spatial engagement such as reading graphs or maps in the games.

In our work, we differentiate engagement from motivation that is the intent, drive, emotion, and energy for learning and achievement. Our engagement construct is based on the concepts of action, more specifically dividing into behavioral, cognitive, and emotional expressions (Martin 2007; Martin et al. 2017; Reschly and Christenson 2012; Skinner et al. 2009). Thus, in our study, we avoided addressing motivation-related theories such as self-efficacy, expectancy and value, need achievement, goal orientation, self-determination, and attributions suggested by Pintrich (2003). Instead, we paid attention to engagement featuring a tripartite framework that many educators have evolved into two- and four-subtype models (Fredricks et al. 2016).

At the same time, we emphasize a comprehensive theoretical model of engagement states that guide research, design, and deployment. Behavioral (motor) engagement includes speed and accuracy of fine and gross motor movements that involve manipulation of physical or virtual stimuli. These are the behavioral manifestations of cognitive processing that is brought to bear during engagement with stimuli and information, associated with the brain stem, limbic system, and motor cortex neural activity (Lowrie and Jorgensen 2011; Posner 1978). Cognitive engagement includes attention span and regulation (i.e., effortful sustained attention and attention switching), inhibitory control (i.e., withholding an impulsive response to performing a more deliberate response), and short-term working memory. These provide maintenance of relevant information and suppression of irrelevant information that enables accurate and efficient learning and problem-solving to occur (Engle 2002). They comprise part of the inter-related “executive functions” that are associated with neural activity in the frontal and prefrontal cortex (Mandinach and Corno 1985; Rueda et al. 2004). Emotional (affective) engagement emphasizes the degree of positive and negative reactions (Reschly and Christenson 2012). Specifically, we examine approach and avoidance of learning technologies, measured as expressions and experiences of positive emotion (e.g., excitement, happiness) and negative emotion (e.g., frustration, irritation, anxiety) states. These emotional states play a role in enhanced or reduced engagement with, and learning from, educational stimuli (Chu et al. 2010; Posner and Rothbart 2007). In our work with educational games, we consider the connections between these three domains of engagement with each other (Ainley 1993) and also explored the two-subdomain engagement constructs.

Perceptions of students’ ability to learn games

We chose students’ perceived competence in games as an important factor for game engagement. Students’ perception of their ability to learn technology is a critical factor for the adoption of technology as Davis (1989) emphasized in his work. Perceived ability of learning technology refers to “the degree to which a person believes that using a particular system would be free of effort” (Davis 1989, p. 320). While there is the lack of research on the direct relationship between students’ perceived ability and game engagement, the research between perceived ability and other psychological constructs showed the potential in the relationship of our interest. For example, Ryan et al. (2006) found that players’ perceived ability in digital games was related to their enjoyment and sense of presence during the gameplay, and their motivation for future digital gameplay, consistent with the findings on non-digital games involving physical activities (Pagnano-Richardson and Henninger 2008). Motivated by its importance, researchers made efforts to identify factors to influence players’ perceived ability. Studies detected that players’ sense of ability could be enhanced in specific conditions including the following gaming contexts in which players received positive feedbacks (Burgers et al. 2015; Roscoe et al. 2013), felt autonomous (Ryan et al. 2006), and experienced in-game success (Rieger et al. 2014).

Gender difference

Many studies report that there is a gender difference in the amount of time of playing games. According to the national survey by the Kaiser Family Foundation (Rideout et al. 2010), boys aged from 11 to 14 spend an average of 1.37 h per a day, while girls of the same ages spend 49 min playing games. Although there is a clear gender difference in the time amount of the game play, there is no research consensus on gender difference in the game engagement level while playing educational games. In the study by Hoffman and Nadelson (2010), males tended to get engaged in digital games almost twice as much as females. Bressler and Bodzin (2013), contrastingly, did not find a gender difference in game engagement.

To understand gender differences in game engagement, it is important to consider design characteristics of games as emphasized by Bressler and Bodzin (2013). According to Bressler and Bodzin, females preferred games with narrative and inquiry-based features and were engaged just as much as males. In their study, Hoffman and Nadelson (2010) found that females were less engaged in the games that did not offer much socialization opportunity.

Prior achievement

While there are few studies on the relationship between student’s prior ability and game engagement, some studies explored the association between student ability and academic engagement in learning with technology. For example, as a part of their study, Barkatsasa et al. (2009) aimed to study the association among different levels of math performance and math engagement in technology-mediated learning. According to their study results, students with higher ability in mathematics displayed stronger engagement in math. On the contrary, low-achieving students revealed lower math engagement. Furthermore, Mandinach and Corno (1985) investigated the interrelationships between student engagement and intellectual ability using a digital game. The researchers found that students with different levels of intellectual ability showed the different patterns of engagement in solving problems using the digital game. Students with high ability tended to demonstrate high levels of engagement and be more flexible to have different levels of engagement to deal with various challenges of the game, compared to students of low ability.

Amount of game play

In an effort to reach a conclusion on mixed effects of digital games, Gentile (2011) conducted a comprehensive literature review on digital games and concluded that the amount of gameplay is one of the important factors to consider. When exploring the effect of the amount of gameplay, researchers have been concerned about too much play of non-educational digital games, particularly violent games. The study results supported their concerns as too much play of non-educational games tended to lead to adverse outcomes such as game addiction, aggression, and health issues (Anderson et al. 2010; Vandewater et al. 2004).

In contrast to the negative effect of too much amount of playing non-educational games, Hoffman and Nadelson (2010) found the positive influence of the amount of playing educational games. In their study of engagement in games, Hoffman and Nadelson included the hours of playing educational games that require strategies and problem-solving skills. The researchers reported that participants spent an average of 2.7 h per a week playing these games and the longer amount of game time led to higher engagement. Despite its positive relationship between educational games and engagement, researchers need to remember that the varying amount of gameplay can lead to different results. According to Sherry (2001), when players played the game for a short time, they became frustrated as they could not get accustomed to the games due to the lack of the time. On the other hand, players felt bored in playing digital games for long periods. The reason is that players felt that they had to play longer than they liked.

Game features

Game features of educational games deal with enjoyable game activities that are designed to promote instructional objectives (Hays 2005, 2010). Wilson et al. (2009) conclude that researchers must first explore the attributes of games to understand their effects on the outcomes. Although there are various types of attributes that researchers have studied such as fantasy, mystery, conflict, or interaction in games (Garris et al. 2002; Hays 2005; Wilson et al. 2009), three main attributes of goals, challenge, and feedback are the focus in our study because they are known for the main elements of educational games and learning environments (Atkinson and Hirumi 2010; Bowman 1982; Garris et al. 2002; Hays 2005; Prensky 2010; Wilson et al. 2009).

Moreover, many educational games need to improve those attributes. According to Kebritchi and Hirumi (2008), the common challenges in designing effective educational games include solid pedagogical foundations, the presentation of clear goals, provision of enough feedback, and the usage of the optimal level of challenges appropriate to students’ abilities. In this research, we focused on the three main attributes of our [Math App] and assessed the impact of those features on math performance. Among researchers highlighting the importance of the specified goals of educational games, Garris et al. (2002) evidenced that clear and specific goals helped students to understand the relationship between goal and feedback, which are the essential elements in triggering greater engagement. Prensky (2008) highlighted the importance of feedback saying that if students know that they are doing right or wrong during the learning process, it will increase students’ engagement, while Malone (1981) argued that challenge is one of the most important attributes to engage students.

Methodology

The [Math App]

This study examined the effects of the [Math App] that was developed to help late elementary and middle school student mathematical learning in class, aiming to promote fractional understanding of students by helping them to engage in mathematical games and practice their conceptual understanding with real-life tasks. Although we plan to examine the effects of the [Math App] for 5th to 8th graders, this study designed to examine the effect of [Math App] only for 5th grade students who started to grasp pre-algebra concepts.

In [Math App], students should produce and ship a chocolate bar of a customer order. The game has appealing features with dynamic and attractive stimuli, with music and colors, and badges and trophies that can be collected. While students were playing the game, they listened to a background music with differentiating sounds for correct and wrong productions of a chocolate bar. For the successful mission of the game, students should be able to identify the chocolate bar size as a fraction relative to the whole chocolate bar, split the bar, and iterate operations of the bar. When students passed each level of three levels of tasks, they received electronic trophies as rewards.

To demonstrate our instructional application, we developed a fully functional prototype, the [Math App], to run on iOS devices (primarily iPads, but available for iPod Touches and iPhones). The [Math App] was designed to engage students in a goal-directed activity that elicits potentially novel uses of existing mental operations, specifically partitioning and iterating. As owners of the candy factory, students received an order from a customer as the shown candy bar (“customer order”) and tried to produce the candy bar of the size for the customer order. In order to reproduce the chocolate bars, students needed to partition a whole candy bar into an equal number of parts and then iterate one of those parts the appropriate number of times by clicking on the menu bar. When students thought that they produced chocolate bars of the correct size, students measured their produced chocolates by clicking on the “Measure” button and shipped chocolates for customers through the clicking of “Ship” button. Then, students received the feedbacks on their produced chocolates as shown by the chocolate bars of the correct size for the comparison.

At Level 1 (the easiest level), the customer order and the whole candy bar contained partitioning marks so that students could employ existing part-whole concepts to make sense of the task. At Levels 2 and 3, those candy bars were un-partitioned, requiring students to estimate the relative sizes of the bars, using partitioning and iterating operations with the game’s drag and drop interface. In essence, the [Math App] had an evidence-based worked example presented in a game format (Fig. 1).

Fig. 1
figure 1

Screenshots of [MathApp]

Participants and experiment

The initial participants in the study were 107 fifth grade students from five mathematics classes in elementary school and seven students dropped out of the study. The school was located in a remote rural area of Virginia in the U.S. where the majority students were from families with low economic and educational status, having $21,816 per capita annual income (2009–2013) (US Census Quickfacts, http://quickfacts.census.gov/qfd/states/51/51063.html, accessed January 2016). Some of those students did not have Internet service or new technology including digital games at home. The majority age range was from 10 to 11, but a few (four students) were age 12, and the gender ratio was similar. The five classes who participated in the study were taught by two teachers: three classes by one teacher and two by the other. The study used a quasi-experimental design because the random assignment of students was not an option for this study (See Fig. 2 for details). Two of the three classes from the first teacher and one of the two classes of the second teacher were selected randomly and assigned to the treatment condition (called game condition), and the remaining two classes were assigned to the control condition. Among 100 students, the 64 students had an opportunity to participate in the treatment and assessments of the study. In the game condition, students played the game regularly and consecutively to understand the game fully and to have an opportunity to use their understanding in their mathematics class. Thirty-five female and 29 male students in the game condition played the [Math App] for 20 min a day for ten school days. While playing the [Math App], students were expected to acquire the basic fraction concepts at the first level for 3 days, the intermediate level at the second level for 3 days, and the advanced level at the third level for 4 days, obtaining the whole concept approximated for 200 min of game playing. In the control condition, 18 female and 18 male students had a paper-and-pencil exercise for 20 min a day for ten school days and had two sessions to play the [Math App] to have experience of the game. Therefore, students in the control condition were exposed to the game for 40 min to get accustomed to the game.

Fig. 2
figure 2

Research design

Data on students’ demographic information, perception on game learning ability, and mathematics performance were collected 1 week before students played the [Math App]. One week after students completed game sessions, data on students’ game engagement, game features, and mathematics performance were collected.

Game engagement instrument

This study used a game engagement instrument in order to assess how students were engaged in the [Math App]. The game engagement instrument of the study was developed through several iterations of a comprehensive literature review, expert review, and pilot testing in Messick’s framework (1995) in a way relevant to content, structural, and substantive aspects of validity. As an attempt to build content validity, we conducted a comprehensive literature review to ensure that our items represent the body knowledge on game engagement properly, while our items were reviewed by multiple experts in the fields of psychology, instructional design and technology, and educational measurement. Also, we sought to address structural validity by developing our instrument using our internal model with three subcomponents consistent with tripartite conceptualization on engagement (Fredricks et al. 2004). After developing game engagement instrument with three subcomponents accompanied by respective items, we conducted a series of pilot testing to affirm that students understood the items as intended, thus dealing with substantive validity.

The initial instrument contained three subdomains of behavioral, emotional, and cognitive engagement which were constructed on the basis of theoretical backgrounds of engagement: The behavioral domain was constructed intending to include attributes of attention, participation, diligence, persistence, on-task activities, and interaction; the emotional domain was to include emotional status of interest, happiness, desire, satisfaction, and immersion in activities; and the cognitive domains was to include traits like concentration, the use of superficial and high-level strategies, and mental process of planning, monitoring, and regulating. After theoretical consideration, all the items were fine tuned for the understanding of elementary and middle school students. The example items of behavioral engagement were as follows: “I worked through [Math App] by finishing each step.” “I completed [Math App] following instructions.” “I did my best while playing [Math App].” “I followed the instructions of [Math App].” The example items of emotional engagement were as follows: “I enjoyed solving math problems using [Math App].” “I felt disappointed when I had to stop playing [Math App].” “I became bored when I played [Math App].” “I liked to play new levels of [Math App].” The example items of cognitive engagement were as follows: “I was fully concentrated on [Math App].” “I could see what [Math App] tried to teach.” “I tried to link [Math App] to other topics in math.” “I used my strategies while playing [Math App].”

The game engagement instrument consists of 30 items that address each of three domains. In responding to the items which were written in a statement form, students chose one of four options ranging from 1 (Strongly disagree) to 4 (Strongly agree). In this study, all items were initially specified to three-subdomain measurement model to validate them. After several iterations of analyses, the two-subdomain model with 12 items was converged and included the detailed outcomes in the result section.

Game feature assessment

During the stage of research design, we also developed a game feature instrument to collect data on students’ perceptions and evaluation of our [Math App]. Game feature items were adapted based on the existing survey (Fu et al. 2010; Sweetser and Wyeth 2005). The original items of the researchers’ scales consist of various heuristics on usability, playability, instructional issues, game mechanics, and attributes. In this research, we paid focused attention to only three game features—goals, feedback, and challenge—because they provide important information relevant to our [Math App] in middle school classrooms. The initial survey consisted of 13 items in three domains and the final survey after refining consists of 10 items. There are three items for goals, three items for feedback, and four items for challenge. The items were put on a 4-point Likert scale (1 = Strongly disagree, 2 = Somewhat disagree, 3 = Somewhat agree, and 4 = Strongly agree) (Table 1).

Table 1 Game feature survey

Instrument assessing mathematics performance

This study used another set of items assessing students’ mathematics performance, particularly students’ fraction proficiency before (PreMath) and after (PostMath) playing the [Math App]. The PreMath score was used in the model to control for students’ ability prior to gameplay. The mathematics test contained multiple choice questions to identify students’ knowledge of understanding equivalent fractions, multiplying, dividing, and comparing fractions. The math initial performance instrument contained a total of 15 questions: four questions were the Standards of Learning (SOL) questions which were released previously, and 11 questions were SOL test-similar questions developed by the research team. The math performance instrument showed defensible reliability statistics having α value of 0.849. The performance score variable (Premath) was an important contextual factor, and its effect was controlled for. The score of PostMath ranged from 0 to 15 with the mean of 10.11.

Students’ background variables

As important factors affecting students’ game engagement, the study also examined the perceptions of their ability to learn games. The study used one variable, “I learn digital math games fast.” The variable on student’s perception was explored by measuring with 4-point Likert-type scales ranging from 1 (strongly disagree) to 4 (strongly agree). The study model included student’s gender (0 = male and 1 = female).

Analysis

This study first examined background information of variables using descriptive statistics and correlation analyses. As a main statistical analysis, the study built a series of structural equation modeling (SEM) using SmartPLS 3 (Ringle et al. 2015). Using partial least square (PLS) estimation, SmartPLS produces comparatively reliable and less biased results even with a small sample sized data (Hair et al. 2016) because PLS is non-parametric estimation that does not require the normality assumption. Beginning with measurement models, we developed a full SEM model that integrated a measurement model and a structure model (Schumaker and Lomax 2004). We investigated the validity of engagement variables as latent variables using the measurement model and examined the structural relationships of exogenous variables (perception of learning ability of games, gender, prior math ability), mediator variables (subdomains of engagement and game features), and the endogenous variable (math performance). To investigate the factor structures of engagement, we first built SEM models that contained three factors of engagement (3-Factor SEM) and further developed SEM models with two factors of engagement (2-Factor SEM) considering multi-dimensional features of engagement constructs. In the final model (2-Factor Full SEM), we specified the variables of student perceived game ability, gender, prior math ability, two subdomains of engagement, game feature, and math performance.

Results

The [Math App], game engagement, and mathematics performance

As presented in Table 2, a total of 100 fifth graders participated in this study, with 64 students in the longer gameplay group and 36 students in the shorter gameplay group. Although the numbers of students in the two groups were different, the important assumption of the analysis, homogeneity assumption, was not violated. The mean score of game engagement of students was 3.37 for a behavioral subdomain and 3.06 for combined emotional and cognitive subdomain.

Table 2 Bivariate correlation and descriptive statistics

Students in the longer gameplay group demonstrated slightly higher behavioral engagement (M = 3.39) than the shorter gameplay group (M = 3.32) as well as combined emotional/cognitive engagement (Mean of longer group = 3.14 vs. Mean of shorter group = 2.86). However, the differences were not significant for both behavioral (t = − .65, p > .05) and combined emotional/cognitive engagement (t = − 1.91, p > .05) as shown in Table 3.

Table 3 Frequencies and T test results

As shown in Table 3, we also examined the effect of [Math App] on the mathematics performance. Descriptive statistics of mathematics performance showed the average mathematics score of all students as 10.11 out of a total 15. The mathematics performances of the shorter and longer gameplay groups did not show any significant difference (t = .29, p > .05), showing the averages of having 10.25 and 10.05, respectively. Mathematics performance did not show any significant relationship with behavioral engagement (r = .09, p > .05) or combined emotional/cognitive engagement (r = .04, p > .05). Behavioral engagement of male students (M = 3.39) was slightly higher than female students (M = 3.36); however, the difference was not significant (t = .30, p > .05). The same pattern between male (M = 3.12) and female students (M = 3.00) was detected for combined emotional/cognitive engagement (t = .92, p > .05).

3-factor and 2-factor SEM models

The 3-Factor SEM model included three subdomains of engagement of behavioral, emotional, and cognitive engagement. Figure 3 shows results from factor loadings and paths coefficients for the 3-factor SEM model. Nine items were loaded into a behavioral engagement factor. The factor loadings of these items ranged from .89 to 1.30. Two items were loaded into emotional engagement factor, showing factors loadings of .98 and 1.02, respectively. Two items were loaded into a cognitive engagement factor. The factor loadings for these items were .85 and 1.07.

Fig. 3
figure 3

3 factor model

The 2-factor SEM model contained two subdomains of engagement of behavioral engagement and combined emotional/cognitive engagement. Figure 4 demonstrates coefficient results on factor loadings and paths for 2-factor SEM model. For the behavioral engagement, the factor loadings of nine items ranged from .89 to 1.30. For the emotional/cognitive engagement, four factors were loaded with four items. The loadings ranged from .89 to 1.18.

Fig. 4
figure 4

2 factor model

Full SEM with two factors of engagement: 2-factor Full SEM

The hypothesized 2-factor Full SEM model was tested with two subdomains of engagement, one game feature factor, students’ perception of their gaming competence, gender, gameplay time, prior math performance, and post math performance. The results indicated an acceptable fit of the model to the data, χ 2 = 2978.19, NFI = .80, and SRMR = .02. The NFI of .80 is considered fair fit (Smith and McMillan 2001), and an SRMR value of .05 or less is considered adequate fit (MacCallum et al. 1996), collectively suggesting that the fit of 2-factor SEM model to the data was acceptable.

The 2-factor SEM included two factors of engagement and one factor of game features as shown in Fig. 5. All the factor loadings (lambda y) were significant (p < .01), ranging from .90 to 1. Specifically, for a behavioral engagement factor, nine items were loaded. The factor loadings of these items ranged from .99 to 1.00. For emotional/cognitive engagement factor, four items were loaded with the loadings ranging from .95 to .98. For a game feature factor loaded by 10 items, the factor loadings ranged from .90 to 1.00.

Fig. 5
figure 5

2-factor full SEM model

In addition to these three factors, the 2-factor SEM included the following variables: student gender, perceived gaming ability, gameplay time, prior math achievement, and post math achievement. According to the path coefficients, the three paths were detected significant: gaming ability to Behavioral Engagement (β direct = .54, p < .01); gaming ability to Emotional and Cognitive Engagement (β direct = .54, p < .01); and gaming ability to Game Feature (β direct = .53, p < .01). Table 4 shows indirect and total effects. Among indirect/total effects, the total effects of the above three paths were found significant: gaming ability to Behavioral Engagement (β total = .54, p < .01); gaming ability to Emotional and Cognitive Engagement (β total = .54, p < .01); and gaming ability to Game Feature (β total = .54, p < .01). These findings on our total effects were aligned with those on our direct effects.

Table 4 Parameter estimates from 2-factor full SEM model

These path coefficients indicated that students who perceived that they learned digital math games fast tended to reveal higher degree of behavioral game engagement as well as combined emotional and cognitive engagement. Also, these students tended to appreciate the design features of the [Math App] more.

Discussion

Game engagement

The study’s main purpose was to explore students’ engagement states in their educational gameplay that were differentiated from the long-lasting engagement. Using a series of measurement models, we validated the two subdomains of game engagement (behavioral and emotional/cognitive) with a good fit statistics. In particular, the behavioral engagement in our study that has been turned out as a solid construct by all indicators was loaded to the construct with significant loadings. However, the other construct that is the combination of emotional and cognitive engagements did not produce as good as hypothesized. The construct was converged with a small number of indicators. We attributed the results to the characteristics of game engagement that deals with a short-living state of engagement. Moreover, behavioral engagement items were directly asked about their behaviors with [Math App] that seemed to make students respond easily.

Gameplay time, math achievement, and gender

Using a game engagement instrument developed by the research team, we examined if the game engagement increased after playing the [Math App]. The analysis found there was no difference in game engagement between our shorter and longer gameplay conditions. We arranged the 40-min gameplay in our shorter gameplay condition for students to get to [Math App] and play it as a new game, but the condition could be too short to build game engagement (Gentile 2011; Hoffman and Nadelson 2010; Sherry 2001). The 200-min gameplay in our longer gameplay condition provided enough time for students to get familiar with games but could be a bit longer. As a result, students who are bored with the game might get less engaged. We suggest that future studies try to measure game engagement using fine time metrics and identify optimal gameplay time by considering the potential curvilinear relationship between the gameplay time and game engagement. Optimal gameplay time can prevent students’ frustration resulting from the lack of time to get accustomed to the rules of the game. It should also provide enough time for students to build game engagement before students start getting bored.

We also examined the effect of the [Math App] on students’ increased mathematics performance that was particularly targeted pre-algebraic concepts. We did not note any significant difference between the two groups, both showing the average scores in mathematics performance. The results from our study did not confirm the previous study (i.e., Howard-Jones and Demetriou 2009; Kebritchi et al. 2010) that showed a significant effect on students’ mathematics achievement in classrooms or school labs. We attributed our insignificant effect of our educational game on students’ performance to students’ age level. The students in our study were 5th graders who started to learn their pre-algebraic concepts and were still at the pre-mature stage to reveal their full understanding of the fractional concepts. As we noted in both treatment and control students, students’ grasping the fractional concepts was not fully demonstrated in the performance scores.

When we also related the game engagement to the mathematics performance, we did not note any significant association against our prediction. In the conception stage, our research team hypothesized engagement state would influence academic performance as many researchers of educational technology argue (Hull et al. 2013; Kim et al. 2008; Kneer and Glock 2013). However, we did not confirm our hypothesis in this study. We reasoned that this result could also be due to students’ low performance and ability level. When the majority students showed low performance in math class with low variability, (the floor effect), the possible effect of game engagement on the performance was hiding.

In this study, we did not find any significant gender difference in game engagement consistent with the findings by Bressler and Bodzin (2013). By exploring the differences across two subdomains of behavioral and combined cognitive/emotional engagement, our findings extended the existing studies which focused only on one domain, particularly emotional engagement (Bressler and Bodzin 2013; Hoffman and Nadelson 2010). To provide conclusions on the mixed results on the gender difference (i.e., Hoffman and Nadelson 2010), this study suggests to exploring the gender difference across different gameplay times, particularly in the optimal gameplay condition, sometime between our shorter and longer gameplay time.

Conclusion

To deepen the understanding of game engagement, we explored the predicting abilities of perceived gaming competence and game features. Echoing the conclusion by Negini et al. (2014), we exhort educators to make an informed decision on designing digital games in a way that students’ perceived competence, particularly, in-game competence, will be increased so that they will get more engaged in game-mediated learning, thus benefiting more from digital games. Also, educators need to pay special attention to some students who might suffer from low perceived competence. For example, girls showed a significantly lower degree of competence than boys (Law et al. 2009). By assessing the perceived competence of students prior to the gameplay, educators can identify at-risk students with low perceived competence. And educators need to ensure that these students play the game at the level equivalent to their ability level, experience in-game success frequently, and receive a lot of positive feedbacks during the gameplay so that their in-game competence will be enhanced, possibly resulting in heightened engagement. Importantly, educators need to make sure that heightened engagement will result in students’ learning by focusing on game design for meaningful learning, not only for game engagement.

As a part of our study, under Messick’s framework (1995), we sought to provide evidence for construct validity for the game engagement instrument in terms of content, structural, substantive aspects. Moreover, we suggest that future studies will address generalizability aspect by investigating whether our findings on instrument features and interpretations are invariant across various games (i.e., educational vs. non-educations), ages, and subjects (i.e., science), and contexts.