1 Introduction

The notion that serious games are more than mere entertainment is a safety definition that does little in the way of stating the exact nature of serious games. The true definition of serious games is an elusive one and varies depending on the application and opinion of the person defining it (Breuer and Bente 2010; Susi et al. 2007). The earliest, and widely used, definition states that serious games have deliberate educational intent without the goal of engaging with them for entertainment only (Abt 1970). The learning intent of serious games has an underlying constructivist rationale with the learner as its central cog (Cheng et al. 2014). Osterman (1998) proposes that constructivist pedagogical strategies should: (a) engage the learner; (b) provide opportunities to explore, articulate and represent knowledge; (c) challenge existing conceptual views and heighten awareness of problems; and (d) allow students to test the efficacy of new ideas. Gee (2005) compliments these pedagogical strategies stating that serious games learning principles include that: (a) nothing will happen without player input; (b) earlier problems encourage the player to build hypotheses, which could be applied to later problems; (c) lateral exploration and thinking allow players to reconceive their goals; and (d) regularly, a new class of problems is thrown at players restarting their cycle of mastery. All-in-all, learning in serious games constitutes allowing learners to apply their current knowledge in a digital environment with the aim of acquiring a new skill set by their own volition to overcome contextual challenges (Boyle et al. 2011).

This does not mean serious games are only about activities that educate, instruct or train, but rather that the addition of pedagogy is what sets them apart from entertainment games (Zyda 2005). The aesthetic design component of the mechanics, dynamics and aesthetics (MDA) framework (Hunicke et al. 2004) encapsulates the fun element of games as a set of “desirable emotional responses evoked within the player” (Hunicke et al. 2004) when engaging with the game. Aesthetics, in a game context, is also sometimes intended to convey the audio-visual (or realism) sensations encountered by players (Niedenthal 2009). The MDA framework also proclaims the importance of mechanics (player interaction with game) and dynamics (progression of the game’s backstory or plot).

The challenge facing serious games though, is to find a balance between the ludic and skills or knowledge transfer goals so that neither a dominant game mode (taking away from the learning outcomes) nor learning mode (removing the fun element) is present (Giessen 2015). Some researchers do not agree on this notion of striking an equal balance. Zyda (2005) argues that pedagogy should be subordinate to story and entertainment, while researchers such as Michael and Chen (2006) argue that educating the player should be the primary goal of serious games. Realism (or fidelity) is a root of further confusion. Research shows that fidelity levels and knowledge transfer are not necessarily positively correlated (Feinstein and Cannon 2002). Norman et al. (2012) corroborate this in their review where they found no significant learning advantage from high-fidelity simulators over low-fidelity simulators. The conundrum, however, is that exposure to high-fidelity entertainment games leads target players to ask for serious games to be as audio-visually realistic as possible (Visschedijk and Van der Hulst 2012). No literature unraveling the mystery of the right amount of fidelity for a serious game could be found. Due to unacceptable reliability in operationalizing fidelity, Cook et al. (2013) went so far as to drop the fidelity variable from their comprehensive review on instructional design features for simulation-based education. Moreover, further literature searches revealed that much of the reported design recommendations, development guidelines and assessments of serious games are circumstantial and based on single games within a specific application field. The field of serious games is accused of being disorganized and fragmented across disciplines and geographies (De Freitas and Ketelhut 2014). Undoubtedly, the vast application field of serious games plays a role in this fragmentation, but we feel it most certainly should not be the scapegoat why “quality games designed specifically to promote learning are hard to find” (Mcmahon and Henderson 2011).

Yet, the use of computer games to foster learning has steadily found favor among government policy makers, health professionals, advertisers, training practitioners and educators (Connolly et al. 2012). Also, many researchers (Hyungsup 2014; Mortara et al. 2014; Sacfung et al. 2014; Wiemeyer 2010) keenly report on various games to teach a diverse suite of fields and subjects including sports, resilience among the elderly, fire evacuation and cultural heritage to name a few. The rising popularity and number of application fields make discovering success factors that enhance learning with serious games a worthwhile endeavor.

To meet the objectives of this article, we settled on defining serious games as digital games that have an intended impact on cognition, behavior or motor skills. Also, we refer to success factors as those that positively impact the learning intention of serious games. These success factors may also be taken as practical guidelines that address the economic considerations that commercial serious game producers face. To concretize our view of what constitutes a game, we followed the 12 proposed elements of what makes games engaging by Prensky (2001), thereby excluding (among others) pure simulators, graphic novels and intelligent tutoring systems from this systematic review. We are furthermore cognizant of the long-standing debate surrounding the term serious games and have noted that it is starting to fall out of favor among the field’s authors. The founding chairman of Games for Health Europe, Mr J van Rijswijk, suggested the term, applied games as a replacement for the aging serious games term during the 2015 Games for Health Finland seminar. The scope of this article does not include identifying an appropriate replacement term for serious games and as such we have elected to stand by this long-standing, and admittedly oxymoronic, term while it is still known among academic authors.

2 Purpose of the article

This article aims to uncover the success factors that enhance learning with serious games by performing a systematic review of prior quality studies. Producing serious games is expensive and time-consuming and much care needs to be taken to ensure that they will be instrumental to aesthetically pleasing learning. Conveying a set of practical production guidelines would contribute to the increased delivery of quality serious games. We aim to address the question, What practical guidelines can serious game producers incorporate to guarantee successful learning with games? Consolidating the serious games success factors from the wide distribution of literature and conflicting opinions will bring a much-needed veneer of consistency to the field.

Previous summative work in the field of serious games either: (a) provides a perfunctory mention of select serious game requirements (Dondlinger 2007; Van Eck 2006a); (b) focuses on the empirical effectiveness of serious games (Sitzmann 2011; Vandercruysse et al. 2012; Wouters et al. 2013); (c) concentrates on the psychological aspects that cause players to learn from serious games (Dondlinger 2007); (d) lists and compares games within a specific field, without an in-depth analysis or discussion of the game elements (Bhoopathi et al. 2007; Rodriguez et al. 2014); or (e) deliberates on implementation strategies for serious games (Blakely et al. 2009; Webster and Celik 2014). We do not downplay the significance of each of these reviews (and others) carry in the field of serious games, but it is distinctly evident that a comprehensive list and clarification of the factors that make learning with serious games successful is not available. Our systematic literature review on successful learning with serious games addresses this gap and provides such a list as a set of practical serious game design guidelines.

This article is not an exposé on models (Minovic et al. 2011; Peters and Vissers 2004; Sherry 2013), frameworks (Echeverría et al. 2011; Moseley et al. 2014; Westera et al. 2008), theoretical orientation (Kriz and Hense 2006) and other fundamental serious games research or a summary of existing literature reviews on similar constructs (Bedwell et al. 2012; Hainey 2007), but examines and extracts practical matters that researchers who build serious games have encountered. Although the crux of this article is to assist serious games producers with an assimilated set of practical success factors toward enhanced learning through games, we also unavoidably touch on engagement (Whitton 2011), motivation (Garris et al. 2002), gamer demographics (Whitton 2013) and other foundational aspects of serious games.

3 Method

3.1 Databases searched and search terms

This study searched digital databases applicable to computer science, information science and technology, health sciences (the most prominent sector for serious games) education and social sciences. The following databases were searched: Web of Science core collection, Science Direct, EBSCO (comprising of ERIC (Education Resources Information Center), Applied Science and Technology Source, MasterFILE Premier, PsycINFO, SocINDEX, Academic Search Premier, Business Source Premier, CINAHL, E-Journals, MEDLINE, PsycARTICLES, Teacher Reference Center), Electronic Journal Services, JSTOR, and Scopus. The time span was set from 2000 to 2015 and included only academic journal articles.

After consultation with librarians of the North-West University, we chose to employ broad terms for searching the digital libraries. The reason for this was that after several test searches with secondary terms included, we were of the opinion that too few hits for the time period and number of databases in question were present—concern was raised over the possible omission of pertinent work. The following search terms were used: (serious games; games-based learning; simulation games; gamification; edutainment; educational games; games for learning). Advergames were excluded as this review is not aimed at games that create awareness, but examine studies targeting (a) knowledge acquisition; (b) behavior change; (c) affective impact; (d) motor; and (e) other skills attainment. Probing the 17 databases resulted in 119 searches with a total of 30 070 hits. All hits were exported to ENDNOTE™ X7 (an electronic reference manager) and stored in 119 distinct subgroups according to a per-search-term-per-database basis. A series of publication removal efforts reduced the number of hits to 1 232 articles.

To further reduce the number of hits (in order to qualitatively analyze a realistic number of articles), we decided to focus only on the most prominent authors from the remaining number of hits. A list of authors from the remaining hits was drawn up, and all authors occurring three or more times were confirmed as prominent by their authorship analytics in the Scopus database. We admit that this was somewhat restrictive and could possibly have led to omitting pertinent work. However, while analyzing the Scopus author analytics, some articles of authors with less than three entries in the researchers’ database, due to their prominence in Scopus, were appended to the list for the ensuing inclusion/exclusion process—397 articles remained.

3.2 Selection of articles for inclusion

Three researchers scrutinized each article’s title, abstract and conclusion to determine which articles can be excluded on the basis of irrelevance concerning the research question. A number of further conditions then stipulated appropriate studies for the review. The selected articles: (a) concerned digital games with interactive gameplay; (b) had a positive learning impact; (c) contained a description or user feedback of the game; (d) did not evaluate entertainment games for learning potential; and (e) did not discuss EXCEL™-, ACCESS™- or POWERPOINT™-based games. From these inclusion/exclusion criteria, 72 articles were found relevant for this review—during the coding procedure, a further nine articles were removed for reasons including (a) no impact being measured; (b) articles being too similar to others already included; or (c) discussion of a framework as opposed to a game analysis—63 articles underwent in-depth analysis.

3.3 Coding and synthesis procedure

This review utilized ATLAS.TI™, a computer-assisted qualitative data analysis system to amalgamate all the natural language (qualitative) data from the pool of collected literature. Themes and categories were identified a priori during the review’s scope determination phase providing a basis for the coding process. As the coding progressed, categories were added, updated or deleted. The pre-identified themes remained unchanged during the coding procedure. Using a partially grounded theory approach and applying the constant comparison method (CCM) (Boeije 2002) enabled us to inductively code the data. The CCM then further facilitated us to refine the network view by categorizing, coding, delineating categories and connecting them as a first instance of serious game success factors (Boeije 2002). Table 1 gives a brief overview of the coding and synthesis steps following the initial theme and category identification.

Table 1 Steps of the constant comparative analysis process this review followed as adapted from Boeije (2002)

Abiding by the three steps outlined above led to the clustering of serious game categories around five definitive themes. The backstory and production theme pivots around the narrative of the game and how players progress the storyline. The production elements highlight the aspects serious games producers should deliberate regarding the design establishment, various development techniques at their disposal and weaving learning content into the narrative. Fidelity, in all its guises (physical, functional, psychological and interaction), to present learning in an immersive user-appealing way, is the sole focus of the realism theme. Artificial Intelligence (AI) and adaptivity are not discussed at programmatic or algorithmic level, but are extracted to show which AI (and associated progress tracking) techniques bring about improved fun and learning experiences. The interaction theme primarily describes the mechanics (for the purpose of this review, mechanics points to the feedback loop (action-reaction) players experience with game objects) and various other physical properties mediating the engagement between players and the learning content, all the while stimulating their enjoyment. The final theme, feedback and debriefing, isolates a deeper account of the communication afforded to players both in-game and from their environment in response to their gameplay activities. The feedback and debriefing theme also points out the learning approaches and support, which should be present for a motivating learning experience, and grants particular attention to reflective and debriefing practices.

Several measures to ensure validity, reliability and avoiding bias of the data extraction method were put into place. The data extraction as themes and categories were derived from existing literature and corroborated by serious game experts. A single author coded two articles on separate occasions (a week apart) with moderate intra-rater reliability (Cohen’s kappa = 0.78). Two researchers independently coded randomly selected studies showing a strong inter-rater reliability (Cohen’s kappa = 0.83). McHugh (2012) proposed the Cohen’s kappa interpretation scale we applied. Given that research with a rich analytical value of the concepts it discusses and a reliability level within or somewhat below the 0.8–0.9 range can go ahead (Riffe et al. 1998), we confidently forged ahead with our review. Duplicate publications of the same data were not included in the review, and the most complete of these articles were selected for the review (Kitchenham 2007). Table 2 presents the 63 articles that underwent analysis and captures each article’s data that led to the formation of the five themes. In cases where one article deliberated multiple games, each game’s distinct properties were segregated. We also decided to consolidate multiple articles discussing the same game into a single table entry.

Table 2 Primary articles and extracted serious game success factors

4 Findings and discussion

From the initial content breakdown, the 63 articles (comprising 55 unique games) under review discussed learning areas such as knowledge transfer, behavior change, affective impact and soft skills (e.g., communication, collaboration or managerial) acquisition; no motor skill acquisition games were encountered. Most of the attention was given to knowledge transfer (28 games), followed by soft skills acquisition (15 games) and behavior change (8 games). Very few studies were dedicated to affective impact (4 games). Admittedly, with serious games, it is possible to have primary and secondary learning areas within a single environment. For instance, it may occur that a game wishes to affect behavior change toward diet and physical activity, but to do so requires a knowledge transfer about healthy foods and exercise. We focused only on the primary aims of the reviewed games.

An explanation for the overwhelming majority of studies dealing with knowledge transfer may lie in the target audience and purpose of the serious games in the reviewed articles. The researchers have classified the target audience by their level of education at the time the games were researched. Only one game was aimed at a pre-primary audience, 18 games were implemented among primary (elementary and middle) school students, 17 games at high (secondary) schools, six games at under-graduate studies and the remaining 13 games were geared toward further professional development or training. The reason most often cited for the school games was to assist students in mastering subjects that have a history of poor achievement.

Content analysis also unexpectedly (given the relatively lower production cost of 2D games) revealed that 2D and 3D games are almost equally common (18 and 22 games, respectively) with five games making use of a pseudo-3D (2.5D) environment and six games combining 2D and 3D styles within a single game. Two games were physical games, which required no screen-time gameplay, but relied on communication through mobile technology instead. The last two games are not clearly defined and have no accompanying screen depictions.

The following section synthesizes the data from Table 2 into a comprehensive reporting of the most recurring points raised under each of the serious game themes. The limitations of the study, practical guidelines conclusion and suggestions for future research succeed these in-depth discussions on backstory and production; realism; AI and adaptivity; interaction; and feedback and debriefing.

4.1 Backstory and production

The backstory, or game narrative, refers to the storyline and game-world players encounter and expectantly immerse themselves in as they play the game. More than this, the narrative is what players attempt to uncover to progress in the game. Where audio-visual techniques seize the attention of players, it is the story or plot that keeps them engaged (Couceiro et al. 2013) and motivated (Hämäläinen 2008) to return to the game once the novelty of the captivating graphics and sound wears off. Ke (2008) observed that sensory stimuli become familiar and their attention-grabbing properties diminish. The importance of narrative with regard to learning can, therefore, not be overlooked—if the game is being played, it implies engagement with the learning material. A good story, however, does not imply credible learning. Learning must be stealthily integrated into the appealing storyline (Ke and Abras 2013) without disrupting the player’s immersive experience. Players become frustrated, and as a result are reluctant to play the game, when confronted by distinct breaks between learning and playing activities (Brom et al. 2010; Hwang et al. 2013a). This dilemma can be resolved through ensuring that the intended learning material (real-life skill) is mirrored by the knowledge or skill required for progressing in the game narrative (Cheng and Annetta 2012; Couceiro et al. 2013; Hämäläinen 2008; Ke 2008). By illustration, if the game’s intention is to teach safe laboratory practices, the game’s narrative should only be allowed to progress if the player displays the correct behavior within the game. Linking the game’s reward mechanic directly to the desired learning outcome further reinforces the intended impact (Kiili 2005).

Players enjoy crafting their own narrative. Not in the sense that they want to be the story writers, but rather that they prefer not to be led through the game in a linear fashion. Linear gameplay entails progressing from one game scenario to another in a fixed sequence, which remains unchanged with every restart of the game. Moreover, players are excited by the opportunity to explore and to mold their own narrative (Ke and Abras 2013; Kickmeier-Rust and Albert 2010; Verpoorten et al. 2014)—another motivating feature drawing players back to the game. However, one study showed that learning style might be a determining factor for the preferred type of progression. Hwang et al. (2012b) found significantly better learning results among students who played games personalized to their learning styles than those who learned from games not meeting their learning styles. This places serious games makers in a predicament. Creating an exclusively open-ended or linear game is bound to alienate a subset of the game’s target audience, lowering the learning effectiveness of the game where serious games should pursue maximized learning effect. Having said this, and keeping the player desire for control in mind, serious game producers should give players the option to choose which progression version of the game they wish to play, or perhaps more elegantly, let the game adapt to the learning style of the player. True, it means that two alternate versions of the game are required, but through game design, costs can be restricted to possibly a matter of locking and unlocking scenarios with minimal game asset differentiation between the two versions. Torrente et al. (2014) expresses the value of this technique in the context of making a game playable by users with cognitive disabilities. We are inclined to believe that the perceived value of learning gain could outweigh this lock–unlock technique’s development cost increment.

We have ascertained that narrative keeps players involved with the learning material (or content), but the storyline must be coupled more closely with the content as opposed to just being an attractive presentation thereof. The narrative must deliver a fitting context for the learning material (Bellotti et al. 2012; Cheng et al. 2014; Couceiro et al. 2013). It makes no sense to devise a story and to plot centering horse-riding when the learning content involves dental hygiene. These bi-lateral game personalities confuse players, add extra cognitive load, diminish immersion and curtail their enjoyment. Ke (2008) found such exogenous fantasy less effective in promoting learning than storylines engrained within the context of the learning material. In addition, a number of studies advocate uniting serious game content with a curriculum or professional training material (Hong et al. 2013a; Papastergiou 2009a; Van Eck 2006b; Virvou and Katsionis 2008). This is in contrast to research which shows that analogy facilitates learning (Donnelly and McDaniel 1993; Gentner and Holyoak 1997; Mayo 2001). This review did not encounter any evidence that could argue for or against analogy-driven serious games. We nevertheless posit that extrinsic motivation, apart from in-game rewards, progress metrics and status, drives the will to play serious games as much as intrinsic motivation (personal interest in playing a game). Players would more likely engage in serious gameplay if they could earn extra course credits. They could subsequently attribute improved formal assessment of their competencies to the skills and knowledge they have gained through gameplay.

Serious games, much more so than entertainment games, are aimed at very specific well-defined target playing groups, each with their own set of whims and requirements. The value of involving the end user in the design and play-testing production phases (Schmitz et al. 2015a) of serious games, both for physical attributes and storyline, is immeasurable and recommended (Hwang et al. 2013b). Studies implementing user-collaborative design strategies (Arnab et al. 2013; Buttussi et al. 2013; Knight et al. 2010; Squire 2013) not only evidenced success in transferring knowledge or abilities, but also received affirmative feedback regarding gameplay satisfaction. Knight et al. (2010) claimed that involving the user even earlier than the first pilot trial in the design process would have saved considerable time during subsequent development-evaluation-refinement iterations of game production. Several articles express subject matter experts as another valuable inclusion to the production side of serious games (Arnab et al. 2013; Cheng et al. 2014; González-González et al. 2014; Schmitz et al. 2015a). Serious games, or games in general for that matter, cannot render every aspect of gameplay exactly as it would be in real life (something simulators strive to achieve), hence requiring some in-game abstraction. Implementing abstraction without harming the learning effect of the game requires expert understanding of the content. Just from a time perspective, and given the wide range of disciplines and topics applicable to serious games, developers should turn to subject matter experts every time a game is made. Lastly, a variety of teaching and learning approaches and theories are broadcast, yet barring Hong et al. (2013b); little mention is made of pedagogical experts ensuring their unyielding integration into the game.

4.2 Realism

Realism stipulates how close a game under scrutiny replicates or resembles real life. Since an extensive discussion of fidelity is beyond the scope of this article, it suffices that we define realism as the physical (graphical and audial aspects), functional (simulation accuracy and non-player character (NPC) response) and psychological (noise/interference, emotional content and time pressure) dimensions of fidelity. We established fidelity preferences from careful consideration of presented screens, qualitative (sometimes verbatim) reporting of player feedback and descriptions offered by the authors. Triangulating the data within and between groups of articles led us to conclude that from a pure gaming perspective, the higher the fidelity of the game (i.e., the more realistic the game), the higher the game appreciation. From a learning perspective, this statement becomes significantly less certain as other authors claim that overly rich presentation of the context distracts students from in-game learning tasks (Ke and Abras 2013; Papastergiou 2009a; Virvou and Katsionis 2008). Kiili (2005) declares that players do not require audio-visually rich games, and as long as the game achieves flow, it will be an effective learning tool. This is in direct contrast to multiple studies showing game screenshots with high-polygon modeling (Chittaro and Buttussi 2015; Verpoorten et al. 2014) and describing detailed game sounds (Chittaro and Sioni 2015; González-González et al. 2014; Van der Spek et al. 2013). Some insist that accurate 3D modeling and realistic sound help players understand and experience the physical space (Baranowski et al. 2011; Byun and Loh 2015; Dickey 2011). All-in-all, this presents serious games makers with another hard-to-solve puzzle. Keeping with the visuals, we encountered in the articles, yet another layer of fidelity complexity comes into play. It is evident that physical fidelity is dependent on the age (or possibly the learning level) of the target players. Games aimed at pre- and early-elementary school children clearly contained unsophisticated 2D graphics with distinctly more stylized artwork. Although we cannot provide an accurate portrayal of the evolution of the visuals (due to a limited number of articles), the physical fidelity of games at high school level and beyond was decidedly more complex and real-looking (most often in 3D). Therefore, we suggest that serious game designers take the age of their target audience into account, before deciding on a serious game’s physical fidelity assets. More holistically, we advocate that serious game designers involve their playing audience from the outset when designing a game. With such narrow target audiences, serious game designers may find very specific cultural nuances and environmental conditions that influence fidelity preferences. Notwithstanding age, culture or environment, we did find that players demand more sounds (Papastergiou 2009a), better graphics (Couceiro et al. 2013) and more credible and varied NPC responses (Johnson and Mayer 2010). In such instances, we suggest creating games which are high in realism without over-cluttering the game world with unnecessary objects. Annetta et al. (2009b) support this view when suggesting a realistic game-world presentation with less emphasis on animation, text and audio that does not aid learning. Bellotti et al. (2009) advocate a highly realistic game environment focusing solely on the learning aspects. In this way, both the learning and gaming camps of serious games can rest assured that players will be eager to play without the risk of detracting from learning efficacy.

Players enjoy fashioning their own narrative. This desire for control extends to the player character (PC)—frequently part of successful serious games (Alamri et al. 2014; Bellotti et al. 2009; González-González et al. 2014; Schmitz et al. 2014; Soflano et al. 2015a). Creating their own avatars excites players (Brom et al. 2010; Ke and Abras 2013). Avatar personalization provides an opportunity for players to create a unique reflection of themselves, their perceived selves or (most likely) desired selves within the game (Gee 2005). All of this stimulates a sense of player relevance and game immersion (González-González and Blanco-Izquierdo 2012), in turn promoting motivation to play and therefore increases engagement with the learning material. We are aware that the option of creating highly sculpted avatars is a tall order for serious game developers due to budgetary, resource and time constraints. With this in mind, we propose a three-tier hierarchy of avatar distinctiveness options; at the top is detailed avatar creation; second is avatar customization which enables the changing of ready-made characters in terms of clothing, hair color and ethnicity; and third is avatar selection where a player could choose an avatar from a prepared range of characters. That players want the option of a unique PC is undeniable, and it is furthermore imperative that both male and female avatars can be realized in order for both genders to have an enhanced PC experience during gameplay (Couceiro et al. 2013).

Engaging with non-player characters (NPC) is vital to serious games. NPCs are game companions and can be in the guise of peer players (in multi-player games) or game-controlled characters with roles varying between guiding gameplay, conveying/explaining learning material or humorous distractions. Given these NPC roles, communication between players and NPCs is a resounding functional fidelity focus. The primary modes of NPC-to-player communication are textual or vocal, while player-to-NPC remains textual. We did not encounter voice recognition for player-to-NPC communication during this review. Only Torrente et al. (2014) at length spoke about this technology in the context of assisting players with limited mobility to control the PC’s movement. We therefore focus our discussion on the NPC-to-player direction of communication. Players prefer NPCs to interact with them through voice rather than text (Johnson and Mayer 2010). This is once more in line with the over-arching sentiment of players wanting more realistic games. We safely reason that NPC voice communication is less flow-disruptive than text; predominantly owing to the time it takes to read the text and the additional cognitive load, this activity creates. From a learning perspective, players are accustomed to facial expression and intonation in their traditional education settings and know how to interpret these social cues. They, hence, enjoy engaging with support that offers information and encouragement through social cues (Van Eck 2006b). Byun and Loh (2015) point out that voice-overs positively affect play-learner engagement and that text is unable to replicate intonation, pauses and emotion of spoken words. For this reason, Kuk et al. (2012) and Bellotti et al. (2009) insist that particularly pedagogical NPCs should own voice and emotional characteristics. If voice-over recordings cannot be executed, serious games producers should keep the textual bursts to the point (Bellotti et al. 2012) with polite rather than direct (Johnson and Mayer 2010) natural language (González-González and Blanco-Izquierdo 2012). However, authors agree that players enjoy interacting with voice-enabled NPCs (Annetta et al. 2009a; Virvou and Katsionis 2008).

Serious games are well embedded in the constructivist learning theory (Cheng et al. 2014; Zin and Yue 2013), where thoughtful reflection and a non-competitive environment are part of the underlying assumptions (Jonassen 1994). That is, constructivist learning is about new environments, challenging the player’s existing knowledge framework and allowing them the time to explore and test this conflict. In other words, we find it counter-intuitive to engage in explorative learning with time limits in place and as such were startled to uncover that 36 % of the studies utilized time pressure as a form of psychological fidelity. Stepping away from the learning aspect and focusing on gameplay, we believe that competition may hold the key. Serious games producers do not deny that competition is one of the primary drivers spurring gameplay (Arnab et al. 2013; Hong et al. 2013b; Hwang et al. 2012a; Zin and Yue 2013). We take the stance that serious games are successful if they get played (granted that the pedagogic infusion is decent)—fun games are played. Therefore, if competition supports this goal, it should be included. When focusing on learning aspects, competition among students is often discouraged. This leaves competing against the clock as an option to satisfy both learning and playing. Time is an adversary that cannot be physically singled out among a group of learners, yet it remains a worthy opponent drawing players to the game. Furthermore, time pressure can offer a furrow for collaborative learning—a learning approach many authors highlight (Admiraal et al. 2014; Hämäläinen 2011; Kiili 2005; Schmitz et al. 2015b). Serious games producers must be equally aware that using severe time pressures can be detrimental to cognitive learning (Hong et al. 2015).

4.3 AI and adaptivity

In our view, a disappointingly small proportion (21 %) of the reviewed articles employed what we broadly consider as AI—a seemingly intelligent or unscripted game response to player activity (i.e., by our definition, Super Mario Bros™ has no AI). AI influences serious games on two fronts: (a) adjustment within the game through agents; and (b) adjustment of the game itself by means of adaptivity. Game agents operate either as reflex or goal-directed agents. Reflex agents are programmed to react to instantaneous player actions, while goal-directed agents continuously aim to manipulate the game toward their preset goal state (Russell and Norvig 2014) (no article was found that employed goal-directed agents). Adaptivity does not respond to a single action, but rather constructs user profiles for matching game presentation with player characteristics (Cheng et al. 2015; González-González et al. 2014).

Player progress-tracking takes place either through recording player activity in a database or activating programmatic flags and is the core of reflex agents and adaptivity (Soflano et al. 2015a). Such flags include: (a) taking too long to answer a question; (b) repeated mistakes; or (c) aimless roaming of the game world. Reflex agents respond to these triggers by informing the player, intervening when misconceptions occur, hinting or providing the player with appropriate feedback. Often agents act under the appearance of an NPC, making their intervention both unobtrusive and timely (Bellotti et al. 2009). The unobtrusive nature of the intervention sustains player immersion (Kickmeier-Rust and Albert 2010), while the timeliness (receiving help when requiring it) of the intervention has indisputable formative learning benefits. Kuk et al. (2012) observed students giving more correct answers after agent interventions. Natural NPC reactions (Barab et al. 2012) and tailored responses (Thompson et al. 2010) from reflex agents deliver a greater sense of enjoyment among players.

Adaptivity’s role is principally to enhance the learning facet of serious games. Hwang et al. (2012b) and Soflano et al. (2015a) created games which adapted player learning styles, while (Bellotti et al. 2012) applied adaptivity to adjust the game’s difficulty level according to player ability profiles. These approaches indicated significant learning gains over their respective control groups, leading us to conclude that adaptivity not just has the “potential to significantly shorten completion time” (Soflano et al. 2015a), but adaptivity also promotes individualized learning—something not easily achieved in traditional education.

In as much as AI boosts enjoyable learning, we must not lose sight of the computational costs involved with AI (Ketamo and Kiili 2010), especially when effecting machine learning and neural networks. This is further compounded with the increased popularity of web-based multi-player games requiring centralized processing (Ketamo and Kiili 2010). The simplest solution to this dilemma comes from Virvou et al. (2005) who generated a pre-defined knowledge domain. Players are kept from straying outside its boundaries by allowing only multiple-choice interactions with answers which do not extend beyond the existing domain. Brom et al. (2011) promote the selection of audience-appropriate machine learning algorithms and state that algorithms capable beyond the game’s desired impact take longer (through unnecessary computation) to process seemingly simple computations and as a result, frustrate players. Ketamo and Kiili (2010) resolved their quandary by pruning the resultant neural network in order to remain within a pre-determined size.

4.4 Interaction

What sets serious games apart from other forms of edutainment is the element of interactivity. Games require user input and respond accordingly, in turn instigating the next player action and continuing in a repeated player-game feedback loop. Games present players with an interface comprising of the game world, where action-consequence is audio-visually experienced and a heads-up display (HUD) to communicate the current status of the game back to the user. Serious games producers should avoid complex interfaces as they take time and effort to become accustomed to (González-González and Blanco-Izquierdo 2012), frustrate novice players into possible quitting of the game (Kiili 2005; Van der Spek et al. 2013) and induce additional cognitive load (Hong et al. 2013b; Hwang et al. 2015). This implies that the game interface should be as straightforward as possible both in the way players provide input to the game and the way communication is returned to the player. Chittaro and Buttussi (2015), Knight et al. (2010) and Soflano et al. (2015a) make use of a minimalist control mechanic (point-and-click for movement and selection) replacing the entertainment game standard (simultaneous use of the mouse and keyboard for movement and other actions) because students are not be interested in a game with hard-to-use controls (Zin and Yue 2013).

Serious games producers should be cognizant that their intended playing audience is not necessarily familiar with games. Consequently, the HUD should transfer the game status and available gameplay tools in a clear and unsophisticated manner to the player. Virvou and Katsionis (2008) found that novice players did not use the available game-world map or inventory. As a result, players got lost in the game and did not access the learning material provided via the inventory. Therefore, a complex interface will turn players away, causing them to miss out on what could have been a fun-filled learning experience. Many successful entertainment games resolve this by allowing users to customize the controls, the HUD and/or level of gameplay complexity (e.g., crash damage in racing games). We promote the notion that games should become complex as players welcome gradual difficulty level increments of game tasks (Couceiro et al. 2013; Ke 2008). This is an integral part of flow theory dictating that challenge should constantly be on the fringes of player ability (Csikszentmihalyi 2008). This also relates to scaffolding as an approach to learning Vygotsky (1978). To commence the scaffolding process, several studies advocate introductory or practice levels allowing players to acclimatize to the game’s interface (Hämäläinen 2011; Hwang et al. 2013b; Ke and Abras 2013; Van Eck 2006b). Hong et al. (2015) noted that increased practice times improved game-learning performance.

As collaborative learning is a dominant approach of successful serious games, it comes as no surprise that player-to-player interaction emerged as a leading serious game success factor. Even when play is not intended to be collaborative, players often share gameplay tactics and solutions with one another. Kiili and Perttula (2012) described an accelerated form of this behavior when players transformed (not programmatically, but through gameplay) a single-player game into a turn-based collaborative effort. Player-to-player interaction modes include chat interfaces (Dickey 2011), avatar communication through text (Bellotti et al. 2009) or integration with voice communication tools such as Skype™ (Hämäläinen 2011). Although in-game player-to-player communication proposes distinct solutions, we have not uncovered any forums outside of gameplay allowing players to share their game accomplishments. We postulate that incorporating opportunities to discuss game tactics and achievements in post-game debriefing sessions will provide less strong learners a different voice to speak with, which in turn, would facilitate increased participation during debriefing.

4.5 Feedback and debriefing

Serious games feedback presents the double-barrel option of: (a) in-game feedback experience through a variety of in-game reward mechanics or NPC interaction; and (b) post-game debriefing and reflection sessions which ultimately elucidate the learning material and place the game-learning experience into a greater context. In-game feedback affords players the opportunity to experience immediate cause-and-effect of their activities (Cheng and Annetta 2012; Johnson and Mayer 2010), while instant updates of the game’s current status give players a sense of progress and competitive standing (Cheng et al. 2015; Kiili 2005; Kuk et al. 2012). The latter leads us to concur with a multitude of studies, which recommend showing the game’s reward mechanics (e.g., points, leaderboard and level indicators), resource tools (e.g., inventory items and maps) and/or time-related elements throughout (Arnab et al. 2013; Chittaro and Sioni 2015; Verpoorten et al. 2014). Resource tools such as an integrated map indicating PC and other key locations best obliges vast game worlds (Hämäläinen 2011; Sadler et al. 2015) to prevent frustration and time wasting.

Serious games could provide a setting conducive to a high degree of formative learning through the exploration of possible cause-and-effect in a risk-free environment (Cheng et al. 2015) with the knowledge of likely safe recovery (Hwang et al. 2015; Ketamo and Kiili 2010; Squire 2013). To further enhance this setting, real-time teacher support should be the gold standard for serious game producers. Players appreciate unobtrusive support when necessary (Ke 2008; Kuk et al. 2012) and support through pedagogical intervention when asked for (Ke and Abras 2013; Serrano-Laguna et al. 2014; Van Eck 2006b). Several studies demote the lecturer to technical support, for the obvious purpose of otherwise risking the study’s credibility. The reality, however, is that serious games should supplement the learning environment, not replace it (Sadler et al. 2015; Schmitz et al. 2015a; Virvou et al. 2005). It is impossible for teachers to be omnipresent during gameplay. We suggest dedicated gameplay times during which facilitators are part of the game play. In multi-player games, teachers’ presence is appreciated as a game avatar who could provide support (Barab et al. 2012; González-González et al. 2014). Single-player serious games require a chat interface to a master computer or a communal learning space with the teacher physically present (Annetta et al. 2014; Brom et al. 2011); or the game could be played via a projector on a big screen either in a turned-based fashion (Kiili and Perttula 2012) augmented with teacher discussions (Arnab et al. 2013) or narration. The latter option diminishes the interactive advantage of serious games while reducing the players to spectators of the story where they were actors before.

Debriefing is the most important opportunity for players to process and consolidate their in-game learning events (Crookall 2014). Debriefing is not something that changes the game’s appearance, but in-game learning activities certainly support its cause. Not just by means of a recollection of memorable gameplay moments, but by game-generated progress-tracking reports. Progress tracking has the advantage, especially in the case of health-related games (Alamri et al. 2014), of sending immediate player progress reports, which the facilitator can use as enablers for post-game debriefing (Baranowski et al. 2011; Hong et al. 2013b). Couceiro et al. (2013) envision a future version of their game with a progress-storing mechanism for remote inspection and delayed debriefing. Although some articles described the effective use of debriefing discussions during paused gameplay (Arnab et al. 2013; Brom et al. 2010), we are reluctant to recommend incorporating this into serious game designs. Only one study mentioned using chat logs as progress tracking specifically suited to post-game debriefing (González-González et al. 2014). Nevertheless, given our earlier finding that chat interfaces are a success factor for multi-player serious games and the relatively small programming step required for recording these logs, we foresee chat logging as a definite value-add for post-game debriefing. Even if conversations are not about the learning material, some valuable input about the fun, or not-so-fun, elements of the game may arise. Therefore, facilitators should make the effort to scrutinize the chat logs before making them public. The usefulness of progress tracking extends beyond current players as it could point out gameplay trends or game flaws to tweak scaffolding for future learners or to remove game errors. Games with errors are known to deter gameplay (Torrente et al. 2014; Virvou and Katsionis 2008). Lastly, the thought of cross-pollinating the functionality of backend databases for feedback and debriefing with AI or “micro-adaptivity” (Kickmeier-Rust and Albert 2010) should convince serious game producers of a positive cost-benefit ratio—both for fun and learning.

5 Limitations

We have only examined academic serious games, therefore limiting ourselves to games that have been designed and tested within academic contexts and which may not be representative of the best in the field. Limitations, common to reviews in general, include the use of search terms and delineation of time period. Additional limitations of this review, however, include: (a) picking salient work from the sheer volume of available articles within the review’s scope; and (b) the effort of locating some journal articles that we would have liked to review for inclusion. The most overt limitation with regard to article inclusion for this study came about in the search phase. Although we had identified them initially, we did not search the IEEE and ACM databases because of the large number of articles (1 232) we had already amassed before delving into these databases. Admittedly, we may have missed some key research, but we remain confident that searching these databases would not have made a noteworthy impact on our findings. With regard to the field of application, only 20 % of the articles reported on post-study professional development and/or training. It seems that schools and higher education have replaced the military as the primary consumers of serious games. Multiple reviews (Akl et al. 2008; DeSmet et al. 2014; Papastergiou 2009b) evidence that the health sector readily embraces serious games, yet only 14 % of our inclusions represented this segment. The health sector (and others) employs games that are not just geared toward learning, but also have the aim to bring about behavior change or improve motor skills. We have limited our discussion to the learning construct of serious games and have therefore not included serious games that have other aims. Our view on this, however, is that each different purpose of a serious game will bring about a new set of requirements that is best reported in separate reviews. In spite of these obstacles, the authors concur that the list of common serious games success factors isolated from the articles is valid and representative for serious game across a broad spectrum of applications.

6 Conclusion

What practical guidelines can serious game producers incorporate to guarantee successful learning with games? From our analysis, it is the playing audience who hold the key to successful serious games. From the reviewed articles, we conclude that they want to have fun before they value the subsequent learning-benefit serious games can offer them. Serious games producers must not impede this hunger for fun, but rather use it to stealthily engage the player with the required learning material. This implies that the games need to have replay value rather than be a once-off learning endeavor. These single learning exercises often result in positive player feedback because they are fresh presentations of the learning material. We uncovered five themes (backstory and production; realism; AI and adaptivity; interaction; and feedback and debriefing), which provided containers for the various success factors, combating this novelty effect.

When players encounter a dull-looking game with little or no story, they will play it once or twice with great enthusiasm, maybe a third time after some persuasion and then turn their backs on it. Confront players with the high degree of realism and open-ended narrative they favor and admittedly learning will be minimal. That is, until the sensory stimuli become familiar and lose their attention-grabbing ability. The value lies in what remains—a game deemed worthy of repeated play. A narrative, which has no distinct breaks between learning and playing while providing a fitting context for the learning material must now take over gameplay motivation. This motivation drives players to become adept at the skills required to progress in the game. Hence, the game-task skills should mirror the intended learning impact.

No matter how captivating the game, learners will not step away from a game with the desire to learn more about the game’s subject material. This would be akin to changing a player’s sphere of personal interest; not many Rollercoaster Tycoon™ players build their own backyard loop-the-loop tracks or read up on the physics of a Ferris wheel. Serious games producers should rather maintain the situational interest (playing a good game) that has been cultivated thus far. This can be achieved through promoting a player’s sensation of flow (immersion) and avoiding game elements that disrupt it. Flow theory suggests that as gameplay progresses, player abilities go up and that challenges should always be on the edge of player ability. This suggests that game tasks should become gradually more difficult in order for a player’s cycle of mastery to be continuously challenged—as also prescribed by the constructivist learning theory. Immediate in-game feedback, an intuitive game interface with minimalist control mechanics and an uncomplicated heads-up display will prevent flow interference.

Thus far, the player has been captivated, motivated and immersed in going through a fun-learning process. AI can polish off the aesthetic by establishing an emotional connection with the player through personalized responses and gameplay modes maintained by progress-tracking mechanisms. These mechanisms for fun can be shared with the learning aspect of serious games by assisting post-game debriefing activities designed to place the in-game learning experience into a greater perspective. Some researchers herald debriefing as the most important learning mediator for the serious games experience. We recommend utilizing progress-tracking reports of in-game learning activities and possibly chat logging to further enhance the value of debriefing.

The production team of serious games involves a medley of proficiencies of artistic, programming, subject matter and pedagogical experts. The bona fide success component of the production team, however, is the end user. Involving a homogenous target player group from early in the design and play-testing phases will ensure that the game will be enjoyed. Enjoyable games are not just played when the intervention is due or when the curriculum demands it. Enjoyable games are played by choice. We reiterate that this is the true measure of successful serious games as players will be engaging with the learning material when they would otherwise have been doing something fun.

7 Future research

Our analysis revealed minimal theoretical underpinning as a design basis of serious games. The limited use of theoretical approaches is possibly explained by their diversity. We found an almost one-to-one ratio of theoretical approaches to game designs in the reviewed articles. This may partly be the cause of allegations that the field of serious games is scattered with inconsistent research. We suggest a consolidation of existing frameworks, theories and models drawing out the most significant aspects from each of them into an understandable and practically implementable approach concretizing the requirements for successful serious games.

Keeping in line with the pragmatic nature of this review, we would like to see a summary and application recommendation of the different authoring tools and techniques to speed up, without jeopardizing quality, the creation of serious games. (Petridis et al. 2010) have proposed similar ideas in their investigation on different game engines suitable for serious games production. Their work, however, is limited to game engines and is buttonholed with the idea of developing serious games from scratch. We suggest to extend their work to XML-driven platforms for serious game development such as <e-Adventure> (Torrente et al. 2014), or technologies such as Neverwinter Nights where game modding is the fundamental exploit to speed up serious game creation (Byun and Loh 2015; Soflano et al. 2015a), which can be further augmented with the use of ScriptEase (González-González and Blanco-Izquierdo 2012). This suggested work may even include additional specific lower-level techniques such as enhanced billboard modeling (Bellotti et al. 2012).

A further managerial tool more often associated with corporate climates, which could be equally useful to serious games makers, is a risk assessment model—not just an impact-likelihood presentation of the risks associated with building and implementing serious games, but also providing appropriate mitigation strategies for the related risks. Although serious game risk analysis was outside the scope of this review, we scrutinized a telling number of articles, without encountering pertinent work toward establishing a risk profile for serious games development.

Most of the serious game researches brought forward in this review, as well as many of the excluded studies, examines the impact of serious games on the end-consumer. Moreover, the studies are conducted under the guidance of researchers well versed in the environmental requirements of serious games implementation. We advise shifting some of the research emphasis to the supply side of serious games and suggest more attention be given to the professional development (PD) of serious games protagonists in their qualified capacities as trainers or educators. Although some studies (Papastergiou 2009a; Sadler et al. 2015; Torrente et al. 2009) refer to the importance of teaching-the-teachers to implement serious games, a formal theoretical framework (or set of guidelines) for the teaching of teachers would be a valuable addition to the field of serious games. A natural progression would be to refine the suggested framework by evaluating its training effectiveness through well-designed experiments comparing the impact and reception of games for those who have received PD versus those who have not.

Our research indicates that 2D and 3D games are almost equally popular with, respectively, 33 and 40 % of the 55 unique games encountered categorized into these two styles. Since cost and time are explicitly mentioned as recurring factors in the building of serious games, it would be valuable to determine practically useful time and cost structures to show the real implication of developing and maintaining 2D versus 3D games. Precluding this, however, we feel a comparative meta-analysis of the effectiveness of each these environments should be undertaken. In this way, a true reflection of the cost-benefit analysis for each style could be initiated. Furthermore, our analysis could not reveal whether specific playing audiences prefer either 2D or 3D games. We speculate a likely correlation between age and preferred game style. Further research will have to verify this and other explicit style preferences—combining it with clear cost- and time-structures, which provide a powerful guide to the commencement of any serious game production. Careful consideration should also be given to pseudo-3D (2.5D) or combinations of 2D and 3D environments—each of these made up 9 and 11 %, respectively, of the games investigated.

A last, but fundamental, aspect of games, which raises many questions, is that of competition. Some learning theories have shown that competition may raise anxiety levels causing players to either quit gameplay or avoid it altogether, both of which are detrimental to the desired learning aspects of serious games. From a gameplay perspective, however, competition is what drives the motivation to play games. Given the conclusion of this review; that a successful serious game is one that is played out of choice, we find ourselves hard-pressed to eliminate competition from serious games. Turning to one of the major findings of this review: players appreciate a sense of control regarding their gameplay environment, may reveal the answer. We suggest creating an environment where players have the option to switch off the competitive elements (player score, leaderboard or time pressure) of gameplay that monitors which players actually perform this action and attempt to correlate this to (among possible others) learning style, prior achievement in the learning subject and/or previous gameplay exposure. Carefully crafted experiments determining a game’s learning impact with and without competition, taking player preference into account, will round this topic off neatly and maybe provide a clear-cut answer to the competition-in-serious-games debate.