Introduction

Modern surgical training curricula aim to create surgical experts in a time- and cost-efficient manner [1]. The difference between nonexperts and experts in surgery is the latter’s ability to recognize and solve problems timely and adequately, the acquisition of which requires prolonged deliberate practice [2]. Patent safety concerns limit possibilities to train in everyday clinical practice. Therefore, educators seek alternative methods [2, 3]. Whereas technical surgical skills may be trained efficiently using simulators [4], diagnostic reasoning and problem-solving (e.g., in the emergency department or ward) are largely learned in the work place as few alternatives are currently known to exist.

Serious games are “interactive computer applications … with challenging goals, that are fun to play, and supply users with skills, knowledge or attitudes useful in reality” [5]. A well-designed serious game combines psychological factors, design, and technology to engage learners in voluntary training. Games have many effects on the brain, most of which occur at a behavioral level. Serious gaming can influence the brain’s adaptive neural plasticity, resulting in both structural [6] and functional [7] changes. Hence, whereas many may perceive video games as simply entertainment, they actually form a potent tool for learning. Players are required to develop and test hypotheses to overcome a given challenge relating to experiential learning [8, 9]. Serious games could thus serve as a powerful tool for cognitive enhancement in surgical training by mimicking clinical problem-solving. To date, games training clinical decision-making outside the operating room are novel, and evidence remains scarce [10].

A serious game was developed to train correct diagnosis and management of biliary tract disease. This serious game is based on a quiz game in which time-pressure and competition is used to trigger the intrinsic motivation of the player [11]. When assessing the value of a game as a serious training instrument, information on its validity is essential. This includes face validity (degree of perceived resemblance between medical constructs represented in the game and reality), content validity (degree to which game content covers the targeted medical construct), and construct validity (degree to which outcome parameters can measure differences between experts and novices) [12]. This study hypothesized that both surgeons and trainees would regard medical constructs to be represented realistically (face validity) and that surgeons would outperform trainees, indicating the game to be both robust and believable (construct validity).

Materials and methods

Participants

Surgeons, surgical residents and medical students from an academic hospital were recruited between September and November 2013. All 41 participants received a standardized instruction tutorial, which did not involve education of surgical content. Participants received a personal invitation with personal login code. Participants played a minimum of one gaming session, averaging five minutes of playtime. Scores were compared between groups of different levels of expertise: surgeons (licensed surgeon), residents (surgical trainee), interns (has experience as physician at surgical wards, no surgical training), master students (medical student in surgical clerkship), and bachelor students (medical student, no clinical experience). After playing, participants filled out a questionnaire. Questionnaire results were linked to performance results through login code; results were assessed anonymously.

Serious game

The serious game Medialis (Little Chicken Game Co., Amsterdam, The Netherlands) contains cases on diagnosis and management of patients with biliary tract disease (Table 1). The medical content was validated prior to the study. Four surgical residents (postgraduate years 5 and 6) with experience in gastrointestinal surgery (>50 laparoscopic cholecystectomies as the primary surgeon) checked each case independently. Cases were evaluated and were marked as “valid” or “invalid.” When marked invalid, cases were removed or corrected. The results were rechecked by two surgeons (M.S. and S.L.) until all were considered valid. The prototype was then tested for fluency, reading time, and clarity of imaging.

Table 1 Description of 3/97 cases

The serious game contained 97 cases: cholecystolithiasis (15), laparoscopic cholecystectomy (15), bile duct injuries (9), biliary pancreatitis (8), choledocholithiasis (5), cholecystitis (4), gallbladder carcinoma (5), adenomyomatosis (3), a variety of minor topics regarding biliary tract disease (33). Learning objectives concerned pathophysiologic and epidemiologic background, workup, treatment, and surgical management based on the residency teaching curriculum [13].

Cases consisted of an image, information describing a clinical problem, and possible solutions (Fig. 1). Participants solved as many cases as possible within one play session. The player had a maximum of 10 s to solve each case. After each attempt, players received feedback focused on their solution. Gaming mechanics included playing against time, competition, and sharing high scores among players on popular social platforms (leaderboards) to increase players’ visibility and boost motivation. The latter two options were turned off during this research phase, securing players’ anonymity. The performance of the participants was automatically measured through the following parameters: correct, incorrect, or neither correct/incorrect solution (+1, –1, 0 points, respectively); time required solving the case; total score; and total play time.

Fig. 1
figure 1

Serious game (screenshots). The player is presented with a case (left) with four resolutions. After making a choice, the player is presented with feedback (center), points, and extra session time. After a session, the player can review his or her statistics (right)

Questionnaire

The questionnaire (Google Docs; Google, Mountain View, CA, USA) contained items on demographic characteristics (10), realism (5), educational and testing value (8), perceived desirability and preferred user groups (7), perceived user experience (8) and game implementation (5). A medical psychologist checked the items on formulation and consistency. Participants could elaborate through open text boxes. Statements were scored on five-point Likert scales (1, disagree; 2, slightly disagree; 3, neither agree nor disagree; 4, slightly agree; 5, agree). Results were compared between three groups: expert (surgeons), surgical trainee (interns and residents), and novice (bachelor- and master-degree students). A median score >3.49 was considered a positive opinion toward the statement.

Statistical analysis

Performances of user groups with clinically relevant levels of expertise were compared (surgeons, surgical residents, interns, master-degree students, bachelor-degree students). Proportions of correctly solved cases and mean case time were compared with parametric and nonparametric tests, respectively. Score improvements during repeated sessions were compared through Wilcoxon signed rank testing. IBM Statistical Package for Social Sciences version 20 was used (IBM, Armonk, NY, USA).

Results

Characteristics

The 41 participants completed a median of 70 cases (interquartile range (IQR) 39–87). Demographic characteristics are described in Table 2. Age differences between the groups were statistically significant (one-way ANOVA with post hoc Bonferroni, p < 0.01). A significantly higher proportion of master and bachelor students were female (Chi square compared to other groups, p = 0.03). There were no significant differences in videogame experience between groups.

Table 2 Demographic characteristics of the participants

Face validity

Overall, 34 participants completed the face validity questionnaire (one trainee and five students failed to complete the questionnaire). The majority found the presentation of images, radiology, and clinical situations realistic (88.2, 97.1 and 91.2 %, respectively) (Table 3). In total, 60.6 % stated that decisions in the serious game are based on realistic elements, and 66.7 % found their clinical experience to be helpful.

Table 3 Participants’ opinions on representation of important medical constructs in the serious game (medians in bold)

The majority believed the serious game to be useful for learning disease background (77.6 %), decisions during the workup and treatment (82.4 %), perioperative decisions (64.7 %), knowledge on medical technology (55.9 %), and risk management (53.0 %) (Supplementary Table 1). Participants found it useful for testing clinical decision-making (76.5 %) and for monitoring trainees’ progress (64.7 %). The majority considered training with this serious game desirable (77.5 %) and more fun than classic training (94.1 %) (Supplementary Table 2) 0.2 The majority of surgeons and residents found it useful for training students (58.9 %) and residents (94.1 %). Surgeons found the serious game also useful for training surgeons (100 %). Interestingly, surgical residents disagreed with surgeons on this item (Kruskal–Wallis, p = 0.02).

The majority of gamers had enjoyed the game, considering it fun (91.2 %) and challenging (85.3 %). Furthermore, 44.1 % said they felt involved during game play (Supplementary Table 3). The majority did not feel frustrated (55.9 %), or bored (79.4 %), and 44.1 % did not feel easily distracted. Experts and trainees found the serious game less frustrating than did novices (Kruskal–Wallis, p = 0.07).

The majority of participants considered the serious game an addition to regular surgical training (79.4 %), although not in an obligatory fashion (Supplementary Table 4). The majority (81.8 %) would recommend it to colleagues, and 58.8 % would play in their free time. A majority (55.9 %) considered it a useful contribution to patient safety.

Surgeons were overall more positive [median 3.89, interquartile range (IQR) 0.47] about the game than residents (median 3.57, IQR 0.39) or students (median 3.68, IQR 0.82) (Kruskal–Wallis, p = 0.04). The student group showed a wider range of attitudes than other groups.

Construct validity

All participants completed the first play session, 22 completed a second (5 residents, 2 interns, 8 master-degree students, 7 bachelor-degree students), and 10 completed a third (3 residents, 1 intern, 3 master-degree students, 3 bachelor-degree students).

Figure 2 shows the combined proportion of correct and not correct/incorrect choices during first session for each study group. Surgeons (mean 0.77, SD 0.09) solved more cases than residents (mean 0.67, SD 0.05), interns (mean 0.60, SD 0.09), master-degree students (mean 0.50, SD 0.10), and bachelor-degree students (mean 0.39, SD 0.03). One-way analysis of variance (ANOVA) with post hoc Bonferroni correction showed differences between surgeons versus interns, master-degree students, and bachelor-degree students to be statistically significant (p < 0.01). Differences between residents versus master-degree students and bachelor-degree students were statistically significant (p < 0.01), as were differences between interns and bachelor-degree students and differences between master-degree and bachelor-degree students (p = 0.00 and p = 0.035, respectively).

Fig. 2
figure 2

Proportion of correctly and not-incorrectly resolved cases (combined) of participants’ first session on the serious game (n = 41)

Figure 3 shows the average time required by participants to solve first-session cases. Surgeons required a median of 8.47 s (IQR 0.72), residents 8.41 s (3.16 IQR), interns 7.71 s (4.02 IQR), master-degree students 8.99 s (1.66 IQR), and bachelor-degree students 8.20 s (2.26 IQR). The differences were not significant (Kruskal–Wallis, p = 0.73).

Fig. 3
figure 3

Participants’ mean time necessary to resolve a case during the first session of serious gaming (n = 41)

Figure 4 shows that participants improved their performance during their second gaming session (median proportion of 0.72 (IQR 0.23) compared to 0.48 (IQR 0.22) during the first (Wilcoxon signed rank, p = 0.000)). Intra-group comparison showed a statistically significant improvement within two sessions for residents [median 0.66 (IQR 0.10) in session 1 and 0.84 (0.21) in session 2; p = 0.043]; master-degree students [median 0.48 (IQR 0.19) in session 1 and 0.77 (0.13) in session 2; p = 0.012]; bachelor-degree students [median 0.38 (IQR 0.04) in session 1 and 0.56 (0.23) in session 2; p = 0.043].

Fig. 4
figure 4

Proportion of correctly and not-incorrectly resolved cases (combined) in the first and second round of serious gaming (n = 22)

Discussion

Valid serious games have the potential to shorten surgical trainees’ learning curves in clinical reasoning and problem-solving, increasing time efficiency in surgical training. They provide direct feedback on decisions made by the player by distributing rewards and punishments. They assist in learning at different paces and allow practice to the point of mastery and automaticity. Repetition is a precondition for long-term potentiation. Whereas gamers repeat actions as they play, their strengthening of synaptic connections induces memory storage and learning. Medialis is the first serious game to show face, content, and construct validity for training clinical decision-making relevant to surgeons outside the operation theater. Quality of decision-making appears to be a valid assessment parameter in the game (proportion correct), whereas the time required to solve cases is not.

According to the study results, serious gaming could be of considerable use to younger surgical residents, enhancing their level of functioning and saving valuable time that can be spent otherwise, such as in surgery itself. Moreover, structured assessment of skills and competencies is becoming more relevant to modern curricula. New curricula have introduced systems of entrustable professional activities and statements of awarded responsibility to measure and assess trainee’s competencies in specific clinical activities [14, 15]. A serious game that validly measures specific levels of competencies could facilitate this entrustment process. Such serious games could also play a role in training and recertification of surgical specialists.

Learning in serious games is based on two principles. Gameplay closely mimics the problem-solving ability required in diagnostic and therapeutic reasoning (i.e., testing and readjusting hypotheses), thereby leading to experiential learning [8, 9]. Learning in this study occurs over time, and participants significantly increase their scores in consecutive sessions. Second, high levels of engagement in gameplay are thought to draw players into a state of “flow” [9]. As such, players are completely absorbed, ignoring all external stimuli and focusing their attention solely on the gameplay and thus on prolonged deliberate practice [9]. Flow occurs mostly when the level of difficulty is adjusted optimally to the level of skill of the player. Results show that this serious game complies with these principles, as 44 % of participants feel engaged when gaming (67 % of the surgeons) even though two of the main competitive game mechanics were turned off during the study. Of the students, 8 of 17 found the game frustrating (compared to 1 of 11 trainees and 1 of 6 experts). According to the “flow theory,” the level of difficulty is too high for some students, which could be adjusted by incorporating difficulty levels.

The majority of surgical residents and surgeons have experience with video games and are therefore likely to embrace using a video game in surgical training. Players are enthusiastic about dealing with everyday medical content in an engaging way. Positive statements by experienced users indicate that serious games have the potential to overcome the appreciation problems of simulators in surgical training. Although generally considered to be both efficient and attractive for teaching surgical skill, residents do not practice voluntarily on simulators in their free time [16].

Serious games currently experience a development surge, even though randomized trials on learning outcome of individual games remain scarce [10, 17]. Recent studies show promising use of game mechanisms optimizing adherence to surgical skills training. Verdaasdonk et al. showed that “gamifying” simulator exercises by adding real-time competition and reward systems significantly increased exposure time and determination to play in 31 surgical professionals [11]. Badurdeen et al. showed that performance of 20 surgical trainees on commercially developed, off-the-shelf video games with motion-sensing controllers (Nintendo Wii™) correlated with laparoscopic simulator performance [18]. Jalink et al. showed that altering the software of laparoscopic simulators into a custom-made video game environment, controlled by laparoscopic handles, shows similar correlations [19]. Next to training technical skills, this study is the first to describe a serious game’s value for training clinical disease-specific decisions outside the operation theater.

Limitations of this study include the following. Analysis of subjective opinions may induce a Hawthorne effect. Exposure to questioning could have affected participants’ behavior and thus manipulated results [20]. By distributing questionnaires and game access online anonymously, social desirability effects have been partially dealt with. Second, a power analysis could not be performed beforehand because an estimated effect size from previous research was lacking. Therefore, lack of power cannot be excluded. The magnitude of differences in performance and limited variances within experienced groups ensured statistical significant differences in this study.

Third, a limitation of the face validity test is the relatively high loss of participants who failed to complete the questionnaire (one expert, one trainee, and five students). In face validity testing, the opinions of experts and trainees are considered most important. Fallout in these groups was relatively limited. Sex differences between the groups are in conformity with the proportions in the general medical student population [21], the surgical trainee population, and surgeons [11, 19].

Conclusions

This study is the first of its kind to demonstrate face, content, and construct validity of a medical serious game on clinical decision-making. It indicates the potential ability of Medialis to train and assess surgical trainees and professionals in an entertaining way. Future research should determine long-term sustainability of learning outcomes and transfer of skills to clinical practice. Considering the enthusiasm of licensed surgeons, the alleged “digital gap” between surgical residents and surgeons on mHealth applications appears to be closing fast.