Keywords

1 Introduction

Gamification, “the intentional use of game elements for a gameful experience of non-game tasks and context,” [1] is an interdisciplinary research area; a combination of game design, user experience design, behavioral economics and motivational psychology [2]. Gamification is most often a means to an end, i.e., many gamified services aim to increase motivation and engagement with the ultimate aspiration of promoting a certain behavior. To date, review studies that systematically investigate the effect of gamification mostly focus on a combination of psychological outcomes (e.g., intrinsic motivation, engagement), and/or behavioral outcomes at large (i.e., usage, retention) [3].

As the gamification field matures [4, 5], there is a call for a stronger effort in defining precise research questions on the basis of existing theorizations, and to further refine the understanding of the kind and size of the effects gamification has on individuals [5]. Hence, to further the field, it is needed to more clearly articulate the type of psychological and behavioral outcomes expected.

In this study, we focus specifically on one behavioral outcome of gamification, adherence, i.e., the extent to which individuals use a gamified service or system and experience its content, as defined or implied by its creators, in order derive a certain benefit [6,7,8]. Thus far, only few gamification studies focus on adherence itself, and the studies that do typically limit themselves to one specific product and application domain, such as health apps [9, 10], online learning software [11, 12], or programming [13]. To date, there has been no study that systematically investigates the effect of gamification on adherent behavior across diverse implementations and disciplines.

The contributions of the paper are twofold. The first contribution is empirical in nature, as we report on the impact of gamification on adherence across disciplines, from 27 academic papers. The results of our systematic review lend support to the hypothesis that gamification has a positive effect on adherence. The second contribution is theoretical in nature, as we further the understanding of adherence in gamification studies and promote both a more refined conceptualization as well as a set of standard elements to be present in any gamification study on adherence.

2 Background

We first discuss the concept of adherence and define it as one specific behavioral outcome, that is pertinent for much gamification research yet different from engagement. We end this section with the research objectives of the paper.

2.1 The Concept of Adherence

Although the term adherence is perhaps most established in the health domain [14], the concept itself has long been used in other domains as well. The Oxford dictionary defines adherence in layperson terms as “attachment or commitment to a person, cause, or belief,”. From this definition, we learn that adherence is not about short lived, single-point-in-time behavior, attachment and commitment imply sustained behavior. The Cambridge dictionary defines adherence as: “the fact of someone behaving exactly according to rules, beliefs, etc.” This definition highlights the existence of rules and beliefs, in other words, there is an envisioned intended behavior. Hence, from these two definitions, the reader can understand that adherence implies both a temporal aspect (behavior as it unfolds over a longer period) and an intended usage aspect (behavior as according to rules, beliefs).

More targeted definitions that elaborate on these temporal and behavioral aspects can be found in the health domain. For example, the World Health Organization (WHO) defines adherence as “the extent to which a person’s behaviour - taking medication, following a diet, and/or executing lifestyle changes, corresponds with agreed recommendations from a health care provider.” [14] With respect to eHealth, as a form of interactive technology close to the realm of gamified technology, Christensen et al. [15] put forward the following definition of adherence: “the extent to which individuals experience the content of the Internet intervention.” Again, these definitions encompass a temporal aspect (experiencing a certain ‘dosage’ of content) and intended usage (following agreed recommendations). To this end, Kelders et al. [6] promote a definition of adherence for interactive systems and services in eHealth that encapsulates both aspects, as “the extent to which individuals experience the content to derive maximum benefit from the intervention, as defined or implied by its creators.” Interestingly, the aforementioned definition of adherence lends itself well to the domain of gamification, as “the extent to which individuals use a gamified service or system and experience its content, as defined or implied by its creators, in order to derive a certain benefit,” and can also apply to situations beyond eHealth, such as e-learning and customer loyalty.

Basing themselves on Kelders’ definition of adherence, Sieverink et al. [7] suggest three elements to be present in any adherence studies:

  1. 1.

    The ability to measure the usage behavior of individuals.

  2. 2.

    An operationalization of intended use.

  3. 3.

    An empirical, theoretical, or rational justification of the intended use.

In combination, these three elements not only ensure empirical measurement of the behavioral outcome, but also that this outcome is compared to a pre-specified value or threshold (\(\sim \)intended use) and that authors need to be able to justify this value or threshold.

2.2 Adherence Versus Engagement

At the heart of gamification lie the interrelated concepts of engagement and behavior change [1]. Therefore, gamification scholars may consider adherent behavior simply as an outcome of increased engagement. While related, adherence and engagement are different concepts. For example, people can be engaged to lose weight, but still not adhere to their weight plan. As aforementioned, adherence starts from the notion of usage behavior that is sustained and as intended, whereas engagement on the other hand, foreground the affective, psychological experience. For example, Brown and Cairns [16], Brockmyer et al. [17], and Denisova et al. [18] conceptualize engagement as a multi-dimensional construct encompassing a user’s absorption, flow, presence, and immersion. Hence, it is possible for a gamified intervention to have engaged users who do not fully experience the content of the services as intended by its creators or prescribers (and are thus not adherent). Vice versa, users may show adherence to the recommended gamified plan, yet show a lack of engagement. To date, as the lines are blurry, the elusive concept of ‘engagement’ is frequently measured through behavioral variables such as ‘returning visits’ or ‘regular use’ [19]. Also the opposite exists, where a less strict understanding of the concept of adherence equates it to measures of engagement. For example, [20] and [21] measure non-adherence as a “lack of participant engagement,” i.e., they are using engagement as a proxy for adherence. Yet as the field of gamification matures, it is beneficial to more clearly delineate and separate these theoretical concepts, to refine our understanding and measurement of the impact of specific gamification strategies.

2.3 Studies on Gamification and Adherence

There are multiple review studies on the effect of gamification on psychological (e.g., engagement, motivation), behavioral outcomes (e.g., retention, increased usage) outcomes, e.g., [1, 22,23,24,25]. However, only the systematic review study of Brown et al. [26] on Web-Based Mental-Health Interventions explicitly studies adherence. The authors reported that web-based health interventions incorporating gamification features had a higher mean adherence rate. Yet, they also found both adherence and usage data were inconsistent or underreported.

In sum, the aforementioned study [26] suggests that gamification has a positive impact on adherence. However, it is limited to the mental health domain. To this end, this study sets out to broaden this scope and systematically investigate the effect of gamification on adherent behavior, across applications and/or disciplines. We set out to explore to what extent current studies conceptualize adherence according to the definition put forward by Kelders et al. [6] and adhere to standards recommended by Sieverink et al. [7], i.e., measuring usage behavior, operationalization of intended use, and a justification of intended use. Additionally, we aim to broaden our understanding of which gamification techniques are most popular and have the strongest impact, and which disciplines perform most studies on gamification and adherent behavior.

3 Materials and Methods

3.1 Search String

The protocol that was used to find and review the studies was developed according to the PRISMA guidelines [27]. In this systematic review we solely focus on gamification techniques hence the truncated keyword ‘gamif*’. We used Brown et al. [26]’s synonyms for adherence. However, we modified ‘retention rate’ to ‘retention’ to also include studies that, for example, report on customer and employee retention. Finally, ‘compliance’ and ‘concordance’ were added to include papers that use a less authoritative approach to describe adherence [28]. We therefore built on their work and extended the search string:

gamif * AND (adherence OR attrition OR dropout OR drop-out OR noncompleters OR non-completers OR “lost to follow up” OR withdrawal OR nonresponse OR non-response OR “completion rate” OR “did not complete” OR retention OR loss OR compliance OR concordance).

3.2 Data Collection

To find gamification studies across disciplines, a comprehensive search of seven electronic databases was conducted and produced a set of 1122 papers: Scopus (life sciences, social sciences, physical sciences and health sciences, n = 300), PubMed (life sciences and biomedical, n = 86), ACM Digital Library (all computing and information technology domains, n = 222), IEEExplore (computer science, electrical engineering and electronics, n = 56), Web-of-Science (multidisciplinary, n = 193), ScienceDirect (physical sciences and engineering, life sciences, health sciences, and social sciences and humanities, n = 12), and ProQuest (multidisciplinary, n = 253).

Fig. 1.
figure 1

Flow diagram according to the PRISMA guidelines.

3.3 Inclusion Criteria

Our review focused on high-quality research reporting original work on the effect of gamification on adherence. From this perspective, we developed the following inclusion criteria:

  1. 1.

    Peer-reviewed conference or journal papers.

  2. 2.

    Full papers (minimum length of four pages).

  3. 3.

    Explained research methods.

  4. 4.

    Researched effect of gamification on adherence as main research subject.

  5. 5.

    Reported how gamification was applied.

  6. 6.

    Reported the effect of gamification on adherence.

  7. 7.

    Reported behavioral or attitudinal measurements.

Criteria 1–2 were chosen to maximize the inclusion of high-quality and original research. Criteria 3–4 were included to enable an assessment of the quality of the work. Criteria 5 ensured that the included papers report on gamification, and not on serious games or persuasive technology. Finally, criteria 6–7 were chosen to ensure the included papers research the effect of gamification on adherence in a user study, and not only provide a conceptual discussion.

3.4 Exclusion Criteria

The exclusion criteria were designed to exclude duplicate reporting of earlier versions of studies fully reported later. We excluded papers with the following characteristics:

  1. 1.

    Extended abstracts, work-in-progress, workshops.

  2. 2.

    Study protocols or conceptual designs.

  3. 3.

    Studies that only cover serious games.

  4. 4.

    Studies that do not report an effect.

  5. 5.

    Systematic reviews.

  6. 6.

    Non-scholarly books.

  7. 7.

    Papers not written in English.

Criteria 1–2 exclude early and incomplete versions of studies. Criteria 3 excludes studies that mislabel serious games as gamification. Criteria 4 makes sure that the effect on adherence can be compared. Criteria 5 excludes studies that did not focus on one particular study. Criteria 6 excludes books that do not have a scholarly focus. Finally, we only included original research written in English.

3.5 Classification

Effect: Studies were classified as ‘significantly positive’ when they explicitly mention a significant positive effect. When they only mention a trend, they are classified as ‘positive trend’. Studies reporting no effect are classified as ‘no effect’. Studies that report a negative effect are classified as ‘negative’.

Adherence: To research how adherence is defined and measured in the gamification domain, we built on [26]’s findings to classify the papers: attrition, dropout, noncompleters, lost to follow-up, participant withdrawal, nonresponse, completion rate, did not complete, retention, loss, and compliance. Effectiveness was also added as a classification term as two papers explicitly used the term effectiveness. Additionally, drawing on Sieverink et al. [7], we developed an Adherence Rationale Index (ARI), to classify studies as follows:

  1. A

    study specifies intended use and provides a theoretical justification,

  2. B

    study specifies intended use but lacks a theoretical justification,

  3. C

    study neither specifies intended use nor theoretical justification.

Gamification techniques: In this systematic review, we used the classification proposed by Hamari et al. [3] to classify gamification techniques as shown in Table 2. The following list was used: points, leaderboards, achievements/badges, levels, story/theme, clear goals, feedback, rewards, progress, and challenge. However, as multiple studies included social motivational affordances into their gamification research, we augmented Hamari’s set with ‘social affordances’ which grouped guilds/teams, social network, social status, opponents, and direct communication. Finally, although serious game studies are excluded by the exclusion criteria, ‘serious game’ was added to the coding table as four studies [10, 29,30,31] each apply three conditions in their study design: the control condition, a gamified intervention, and a serious game as a third condition.

Study design and criteria: Each paper was classified as either a randomized control study (RCT) or as a baseline study. Studies were classified as an RCT when they had a randomized control group in parallel with the intervention group. Studies were classified as a baseline study when they could compare their results to a baseline value. Additionally, the sample size, demographics, and duration of each study were listed.

Scientific fields: A scoping review by O’Donnel et al. [32] shows that gamification became a multidisciplinary research topic, applied and used in several domains. To remain consistent, we used O’Donnel et al. [32]’s ten categories to classify the primary scientific fields of the papers: (1) Sciences; (2) Information & Computing Science & Technology; (3) Medical & Health Sciences; (4) Education; (5) Economics, Commerce, Management, Tourism & Services; (6) Psychology & Cognitive Sciences; (7) Law & Legal Studies; (8) Engineering, Built Environment & Design; (9) Arts, Humanities, & Social Sciences; and (10) Games, Digital Entertainment Media. This is a condensed format of the 22 top-level divisions of the Australian and New Zealand Standard Research Classification.

3.6 Intercoder Reliability

All studies were coded and calculated by two independent coders (RDC and JG). Intercoder reliability was calculated using Cohen’s kappa statistic. The mean value was 0.78 (± 0.13) and all values were significant P < 0.001. Overall, all intercoder reliability values were at an acceptable level, i.e., > 0.60 [33, 34].

All the adherence terms, rationales (Table 1), as well as the scientific fields were consistently coded with a kappa agreement of 0.82 (P < 0.001), 1.0, and 0.80 (P < 0.001) respectively. The gamification techniques were also reliably coded and values were found to be between 0.62 and 1.0 kappa agreement depending on the technique. The lowest rated principle in terms of intercoder reliability was ‘story/theme’ (kappa agreement: 0.62). The lower agreement is due to the blurry line between graphic additions and a theme. For example, is the addition of a penguin [35] a graphical asset, a story, or even an avatar?

4 Results

As shown in Fig. 1, a total of 1122 papers were retrieved from the database searches by using the search terms described in Sect. 3.1. After removing duplicates and filtering papers based on the inclusion and exclusion criteria, 99 papers were evaluated by considering their full texts. Twenty-seven papers focusing on both gamification and adherence met the criteria and were thus included in this systematic review.

4.1 Effect of Gamification on Adherence

The results as reported by the authors are summarized in Tables 1, 2, and 3: of the 27 studies, 33% reported a scientifically significant positive effect of gamification techniques on adherence. Additionally, 37% reported positive trends, but could not provide significant effects. Finally, 30% reported no effects at all, while no studies reported a negative effect. However, a decrease in adherence over time was reported by 26% [29, 30, 36,37,38,39,40], either to the intervention or to the gamification techniques themselves. For example, Bodduluri et al. [29]’s results suggest “that the motivating effects of gamification ‘wear-off’ and become boring as participants continue in a session that is lengthy, unless there is greater variety or progression of challenge in the task.” This is similar to Dugas et al. [36] who also state that their participants’ motivation diminished as the study continued, which resulted in decreasing adherence over time. Fotaris et al. [30] noticed the demotivating aspect of leaderboards as “students [...] began to lose interest once they trailed behind in the leaderboard.

Table 1. Overview of the terms used: ARI (A. studies specify intended use and provide a theoretical justification, B: studies specify intended use but do not provide a theoretical justification, C: studies neither specify intended use nor provide a theoretical justification), measurement variables, study design, sample size, duration, and reported outcome (++ significantly positive, + positive trend, = no effect).

4.2 Adherence Measurement Variables

As illustrated in Table 1, not adherence (15%) but retention (26%) was the term used most frequently. Other terms were compliance (19%), completion rate (15%), attrition (11%), effectiveness (7%), and dropout (4%). Scrutinizing the ARI, the majority of studies (70%) neither specified intended use, nor provided a theoretical justification. Instead, they followed a “the more, the better” approach. Just 19% specified intended use but lacked theoretical foundation. Only three studies both specified predefined use and provided some theoretical foundations. Cafazzo et al. [43] assess treatment adherence at baseline and post-intervention using the validated 14-item Self-Care Inventory [55]. A participant is adherent when they have three or more measurements, as “frequent self-monitoring of blood glucose (\(\ge \) times daily) is associated with better glycemic control among patients with type 1 diabetes.” Gremaud et al. [37] selected a 1250 steps per day threshold as a conservative estimate based on previous research that found adding 1385 steps per day resulted in significant reductions in multiple cardiometabolic risk factors. Finally, Leinonen et al. [10] followed the Finnish national recommendations for those in the age group of 13 to 18 years as at least 1.5 h of daily physical activity [56]. In the end, most studies simply calculated adherent behavior by measuring and comparing a quantitative, behavioral measure. This ranges from the daily average frequency of blood glucose measurements [43], the number of exercises completed [47], to the number of app sessions [40]. Even studies that conceptualized engagement as (part of) adherence still relied on behavioral measurements exclusively. For example, Stanculescu et al. [40] claim that “The average session length falls into the online behavior metrics and is a good indicative of user engagement.” Finally, some studies, such as the one from Dugas et al. [36] use a combination of variables to calculate adherence: “points were used to assess treatment adherence during the intervention. Points were allocated for achieving daily goals related to reporting and reaching target levels of glucose, exercise, nutrition, and medication adherence.

Table 2. Frequency of the gamification techniques used in the included studies grouped by reported outcome. Note that all studies use at least three different gamification techniques.

4.3 Gamification Techniques

A multitude of gamification techniques were being integrated to improve adherence as shown in Table 2. All studies used at least three different gamification techniques with on average 5.7 (± 1.8) techniques. The majority of the studies implemented points (85%) and feedback (67%). These points were often simple numerical values for an action or a combination of actions and were used in distinct forms. For example, Ryan et al. [38] use the step count as points, while Dugas et al. [36] use traditional points. Points are typically displayed on a leaderboard (63%) or used to calculate badges/achievements (52%). All four studies [31, 45, 47, 48] that did not use points focused on progress in a certain theme/story.

Feedback was also implemented in a great variety, either immediate feedback with pop-up messages, e.g., [54], or in the form of reports, e.g., [13]. Additionally, information visualizations [43], summary screens [11], or the ability to monitor results [50] were classified as feedback. Rewards were implemented with a similar variety, such as virtual rewards [31, 38, 54], candy [30], iTunes music [43], actual money [13, 50], physical trophies [51] or grades [13].

Another highly popular technique was the addition of social affordances (56%). Scase et al. [31] reported that the bonding aspect between participants helped encourage their participants to adhere. Like all other gamification techniques, social affordances were implemented differently: social networking [10, 31, 39, 43, 46, 54], teams [13, 30, 38, 51] opponents [11], social status [30], and communication features [36].

Challenges (30%) and levels (30%) were found to be the least commonly used gamification technique in the selected studies.

Table 3. The representative scientific fields of study of the included studies grouped by reported outcome.

4.4 Study Design Criteria

An RCT approach was used by 56% of the studies to study the effect of gamification on adherence, while 44% of the studies compared the gamified version to some baseline (see Table 1). A difference was most noticeable in the papers that reported no effect of gamification on adherence: six RCTs reported no effect, while only two baseline studies reported no effect.

Sample sizes varied greatly, ranging from 16 [35] to 1763 (284 versus 1479 control) [48]. The median sample size is 97 participants, while the lower and upper quartile lie between 30 and 200 participants. The duration also varied ranging from one day [35] (adherence measured as more usable trials) to one year [45] as shown in Table 1.

4.5 Scientific Fields and Gamification Studies on Adherence

As shown in Table 3, the Education, Medical & Health Sciences, and Psychology & Cognitive Sciences fields are well represented in the selected studies with ten, nine, and four studies respectively. Humanities & Social Sciences, Economics, Engineering, and ICT were also represented with one study each.

5 Discussion

We first discuss the results obtained with respect to the impact of gamification on adherence across disciplines. Next, we reflect on the extent to which the recommendations by Sieverink et al. [7] were found. We end the discussion section with a reflection on the specific gamification strategies used.

5.1 Impact of Gamification on Adherence

The results of our systematic review lend support to the hypothesis that gamification has a positive effect on adherence. Nineteen out of 27 studies report a significantly positive effect or a positive trend. We emphasize that we applied strict standards with respect to scientific quality, and only included peer-reviewed conference or journal papers, and limited ourselves to studies that (1) conducted an experimental design and (2) studied adherence as a research goal and not as a consequence of the methodology used. Moreover, sample sizes and intervention duration were high among studies, suggesting that research studies have been conducted in a scientifically adequate manner.

Nevertheless, we must remain cautious. Although none of the papers reported a negative effect, 26% mentioned some form of a decrease in adherence over time. This is in line with the work of Koivisto et al. [57] who found that the perceived usefulness of gamification declines with use. This novelty effect should be considered when evaluating gamified systems and services as it can skew the results [58] with respect to adherence studies. This also highlights the importance of specifying how adherence is measured.

5.2 Pre-specifying and Justifying Adherence Measurements

Our study also foregrounds that further debate is necessary on how to conceptualize and measure adherent behavior in gamification studies. As shown in Table 1, all authors used distinct terms and distinct measurement variables. This is of course a natural consequence of the different disciplines and research objectives. Yet, transcending the different disciplines, we found both intended use and empirical, theoretical, or rational justifications of intended use were mostly lacking. Only five studies did specify intended use [13, 31, 38, 41, 50], and only three [10, 37, 43] did provide a additional justification of the intended use [7].

Unfortunately, the lack in specifying intended use a priori, and the lack of providing an accompanying justification might introduce researcher bias, as a researcher is currently completely unconstrained in defining what and how to measure. Moreover, it is hard to compare effects of gamification on adherence when measurements variables and thresholds differ between studies. The lack of a pre-specified intended use and the lack of justification in the current studies urges us to remain cautious in making bold claims about the impact of gamification on adherence.

5.3 Gamification Techniques

We found that all papers used a combination of minimum three distinct gamification techniques, with on average 5.7 (± 1.8) techniques. At first sight, this large average number may indicate the maturity of the implementations. On the other hand, the frequent occurrence of points and leaderboards might also suggest that the included studies focus largely on the PBL (points, badges, leaderboards) triad [59]. Limiting gamification to the PBL triad may result in failing to capture what makes games engaging, which in its turn leads to ineffective systems, and this is fundamental criticism of the field [60]. Due to the large distribution of gamification techniques and the large variety in their implementation (and perhaps the quality), no conclusions can be made about adherence and its relation to specific gamification techniques used in the study.

6 Limitations

This study has a number of limitations that affect the contribution. First, the terms used in the search string might have impacted the results, as we did not include domain-specific constructs of adherence, such as customer loyalty and learner conversion. Such studies may in fact implicitly report adherence measurements when they report impact. In these studies, however, improving adherence is often not the main goal, and therefore were excluded from this study. Moreover, we based ourselves for our search string on the findings of Brown et al. [26]. We acknowledge that these terms could have induce a bias towards papers in the health domain. However, 37% of the studies were from the education domain, 33% of the health domain, and 15% from psychology and cognitive sciences. This suggest that we were able to include studies across disciplines.

Second, we did not score for ‘quality’ of the implemented gamification techniques or the gamified application. Gamification designers often critique the approach of using individual gamification techniques without acknowledging the quality of the implementation [61]. Moreover, gamified applications are perceived as ‘gestalts’ by their users, they are perceived as one whole, rather than a mere atomistic addition of gamification elements [60]. Hence, future research may attempt to include measures of quality of implemented gamification techniques as perceived by end-users.

7 Conclusions and Future Work

This paper first explored the concept of adherence and presented a tailored definition for the gamification domain as “the extent to which individuals use a gamified service or system and experience its content, as defined or implied by its creators, in order to derive a certain benefit.” Next, it reports on a systematic literature review summarizing the published research on the effect of gamification on adherence, across disciplines. Twenty-seven papers focusing on both gamification and adherence, and including empirical measurements, met the criteria. The results of our systematic review lend support to the hypothesis that gamification has a positive effect on adherence. However, our results also suggest the need for a more refined conceptualization, as well as a set of standard elements to be present in any gamification study on adherence: (1) the ability to measure the usage behavior of individuals, (2) an operationalization of intended use, (3) an empirical, theoretical, or rational justification of the intended use.