1 Introduction

The procedural generation of complex and believable narrative has long been a subject of research, with much of it focusing on the generation of full and coherent narratives, following the tradition of Tale-Spin (Meehan 1977). Although the quality and coherence of these generated artifacts have increased significantly since Meehan presented his seminal work, even the most advanced systems such as GPT-3 (Brown et al. 2020) still produce output that, although very impressive and convincing, is still fairly easily distinguishable from human output. Furthermore, these highly advanced generators require significant computing resources and extensive data training sets to achieve a sufficient level of believability—to a degree where it is very difficult to understand the underlying model creating the output of the algorithm (Brown et al. 2020). This makes it difficult to integrate these algorithms into games, and to test them as part of a greater user experience. Furthermore, these complex approaches trades off a large amount of user control in favor of a more complex output, an aspect of Artificial Intelligence (AI) technologies that has been raised as a concern for game developers (Johansson et al. 2012). Thus developers may have to design and build games around these algorithms and technologies, rather than using these them to improve their intended game design. As such, a simpler approach may be more favorable.

The aim of this chapter is to highlight a different approach to generative narrative—an approach that may be more amenable to the current design approaches used in the games industry, where it has already seen some success. In this chapter, I will describe what I call naive narrative generation—an approach aimed at lowering the complexity of generative content, and outsourcing parts of the narrative building onto the player. This approach is akin to the storytelling approach used by fairy tales or old folklore, called bardic storytelling by Murray (1997). I will also introduce the concepts of high-fidelity and low-fidelity generation, discussed from the perspective of narratives. These concepts will be used to describe how different approaches to generative narratives influence how stories are conveyed to the player of the game, and how these narratives in turn influence the player experience. I will also highlight why these techniques and approaches warrant further study from the academic community, and how they can be used to make academic research more relevant to practicing game developers.

2 Naive Narrative Generation

At the core of this chapter is the concept of naive generation, and particularly as it pertains to narrative. The concept borrows from the computer science definition of a naive algorithm, i.e. an algorithm that is very simple and uses a very direct and uncomplicated approach. Computational naivete is characterized by the process having very few constraints and rules, and little consideration is given to factors like speed or solving the problem in a clever way. Applied to content generation, naivete instead means putting less emphasis on capturing all the perfect nuances and depth of the generated artifact, and instead focusing on producing an easily understood generator that can be built quickly and efficiently, trading off shorter development time for a less complex solution.

If we apply the idea of naive generation to narratives, we aim to create very simple snippets of narrative. Where GPT-3 (Brown et al. 2020) creates a story several pages long, simpler systems such as Tale-Spin (Meehan 1977) create a story that can be contained in a single paragraph. In both of these cases, the story being created is still highly interconnected and each sentence builds on what has happened before, if to a limited degree in Tale-Spin, on account of its age. A naively generated narrative would be a loose collection of sentences that fulfill the barest level of interconnectedness. As an example, consider the following story: The Myth of Torg the Barbarian: Torg the Barbarian was orphaned at the age of 3. Torg the Barbarian fought the Dragon of Helmsford. Torg the Barbarian was crowned the ruler of Helmsford. In just three short sentences we have told the Myth of Torg—starting with the humble origins of Torg The Barbarian, describing his great deed in fighting the Dragon of Helmsford, and ending with him being crowned the ruler of the same town. As a coherent story this is not very impressive, and does not even reach the level of interconnectedness of the underlying model of Tale-Spin, a 40 year old system. However, by reducing the amount of detail and contextualization we have neatly side-stepped many of the problems of narrative coherence. Note that we have no information about Torg the Barbarian, who he is as a character, and why he was actually crowned the ruler of Helmsford. Furthermore, we know next to nothing about the Dragon of Helmsford, nor about Helmsford as itself. However, upon reading this very simple narrative we may still get a sense of the tragic and heroic character Torg the Barbarian, and his valiant defense of Helmsford. None of these things are mentioned in the story, but there is a wide open narrative space for the reader to extrapolate and add their own embellishments. By providing narrative in this simple form, we prompt the reader to apply their own creativity and imagination to fill in the blanks of the narrative.

By all reasonable means this narrative format should not be engaging—and it usually is not beyond this little toy example. The very formulaic nature of the output from this hypothetical generator is very repetitive and unlikely to be engaging in the longer term. This type of ultra-simplistic narrative is also a prime example of what Compton (2016) describes as the 10,000 bowls of oatmeal Problem, i.e. that a generator can easily spit out 10,000 different variants of something, but they are all more or less the same and provide very little interesting content. However, when it is presented in this simple form the Myth of Torg actually turns into a story. This phenomenon, where simple snippets of narrative is turned into a larger story, is also something we see in fairly successful games—for example RimWorld (Ludeon Studios 2016), Dwarf Fortress (Bay 12 Games 2006), and Caves of Qud (Freehold Games 2015). The style of narrative presentation used by RimWorld, Caves of Qud, and Dwarf Fortress was also common in older games, for example Sid Meyer’s Colonization (Microprose 1994), developed when technical constraints made it difficult to achieve the level of portrayal used in modern AAA games. We also see this style of narrative presentation being successfully used in some AAA games, for example the Crusader Kings series of games (Paradox Development Studio 2012, 2020). But why does it work? To answer that question, we must first explore the concept of fidelity and how it applies to generative content, including narrative.

3 Fidelity

In this chapter I use the concept of fidelity to describe the sensory complexity and detail by which game content is presented. A high-fidelity game puts heavy emphasis on having state-of-the-art graphics, detailed character animations, a complex soundscape, and generally very finely detailed game world. Conversely, a low-fidelity game tends towards more abstract representation with simpler graphics (perhaps even 2D), less advanced animations, a simpler soundscape, and overall less detail in the game world. Examples of high-fidelity games include Assassin’s Creed: Valhalla (Ubisoft Montreal 2020), Cyberpunk 2077 (CD Projekt Red 2020), and Grand Theft Auto V (Rockstar North 2013). Examples of low-fidelity games include the aforementioned RimWorld (Ludeon Studios 2016), Dwarf Fortress (Bay 12 Games 2006), and Caves of Qud (Freehold Games 2015), as well as older games that were restricted by the technology of the times, for example Sid Meyer’s Colonization Microprose (1994). Fidelity, however, isn’t strictly binary. Games such as the Crusader Kings series (Paradox Development Studio 2012, 2020) occupy a middle ground in terms of fidelity. They are not as sensory complex as the high-fidelity games, but they are also significantly more complex than the low-fidelity games. Thus, fidelity is not a binary value and should instead be measured on a scale, where different games occupy different places on the scale.

If the game reinforces a low-fidelity experience by presenting the player with a less complex sensory experience (i.e. simpler graphics, less complex audio environments, and the like) the player is then conditioned to expect a certain experience in all aspects of the game. Thus, the game can “get away” with a less complex representation of, for example, Non-Player Character (NPC) behavior. In a game like Assassin’s Creed: Valhalla (Ubisoft Montreal 2020) the player would expect a more complex representation of a greeting between characters where each of the parties is represented using a highly detailed 3D model, and the characters are shown clasping forearms and conversing though pre-recorded sound files. As a contrast, RimWorld (Ludeon Studios 2016) the same type of event is represented by moving the characters next to each other and showing speech bubbles with iconography above their heads. These representations are vastly different, but end up being conducive to a positive player experience in their respective games. The reason for this is that both games make sure to situate the player’s experience using the game’s level of fidelity.

Fidelity influences how the player’s understanding of the game is situated, i.e. how that understanding is affected how the player’s preconceived notions of the game influence their expectation on their gaming experience (Warpefelt 2020). As described by Kultima and Stenros (2010) the player’s experience begins before they even start playing the game, and involves things like advertising, concept art, and trailers for the game. Even at this early stage the player starts establishing expectations on what kind of game this is going to be like, and that process carries on into, throughout, and between play sessions. Because of this, it is critical that the game continuously feeds back into the player’s understanding of it, and reinforces the experience that is intended by the designer. Part of this process is the fidelity by which the game is presented, and how that sets a certain expectation (Warpefelt 2020). Borrowing from the field of human-computer interaction, the game must continuously reinforce that it has a certain character (Janlert and Stolterman 1997), i.e. a collection of certain characteristics that cast is as being a recognizable form of game—not only in terms of genre but also in terms of complexity of presentation. Thus, the game should continuously reinforce the level of fidelity of the game’s presentation. If it fails to do so, the game may be experienced as discordant and confusing. Conversely, if the game manages to reinforce the level of fidelity and properly temper the player’s expectation, the game experience will seem more harmonious.

3.1 Generative Fidelity

Generative fidelity is fidelity as applied to generative content. In terms of generative narrative, the fidelity of the presentation of the game also sets expectation that the narrative will have a high-fidelity presentation. Because of this, a game like RimWorld can again “get away” with a much simpler representation of the narrative. In RimWorld there is comparatively little pre-programmed narrative, and instead the narrative of the game is presented through a sequence of short texts describing events, and through in-world events. Conversely, games like Assassin’s Creed: Valhalla has a large amount of pre-programmed narrative snippets in the form of NPC behavior, cut scenes to provide story, or world events.

3.2 Summarizing Fidelity

To summarize, fidelity is the degree by which a game provides a detailed representation of content. The level of fidelity influences the expectations of the player on how the content is going to be presented, and as the fidelity of graphical presentation increases, so must the fidelity of narrative presentation. Conversely, a lower fidelity graphical presentation can also allows a game to maintain a lower level of narrative while still making the narrative seem believable. In the following section we will discuss some of the problems associated with procedural generation, and how they can possibly be sidestepped by reducing the level of fidelity of the generated content.

4 Limits of Narrative Generation

There are a few common issues with procedural content generation. In addition to the aforementioned 10,000 bowls of oatmeal problem (Compton 2016), we also have the twin problems of expressive range described by Smith and Whitehead (2010) and the Kaleidoscope problem described by Cardona-Rivera (2017). Lastly, we have the problem of the Black hole of AI and the trade-off between complexity and control of game design described by Johansson et al. (2012).

As previously discussed, the 10,000 bowls of oatmeal problem (Compton 2016) describes the fact that a it is trivially easy for a generator to create many different combinations of things, but that few of them are actually interesting—much like oatmeal. Even if you create thousands of different bowls of plain oatmeal, the problem remains that plain oatmeal is just bland. As an antidote to this problem, Compton (2016) suggests two degrees of uniqueness for generated artifacts: perceptual differentiation and perceptual uniqueness. Perceptual differentiation is the less difficult to achieve of the two, where each generated artifact is perceptually not identical to the next one, but not necessarily entirely unique. Compton exemplifies procedural differentiation using trees: if the trees are all identical or insufficiently varied they will not be seen as natural by even a casual observer, but if they fulfill a basic level of variance they can still be different enough that they fulfill the aesthetic needs of the game, despite each tree not being unique and memorable. An example of this principle in practice would be SpeedTree (IDV 2021), a technology commonly used in games and cinema to generate various forms of vegetation. The various trees used by SpeedTree are not necessarily entirely unique, but they are varied enough because players generally do not scrutinize the foliage to a degree where uniqueness matters. The more difficult level described by Compton (2016) is perceptual uniqueness, where each generated item is distinct and unique from the next—or characterful as Compton describes it. Compton does not further define the concept of characterful, but by connecting to the concept of character presented by Janlert and Stolterman (1997) we can describe it as a collection of characteristics that make the generated object distinct from other objects in the game. Compton (2016) uses the metaphor of the object being the lead actor instead of an extra in a movie, and notes that not everyone can be the main character, which also matches the concept of character as described by Janlert and Stolterman (1997). A unique object set against a background of more homogeneous objects will stand out more than a unique object lost in a sea of other unique objects. Thus, there exists a need for perceptually different as well as perceptually unique objects. In a concrete example, a game may use SpeedTree to generate the vast majority of a forest, and then use uniquely designed trees as visual markers to guide the player through the level or to provide narrative told through the environment of the game world (Fernández-Vara 2011).

Applied to the space of narrative, the 10,000 bowls of oatmeal problem captures the fact that it is trivially easy for us to generate narratives along the lines of The Myth of Torg that was described above. Each narrative runs the risk of becoming much like the next, and there is seemingly very little of neither perceptual differentiating nor uniqueness in The Myth of Torg style narratives. Fortunately, the player understanding of narratives is situated by the fidelity of the game. As we have seen in games such as RimWorld (Ludeon Studios 2016) and Dwarf Fortress (Bay 12 Games 2006) small snippets of narrative like those presented in The Myth of Torg can still achieve believability of they are properly situated with an appropriate level of fidelity—i.e. if the game sets the player’s expectations to be in accordance with the content and how it is presented. Thus, if the player’s expectations are set in accordance with the complexity of the narrative it will still be perceived as believable and engaging—an effect that we see in the aforementioned games.

However, the problems described by Compton (2016) are fundamentally related to the Kaleidoscope Problem described by Cardona-Rivera (2017), and the problem of expressive range described by Smith and Whitehead (2010). Where Compton discusses what makes generated content dull or interesting, Cardona-Rivera as well as Smith and Whitehead provide the mechanism by which this happens: once the player starts finding patterns in the generated content they can quickly discern what the generator is capable of creating, and that breaks the illusion of the game. This, in turn, causes the breakdown of various aspects of the gaming experience, such as immersion (McMahan 2003; Ermi and Mäyrä 2005) and suspension of disbelief (Coleridge 1817). Even if the narrative presented by the game is properly situated with an appropriate level of fidelity, the player experience may still be negatively affected if the player is able to perceive the borders of the expressive range of the generator.

The obvious solution to this problem is to apply more technologically advanced solutions to create higher quality generated experiences. However, there are two problems with this approach, as described by Johansson et al. (2012). Game developers are hesitant to implement technologies that take away their control over the design of the game, what Johansson et al. (2012) call trading complexity for control. Furthermore, spending work hours adding more AI technology to a game is often subject to severe diminishing returns—something Johansson et al. (2012) call the Black Hole of AI. As stated by Johansson et al. (2012), players are unlikely to notice the more advanced technology due to diminishing returns, and the implementation of those features may have come at the expense of other parts of the game. As AI models grow more complex they grow more expensive to use, and they become more difficult to implement. Because of this, integrating them into a game design becomes very time consuming. Both of these factors add to the development costs of a game, something that is already problematic for the games industry (Tschang 2007). Thus, the financial and labor realities of game development projects favor less complex and more predictable generative approaches—or in our terminology more naive generative approaches. However, as we see above these approaches come with their own design constraints in terms of expressive range and the need for situation.

5 The Role of Naive and Low-Fidelty Generation

Given the problems and trade-offs associated with implementing generative content in games, it is clear that there exists a large need for future research into how we apply generative technologies in games. Games already make extensive use of generative technology (as seen in the examples earlier in this chapter) but as described in Sect. 6.4 there are also distinct problems associated with generative content. As previously mentioned, one area that has been (and remains) especially problematic is the generation of complete narratives. Fortunately, this is where a naive and low-fidelity approach to generated content may be useful.

One approach game developers can apply to alleviate the issues of narrative generation is to use the inherently emergent nature of game narratives to their advantage, especially when implementing naive and low-fidelity generative technologies. The narrative of a game is not simply a linear experience presented to a passive consumer, but instead the player takes an active role in the gaming experience. Calleja (2009) presents a division between the scripted narrative, which are the parts of the game narrative that are hard-coded into the game, and the alterbiography, which are the player’s own first-person story of how they played the game. In the case of generative narratives, the scripted narrative is what is created by the generative process, whereas the player’s alterbiography remains the player’s own story. Thus, the goal of the game in terms of narrative is to provide the relevant scripted narrative pieces to help build an engaging alterbiography for the player.

What I advocate in this chapter is essentially taking what SpeedTree (IDV 2021) does for vegetation generation and applying it to narrative generation. The vast majority of the content created by SpeedTree is not unique or a “main character” as per Compton’s terminology. Instead, it forms a distinct but still fairly uniform background on which the more distinct features are set. Similarly, there is a lot of content in games that exist to create world building. As noted by Bartle (2004) and later Warpefelt and Verhagen (2016), this is the role already played by one of the most prominent AI-powered feature in games: NPCs. As described by Bartle (2004), and by Warpefelt and Verhagen (2016), NPCs play a large role in making game world seem alive. If they are presented in coherence with other diegetic game elements, they become part of the game’s environment, which is used to enhance the gaming experience and provide narrative to the player, or what Fernández-Vara (2011) calls indexical storytelling. By simply existing in the game world, NPCs help build the narrative of the game.

If we apply this reasoning to other generated artifacts in the game world, we can create the fond of less perceptually unique but still differentiated content that acts as a backdrop to larger, more unique events—thereby “thickening” the narrative experience of the player—essentially providing the scriped narrative described by Calleja (2009) through the mechanism of indexical storytelling described by Fernández-Vara (2011). This then influences and supports the player’s construction of their alterbiography (Calleja 2009). Described using the theory presented by Barthes and Duisit (1975) the content created using naive and low-fidelity approaches thus acts as catalyst for the player’s interpretation of the narrative, and provide a of narrative hints that help the player integrate various forms of key events, or what Barthes and Duisit (1975) call cardinal functions. Between these cardinal functions are gaps in the narrative that need to be filled by the player, what Barthes and Duisit calls interstitial gaps. The filling of these gaps is performed using the mechanism of indexical sotrytelling (Fernández-Vara 2011), which the player uses a catalyst (Barthes and Duisit 1975) to create their alterbiography (Calleja 2009).

As an example of how this type of narrative presentation has been done in practice, consider RimWorld (Ludeon Studios 2016). RimWorld is a survival game where the player is in charge of a small collection of colonists trying to survive on, and eventually escape from, a hostile alien planet. The narrative of the game is mostly driven by events, or what Mateas and Stern (2007) would call story beats, set against a background of longer periods of peaceful existence. These events (or beats) occur on multiple levels: a narrative director will by default throw events at the player to keep tension at an engaging but reasonable level, weather changes can either be a blessing or a curse on the player, and colonists may experience various emotional states (nervous breakdowns, falling in love, depression, contentment, or even death) that can cause events. The player’s colony may develop friendly or hostile relationships with neighboring tribes or colonies, and the lives of the colonists. Each of these events are only loosely connected to each other, but over time the player starts inferring connections between these events and creating meaning from them, making the player build an alterbiography of the colony as they experienced it.

In a sense, the approach to narrative used in games like RimWorld is very similar to how Murray (1997) describes bardic storytelling, i.e. as a set of formulas within formulas used in conjunction with variable content to create reasonably novel stories by essentially plugging in names and events to weave narratives. Some of these narratives grew to become very formulaic, as can be evidenced by the work of Propp (1968) on the decomposition of folk tales into simple grammars.

In games, naive and low-fidelity generated content takes on the role that helps build the atmosphere in the game. The mechanism that makes naive and low-fidelity generation work for this purpose is that it presents familiar patterns in familiar combinations, similar to bardic storytelling (Murray 1997). It should be noted, however, that naive and low-fidelty narrative generation not necessarily simple random substitution, but instead a curated selection of alternating different contextually appropriate options, again in line with how Murray (1997) describes bardic storytelling. However, a naive approach does not necessarily mean that the implementation is simplistic at the cost of all other aspects. Each event must be properly situated within a context that makes it legible to the player, and conducive to an engaging gaming experience. However, when we take a low-fidelity approach to generation that context can be more or less rigorously defined, and using a low-fidelty presentation can help spark the player’s own creative process and make them fill out the gaps in the narrative by integrating it as a part of their alterbiography (Calleja 2009).

6 Conclusion

To summarize, current high-fidelity approaches to the procedural narrative generation, like GPT-3 (Brown et al. 2020), are technically costly and difficult to use, and because of that they are likely to be less useful for game development (Johansson et al. 2012). Furthermore, developers who use these advanced techniques need to trade off control over their game design, and will have to invest extensive development time into making them integrate with the game, as described by Johansson et al. (2012).

A naive and low-fidelity approach to generating narratives may be more applicable in many cases. Using a naive generative approach allows for a greater control over the content presented in the game. This approach may also have benefits in terms of difficulty of integrating the generator into the game design. Furthermore, a naive and low-fidelity approach may help build believability for generated narratives created using a more high-fidelity approach by providing more context for the more complex narrative, and acting as a “thickening agent” for the overall gaming experience. In the words of Barthes and Duisit, naive and low fidelity generation creates integrative content that acts as catalytic (Barthes and Duisit 1975) and indexical markers (Fernández-Vara 2011; Warpefelt 2020), and lets the player infer the cardinal functions (Barthes and Duisit 1975) of the story. As described by Barthes and Duisit (1975) the interstitial gaps between cardinal functions can be infinitely filled with content. In the case of naive and low fidelity narrative generation we simply provide less scaffolding and rely more heavily the player’s catalytic filling process.

Although much research into generative content has been focused on complex tasks such as full game generation or the generation of highly complex narratives, there has been comparatively little focus on how to help developers create the vast areas of the game that need to be what Compton (2016) calls perceptually different, but not necessarily what she calls perceptually unique. These less diverse and unique parts still play a critical role in supporting the player experience, and the generation of them and their role in contributing to the narrative of the game is currently understudied, despite these elements commonly featuring in various forms. In a talk at the Game Developers Conference Abernathy and Rouse (2014) also make the point that players tend to more strongly recall the parts of the game experience where the narrative intersected with game play rather than the big story beats and game plots presented by the game. This lends further strength to the argument that focusing on the naive and low-fidelity generative parts may be a productive approach if we want to increase the relevance of academic research to industry practice.

Thus, this chapter makes the case for naive and low-fidelity generation of content. Note that this is not simple random substitution, and that a naive approach still need the generated content to be contextually relevant. However, the required level of contextual relevance can be managed and contained by using a low-fidelity approach to generation, which enacts the player’s own creative process.

Given the relative proliferation of low-fidelity presentation of generative content and narrative in existing games there has been comparatively little research into how that type of technology supports game design. Granted, naive approaches are likely not exciting from the perspective of pure technical research, but they may still be interesting from the perspective of game design research. Given the costs associated with developing games (Tschang 2007) and with integrating AI technologies (Johansson et al. 2012) it would be beneficial to investigate what constitutes the minimum viable level of generative content, and how that level changes depending on the fidelity of the game’s presentation.