Keywords

1 Introduction

1.1 Overview

Since the advent of Mixed and Augmented Reality systems, their use in conveying digital information for museums and historical sites has been one major application area [6, 23], next to maintenance support systems [17]. With varying hardware and varying principal software components, several museum applications were conceived in research projects over time [4, 9]. The connection of the technology with location-based games [3] and storytelling [11, 12] has been recognized as an opportunity for learning and entertainment before the term ‘serious games’ was even coined. The existing examples range from research prototypes never used outside labs up to evaluated systems. Their differences often lie in the degree to which a concept is readily applicable. Moving from the labs towards usable systems, requirements change. Technology must be invisible and functional; story and interactive system must be fully integrated. AR systems – even those that are supposed to help people – are often faced with complex and unfamiliar user interaction styles that have to be learned by novice users in the first place. A meta-study of user experience research with AR systems has pointed out that a majority of evaluations have been conducted with technophilic male test subjects [2].

However, within the last three to five years, personal handheld devices with a reasonable screen size and a variety of sensors have been getting ubiquitous as a platform for gaming and entertainment, including mainstream concepts for museums or tourism.

In this paper, we report on the project SPIRIT, in which a system and a case study have been developed as a running prototype that has also been finally evaluated with consumers as visitors of an outdoor museum. Within the project, in order to develop a full experience, several aspects have been tackled by an interdisciplinary team:

  • The SPIRIT player app: An Augmented Reality system, running on off-the-shelf mobile phone and tablet hardware (Android), sensing the environment with GPS, camera, and gyroscope/accelerometer as well as Bluetooth signals from beacons, and using image processing for recognising locations in the unprepared environment

  • STARML: A formalized content structure as XML dialect to be authored by storytellers or game designers, extending the existing location-based standard ARML [15] with elements for plot- and GUI management, thus enabling interactive storytelling and game design for creators

  • A plot engine reading the content structure, managing the presentation in time and in relation to contexts and variables

  • Concepts for user interaction, resulting in several ideas for interaction patterns making use of the big variety of available sensors, providing guidance and learnability for users, tested in formative evaluation cycles

  • A case study with an example fictional scenario story (“Spirit: Aurelia, Saalburg AD 233”) that fulfils requirements of being adequate to visited locations and connected to historical facts, running with one interaction pattern fully elaborated with a tested GUI.

In summary, the result benefits from several technical developments as well as utility solutions by solving design challenges. The achievements consist of a playable case study prototype with location-dependent content that can be experienced at a specific historical outdoor site (the Saalburg Roman Fort [20], see Fig. 1), and of a functional extension concept allowing authors and future developers to create similar experiences. In the following, the paper is organized along the line of addressed aspects that are finally integrated in the holistic user experience. We summarize each of the individual solutions by also explaining their rationale and their relation to the state of the art. In each section, we also discuss the system’s constraints for scenario designs, and we show possibilities for extending the system in the context of serious games and interactive storytelling.

Fig. 1.
figure 1

User holding up a tablet, having tracked down spirits of Roman soldiers (of the year AD 233) at the “Principia” building (of now), who discuss the recent attacks of the Germanic tribes (Videos including artist credits are available at https://www.youtube.com/user/hsrmstory).

2 Design Constraints Set by the Augmented Reality System

2.1 Overview

At first, we describe our developed AR system in order to later discuss its influence on content design. Grounded in the original idea for the applied research project, the whole experience and technology is designed around the metaphor of ‘meeting spirits of the past’ at a location where the referenced historic characters lived their life. Hence, the system as well as the premise for creating content is first and foremost location-based, with the claim to present matching media triggered by users’ orientation in the environment.

This was the motivation to base the structure on principles defined by the Augmented Reality Markup Language ARML of the Open Geospatial Consortium [15], which is also used by other outdoor spatial AR applications, such as Wikitude [24]. The technical concepts of the AR player application have been described in previous papers [5], so only a brief summary is given here. ARML describes visual assets that overlay features as representations of the real world. The Spirit player extends the ARML-defined asset media types by the use of video. Particularly semi-transparent video with cutout effect (see Fig. 1) is obtained through rendering the video stream as a SurfaceTexture using the MediaPlayer Android API and modifying it using Fragment Shaders. The videos need to be pre-produced using green/blue-screen technique with chroma-keying and some post production, to create the ghost-like illusion of ‘floating in thin air’. Further, ARML specifies an anchor for triggering augmentation of a real world feature, distinguishing between a location anchor and a trackable anchor. GPS data and (for indoor environments) beacon signals are interpreted to recognize users’ locations.

Further, a video-based tracker has been implemented using OpenCV and the ORB algorithm [19] for matching camera input with stored reference images. The reference images can be a series of photographs of a real location, taken under different lighting conditions. Gyroscope, tilt/inclination sensors can also be used to trigger media. Consequently, mobile devices to be used for the application (phone or tablet) need to be equipped with a rear/main camera, GPS, gyroscope, tilt/inclination sensors, and the graphical capability of displaying AR video. Bluetooth/BLE is also useful for indoor navigation. For our prototype, we use Android with Java for direct hardware access to above components, further OpenGL and the OpenCV library for image detection.

In relation to the state of the art in Augmented Reality systems, our system does not address 6DOF tracking and registration of 3D objects, which in an emerging technology, but still a challenge in unprepared environments [1, 10]. However, compared with the 2D concepts of for example Wikitude, the Spirit player includes reasonable advancements, integrating sensors such as the gyro for more complex spatial interactions, and providing the interface to the STARML markup language and the plot device described in the next chapter.

2.2 Discussion

The system has proven to recognize locations, images and orientations quickly and effectively in the outdoor environment, even under differing lighting conditions. On the other hand, there is a certain level of imprecision in the exact superimposition of media assets. Fortunately, for displaying ‘spirits’ floating in thin air, this can be negligible. Thus, the base technology constrains creators to think mainly of communication scenarios with virtual characters, being displayed from one side without a 3D impression.

At this point the system is imperfect, yet “satisficing”Footnote 1. Asking for the added value that AR could contribute to the user experience in a (partly) educational setting, we found that it is exactly a connection to reality, which marks the difference towards more conventional media. Two main benefits especially for museums could be associated with these properties: (a) AR displays explanations and illustrations directly on physical objects, to let users see correct relationships between explanation and object parts; and (b) AR facilitates the potential experience of an ‘aura’ of ‘the real’, to encourage people to visit historical places [12].

The SPIRIT system clearly caters for this latter level of amazement in order to be rewarding, entertaining and motivating, while precise superimposition of visual features is less important. However, designers of games and location-based storytelling would need to work and ‘write’ around this constraint. This means that exact positions in the outdoor environment are hard to be referenced within a game or story, as the system is best suited for communicating dialogic scenes with ‘spirits’ integrated visually in a camera background.

3 Formalized Content Structure and Plot Engine

3.1 Overview

Based on this technology including the ARML standard, we developed an extended formalized content structure as well as a plot engine that manages the different sensorial contexts (i.e., interpreted user behavior) and provides situation-dependent adequate pre-authored storytelling content.

We extended the OGC standard ARML and developed it further into so-called STARML (Storytelling Augmented Reality Markup Language), which we described in a preliminary version in [7]. Basically, this includes author-friendly terminology to support the description of a more complex stage for action. For example, the declaration of GPS coordinates is integrated in tags to define “active areas” (see Fig. 2). Within an active area, file paths to reference images, such as photos of the environment, can be declared to define certain so-called “backdrops”, in front of which pre-recorded video storytelling can be staged (compare Fig. 2). In order to organize scenes with storytelling and/or game progress, several “chunks” of video pieces can be declared that can be concatenated by the plot engine into a linear presentation, yet interruptible between single video files. Given that authors then can include more preconditions into certain “chunk” elements, authoring of nonlinear content presentations is possible. For example, as illustrated in Fig. 2, authors can distribute several groups of spirits around one user location. These could also have a fictional conversation with each other in one scene, provided that the user turns around – stepwise – to visualize them. This is foreseen to be managed by interrupting and resuming chunks of videos. Layered parallel to videos, sound files can be declared (for example to support continuous ambient sound independent of the user’s triggering).

Fig. 2.
figure 2

Location-dependent content structure of an “active area” including a declared GPS coordinate (user/tablet position), a prepared “backdrop” image to start a scene (here in front of the eastern gate), as well as a gyroscope condition triggering further video files when a 90° turn to the right is recognized by the tablet. Note that spirits are not staged ‘at’ the GPS coordinate, rather ‘roughly’ in front of a backdrop or in a viewing direction.

STARML has been extended to also include descriptions of more interactions. For example, authors can declare content working only on the condition that the gyroscope signals a turn to one side by the user. Another interaction possibility that can be declared is a timer that supports the design of a “timed-gaze” input selection or similarly, timed proximity (e.g. proximity to a beacon placed in the environment).

Lastly, STARML can also be used to declare dynamic graphical user interface (GUI) elements, such as buttons, dialog boxes, graphics, and subtitles. ‘Dynamic’ here refers to the feature that GUI elements can optionally be handled as plot-dependent content elements, similar to videos, except that they have screen coordinates (while videos move according to the device movement, staying adjusted to the camera image). Other dynamic elements are plot-dependent text notes that can be read as one goes or in between, such as for including fact information. This could also be feedback for users on their achievements, such as an inventory of spirits met, and a tour history.

The plot engine interprets all STARML elements, checks preconditions after each event and determines the next possible action or event to be played. Implemented like a state machine, it processes propositional strings as pre- and post-conditions to advance the plot. These concepts have been inspired by narrative structures used in previous projects of the interactive storytelling community [13, 16, 22], and adapted to be specific for location-based Mixed Reality using implicit sensors next to deliberate choices of players.

3.2 Discussion

STARML as a content structure enables the separation of player app and content, thus, to extend the system by authoring alternative content. This XML dialect had to be developed closely together with the performed case studies of content ideas. It also has been changed constantly during iterations of user interaction design cycles and formative evaluations, as this influenced user requirements concerning GUI elements and their timely behavior possibilities. As a conclusion, we expect that with further ideas for redesign or content adaptations, it is likely that STARML would need extensions again, as consequences of authoring ideas are hardly foreseeable before they are tested with users. Therefore, many aspects of the case study, such as the STARML content structure, the interaction design and the concrete story with its dialogs, were developed at the same time, respectively in iterations. As a matter of course, this also meant that the plot engine was under constant change. In the interdisciplinary team, consisting of software developers, a creative writer, and interaction designers, this was partly also a source of friction, due to hen-and-egg situations. For further extensions of the system at the content structure level, these mutual dependencies have to be taken into account.

In the case study, a creative author was involved to write compelling dialogs and an interesting story about deals, corruption and the life of families during the Roman occupation on Germanic territory (around AD 230). Visitors of the Saalburg stroll about the area and encounter parts of the story, dependent on found locations. The STARML possibilities of authoring non-linear dialogs have not been fully exploited by the authoring team (yet). Besides conceptual challenges for ‘making’ the story, this would have also posed more difficulties on end-user interaction, which will be described next. For the sake of getting the case study up and running with non-professional users, keeping it simple (thus linear) was important in the first instance. However, in currently starting future work, this challenge will be tackled again.

4 User Interaction Patterns

4.1 Overview

One of the first visions of this project was to fully realize the metaphor of just meeting spirits of history, possibly with no profane GUI elements that remind at any operation system, but directly using the sensors and camera of the mobile device. However, without any markers in the environment, users were not able to get a clue about where to point their device at, or where to go. Increasingly over the time of design and development, GUI elements were added together with several ideas to simplify or constrain the huge variety of possibilities.

Interaction design was done in many iterative cycles of designing and testing with users. In order to test preliminary ideas before the AR system was fully implemented, we developed a mockup tool (MockAR [8]) that allowed the design technique of wireframing with AR overlays. Further, previous prototypes approached the interactive story experience by using graphical stand-ins and voices of the authors, before the final bluescreen shooting (which involved most of the costs) could happen. As far as possible, tests were done in the office before traveling to the location. In this case, excluding the aspects of GPS and search, we mounted posters with backdrop photographs on office wall corners for experiencing alternative solutions of content and GUI design. Getting closer to the current solution, 20 subsequent test cycles with fresh users have shown that the pattern of interaction is still not easy to learn if nobody demonstrates it. However, it is easy to remember. As a solution, a tutorial has been designed that can take the role of demonstration.

The result is an interaction pattern that has to be learned at the first location, and can be repeated at all further locations. The following user actions are part of the pattern, repeating a certain order of actions.

  • Searching for a next location, with the help of the map or with the help of a stencil, which represents a memory image shown by the spirit Aurelia in the Saalburg story. User tests have shown that both ways of searching were preferred by different people, and both were found to be entertaining.

  • Aligning the stencil, which matches the finding of a “backdrop”, leads to the trigger of the first video available in an active area (compare Figs. 2 and 3 left), thus starting a scene.

  • Eventually, arrows occur on the side of the screen, as an affordance for the user to make a quarter turn in the indicated direction. This causes a next part of the dialog to be triggered. In user tests, this was the most critical part to adopt for most of the users, because the turning movement needs to have a certain dynamic to be effective for the gyro sensor. Once learned, it can be repeated easily.

  • This can repeat for a while, depending upon the authored content structure at one location.

  • In the particular Saalburg story, after all content elements of one active area are played out, the main spirit Aurelia turns to the user and shows a memory image of the next location (“active area”).

  • Then the user can walk and search for the next location and repeat, until the end of the story.

  • In addition, it is possible to access the menu at the bottom of the screen anytime. It provides access to historical facts information, suited to a current fictional scene (see Fig. 3, right). Further, users can check the spirits they met so far and see their current progress about already visited locations. Finally, the menu contains a button to access the map.

Fig. 3.
figure 3

Left: Search screen with memory stencil overlay fitting the eastern gate backdrop. Middle: Triggering of a scene, including a subtitle and an update notification of “new facts” (at the bottom) associated with the scene. Right: Interaction with the facts menu in reading posture.

The project could hardly make use of existing design principles for mobile Augmented Reality, except for generic principles like those of affordance, feedbacks and constraints [14]. For this particular application area, there is little reference material, as many applications include AR as a very brief interaction, instead of one to directly experience stories with. Concepts for games with more proportions of AR interaction have specific interactions tuned to the particular game design [18], which is a similar case for the SPIRIT project.

4.2 Discussion

Interaction design is closely related to the way of interactive storytelling, as it defines possible user participation or influence on the story. The previously described system of sensors including STARML enables a plethora of options how this could be designed. One option could be that all information, also on how to use the system, was conveyed by the dialogs with met spirits, within ‘the storytelling’.Footnote 2 However, such a decision constrains the storytelling to entertaining direct dialogs with users, also diminishing the believability of a historical character, such as a spirit of a Roman soldier.

Design decisions were made after again revisiting the question for the added value of Augmented Reality in the interactive educational and entertainment application. We expected AR to facilitate a potential experience of an ‘aura’ of ‘the real’. We made the user turn around to make him/her look into different directions on the spot, also to find spatial relationships. For example, standing in front of the eastern gate and turning to the right lets us look to the “Limes”, the historically built frontier by the Romans to the Germanics area. These aspects can be stressed in the fictional story, by letting the spirits look into this direction for the arrival of their scouts (compare Fig. 2). They can also be addressed by text in the ‘Facts’ menu. As a matter of fact, in this case the interaction design is especially effective, if the story content is written in direct accordance with it.

Further, we found that not only the technology, but also the design of characters including their acting could have an effect on the experience of presence of the spirits in ‘reality’. This is a reason why we separated fictional play of the story of the spirits from factual information, to let the spirits stay ‘in character’. Facts can be consulted at the users’ convenience, or not at all, as they wish. In our formative evaluation cycles during design, test users differed in their tendency to use the “Facts” button, which for us underlined the necessity to design this feature as ‘optional’.

Within the project, we have also explored other interaction patterns by prototypes, which were lending themselves to different kinds of content genres, such as adventure games. For example, we cashed beacons at two different locations to identify fictional user/player friendship with either Romans or Germanics, or we can let authors set a timer for reaching a certain goal in physical space. In principle, these kind of interactions could be fleshed out with an extension of the system, and a new framing story matching this kind of experience would then need to be written.

5 Storytelling

5.1 Overview

The SPIRIT system has been conceived as a location-based Augmented Reality system for interactive storytelling, which is the reason for the particular design of many of its components. So far, we have one playable prototype running with one location-based story connecting to the Roman occupation of northern Europe, to be experienced at a particular place, the Saalburg Roman fort.

The story has been designed in a laborious collaboration of a writer and designers with the rest of the technical team. The more the characters, dialogs and actions of the story take up knowledge of the specific location, and can indeed connect to historical facts, the better we expect to experience real presence at the place through technology. However, this location-dependency can also be a drawback, as one has to intricately travel there for the experience. Nevertheless, especially the storytelling part is easily extensible without much technical development, given that the current interaction pattern remains unchanged.

This has been achieved in small authoring evaluations with student groups in Media Management, who did not have programming experience. By just replacing GUI elements and authored video content, and adopting the exact interaction scheme of searching, watching and turning around, one group built an entertaining welcome application for the foyer of a big company, to be used in recruitment of young adults as employees. However, more student groups pursued history themes around the city of Wiesbaden, as it lends itself better to meeting ‘spirits of the past’ that guide us with their memories. In summary, although many stories can be told, it is recommendable to question the suitability of the medium for each story, as it is fascinating, but also complicated.

5.2 Discussion

A challenge that still needs more exploration is making use of the existing content structure for ‘interactive’ or non-linear interactions in storytelling. First, there is a barrier to be crossed by creative story authors to fully embrace interactivity. Second, as also in our case, we found during our tests with one prototypical nonlinear scene that end-users had many more difficulties in using it than with a strictly linear scene. Similar to the situation depicted in Fig. 2, we created a distributed dialog scene with two groups of spirits facing each other from different positions. Users could randomly look back and forth to either group of spirits and listen to them, as they wish. Their turn would trigger the spirits’ dialog to start, instead of the system prompting the user where to look at. Unfortunately, this kind of frequent non-linearity is also combined with a certain difficulty in the usability of moving the device with the right momentum, so that the sensors apply. Here more exploration is needed in all aspects, including more designed feedback for the user through an interface.

Other more global non-linearity, for example based in an undefined order of active areas to visit, is not as much a technical problem with the system, as a potential practical one. At first sight, it seems similar to known adventure game styles. However, our evaluation of the case study with 107 museum visitors revealed other aspects that need be considered, concerning the museum situation. A large group of subjects complained about having to walk one segment twice, although from the story’s logic it made absolute sense, for example to return to the eastern gate. A guided ‘tour’ with ever-new sights was rather expected. Apparently, as the app is location-based, it could be that the effort to walk overrules the ideal of having free choices. Some visitors expected rather more factual information about buildings than to experience fictional scenes. The latter, including love, personal dreams and corruption, had the goal to stimulate imagination of how one’s life could have been at this place. As mentioned in Sect. 4.2, we decided to provide a separate Facts menu that would be up to the user to check it out, or leave it. Our preliminary tests (with subjects from close to our lab) did support this hypothesis that some users would not want to hear talks about Facts. This felt differently in the evaluation on the spot, where the subjects were museum visitors who decidedly came there, before they knew of the app.

6 Conclusion

With the SPIRIT project, we developed a system for a specialized form of location-based interactive storytelling or gaming with Augmented Reality. At several levels, the system is optimized for this kind of connecting a story or game to a place, while other more general features of e.g. Augmented Reality are not as much supported. The concept separates the story content from its structure that is tuned with the running app and interaction design. Next to video AR content, also interactive elements such as GUI items can be managed by a plot engine. Under the umbrella of creating specialized AR experiences with a goal to create a feeling of local presence, this system is extensible. Non-programming authors are able to create content easily, as far as existing interaction patterns are adopted. Otherwise with more effort in development, further interaction patterns could be designed.

The performed case study of content production and user evaluation points out that research is still to be done also in terms of the suitability of developed content, its structure and technology, and matching of target groups and situations. The study showed how mutually dependent all aspects – also beyond system design – are to be successful. Although the described system is extensible, there are certain limitations from within the subject matter of location-based Augmented Reality. There is the potential that struggling with unfamiliar interaction patterns distracts a bit from feeling the presence of spirits in the environment, and to even remember the content after the experience.