Keywords

1 Background

The theater director Peter Brook has argued that all that is needed to make theater is for a person to walk across any empty space while someone else watches [1]. Of course, most theatrical productions involve far more than this, but Brook suggests that these are the minimum requirements. Three key elements within this definition will be important to consider as we discuss how augmented reality (AR) can be used in theatrical performance. Brook’s definition is framed in the present tense, suggesting an immediacy that distinguishes theater from film—its closely related storytelling cousin. When someone watches a film, they are watching something that has already happened. Film is fixed and can no longer respond to outside input.

On the other hand, when someone watches theater, they are watching something that is happening now. However subtly, the act of an audience member bearing witness changes the performance of the actor on stage. Therefore, while virtual reality technology could enable someone to experience the opening night of Hamilton, the Hip Hop Broadway musical that has captured international attention, the experience would be more akin to film than to theater as it happened in the past.

This immediacy creates a sense of community between the performers and the people watching. This community is yet another feature of theater that separates it from film and is perhaps part of what has made theater an enduring art form, despite the popularity of television and movies. Although VR can provide someone with the ability to experience a performance that is happening live, it may still be difficult for the technology to create the communal nature of theater when everyone is sealed off from one another inside their headsets. AR, however, offers a broader range of possibilities, and as such may be a better method for innovating theater spaces in the twenty-first century.

Brook states that someone walking across a space can be theater. Thus, any space where someone is doing something can become a theater space. There is a long history of researchers exploring the combination of theater and other live performance media with augmented reality. Some of this research has involved augmenting the space itself, while the rest has augmented the human action or interaction. Researchers have developed virtual sets [2], platforms for dance and theater events [3, 4], and the ability to combine actors and robots [5]. Dorsey et al. were some of the first to present research in this subject area [6], designing virtual sets for opera stages with computer-generated projected environments. Sparacino et al. presented an augmented reality platform for dance and theater [4, 7]. This work used body and gesture tracking in conjunction with virtual actors and human actors in performance events. Other types of augmented reality in live performance events include dancing events [8], body painting [9], and games for children with learning difficulties [10]. Jessop described approaches for mapping user gestures for performances [11, 12], providing a framework for developing performance expression recognition systems in live performance, interactive installations, and video games. Jessop’s work outlines the tools, technologies, challenges, and opportunities for utilizing gestures in interactive mediums. Benford et al. provided a survey of many augmented performance events in their work, Performance-Led Research in the Wild [3]. The work describes how practice, studies, and theory are all interleaved into interactive public exhibitions and how challenges, such as balancing artistic and research interests, can push up against institutional norms.

Much of this work has explored two different aspects of augmenting live performance events. One aspect of augmentation that has been explored is the use of virtual actors. For example, Mavridis developed an augmented reality technology that enabled combinations of pseudo-3D projections and humanoid robots to create a mixed reality theater piece [5]. This work is similar to the presented work in that it aimed to empower the actor but focused mainly on the human and virtual character interaction as opposed to the user interface. Another aspect for augmentation of live performance is the projection of virtual backdrops and sets. For example, Jacquemin et al. described an implementation of interactive projective sets [2]. More recently, Marner et al. presented Half Real, which demonstrated a theater production that featured projected environments, actor tracking, and audience interactivity [13]. Lee et al. have also demonstrated projection-based augmented reality for dynamic objects in live performance events [14]. The presented work’s motivation is similar to these works in the augmentation of stage performance, building upon them to enable new modes of stage-based interaction.

The onstage hologram concerts by Hatsune Miku provide a unique experience for fans of the virtual vocaloid singer. Developed in 2007 by Crypton Future Media in Japan, Hatsune Miku is part digital 3D avatar and part music production software. Miku is capable of singing any lyrics a user provides. Despite being an entirely digital persona, both in vocal talent and appearance, through the use of onstage holographic projections, she can perform live at concerts [15]. This blending of fictional personas and real-life spaces has been seen in other musicians’ live performances such as the Gorillaz. They frequently use projection mapping and onstage holograms to allow the fictional band members to appear before their fans [16]. These touring acts demonstrate the viability of fictitious personas appearing live before audiences using augmented reality technologies, which pushes against the boundaries of what constitutes live theater.

The IETM, the International Network for Contemporary Performing Arts, comprises more than 500 performing arts organizations from around the world and represents individuals who create theater, dance, circus, interdisciplinary live art forms, and new media. At their 2016 Spring Plenary Meeting in Amsterdam, keynote speaker Joris Weijdom presented on mixed reality and the future of theater [17]. At the close of his remarks, he quoted Shakespeare and reminded his audience that “all the world’s a stage, and all the men and women merely players.” In doing so, Weijdom challenged theater makers worldwide to think not only of ways that technology could augment and enhance a performance on a traditional stage but also how technology might enable theater to be made by anyone, anywhere.

2 Augmented Reality in Theatrical Performance

The following two examples of uses of augmented reality (AR) in live theatrical performance demonstrate the two versions of Weijdom’s challenge for the future of theater making. In the ALICE Project, AR technology allowed all of the production’s technical elements to be controlled by the performer. The result of which is perfect synchronization of technology and human movement. Next, What the Moon Saw utilized AR to facilitate a theatrical production in a nontraditional performance space and create opportunities for audience interaction and agency. These two projects, viewed together, show the breadth of technical possibilities of integrating AR into theater. At one end of the spectrum, What the Moon Saw presented an augmented performance environment that was so intuitive that young audience members rarely needed any help navigating the world of the play. At the other end, the ALICE Project relied on a highly trained and well-rehearsed performer to safely and effectively execute the visual spectacle of Lewis Carroll’s Wonderland in front of an audience.

2.1 ALICE Project

The technology behind most live theatrical performance events has been standardized into multiple entertainment control systems. Traditional performance practice dictates that each of these systems has an operator to run it, and this work is in conjunction with, though independent of, the onstage performer. The performers’ movement either influences the operation of these technology systems or is driven by the output of the technology of these systems. For example, a spotlight operator follows the movement of a dancer, while an actor adjusts their speaking tempo to sync with a recorded video. The philosophy of our Augmented Live Interactively Controlled Environment (ALICE) Project is to place the control of these systems with the performer. This dynamic shift was developed and tested through multiple performances in the spring of 2014. Support for this research was provided by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin–Madison with funding from the Wisconsin Alumni Research Foundation (WARF).

The ALICE Project is a multidisciplinary interactive production methodology that melds traditional theatrical production disciplines with emerging technologies. The ALICE Project enables the performer (i.e., actor, dancer, musician, etc.) to simultaneously interact with and control multiple aspects of a dynamic stage environment. By integrating video projection, motion control, motion capture, a video game engine, and virtual reality technologies together, the project enables new possibilities in live performance that enhance the experiences of both the performer and the audience.

2.1.1 Technical Components

The ALICE system was comprised of several interconnected, automated control systems that interacted with the actor and responded to the motion of their body. The system consisted of three main elements: motion sensors, a video game engine, and a stage motion control system. The actor’s joint locations were continuously monitored on the stage using multiple Microsoft Kinect v2 sensors that transmitted these joint locations as data to both the video game engine and the stage motion control system. The research team elected Unity as the video game engine to provide the interactive content environment because of its physics modeling and object interaction capabilities. The computer running the custom simulated Unity world powered a multi-projector system. Two projectors provided a rear-projected display behind the performers, thus preventing the performers from occluding the projection. Two ceiling-mounted stage projectors displayed images on the stage floor around the performer, allowing the virtual world to spill out onto the stage (Fig. 18.1).

Fig. 18.1
figure 1

Projector placement

The video display responded to how the actor moved within different operational zones of the stage motion control system. One such zone was the treadmill. As the actor locomoted on the treadmill, the Kinect sensors provided the positional data necessary to maintain their position in the center of the zone, regardless of pace, while the Unity world responded accordingly. Thus, when the actor walked, the environment changed slowly; when the actor ran, the environment changed quickly. The augmentation of the stage space in this way therefore created the sense of moving through vast environments.

Similarly, the flying zone made it possible for the actor to create the sense of both flying and falling. The actor’s hand positions controlled both the height at which the performer flying system raised them above the stage, as well as the speed at which the video game world changed. For example, hands straight up created rapid falling, but hands out to the sides would slow the apparent falling. All these motion functions were safeguarded against failure by SIL 3-rated functional safety features (Safe Limited Speed, Safely Limited Acceleration, etc.). This research project utilized IEC 61800-5-2 (adjustable speed electrical power drive systems - Part 5-2: Safety requirements - Functional) functional safety features to enhance performer safety in this unique performance environment (Fig. 18.2).

Fig. 18.2
figure 2

Actor using flying performer system

2.1.2 Technological Affordances in Performance

The research team chose to adapt Lewis Carroll’s Alice’s Adventures in Wonderland into the script for this stage performance. Following the start of the show, no external input, except for one hold-to-run safety button, was required for the actor to create an Alice character who truly lived in the imagined worlds created by Lewis Carroll. The actor’s movements, captured via multiple Microsoft Kinect v2 sensors, directly controlled all the show systems. Their position on the treadmill was maintained at the center regardless of the speed at which they walked or jogged. The projected world behind Alice scrolled along to keep pace via a communication link between the automation system and the Unity video game engine, which managed the digital world. This allowed Alice to interact with projected images and traverse the world via the treadmill (Fig. 18.3). For example, when an animated White Rabbit appeared, Alice could follow it. Unlike previous theater methods, however, the actor was not beholden to the predetermined and fixed speed at which the White Rabbit would run away. In the ALICE Project, the actor playing Alice could walk curiously, quicken slightly, and then break into a full run. The dynamic system responded to ensure the White Rabbit would always remain one step ahead.

Fig. 18.3
figure 3

Actor on treadmill

When Alice followed the White Rabbit down the rabbit hole, the actor raised their hands and the performer flying system raised them above the stage. The Unity engine responded accordingly, and Alice fell down “a very deep well” [18] into Wonderland, and the higher the actor raised their hands, the faster they appeared to fall (Fig. 18.4). Position in the stage environment was detected by the Kinect sensors and dynamically controlled by the automation system.

Fig. 18.4
figure 4

Actor appearing to fall

The inclusion of automation into the ALICE Project is what makes this performance methodology genuinely unique. Traditional performance practice dictates that each specialization (e.g., performer flying system, video projections, etc.) has an operator who runs each system. Within the ALICE Project, however, the actor is in control of the stage environment. Through the use of multiple technologies, the research team has developed a stage environment in which the actor dynamically controls lighting, sound, projection, and automation systems (Fig. 18.5). This liberates the actor to live more fully within an imagined environment.

Fig. 18.5
figure 5

Flying controlled by actor

2.1.3 Safety Considerations

Safety was a primary design consideration throughout the development of this entertainment technology. The aforementioned hold-to-run enable button empowers an external safety observer to halt the automation system if needed to maintain safety. This safety function was one of several utilized in the system. Formal Risk Analysis and Risk Reduction (RA/RR) processes conducted throughout the development determined that multiple levels of safety were required to protect the Alice performer throughout the performance. Highlights of the safety system include dual-zone sensing to activate the treadmill and lift zones; advanced variable frequency drive technology using Safe Limited Speed (SLS), Safe Limited Acceleration (SLA), and Safe Limited Position (SLP) [19]; and use of an E1.43-compliant performer flying hoist system [20]. The performer flying system allowed for performer-controlled flying within the normal operational zone while preventing overtravel and excessive speed or acceleration (Fig. 18.6). The integration of these safety features with purpose-designed interactive automation control software allowed actor-driven automation to be performed without incident throughout the performances.

Fig. 18.6
figure 6

ALICE Project performer flying safety diagram

2.1.4 Sensor Considerations

With the performer’s safety as a primary goal, the consideration of the limitations and reliability of the sensing systems used is an important element in such interactive design. These limitations can be intrinsic to the capture technology used or extrinsic environmental factors that influence the sensing of the performer. Intrinsic limitations include the resolution, range, capture rate, depth-sensing technologies, and spectrum visible to the camera systems. Extrinsic factors include the size of the tracked space, the lighting and visibility of the tracking targets, and the number of targets to be tracked. Two broad categories of sensing involve augmenting the performer or augmenting the space. Augmenting the performer might include passive motion capture reflective targets, active trackers such as the HTC Vive trackers, or accelerometer and positional data from activity trackers and smart devices. While some of these involve adding tracking cameras or base stations to the environment, they all also involve premeditated preparation of the performance participants. Another approach is to augment the space such that it is capable of directly sensing the desired data. This was the route taken in the ALICE Project, using two Microsoft Kinect v2 cameras.

The Microsoft Kinect family of sensors use infrared light to generate their depth data. However, the technique varies based on the model. The first-generation Kinect is a structured light sensor, which projects a known pattern of infrared light onto the scene and calculates depth information based on the resulting changes to that pattern. This can result in errors when the two sensor’s structured light patterns interfere with each other. Subsequent Kinect models use a time-of-flight camera system that also suffers from competing infrared signals in different use cases. For example, stage lighting, or anything very hot, serves as a potent infrared source that can disturb the sensing capabilities. Despite limitations, infrared depth-sensing confers a powerful advantage over visible light depth systems. That is, they can sense depth without visible light. This is of particular interest in theatrical productions where one faces a high dynamic range in the intensity, configuration, color, and coverage of lighting. Using infrared-based systems facilitates the tracking of performers in complete darkness, allowing performers to be tracked offstage or, in this project, while in midair. Visible light-based stereo systems are not without their advantages. Other depth cameras that utilize disparity provide a significantly longer-depth tracking range in well-illuminated venues such as outside. Choosing the sensor best suited for project needs involves considering the environment of the performance and the goals of the data captured.

2.1.5 Data Considerations

While the Kinect sensor primarily captures depth, the system can try to infer human body positions and movements. As these systems provide the system’s best guess, body parts that are occupied or are in unusual positions can be incorrectly identified. Fortunately, the Kinect v2s used in this project can indicate the confidence levels of joint positions reported. The system may report points with poor confidence for a number of different reasons. For example, when a performer’s body is completely out of the range of the sensor’s view, the system will report no confidence. Low confidence points include those that are at the edges of sensing thresholds or those that are occluded. Occluded points will have their joint locations predicted by the Kinect’s machine learning kinematic models. Medium and high confidence points have their predicted joint location correspond with actual points visible to the sensor (Fig. 18.7).

Fig. 18.7
figure 7

ALICE Project system topology

The distinction between predicted and sensed points is particularly important for the automatic control systems used in augmenting the performance in the ALICE Project. As stated above, the Kinect generates joint positions by running machine learning algorithms trained to detect human body poses from depth information. This is highly beneficial for robustness and usability in general application development. It allows for a consistent representation of the body pose and position to be maintained during occlusion, interference, or range issues. However, there is no guarantee on the accuracy or precision of the predicted models. While an incorrect guess is not catastrophic when displaying a digital avatar, these inferred points potentially become dangerous when those data are responsible for driving a treadmill or lift system used by a human performer. These safety considerations led the ALICE Project research team to use multiple sensors. Multiple sensors ensured ample high confidence points, overcame occlusion with extra viewing angles, and provided redundancy for safety. Poor confidence points could be rejected from the control systems to ensure the performer’s safety and the reliability of the control system. In practice, the developed system provided a stable representation of the actor, enabling the seamless transitions between the various operational zones around the stage.

2.1.6 Considerations for Working with Performers

One surprise the research team discovered from working with actors and performers was how different this group of people was from human subjects in other research studies. Most actors’ training prepares them to quickly adapt their performance to align with the technology’s capabilities. This, however, created an unexpected tension between the researchers and the actors. The researchers sought to design technology that adapted an actor’s physicality, but this was a completely foreign experience for the actors who were unaccustomed to having their individual preferences attended to so closely. This was particularly true as the team fine-tuned the flying and treadmill zones for a particular actor during the design process. Because the team build the flying system specifically for the actor, they could not directly test much of the user interface. Instead, they had to rely on the actor’s responses to changes in various parameters. As a result, the only way to test the responsiveness between the actor’s actions and lift, for example, was through trial and error (Fig. 18.8). This led to many conversations in which the research team would create a series of mock-ups and ask the actor which one felt most natural, only for the actor to respond that their preference was irrelevant and that they would do whatever seemed to fit best the overall aesthetic of the production.

Fig. 18.8
figure 8

ALICE Project actor testing the flying system

Another challenge for the actors came from interacting with 3D spatial triggers for which the actors could not see. These triggers were set up such that from the audience’s point of view, the actor’s actions would align with virtual objects (Fig. 18.9). For instance, the actor could grab a virtual bottle by passing their hand through the space the bottle would physically occupy. Unfortunately, as the projected perspective was given for the audience and not the actor, the actor was without visual cues as to where these physical/virtual collisions should occur. While actors are trained in hitting marks, these generally occur in 2D space, for instance, making sure one is standing in the correct location on the stage. The adjustment toward enabling an action to move through a volume of space with limited cues sometimes proved to be a challenge to the performer.

Fig. 18.9
figure 9

ALICE Project actor reaching for a virtual object

Finally, having actors be part of the technological development process proved to be a unique experience all around. As both groups were unsure of what was possible, either physically or technologically, both groups needed to learn how each other operated. Much like other interdisciplinary research forms, while integration challenges existed between the groups, the payoff is more than made up for it.

2.2 What the Moon Saw

Lewis Carroll’s Alice’s Adventures in Wonderland contains fantastic environments that were previously difficult to produce for audiences given their strong preconceived visual notions created by major motion pictures. However, the technology of the ALICE Project afforded the creative team the ability to create an augmented live performance that satisfied these cultural expectations for the audience. The significance of this technology for the practice of theater making is the shift in the actor’s ability to interact with the created narrative world, but this remained a passive experience for the audience. Building upon these developments of using video game and interactive technologies in live theatrical performance, the research team worked to develop an environment that would facilitate interactive possibilities for young audiences. To do this, the team elected to adapt Hans Christian Andersen’s What the Moon Saw [21] into a dynamic and interactive play for children. These efforts yielded a new storytelling methodology that allowed for nonlinear storytelling and audience interaction within augmented reality performance.

To fully develop an interactive production, the project team developed a new script based on Hans Christian Andersen’s fairy tale What the Moon Saw that sought to incorporate the new performance methodology. The original tale is a loosely connected set of 32 vignettes framed by a story of the Moon paying nightly visits to a lonely child, who is Andersen’s nameless protagonist. As the Moon tells stories of what he has seen in his travels around the world each night, the child fills a sketchbook with drawings inspired by the tales. Because each vignette within the original Andersen text essentially functions as a unique and self-contained story unit, the order in which an audience encounters them does not matter. This made What the Moon Saw an excellent source of material for the creative team to adapt into a nonlinear story. For the purposes of providing a coherent narrative frame for the adaption, the team’s playwright crafted a story that takes place in the course of a single evening. In the adapted script, the Moon visits a child, whom the playwright named Erika, and shows her various episodes inspired by Andersen’s original vignettes. The playwright designed the script in such a way to provide significant audience agency over the narrative, which will be discussed later in the chapter. The resulting play combined augmented reality (AR) technologies with more traditional theatrical conventions.

The Unity game engine was the technical medium through which the team created the audience’s interactive theatrical experience. What the Moon Saw featured a variety of both interactive and noninteractive scenes rendered by the team’s designers in Unity. Some of the noninteractive elements included static backdrops akin to traditional matte paintings, prerendered cartoon-style computer-generated movies that played as a dynamic backdrop, and dynamic cameras that rendered the digital scenes using Unity’s real-time graphics engine. The show’s interactive elements included body motion-tracked mini-games for audience members, performer body tracking-driven scene transitions, showrunner-cued transitions, and digital puppeteering for audience members. The whole project ran off of a single VR-capable desktop with a dedicated GPU. The use of body tracking for the interactive elements of the performance allowed for a seamless onboarding process of audience participants which gave them a sense of stepping into the simulated world on the stage.

The research team achieved body tracking through the use of one Kinect v2 camera system; the Kinect is a visible and depth camera with machine learning models that enable both human body and pose tracking. Data generated by the Kinect streamed to Unity by the Body2Basics Microsoft app, which provides 25 body joint positions for up to six humans. The researchers mounted the Kinect on a frame above and downstage of the performers, allowing full tracking of space in front of the projected screens (Fig. 18.10). An active USB 3.0 extension cable was necessary to connect the stage-mounted Kinect to the backstage rendering and projection systems. This data was fed into the Unity rendering system to enable the input to the simulated reality, which was then output to the users enabling the interactive experiences.

Fig. 18.10
figure 10

Actors standing in front of static projected image

Simulation output was through two theater-grade Epson 1080p projectors which the team vertically oriented to rear project the AR environments. Manual keystoning and lens offsets allowed the team to tune the displays specifically to the projected surface. The performance space for What the Moon Saw contained a wall of frosted glass doors that served as the projection surface and the actors’ entrances and exits. The mobility of the projection surface allowed for immersive transitions by the performers. Through opening doors in the projected surface, performers were able to step out of the virtual world of the digital game and into the stage’s physical world.

Running off a single computer and using a single Kinect camera lent a great deal of portability and ease of setup and takedown to the system. As part of the performance space’s limitations, the team had to remove the entire technical infrastructure after each rehearsal and then reassemble it before the next rehearsal session. The system was versatile enough to enable rapid deployment, as well as to augment the performance space.

2.2.1 Agentic Affordances of AR in Theater for Young Audiences

The research team behind What the Moon Saw utilized live motion capture technology, facilitated through a Microsoft Kinect v2 and the Unity video game engine, developed in the ALICE Project to augment the performance environment. In creating a new form of performance methodology, it was important for the research team to first consider the unique affordances that AR could contribute to live theater. Perhaps most notably, AR provided young audience members a chance to join actors in the performance space and express agency over the narrative outcome. Young people could enact their agentic capacity in two ways. First, young audience members could interact with elements within the video game world. For example, someone could push a rock within the video game world over a cliff. Second, young audience members could embody figures or characters and control their movements within the environment. This phenomenon could be imagined as a sort of digital puppetry.

2.2.1.1 Chicken Game

In considering ways to utilize the first interactive affordance, the playwright adapted two vignettes from Hans Christian Andersen’s original text into scenes requiring the successful completion of a task-oriented game. During one night in the Andersen text, the Moon shows the protagonist a chicken farm in which dozens of chickens have escaped the coop. A young girl is distraught over the situation because she is responsible for the uncaged chickens and is worried that her father, the farmer, will be angry. This premise provided the foundation for a scene with a game-driven narrative, which is to say successful completion of the game is a vital element for successful storytelling.

At the start of the scene, 30 chickens bounce and squawk about the screen. The actors portraying the Moon and Erika explained to the young audience that the girl would need help getting the chickens back into the coop and then selected multiple volunteers to come to the stage to herd the chickens. Facilitated through the joint tracking of the Kinect sensors’ joint tracking (Fig. 18.11), the young audience members then used their hands and feet to move the bouncing chickens into a targeted chicken coop (Fig. 18.12). By successfully rounding up the chickens, the volunteers solved the girl’s problem of the escaped chickens and brought the scene’s narrative to its conclusion.

Fig. 18.11
figure 11

Joint tracking of young audience members playing the Chicken Game

Fig. 18.12
figure 12

Young audience members playing the Chicken Game

2.2.1.2 Penguin Game

Similarly, the playwright adapted a second vignette set in the Arctic in which walruses hunt seals into task-oriented game that involved coaxing penguins back into their exhibit at the zoo. In this instance, the actors portraying the Moon and Erika elicited help from young audience members to recapture the penguins before they waddled too close to the walrus exhibit. Again, facilitated through the Kinect sensors, the young audience members then used their bodies to deflect sliding penguins into the targeted penguin exhibit (Fig. 18.13). During the performance, two audience members would play the game, and once they collected all the penguins, the scene would advance after the audience members returned to their seats. This game could support up to three players. The number of escaped penguins could be adjusted for increased difficulty. Seven penguins provided a sufficient challenge during the show without taking up too much time.

Fig. 18.13
figure 13

Young audience members playing the Penguin Game

The tasks of the Chicken Game and the Penguin Game were virtually identical. The actors provided the audiences with the ability to choose between the two scenes. Thus, only one of the games was played during an individual performance. Choosing the way in which the larger narrative unfolded on stage was yet another way in which the young audience members exerted agency over the story. This will be discussed later in the chapter.

2.2.1.3 Swan Game

The final scene of What the Moon Saw utilized the second affordance: embodiment. This interaction involved coordinated effort by audience members to help an injured swan fly back to its flock. This included the most specific body tracking of all of the interactions. Using the position and rotation of their shoulders and arms, audience volunteers were able to puppeteer the wings of the swan. Though capable of working with one user, the experience was designed to be a cooperative effort from two players.

The actors brought two volunteers to the stage and asked them to help the swan fly again. Thus, the young audience members were presented with an ill-defined problem that they had to work together to solve. This provided young people an opportunity to cooperate within their agentic control over the narrative. Standing facing the projected screen, the left audience member’s left arm controlled the swan’s left wing and similarly for the audience member standing on the right. In order to successfully make the swan fly, both players had to flap their wings in unison (Fig. 18.14). If one player flapped too fast or too frequently, the extra force would tip the swan over, and the duo would have to try again. Successfully helping the swan fly off from the lake brought about the narrative conclusion to the scene. Once the swan reached a certain height, a prerendered video of the swan flying away to its flock played and triggered the scene’s transition.

Fig. 18.14
figure 14

Young audience members playing the Swan Game

2.2.2 Storytelling Affordances of AR

In addition to affording young audience members the opportunity to help complete the story of each scene, the AR created by the video game world of Moon also allowed the audience to choose the direction of the larger narrative as well. Unlike Andersen’s original literature version which takes place of 32 nights, this theatrical version takes place in just one night. Like Andersen’s protagonist, Erika is lonely and isolated. In this version, these feelings are because her family has recently moved to a new town. The Moon magically comes to the window and offers to show Erika many wonderful aspects about her new home. By design, however, the Moon does not have time to show Erika everything, and this allows the actors to ask the audience to vote for the next scene at the conclusion of each preceding scene.

To make the voting process simple, the playwright wrote two options for the audience to choose between each time. The designers created text to appear on clouds with single words describing the settings of the scene options. In the previously mentioned scenes with uncaged chicken and escaped penguins, the words farm and zoo appeared. The actors asked the audience to vote and then, utilizing the technology’s interactive affordance, slid the appropriate text cloud to the side to trigger the next scene to start.

Central to the story of What the Moon Saw is Erika’s sketchbook. The graphics designer created the sketchbook that was larger than life and often filled the entire screen. The interactive features of the technology allowed the actors to turn the giant pages of the AR sketchbook and trigger the start of animated drawing that gave Erika’s illusion, drawing the elaborate scenes the Moon described.

2.2.3 Considerations for Public Interaction

The creative team staged What the Moon Saw in a large, open space within the Wisconsin Institute for Discovery (WID) on the University of Wisconsin–Madison. This building was open to the public and is not a traditional theater space. The location was chosen, in part, because of the presence of large, frosted glass walls that would allow for rear projection on one side and audience seating on the other. In using the frosted glass as a projection surface, the team solved one of the primary concerns of attempting this project with young audience members that is potential damage to a projection screen during performances. The team predicted that the ubiquity of touch-screen technology would make playing a large video game fairly intuitive for young audiences. While interacting with the AR environment did not require touching the screen, projecting onto sturdy glass walls mitigated the risk of damage.

The motion capture sensors facilitated the human interaction within the AR environment, which meant that the team needed to consider how and where to place the sensors. Microsoft recommends that Kinect sensors be set between 2 feet and 6 feet from the ground [22]. This, however, is not ideal in a theatrical setting because the sensors would then obstruct the audience’s view of the stage area. The team was concerned that placing the sensors on the floor would be a tripping hazard that could cause injury to young audience members and damage to equipment. The team found that hanging the sensors from scaffolding, such that they were 8 feet above the ground (Figs. 18.15 and 18.16), alleviated the tripping hazard without sacrificing sensor efficacy.

Fig. 18.15
figure 15

Placement of Kinect and projectors

Fig. 18.16
figure 16

Eye-level view of placement of Kinect with a member of the research team, for scale

2.2.3.1 Sensor Considerations

Whereas in the ALICE Project performer safety was a primary goal, in What the Moon Saw, the sensing goal was to provide a robust and flexible system. In contrast to the single performer tracking in the ALICE Project, What the Moon Saw involved tracking a wide range of body sizes and shapes. The audience members ranged from kindergarten to middle school students. Also, unlike the single performer in the ALICE Project, there were anywhere between two and six bodies to be tracked at one time. The research team decided to use a single Kinect v2 as it provided the ability to track up to six targets simultaneously. Needs for redundant data or occlusion resolution were minimized in this application because the data was driving interactive games instead of physical systems with safety concerns.

One of the problems that eluded resolution for quite some time was a phenomenon of ghost bodies. Occasionally, the Kinect would begin confidently tracking floating bodies that did not actually exist. Once the ghost bodies were tracked, the true performers were no longer reported by the Kinect. Fully blocking the sensor’s cameras with a piece of cardboard for a few seconds cleared these errors. After further observation, the research team noticed a sliver of light was leaking through the gaps in the projection surface. The heat from the light overwhelmed the camera. The team confirmed this looking through the Kinect’s IR diagnostic view and observing the very bright patches coming from the hot, intense IR source of the projector’s lamp. This demonstrates the importance of evaluating how tracking systems and equipment might interfere with one another. Another example of these conflicts is the IR laser tracking systems used in Valve’s lighthouse system; if placed close to a Kinect, the IR laser beams projected from the Kinect can overpower the IR lasers from the base stations and cause catastrophic tracking failures.

2.2.3.2 Data Considerations

The Chicken and Penguin Games were achieved by using depth masking information alone. Collisions with the space occupied by participants was sufficient to interact with the Unity physics system that drove the animals’ movements. This method has the advantage of quicker implementation and a single object against which to check collisions. It would also allow for many users to be in the space because the sensor tracked occupied space and not individual bodies and joints. However, the Swan Game required the ability to recognize the position of two users and track specific arm movements. This cannot be easily done directly from the depth data but is a task better suited for body and joint tracking. This requirement led the team to forgo using a simple depth masking technique throughout the project instead of using the full-body tracking systems in the Kinect API, application programming interface.

A further consideration that applied to both interaction approaches resulted from relocating the physical mounting location of the Kinect. In the ALICE Project, the Kinect tracked the performer from the front, but in What the Moon Saw, the team needed to mount the Kinect above the stage. The research team made this change primarily to prevent young audience members from tripping over the sensor or its cables. Still, this nonstandard positioning required a transformation, measurement, and calibration step in Unity for each performance location. Using the live preview mode in Unity, the performer would walk the space’s boundaries. At the same time, an operator adjusted the digital space to correspond with the sensor’s reported body positions before each performance.

2.2.4 Multiple Forms of Public Engagement

The Wisconsin Institute of Discovery hosts a monthly event called Saturday Science, which is open to the public. Learners of all ages have an opportunity to explore a variety of scientific disciplines through hands-on engagement. In July 2019, the team displayed the motion capture technology and the interactive games at the monthly Saturday Science. Following minor tweaking based on this user testing, the team then mounted the full production of What the Moon Saw at the August Science Saturday. In October 2019, the team remounted the production for the WID’s annual Science Fest, attended by more than 2000 elementary and middle school students from across the state of Wisconsin. In total, the team produced 10 performances of What the Moon Saw that nearly 500 audience members enjoyed.

In between performances, the research team ran the audience participation games as an open arcade experience. Following the conclusion of the show, they invited audience members to play any of the three games from the play. During the performance, two or three audience members would play the game, and once they completed the game, the scene would advance after they returned to their seats. However, this was not a technical limitation, and the game could support up to six players. Six players frequently played the Chicken Game successfully during postshow free play. The number of chickens to capture could be adjusted based on numbers of players and desired game length. Thirty chickens provided a sufficient in-show challenge without taking up too much time. Postshow free play variants included hundreds of chickens for larger groups, creating a veritable sea of chickens that moved in waves like a fluid. The Chicken Game was a clear crowd favorite as it both amazed toddlers and excited teenagers.

While there was an opportunity to play the Penguin Game during the open arcade, the game did not scale up as well as the Chicken Game. When there were too many penguins or too many players, the penguins stacked up and blocked the entrance to the target enclosure or players accidentally blocked the access to the enclosure. If the team made the humanoid colliders visible to the players (Fig. 18.17), then the young people had an easier time understanding that they were blocking the entrance. Though many audience members would play and win the penguin game once, it was the chicken game that people wanted to play multiple times.

Fig. 18.17
figure 17

Humanoid collider shown behind ALICE Project actor

The popularity of the Swan Game seemed to fall somewhere in between the other two. For the open play, the designers disabled the video projection trigger from the performance, enabling players to fly the swan as high as possible. Though not prompted by any team member, players would often compete against one another to see who could fly the highest without tipping over. Future considerations will include adding a maximum height score or additional content to encourage this emergent play behavior.

2.2.5 Actors, Audience, and AR

Although the interactive game element of the production proved intuitive to most audience members, the team nonetheless spent multiple rehearsal sessions helping the actors gain expertise in using the Kinect motion sensors and manipulating the motion-based triggers AR platform. While this aspect of the rehearsal process nearly doubled the amount of time needed to prepare the cast of actors for a 25-minute performance, the team needed to have ample time to teach the cast how to navigate the potential need to troubleshoot during a performance successfully. In any theatrical setting, the actors are best positioned to solve problems that occur on the stage. Knowing this, the creative team spent roughly 9 h working with the actors to acclimate them to the technology’s various elements.

The AR experience generalized across multiple performers. Over the lifetime of the production, two different sets of performers filled the cast. With minimal adjustments, the team could adapt the AR production to the users’ new sizes and profiles for motion capture. This generality also continued in the diversity of the audience’s size and body type during their interactive scenes, including audience members who used wheelchairs.

The technical iteration process of getting feedback from the performers during tech rehearsal was particularly vital in the development stage of the project, highlighting the importance of user testing and feedback when designing interfaces and interactions, especially in performance and dynamic technology.

3 Discussion

Interactive performance research is a new and growing field of inquiry—this research project investigated both the technology behind the interaction and the human reaction to the performance. The aforementioned ALICE Project was solely focused on developing the initial technology behind the methodology. What the Moon Saw extended this inquiry into how to incorporate interactive technology into production, demonstrating that the video game engine environment can be a flexible and powerful tool for use in augmented reality performance. Using motion capture technology, both performers and audience members can dynamically interact with the stage environment. The ALICE Project shows what fantastic spectacles can be achieved with trained performers, while What the Moon Saw shows what is possible by simply walking onto the stage. AR and live motion capture allow for a relatively seamless onboarding process, making it possible for anyone to interact with the characters and the story. While examples of mixed reality, augmented reality, and interactive technology are not new to artistic installations, this production methodology’s scale and inclusive nature is unique.

3.1 Potentials for Nonlinear Storytelling

One of the potential contributions of this research to theatrical performance is the possibility of creating dynamic storytelling environments. In producing What the Moon Saw, the research team tested the use of AR technology to enable the audience to choose the direction of the next scene, similar to a choose-your-own-adventure storybook. This endeavor’s success demonstrated the viability of dynamically programming story/action options into a video game engine for theatrical performance purposes.

This technical advancement opens new possibilities to playwrights, theater directors, and designers for creating nonlinear storytelling experiences. Nonlinear storytelling is not a new concept to theater makers. The New York production of Sleep No More provides a recent example of nonlinear storytelling in theater. In this production, the creative team transformed an entire building into a nontraditional theater space. Actors performed their portions of the story simultaneously in different rooms of the building, and the individuals in the audience were permitted to travel throughout the theatrical environment. As a result, each member of the audience experienced the play in a unique way.

Sleep No More created an analog theatrical storytelling experience similar to how narratives unfold within the created worlds of video games. However, the performance methodology of What the Moon Saw created this same sense of agentic control over the story’s direction, but it does so in a communal way for the audience. In addition to creating a dynamic and communal experience for the audience, this application of AR into theatrical settings also enables theater makers to utilize traditional theater spaces. Therefore, the building’s winding corridors used in Sleep No More could be recreated using the technology applied in What the Moon Saw, allowing a similar story to be told in a much smaller space.

3.2 Interfacing for Immersion

Designing an interface for immersion requires as much a consideration of how to acquire the data as how to use it. Differences in the goals, environment, and interactions between these two projects led the team to take different sensing and data collection approaches. The predictable, controlled, and high-stakes aspects of the ALICE Project led the team to focus on high-quality data, redundant systems, and performer training. What the Moon Saw, however, was unpredictable, dynamic, and playful, and this led the team to focus on scalability, simplicity, and emergent spontaneous interaction design. Establishing working prototypes and periodically reevaluating the goals of the interface help lead to a functional result that minimizes overcomplicating either system.

Despite similarities to virtual reality or traditional video games, additional considerations must be addressed when designing for augmented reality in live theatrical performance. As all these systems can be designed in the Unity Game engine, one might assume an entire AR story could also be experienced in an immersive VR perspective in a head-mounted display, HMD. The technical change in Unity from a projected AR display to an HMD unit is relatively straightforward, but the experience would not be so easily translated. Certain goals and assumptions will conflict. For example, when designing for AR, the virtual environments can be designed to facilitate linear movement (e.g., scrolling backdrops). This could be done to avoid the camera accidentally clipping through virtual objects. However, when given the unconstrained perspective of VR, these environments designed to be viewed from a forced perspective look empty. For a fully immersive first-person point of view, it is essential to create scenes that enable arbitrary viewing directions.

A secondary challenge of this approach is in the full integration of physical and virtual worlds. As is exploited in redirected walking research, humans often utilize visual cues to walk in straight lines. When using augmented reality technologies, the user can still maintain awareness of their environment and receive visual feedback. However, in a fully virtual experience, any misalignment in a treadmill system’s limited width can cause the user to walk off the treadmill and risk injury. Furthermore, many HMD systems currently require physical connections to computation nodes. These cords and cables can pose serious safety concerns, especially for multiuser spaces. And while the use of HMD is incongruous with the definition of theater used by this team, these considerations nonetheless demonstrate that when designing an experience relying on a strong merging of physical and virtual interactions, augmented reality provides many suitable safety, design, and execution options.

3.3 Accessibility

In thinking about the potential for AR technology to create immersive theatrical environments for audience members, it is also important to consider issues relating to audience access to theater in the first place. The research teams of the ALICE Project and of What the Moon Saw made deliberate choices to make theatrical experiences that did not rely on the use of secondary devices to mediate the AR experiences for the audience. While there are certainly creative possibilities that can be achieved through audience members using smartphones or tablets to facilitate an experience, some of which will be discussed later, theater makers should balance these design choices against the degree to which they might exclude some audience members from full participation.

The use of mobile devices by people in nearly every demographic has increased significantly in recent years, and an overwhelming majority of school-aged young people indeed have access to a smartphone or tablet [23]. However, it would be misguided to assume that the overwhelming majority of audience members would therefore each be able to have a personal device available to mediate an augmented theatrical performance. Some students who report having access to Web-enabled devices for homework also report that they share the device with other members of their family. Socioeconomic status can already present a barrier to experiencing certain theatrical performances because high ticket prices can make attending events like Broadway plays cost-prohibitive. While supplying devices to every audience member would be a possible solution, many theater companies would likely find such a financial investment equally cost-prohibitive.

In addition to considering ways to make an augmented theatrical performance accessible to people across the socioeconomic spectrum, it is also important to keep in mind audience onboarding. The ALICE Project featured no audience onboarding as audience members needed only to observe the spectacle in the same way they would a traditional theatrical performance. What the Moon Saw provided a nearly seamless onboarding because audience members appeared to find the interactive technology intuitive, and the performers modeled its use throughout. If theater makers and designers expect audience members to utilize new technologies in order to facilitate an AR experience, then they should make efforts to ensure that all audience members are comfortable doing so.

Beyond simply taking steps to ensure that audience members understand how to interact with the necessary technology, designers should also take steps to make experiences that are accessible to people with different abilities. For example, holding a mobile device or wearing a headset may not be possible for some patrons. Although it may not be possible to accommodate every audience member’s unique situation, theater makers considering AR technology in the design should weigh the options as they would in choosing whether or not to use strobe lighting or the sounds of gunfire. The team wanted to make audience participation in What the Moon Saw accessible to anyone who wanted to take part. To do so, the team tested the Kinect sensor’s ability to track someone in a wheelchair accurately. They also looked to see that the sensor could track the movements of someone with a missing limb and people of varying height. The Kinect worked well in all of these situations, but this was not necessarily a given.

Finally, creating a live theatrical performance using AR for young audiences meant making the technology elements accessible to children. Some of the safety considerations for protecting both people and equipment have been discussed previously, but this research project provides a deeper lesson about design that can generalize for a wide array of AR applications in theater. Good design should always have the user experience in mind, and in this case, the user is the audience. Some children are short, and so the team made sure that their motion could be tracked. Children are all at different stages in their motor development, and so the team made sure to design the games accordingly. And in addition to countless design decisions made ahead of time, the onstage performers and the showrunner were prepared to step in and assist young audience members navigate the augmented theatrical world successfully.

3.4 Future Directions

This project paved the way for different types of future endeavors beyond augmented user interfaces. Consumer commodity computers, smart devices, and wearables are continuing to integrate more and more of the technology required to augment reality. High-resolution screens, positioning and telemetry sensors, and even depth cameras are becoming commonplace in sophisticated smartphones. As our evermore technological society continues, it is reasonable to assume that this trend will enable augmented reality to become ubiquitous. How long that may be is speculative, but we look here to consider what this enables in live theatrical performance. In addition to live motion capture, we also experimented with voice recognition. While only tested as an early-stage prototype, the use of voice recognition presents interesting possibilities for theater and other live performance events. As scripts are well-defined (i.e., predetermined), voice recognition algorithms can be tailored toward prespecified input phrases (i.e., lines in the script). These phrases can be used to advance the narrative structure of the projected display environment.

Further applications of voice recognition technology could serve to make a theater experience more inclusive and accessible for audiences. It is common practice in opera productions to project surtitles above the stage to allow audiences to read translations of the lyrics being sung. For example, an Italian opera performed in the United States might have English language surtitles. Similarly, larger theaters sometimes use a similar practice to make a production accessible to audience members who are deaf or hard of hearing.

The current practice of surtitle projection is limited in several ways. First, they are typically prerendered, which means that if performers were to veer from the script, the surtitles would not reflect this. Second, subtitles generally only feature one language. Third, a theater venue must have architecture to support the projection of surtitles without obscuring the audience’s view or creating obstacles for performers. Voice recognition captured live and translated into any one of a variety of languages could then generate subtitles sent to an audience member’s mobile device, addressing all of these limitations. By creating what amounts to closed captioning for live performances, this technology would give greater accessibility by expanding linguistic options to patrons. It would also allow for performance art forms like improvised comedy, where translation of this kind has been impossible, to be accessed by more people. Finally, by sending captions to a handheld device, performance spaces that are not conducive to projecting surtitles become more accessible.

Once generated, these captions would not need to be limited to words on a handheld device. Instead, they could be delivered to an audience member’s device as audio narration. Audience members with impaired vision could use a text-to-speech delivery method, allowing them to listen to descriptions of the action of the performance. Additionally, action annotations could be automatically generated. Using motion capture and artificial intelligence technologies to recognize performer actions would allow for audio descriptions of the performers’ visual actions. This can be leveraged further against the content of a script allowing for meaningful descriptions of the actions within the context of the performance, again making the art form of theater more accessible.

Furthermore, we see this framework as providing a potential opportunity to create training environments for professions that require extensive human interaction. A common challenge in preparing for professions such as teaching, counseling, or courtroom litigation is learning how to respond to the unexpected. While the professions generally rely on reflection to prepare for the future, it is rarely possible to implement a new tactic in a timely manner. Using AR, however, a user could download a scenario, set which characters will be human and which will be virtual, and game through the scene with projected agents and environments. Given the advances of voice recognition since the team’s initial experimentation in 2014, and the motion capture technology developed through this research, users could interact with virtual agents capable of responding in real-time to their speech and gestures. For example, a preservice teacher could then have multiple opportunities to practice managing a classroom crisis and explore a variety of possible interventions.

An additional opportunity of the ALICE Project comes from improving the methods for flying systems. As an example, it would be possible to create a system that will augment the actor’s jumping abilities. As traditional systems use external operators, these personnel must decide when to pull the actor into the air in the course of a jump. This only allows the actor to react to these pulling forces and can present potentially dangerous situations if the timing is off. While current high-end approaches to these effects utilize a push-button system in the actor’s hand to start the flying behavior, thus enabling the actor to be in control, actors can still miss the timing of these button pushes, causing jerking behavior. We see potential in the usage of the tracking system to aid in these types of efforts. The system could determine jumping behavior and the optimum point at which to start the flying motion. This would not only create an effect that would look more natural to audience members, but it would also provide greater safety for the actor.

A final potential application of this approach is to simulate other types of dangerous training situations. For example, this system could be augmented to be used as a jet-pack simulator in which the user would be able to feel the forces that they would expect to feel in flight. While the current rendering techniques utilize a fixed perspective for the audience, having this perspective change based on the user’s head-tracked position has been shown to be effective for other CAVE-like systems using Kinect devices [24]. This approach of using motion platforms has been shown to be effective for driving simulators motivating these types of endeavors [25].

4 Conclusion

Designing augmented reality into theater and live performance shares many common principles with conventional design. It requires the designer to understand the relationship between the virtual three-dimensional designed objects and the real physical environment. Artistically speaking, implementing AR technology into live performance adds an additional dramatic dimension in the storytelling environment to provide the audience an immersive theatrical experience. Technically speaking, developing an AR environment requires the creative team to work collaboratively to solve the problem of setting the virtual space in real space, setting the virtual interactivity in the physical performance progression, and finding the AR environment control and manipulation solution. Due to AR’s digital nature, the implementation still needs to be based on heavy digital content creation, through conventional 3D motion graphic applications and video game engines, similar to VR production.

Other examples of AR experiments can be found from Gorillaz’s concerts, Diesel’s Liquid Space holographic fashion show, and Royal Shakespeare Theater’s The Tempest. The common artistic and technological solution applied in the above productions are the digital projections based on either the holographic projection technology or the digital projection mapping technology on moving screens to create the virtual objects added into the real theatrical environment for the dramatic illusion. Selecting the projection as the medium for AR in theatrical performance has been the solution, while the AR viewing device, such as the Microsoft HoloLens, is not technically reliable and practically applicable at the present time; however, the AR application in meeting the requirements of the large audience and space might not be limited in those wearable viewing devices as long as the augmentation and the immersive environment are created.

In conclusion, by combining the virtual and physical worlds, the described approach enables play writing to be more innovative and imaginative as many of the limitations of creating physical sets and props can be overcome. This approach enables a new performance methodology with exciting new options for theatrical storytelling, educational training, and interactive entertainment. The work described here demonstrates the possibilities for industry and performance advancements by setting aside prevailing notions of what can and cannot be done.