Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Augmented Reality (AR) is a technology that allows interactive three-dimensional virtual imagery to be overlaid on the real world. First developed over forty years ago [1], applications of Augmented Reality have been employed in many domains such as education [2], engineering [3] and entertainment [4]. For example, mechanics are able to see virtual instructions appearing over real engines giving step by step maintenance instructions [5], and gamers can see virtual monsters appearing over real playing cards and fighting with each other when they are placed side by side [6]. Azuma provides a detailed review of current and past AR technology [7, 8].

Figure 1 shows a typical AR interface, in this case the user’s view through a head mounted display (HMD) while looking down a street. The physical world is the building in the background, and a virtual road, street lamps, and houses appear overlaid on the real world in front of it. This particular AR application lets the user enhance the landscape outside their building with the addition of a virtual road, houses and a set of street lamps that can be walked around. The virtual objects are registered to the physical world and appear at – or standards based - content, platforms or viewing applications. It is a field of technology silos and, consequently fragmented markets.

For mobile collaborative AR, the needs for standards are compounded by the fact that the content of shared interest must travel over a communications “bridge” which is, itself, established between end points between and through servers, client devices and across networks. The more interoperable the components of the end-to-end system are, the less the fixed locations in it. The Tinmith [9] AR wearable computing hardware is an example system that supports this form of AR, see Figure 2.

Fig. 1
figure 1_1

Users view of the Real World

Fig. 2
figure 2_1

Tinmith Hardware

Figure 1 shows that a major benefit of AR is the viewing of information that is location based and registered to physical objects. Basic AR systems provide information about the physical world, and let users view that information in any setting. For example, a classic use is to visualize a proposed architectural structure in the context of existing buildings or at a particular physical location. The ability to walk in and around the virtual structure lets users experience its size, shape, and feel in a first-person perspective and fosters a more emotional engagement.

Recently, mobile phones have become as powerful as the desktop computers from a decade earlier, and so mobile augmented reality has become possible. Modern smart phones combine fast CPUs with graphics hardware, large screens, high resolution cameras and sensors such as GPS, compass and gyroscopes. This makes them an ideal platform for Augmented Reality. Henrysson [10], Wagner [11] and others have shown how computer vision based AR applications can be delivered on mobile phones, while commercial systems such as Layar 1 Footnote 1, Wikitude 2 Footnote 2, and Junaio 3 Footnote 3 use GPS and compass sensor data to support outdoor AR experiences.

Phones also have powerful communication hardware, both cellular and wireless networking, and can be used for collaboration. So, for the first time, consumers have in their hands hardware that can provide a collaborative AR experience [12]. A Mobile Collaborative Augmented Reality (MCAR) application is one that allows several people to share an AR experience using their mobile devices [13]. The AR content could be shared among face to face or remote users, and at the same time (synchronous collaboration) or at different times (asynchronous collaboration).

1.1 Core Mobile AR Technology

In order to deliver a MCAR experience there are several core pieces of technology that must be used, including:

  • Mobile Processor: Central Processing Unit (CPU) for processing user input, video images and running any application simulations.

  • Graphics Hardware: Graphical Processing Unit (GPU) system for generating virtual images.

  • Camera: Camera hardware for capturing live video images, to be used for AR tracking and/or for overlaying virtual imagery onto the video images.

  • Display Hardware: Either a handheld, head mounted, or projected display used to combine virtual images with images of the real world, creating the AR view.

  • Networking: Wireless or cellular networking support that will allow the mobile device to connect to remote data sources.

  • Sensor Hardware (optional): Additional GPS, compass or gyroscopic sensors that can be used to specify the user’s position or orientation in the real world.

Using this technology and the associated software modules, the position and orientation of the user’s viewpoint can be determined, and a virtual image created and overlaid on the user’s view of the real world. As users change their viewpoint, the AR system updates their virtual world view accordingly. Thus, the basic AR process is:

  1. 1.

    Build a virtual world with a coordinate system identical to the real world.

  2. 2.

    Determine the position and orientation of the user’s viewpoint.

  3. 3.

    Place the virtual graphics camera in that position and orientation.

  4. 4.

    Render an image of the physical world on the user’s display.

  5. 5.

    Combine the virtual graphical overlay over the physical-world image.

1.2 Content of the Chapter

Although the hardware is readily available, there are a number of research and technical challenges that must be addressed before shared AR experiences are commonplace. In this chapter we provide an overview of MCAR systems with a particular focus on the history leading up the current systems, the typical technology used, and the important areas for future research. Later chapters in the book address specific topics in MCAR in more detail.

2 Mobile AR with Head Mounted Displays

The earliest mobile AR systems were based around head mounted displays rather than hand held mobile phones. Head Mounted Displays (HMDs) [14] were invented by Ivan Sutherland in the first AR system developed in 1965 [1]. Sutherland employed a physical optical system to combine the real world visual information with the virtual information. Currently the use of a digital camera to capture the visual information of the physical world allows the combination of both forms of visual information via the capabilities of modern graphics cards [15]. Using a HMD in conjunction with a head-position sensor and connected to a wearable computer, a user is able to see a large portable panoramic virtual information space surrounding them. A person can simply turn their head left, right, up, or down to reveal more information around them [16].

Figure 3 shows a conceptual image of a user within a wearable virtual information space, surrounded by pages of information. The combination of a head tracking sensor and HMD allows for the information to be presented in any direction from the user. However, a person’s normal Field-of-View (FOV) is about 200 degrees [17] but typical commercial HMDs only have a FOV of between 30-60 degrees [18]. In spite of this, previous researchers such as Feiner and Shamash[19] and Reichlen [20] have demonstrated that a HMD linked to head movement can simulate a large “virtual” display.

Fig. 3
figure 3_1

A Wearable Virtual Information Space (from[47])

A key feature of a wearable computer is the ability for a user to operate the computer while being mobile and free to move about the environment. When mobile, traditional desktop input devices such as keyboards and mice cannot be used, and so new user interfaces are required. Some currently available devices include: chord-based keyboards [21], forearm-mounted keyboards [22], track-ball and touch-pad mouse devices, gyroscopic and joystick-based mice, gesture detection of hand motions [23], vision tracking of hands [24], and voice recognition [25].

One particularly interesting control mechanism for navigating virtual information in a mobile AR interface is to use head movement. This should be intuitive, since it is how we normally explore the visual space around our bodies. The proprioceptive cues we get from muscles involved in head motion should aid navigation and object location. A head movement interface is a “direct” manipulation interface. AR use this concept by registering information in the physical world, so the user looking at a physical object can see overlaid graphical information.

The first demonstration of wearable AR system operating in an outdoor environment was the Touring Machine by Feiner et al. from Columbia University [26] (see Figure 4). This was based on a large backpack computer system with all the ­equipment attached to allow users to see virtual labels indicating the location of various buildings and features of the Columbia campus. Interaction with the system was through the use of a GPS and a head compass to control the view of the world, and by gazing at objects of interest. Further interaction with the system was provided by a tablet computer with a web-based browser interface to provide extra information. As such, the system had all the key technology components mentioned in the previous section. In the next section we describe more of the history of mobile and handheld AR systems.

Fig. 4
figure 4_1

Touring Machine Hardware and User’s View

3 Mobile AR with Handheld Displays and Mobile Phones

In the previous section we described the core mobile AR technology and how the earliest prototype wearable AR system was developed using some of this technology. Now we give an expanded history of mobile AR systems, from backpack hardware to handheld devices.

Current MCAR systems have a rich history dating back to the mid-nineties, and Feiner’s Touring Machine [26], described above. The Touring Machine was extended by Hollerer et al. for the placement of what they termed Situated Documentaries [27]. This system was able to show 3D building models overlaying the physical world, giving users the ability to see historical buildings that no longer existed on the Columbia University campus. Since that time other researchers explored the use of backpack and HMD based AR systems for outdoor gaming [28], navigation [29] and historical reconstructions [30], among other applications.

After several years of experimenting with backpack systems, handheld computers and personal digital assistants (PDA’s) became powerful enough for mobile AR. Initially these were thin client applications, such as the AR-PDA project [31], in which the PDA was used to show AR content generated on a remote PC server and streamed wireless. Then in 2003 Wagner and Schmalstieg developed the first self contained PDA AR application [32], and The Invisible Train [33] was first handheld collaborative AR application (see Figure 5). Unlike the backpack systems, handheld AR interfaces are unencumbering and ideal for lightweight social interactions.

Fig. 5
figure 5_1

The Invisible Train

As AR applications began to appear on handheld devices, researchers also explored how to use mobile phones for Augmented Reality. Just like PDAs, the first mobile phones did not have enough processing power so researchers also explored thin client approaches with projects such as AR-Phone [34]. However, by 2004 phones were capable of simple image processing both Moehring [35] and Henrysson [10] developed marker based tracking libraries. This work enabled simple AR applications to be developed which ran entirely on the phone at 7-14 frames per second. Most recently, Wagner et al. [36] and Reitmayr et al. [37] have developed markerless tracking algorithms for mobile phone based AR systems (see Figure 6).

Fig. 6
figure 6_1

Mobile Phone Markerless AR Tracking

The Touring Machine and other wearable systems used GPS and inertial compass hardware to detect the user’s position and orientation in the real world without relying on computer vision methods. The MARA project was the first that tried to provide the same functionality on a mobile phone [38]. An external sensor box was attached to the phone that contained a GPS and compass and bluetooth was used to wirelessly send the position and orientation data to the mobile phone (see Figure 7). This was then used to overlay virtual information over the live camera view of the phone. More recently a number of mobile systems have been developed that provide the same functionality, such as Layer 4 Footnote 4 and Argon 5 Footnote 5.

Fig. 7
figure 7_1

MARA Hardware and Interface

4 Collaborative AR Systems

One of the most interesting uses for Augmented Reality is for enhancing face to face and remote collaboration. Current collaborative technology, such as video ­conferencing, often creates an artificial separation between the real world and shared digital content, forcing the user to shift among a variety of spaces or modes of operation [39]. For example it is difficult with a desktop video conferencing system to share real documents or interact with on-screen 2D content while viewing the live video stream. Sellen summarizes several decades of telecommunications research by reporting that the main effect on communication is the presence of mediating technology rather than the type of technology used [40]. It is difficult for technology to provide remote participants with the same experience they would have if they were in a face to face meeting. However, Augmented Reality can blend the physical and virtual worlds and overcome this limitation.

At the same time as the development of early mobile AR systems, Schmalstieg et al. [41], Billinghurst et al. [42] and Rekimoto [43] were exploring early collaborative AR interfaces. Billinghurst et al.’s Shared Space work showed how AR can be used to seamlessly enhance face to face collaboration [44] (see Figure 8) and his AR Conferencing work showed how AR [42] could be used to create the illusion that a remote collaborator is actually present in a local workspace, building a stronger sense of presence than traditional video conferencing. Schmalstieg et al.’s Studierstube [41] software architecture was ideally suited for building collaborative and distributed AR applications. His team also developed a number of interesting prototypes of collaborative AR systems. Finally Rekimoto’s Transvision system explored how a tethered handheld display could provide shared object viewing in an AR setting [43].

Fig. 8
figure 8_1

Using the Shared Space System for Face to Face Collaborative AR

The first mobile AR collaborative system was the work of Hollerer [45] who added remote collaboration capabilities to the Touring Machine system, allowing a wearable AR user to collaborate with a remote user at a desktop computer. Piekarski and Thomas [46] also added similar remote collaboration capabilities to their Tinmith system, once again between a wearable AR user and a colleague at a desktop computer. In contrast Reitmayr and Schmalstieg [13] developed an MCAR system that allowed multiple users with wearable AR systems to collaborate together in spontaneous ways, either face to face or in remote settings, using a backpack configuration. Billinghurst et al. developed a wearable AR conferencing space in which users could be surrounded by virtual images of people they are conferencing with and hear spatialized audio streams from their locations [47]. User studies found that the spatialized audio made it significantly easier to disambiguate multiple speakers and understand what they were saying. These projects showed that the same benefits that desktop AR interfaces provided for collaboration could also extend to the mobile platform.

Most recently, MCAR applications have been deployed on handheld systems and mobile phones. Wagner et al.’s Invisible Train [33] allowed several users to use PDAs in face to face collaboration and see virtual trains running on a real train track. They could collaborate to keep the trains running for as long as possible without colliding with each other. Hakkarainen and Woodward’s “Symball” game [48] was a collaborative AR game in which each player could hit a virtual ball and play virtual table tennis with each other using a mobile phone. There was a virtual representation of the bat and ball superimposed over the real world (see Figure 9). Players could either play face to face, or remotely using internet connectivity, and a desktop player could also compete with a player on the mobile phone.

Fig. 9
figure 9_1

Symball Application

5 Current MCAR Systems

As shown in the previous section there have been a number of research prototype MCAR systems that have developed. More recently a number of more sophisticated research and commercial systems have been created. In this section we describe several sample systems in more detail.

5.1 Junaio AR Browser

Since 2009 several companies have developed mobile phone AR browser applications. These use the GPS and compass sensors in smart phones to enable AR overlay on the live camera view. Unlike stand alone mobile AR experiences, AR Browser applications provide a generic browser interface and connect back to a remote server to load geo-located points of interest (POI) which are shown using virtual cues. Applications such as Layar allow users to subscribe to channels of interest (e.g. homes for sale) to show the set of POI that are most relevant.

Junaio 6 Footnote 6 is a cross platform commercial AR browser that supports asynchronous collaboration, running on both iPhone and Android mobile phones. Like other AR browsers when users start it they can select a channel of interest and see virtual tags superimposed over the real world (see Figure 10). Users can see one of the following: an AR view, a list view of points of interest, or a map view where POIs are shown on a Google map. The AR view also shows a radar display showing where the POI is in relation to the user’s position.

Fig. 10
figure 10_1

Junaio AR View

However, unlike most other AR browsers, Junaio also allows users to add their own content. Users are able to “Tag the World” where they can add 3D models, text notes, or 2D images at their current location. For example, a user could take a picture of a party they were at and then tag their location with the picture and text annotation. This picture and annotation is saved back to the Junaio server and can be seen by others who come to the same location. In this way Junaio supports mobile AR asynchronous collaboration. Virtual annotations can be made public so that anyone can see them, or private so they are only visible to the user’s friends.

The main limitation with Junaio is that the interface for adding AR content is limited, so when a 3D model is created then the user cannot move, orient or scale it once it has been added to real world. The user is also limited to only using the predefined Junaio 3D models. However these are sufficient to add simple 3D tags to the real world, and the ability to take pictures and drop them in space is particularly useful for asynchronous collaboration.

5.2 The Hand of God

Modern command and control centers require support for temporally constrained collaborative efforts. We envision this technology to be intuitive and very straightforward to control. Picture in your mind a leader communicating to support people in the field, and they require a support person to walk to a particular position on a map. One straightforward method would be for the leader to point to a location on the map, and for a virtual representation to be shown to the field operative. This is an example of technology supporting through walls collaboration for the leader providing meaningful information to the operative in the field.

The Hand of God (HOG) system was constructed to connect indoor experts and operatives out in the field [49]. Figure 11 illustrates an indoor expert utilizing the HOG by pointing to places on a map. The indoor and outdoor users utilize a supplementary audio channel. An outdoor field worker makes use of a Tinmith wearable computer [50] (see Figure 2) and visualizes a 3D recreated virtual model of the indoor expert’s hand geo-referenced at the indicated location on the map, as depicted in Figure 11. The indoor expert is able to rapidly and naturally communicate to the outdoor field operative, and give the outdoor user a visual waypoint to navigate to, see Figure 12. Physical props may be positioned on top of the HOG table, such as placing a signpost on a geo-referenced position (see Figure 13).

Fig. 11
figure 11_1

An indoor expert employing the Hand of God interface

Fig. 12
figure 12_1

Head mounted display view seen by the outdoor user

Fig. 13
figure 13_1

Physical props as signposts for the outdoor user

5.3 AR Tennis

The AR Tennis application was designed to support face to face collaboration on an AR game with mobile phones. In this case, users could sit across the table from one another and use their real mobile phones to view a virtual tennis court superimposed over the real world between them [51]. Players could hit the ball to each other by moving their phone in front of the virtual ball (see Figure 14).

Fig. 14
figure 14_1

AR Tennis

The application was run on Nokia N-95 phones using a Symbian port of the ARToolKit tracking library [10]. Players needed to place black square patterns on the table to support the AR tracking, and Bluetooth networking between the two phones was used to exchange game state information and ball position. A simple physics engine was integrated into the application to allow the ball to bounce realistically over the net. The game also supported multimodal feed-back. When the player’s phone hit the virtual ball the sound of a ball being hit was played, and the phone vibrated to create the illusion that they were hitting a real ball.

The AR Tennis application was used to investigate how AR changed the face to face gaming experience [51]. A user study compared between people playing in an AR mode, in a graphics only mode where the user did not see video of the real world on their screen, and also in a non-face to face condition. Players were asked to ­collaborate to see how long they could keep a tennis volley going for. User’s overwhelmingly preferred the face to face AR condition because they felt that they could more easily be aware of what the other player was doing and collaborate with them. They enjoyed being able to see the person they were playing with on the phone screen at the same time as the virtual court and ball.

6 Directions for Research

Although some MCAR systems have been developed, there is still a lot of research that must be conducted before such systems become commonplace. In particular there is important research that can be done in each of the following areas:

  • Interaction Techniques

  • Scaling Up to Large Numbers of Users

  • Evaluation Methods

  • New Devices

  • New Design Methods

Investigating interaction techniques is a notable area of research for MCAR. Controlling the information in a mobile will require the creation of new user interfaces and input devices. Current technologies fall short on the requirements that these devices must be intuitive, non-intrusive, and robust. Many traditional input devices such as mice and keyboards are not suitable for mobile work outdoors, as they require a level flat surface to operate on.

The problem of registering virtual images with the user’s view of the physical world is a main focus of current AR research. However, there is little previous work in the area of user interfaces for controlling AR systems in a mobile setting. Two major issues for the development of these user interfaces are as follows: firstly, registration errors will make it difficult for a user to point at or select small details in the augmented view, and secondly, pointing and selecting at a distance are known problems in VR and AR applications, compounded by the fact the user is outdoors with less than optimal tracking of their head and hands [52] [53].

Therefore, the investigation of new user interaction techniques is required. A key element of these new user interactions is that AR systems have a varying number of coordinate systems (physical world, augmented world, body relative, and screen relative) within which the user must work. Areas of investigation requires support for operations such as selecting small details in the augmentation, pointing/selecting at a distance, information overlays, text based messaging, and telepresence. While there have been some empirical user studies of existing commercial pointing devices for wearable computers: a handheld trackball, a wrist mounted touchpad, a handheld gyroscopic mouse and the Twiddler2 mouse [54], new input devices are required.

Current mobile phone technologies provide an interesting platform to support user interaction for mobile collaborative augmented reality. While the phone can support the entire AR technology requirements, it could also provide a convenient user input device for HMD style MCAR application. The phone has buttons and touch screen, but current phones have accelerometer and gyroscopic sensors in them. These sensors allow for the support of gestures. Depth cameras are becoming popular and also ­provide an opportunity to support hand gestures in a more complete fashion.

7 Conclusions

In this chapter we have described the history and development of mobile collaborative AR and set of examples from recent MCAR systems. As can be seen, MCAR systems have progressed rapidly from heavy back pack systems to mobile phones and handheld devices. From the MCAR research and commercial systems that have been developed there are a number of important lessons that can be learned that can inform the design of future mobile collaborative AR applications. For example, it is important to design around the limitations of the technology. Current mobile phones typically have noisy GPS and compass sensors, limited processing and graphics power, and a small screen. This means that the quality of the AR experience they can provide is very different from high end PC based systems. So successful MCAR systems do not rely on accurate tracking and complex graphics, but instead focus on how the AR cues can enhance the collaboration. For example in the AR Tennis application the graphics were very basic but the game was enjoyable because it encouraged collaboration between players.

One promising direction of future research is the concept of through walls collaboration [55] that enables users out in the field at the location where decisions have to be made to work in real time with experts indoors. The users out in the field have personal knowledge and context of the current issue, while the indoor experts have access to additional reference materials, a global picture, and more advanced technology. MCAR can supply a suitable hardware and software platform for these forms of systems.