Keywords

1 Introduction

Projection-based technology is receiving increasing attention over the last years, in part because it provides effective means to overcome the inherent information-display limitations associated with small screens. The driving idea behind this technology is that we can create interaction displays that go much beyond the confines of the device itself, for example, to encompass virtually any external object present in a given a physical space.

The use of projection-based technology is not completely new. Yet, the level of interactivity with such technology is, at the moment, somewhat pedestrian. The way most applications work is similar to the idea of an ordinary projector connected to a laptop. In many applications, the image that is projected is actually a clone of the image of the device controlling it, and the user interaction is basically limited to the interaction with the device itself. This implies that the display space and the user interaction remain separate; information does not adapt to the shape of physical objects and users cannot interact with the projection itself. In the end the technology is only used to extend the areas on which content is displayed and not to pave the way to new types of collaborative interaction and shared experiences that facilitating the interaction with information that can be visualised directly on the object or subject of interest using real world interaction metaphors as we know them.

Typically, the interaction with the projection device (e.g., proximity, using tilt gestures, etc.) or on the device (e.g., tap, touch, etc.) has direct influence on the display space and not on the projection space. Examples include the use projection to facilitate social interaction on a workplace by displaying photos associated with the person in physical proximity [26], or to function as a community poster board, which shows both content that has been user created or automatically sampled from the workplace intranet [10]. More recently, it has also been used to create new ways of experiencing daily activities. SubliMotion [43] for example, is using projection mapping to provide an unparalleled techno-gastro experience that goes beyond the experience of taste. In most of the cases, the main advantage of using projections is that the projected image can easily be shared among a multiple of users. Hence, the user interaction with the projection itself is somehow limited to translation, rotation and scale, as the situation requires. Yet, there are more exciting ways of using content projections that capitalise on a new wave of projection-based devices like smartphones that have an integrated camera and embedded projector, and digital cameras that can project their photos, which due to their probability can make projection-based interaction more ubiquitous, something that in the past has been limited to fixed project-based systems setups.

In this work, we introduce new a projective augmented reality paradigm that aims mostly at supporting creativity processes. The augmented reality systems that are available today for our smart phones do not provide seamless integration between reality, user experience and information. On the one hand, they lack of a proper communication infrastructure. On the other hand, most applications require users to wear head-mounted displays or to hold up the smart phones or tablets and switch on the camera view to overlay graphical elements into the reality, which are well-suitable for solitary, single user experiences. Hence, in many cases, they are more like display or portal to the reality rather than the immersive reality itself. The interaction techniques consider the case where user have access to handheld projectors, so the interactivity between users and projectors can result in a rich design space for multi-user interaction, and ultimately path a new way in the augmentation of the surrounding environment with new scenarios for collaborative experience.

2 Motivation and Objectives

In the previous section, we described the driving idea behind projection-based technology. In this section we present our motivation and we enumerate open issues and research challenges correlated to our approach. In the next section, we survey the relevant literature, both to see if similar studies have been done, and to define the framework from which to evaluate the relevance and impact of this study. Then, in Sect. 4, we describe the proposed methodology as well as the implementation details. In Sects. 5 and 6 we present several use case scenarios as well as an evaluation of the framework. Lastly, in Sect. 7, we wrap up the results of our work with a few conclusions and future work.

Our motivation to exploit the use of projection-based technology is driven by three factors. First, in our framework called c-Space, we capitalise on the ongoing move of smart phones towards near-ubiquity to create near real-time automatic 3D reconstructions of spatio-temporal events, e.g. concerts, moving objects, etc. Hence, we seek new ways of interaction with replicas of physical objects of events, that are similar to real world interaction metaphors as we know them. Second, it is undeniable that smart phones have, aside from the lack of memory and processor power, the very small display sizes as their major bottleneck. Therefore, we want to investigate if the interactivity between users and Personal Pico Projection (PPP) technology (e.g. smartphones with integrated projectors) can result in a rich design space for multi-user interaction – one of the key side-effect of interacting with small display sizes. Lastly, we want to investigate to which point this new way of interaction can support creativity and collaborative experiences, which include but are not limited to i.e. creation of spatio-temporal annotations, combination of real objects with digital content, and content sharing and reuse. Figure 1 summarises the c-Space framework, which aims at providing a new disruptive technology that unleashes users’ creativity to create and user 4D content in a completely new way.

Fig. 1.
figure 1

Overview of the c-Space framework

The list of research challenges that were identified while creating dynamic projections with mobile setups based on time-of-flight camera (ToF camera) and PPP technology are:

  1. RC1

    How can we exploit the real world scene depth to create projections that fit in intelligent way different types of surfaces, in order to avoid the image projected to be perceived as distorted?

  2. RC2

    How can we seamless compose (i.e. how can we efficiently define projection mappings) and share new interactions that bridge digital content and real word objects, independently of the projector’s location and orientation?

  3. RC3

    How can affective computing and recommendation systems be integrated into a projective augmented system, so we can adapt the content to the emotional state or needs of the user?

  4. RC4

    What mechanisms can be put in place to foster or promote collaboration and content sharing among users?

  5. RC5

    The use of projective interfaces exposes the end-user to the risk of projecting sensitive information by mistake, e.g. phone number, contact list, etc., which open up new privacy challenges. Hence, what mechanisms can we implement to tackle the issue of sensitive information disclosure?

  6. RC6

    The invasiveness of projected content can lead to “visual pollution” or bring annoyance to other people in the vicinity. Additionally, more powered projecting devices with the potential to project from a long-range distance can be even dangerous for other users. Hence, which is the social impact of PPP technology? How can we create a normative policy that regulates the use and the power of PPP technology in certain scenarios like streets, where drivers or passengers could be temporarily blinded by the projection?

  7. RC7

    Manually setting the projection focus raises a critical barrier to mobile content projections because users have to readjust the focus every time they move. Hence, how can we reduce the effect of out-of-focus in projections when using non-laser projectors?

3 Related Work

In this section we review works that relate to our 3D projective augmented environments concept. Afterwards, we discuss in detail how our solution advances the state of the art.

We consider existent literature to be clustered into three major categories: projector-based augmented spaces, multi-user handheld projector systems, and projector tracking.

3.1 Projector-Based Augmented Spaces

Traditional techniques to augment the world with additional information require the use of head-mounted displays [1] or a portable device serving as a “magic lens” [3]. Their weakness resides at the hardware level. Their hardware is not designed for simultaneous multi-user interaction, in contrast to projection metaphors that target multi-individual experiences by augmenting objects in the user environment without hampering any existent collaboration.

In 1999, Underkoffler [42] described for the first time a system that uses projection (I/O bulb co-located projector) and video-capture techniques for distributing digitally-generated graphics throughout a physical space. Later, Hereld et al. [17] described how to build projector-based tiled display systems that incorporates cameras into the environment, to automate the calibration process. Afterwards, the authors of [32] investigated the use of steerable projector to explore content projection on arbitrary indoor surfaces. In 2003, Raskar et al. introduced a system called iLamps that creates distortion free projections on various surfaces [35]. The RFIGLamps [34] extended iLamps with the possibility to create object adaptive projections. One of the use case scenarios proposed consisted in visually identify products with the closest date of expiry.

Prototypes that target mostly interactive tabletop experiences include Play Anywhere [48], the work of Hang et al. [15] that takes advantages of projected displays to explore large-scale information, The Bonfire that uses several handheld projectors mounted on a laptop to extend the desktop experience to the tabletop [20], Map Torchlight that enables paper map content augmentation [38], and Marauder’s Light that can be used to project on a paper map locations retrieved from Google Latitude [24].

In 2005, Blasko et al. [4] investigated possible interactions with a wrist-worn projection display. A short-throw projector was used in their lab’ experimental setup to simulate the mobile projector. A few years later, Mistry et al. [27] introduces Wear Ur World, an application that relies on a portable projector, a mirror and a camera, to demonstrate that mobile projection can be integrated in daily life interactions. Fiducial markers attached to fingertips were used to improve precision and speed of the computer vision process. More recent works include SecondLight [18] that can be used to interact with projected imagery on top of real life objects and in near time; and OmniTouch [16] for interactive multi-touch applications using arbitrary surfaces by employing a wearable depth-sensing as well as a projection system.

3.2 Multi-user Handheld Projector Systems

Modern handheld projectors can produce relatively large public displays, often considered an important requirement in many multi-user interaction scenarios. The possibilities of multi-user interaction with handheld projectors has been an active research field. In 2005, Sugimoto et. al. described an experimental system that explored the concept of overlapping two projection screens to initialise file transfer between different devices [41]. In 2007, Cao et. al. presented a wide range of multi-user interaction techniques to manage virtual workspaces that relied on motion capture systems for location tracking [6, 7], e.g. they designed interaction techniques to visualise content, define content ownership, to perform content docking, and to initiate transfers.

Multi-user games have also received some attention in recent years. Hosoi et. al. introduced a multi-user game where users have the challenge of guiding a small robot by line up projected pieces of track so the robot can move around [19]. The prototype used a fixed camera that was placed above the interaction area to enable the interaction between the handheld projectors. Another example of a game that uses projection metaphors is the multi-user jigsaw game proposed by Cao et. al. where users have to pick up and place pieces of a puzzle together [8]. In this case, the interaction between multiple handheld projectors was enabled by means of a professional motion capture system.

3.3 Projector Tracking and Interaction

2D barcode-style fiducial markers have been used widely in tracking due to their robust and fast performance. A well-known issue with this type of fiducial markers is their unnatural appearance – which is not readable by users. Additionally, the integration of barcode style markers into the design of interactive systems raises also resistance due to some of their properties, e.g. fixed aesthetic and intolerance to changes in shape, material and colour.

There are numerous techniques that were developed to hide or disguise fiducial markers from the user. Park et al. used invisible inks to create markers that are visible with IR cameras [31]. Grundhofer et al. investigated the use of temporal sequencing of markers with high speed cameras and projectors [14]. In 2007, Saio et al. created custom marker patterns that are disguised to look like normal wallpaper [37]. The use of IR lasers to project structured pattern style markers was investigated in [21, 45]. Nakazato et al. used retro-reflective markers together with lights and IR cameras [28]. Other works include the projection of IR [9, 40, 44] or hybrid IR/visible light markers [22].

The use of natural marker was also proposed as a solution to overcome some of the limitations of 2D barcode-style fiducial markers, e.g. fixed aesthetic. For this reason, most work has now been placed in the development of natural marker detection techniques that can use the natural features of the object as a marker, therefore, replacing the need of incorporating structured marker patterns [5, 30]. The issue with natural markers is that normally they require a training step for each object that has to be recognised and they are computationally more expensive than structured marker detection techniques.

Sensor-based projection tracking designs were also proposed in many works. Dao et al. proposed the use of fixed positions [12]. A technique presented in [36] works by making assumptions about the user’s arm position. In 2011, Willis et al. [47] described a system that used a motion sensor input and an ultrasonic distance sensor for pointing-based games, which was used to study users’ pointing behaviour. The different version of the system investigated a camera-based approach, where a customised prototype camera+projector with infrared fiducial markers was used for tracking [46]. Additional visual-based device tracking methods include projector-based pointing interaction [2, 33], and shadow pointing to the projected image to interact [11].

Most methods described in the literature either require a pre-calibrated infrastructure to be installed in the physical environment [7] or limit the interaction between participants and the projection [39]. Additionally, most systems were designed to project on flat surfaces, therefore, ignoring the depth of the real scenes which leads to distortions. In our prototype, we use a vision-based approach to track user-defined AR setups – based on natural markers – that enable the user to interact and spontaneously change its location, as the projection automatically adjusts to changes in the projector position.

4 Methodology

In this section, we describe the methodology that was developed to solve the research challenges in Sect. 2. The explanation is provided in parallel to the description of the application workflow.

4.1 Overview

An important consideration that we must bear in mind while projecting images on a surface that is not perpendicular to the projector view is how to acquire information on scene depth – which influences the distance between two projection points. Scene depth can either be extrapolated in automatic with computer vision techniques or, alternatively, by means of hardware that can capture depth, like depth cameras. Many of these techniques either require intensive computing algorithms or the end-user to execute additional calibration steps.

The solution that we propose in this manuscript to RC1: How can we exploit the real world scene depth to create projections that fit in intelligent way different types of surfaces, in order to avoid the image projected to be perceived as distorted? is deeply interlinked with the solution that we propose to RC2: How can we seamless compose (i.e. how can we efficiently define projection mappings) and share new interactions that bridge digital content and real word objects, independently of the projector’s location and orientation?. Our methodology does not require the computation of depth matrix to guarantee that projections will not be distorted by the depth of the scene objects.

Scene depth is extracted automatically from the transformation matrix that is computed for each user movement, as well as any user-defined information on the scene. In the next sections, we describe in detail how our novel optical-flow-based tracking technique can achieve this with the following steps: first we search for distinctive invariant features in the video stream; then we use a user-friendly interface to defined where and how content is projected; afterwards we rely on the use of optical-flow-based techniques to track the user movements; and finally we integrated new ways of interacting with PPP technology.

4.2 Invariant Feature Detection

In this section we explain how to detect distinctive features in images – the first step that has to be performed in order to track the projector’ camera position.

In order to detect distinctive features in images, we need to know the role of unknown variables such as lens distortion, illumination, viewing angle, and so forth, in the image formation process [25]. For example, the difference in perspective between two frames constitutes a significant factor especially when the camera baseline is large between the two views. The process of feature matching requires the extracting of key features from images that have invariant properties for large differences in viewing angles and camera translation. Additionally, the features that are used have to be discriminative if we want the process of recognising the scene to be robust.

In this work, we use the scale invariant feature transform BRISK algorithm [23] to detect distinctive features. The BRISK algorithm provides a robustness and performance comparable to the well-know SURF algorithm, but at much less computation time [23]. A strategy to select pairs between frames can be computed according to their spatial distance. The distance between BRISK descriptors can be calculated with the Bruteforce Hamming algorithm. The correspondences between features in different frames can then be used for estimating a camera pose.

4.3 Optical-flow-based Tracking

In this subsection we explain how pairs of BRISK descriptors can be used to track features in the real world without the use of fiducial markers.

There are several actions – given our problem statement – that can be implemented in order to reduce the computation time of the feature matching step. We know a-priori the regions of interest, i.e. around control points that are used to define projection mappings, therefore, we can use that information to reduce the search space for BRISK features. Afterwards, we can compute the list of pairs. The problem of tracking a shape between two consecutive frames is considered in literature as a small-baseline tracking problem because the transformation from the image at time frame t to the image that corresponds to the time frame \(t + dt\) can be modelled with a translational model, given a small dt.

The computation of the optical flow for feature points that correspond to the control points of a user-defined shape is the core of the algorithm for a frame-to-frame feature tracking in which the computation of the translational model corresponds to the computation of the homography matrix between two different frames. Homography is a projective transformation that provides the relation between a point on the camera space and a point in the world space.

The use of the homography matrix implies that under special circumstances a point in the reference image frame relates, by a linear relation, to a point that depicts the same information in different image frame. These circumstances are valid in case of pure rotation or if the view is a planar scene. In such case, the 3\(\,\times \,\)4 matrix that represents the projective relation between a 3D point and its image on camera becomes a 3\(\,\times \,\)3 matrix.

In our case each shape is defined to cover only a planar surface. Nevertheless, our system provides support for multiple shapes which can then be used to apply content to non-planar surfaces. Thus, in our case, homography can be used, if we provide the right quantity and quality of matching point. Before we can compute the homography with the OpenCV’ function called findHomography, we first need to find the list of feature points of interest, and extract their descriptors in order to find good matches. To find good matches more accurately, we compute the homography after removing the outliers with the RANSAC algorithm [13]. We use RANSAC to remove features that are on nonplanar objects, thus maintaining the planarity condition even for those images that consist of more complex geometry than a single working plane. We can use the RANSAC because we have the constraint that shapes can only be projected on planar structures. We use the RANSAC algorithm also for imposing the epipolar constraint between different images which help us to reject false matches.

The function to compute the homography requires an input of at least four points. Otherwise, we will not be able to map the points in the first image to the corresponding points in the second image. Afterwards, we compute the inverse homography with OpenCV’s perspective Transform function to find the matrix that maps points in the reference image to the equivalent points in the destination image. Note that homography works well if the BRISK descriptors are well distributed inside the shape. Otherwise homography might result too unstable for practical applications. To avoid the propagation of errors, we do not compute the tracking between two consecutive frames. We try to use the current frame against the oldest frame possible. For that, our algorithm keeps the oldest frame for which the homography was successfully computed, the current frame.

Fig. 2.
figure 2

Feature tracking using an invariant feature algorithm

To make the user experience smoother, we implemented a threshold filter for low quality pairs, and we used the sensors of the smartphone to estimate the new pose. Our system shows a user notification if the homography cannot be computer within a few interactions. Tracking can be unsuccessful for two reasons. First, the initial surface of projection is not suitable for tracking due to the fact that we cannot extraction enough key points. Second, the camera baseline between the two views is too large or the surface is no longer visible. In this case the shape becomes invisible. Once the projection surface is again visible in the image frame the tracking restarts and the shape is drawn again in the right spot. Figure 2 provides an overview of the tracking process.

4.4 Creating and Maintaining a Virtual Scene

In this section, we describe how user-defined projection mappings are created and how they are visualised on top real world objects. User-defined projection mappings, or simply “shapes”, have the following properties: they are always defined by four points, they have graphical content associated (e.g. a video, an image, or some interactive 3D content), and they can store a user-defined depth correction.

In our application, users can drag shapes into the virtual reality scene and then map the vertices of the scene to the object they want to map – with simple drag-and-drop gestures (Fig. 3). Afterwards users can defined which content to associate to that shape and the slope of the surface – scene depth.

The physical setup consists of a smartphone that renders the content that will augmented the physical space and then sends it to the projector mapped in the projector’s perspective. We decided for this specific setup because our goal is to test the concept of interaction with portable technology that is both ubiquitous and accessible by everyone. The use of smartphones is not fundamental but is extremely useful to facilitate the process of creating and designing new augmented scenes. For example, users can use smartphones as lens during the process of creating a projection mapping, to facilitate the modelling process. An issue in our initial approach was that the field of view of the projection and the field of view of the camera attached to the projector (used to compute the pose of the projector) were not the same. To solve this issue we decided to project in each corner of the projected image an image, which we use as markers to track the field of view of the projector. In this way, we can automatically calibrate the projector and the camera attached to it. This is extremely important in order to be able to convert smartphone views into the projection view.

Fig. 3.
figure 3

Mobile editor interface

Another issue that we had to address was RC7: How can we reduce the effect of out-of-focus in projections when using non-laser projectors? This limitation raises a critical barrier to mobile content projections, since users have to readjust the focus if the distance to the projected surface changes. Hence, the whole purpose of having a fully automated interaction is lost in the case that the projector does not use a laser to keep the image focus. This problem can be solved by using a rangefinder and a closed-loop motion system consisting of a micro motor like piezoelectric SQUIGGLE and a non-contact position sensor like TRACKER. As an alternative, the out-of-focus projection blur can be reduced with image-based methods like the one proposed by Oyamada [29], which is well-suitable to reduce the image blur in non-perpendicular projections.

Our solution to RC5: What mechanisms can we implement to tackle the issue of sensitive information disclosure? is based on the fact that our system does project a clone of the mobile screen. The projector is identified and used as a different display. In this display, we do not render any user interface, as they are not needed. Only the shapes defined by the user are projected as well as any other information related to the collaborative task.

In our framework, we have also integrated in-house developed affective computing and recommendation systems - RC3: How can affective computing and recommendation systems be integrated into a projective augmented system, so we can adapt the content to the emotional state or needs of the user?. The affective computing system is capable of detecting user emotions, which are then combined with the user preferences to filter content or to change the way the user is interacting. This is especially relevant to us, as our system was originally designed for urban planning and advertising scenarios. In the same way that our mood affects the type of music we listen to, this system helps users to reach the user goals faster. That means, finding relevant/appealing products or suggesting designer’s alternatives.

To propose the RC6: How can we create a normative policy that regulates the use and the power of PPP technology in certain scenarios like streets, where drivers or passengers could be temporarily blinded by the projection? we propose a system that is based on image analysis. The system analyses the content of a frame in order to understand what kind of elements are present there. Hence, architectural elements or other things like people, and streets can be easily identified. To test our system, we defined a rule that interrupts projection if a street is detected. Figure 4 show the image analysis results, in percentage, for a given image frame.

Fig. 4.
figure 4

Image context analysis using cloud services

In the next section, we explain what we did in what concerns the RC4: What mechanisms can be put in place to foster or promote collaboration and content sharing among users?.

5 Use Cases

Although our technology is applicable to a wide range of different scenarios, we decided to describe here only three use cases: an architecture and urban planning scenario; and augmented mobile advertisement scenario; and cultural heritage tourism scenario.

5.1 Architecture and Urban Planning

It is fundamental in scenarios like architecture and urban planning to have a system for decision-making that provides an overview of information relevant to the analysis (context) together with more detailed information for the various sub-tasks of interest.

Fig. 5.
figure 5

Architecture and urban planning

Interaction with handheld projectors can be designed to effectively support this type of activities. For example, one projector can be held far from the projection area to create the low-resolution coarse-granularity, and another handheld projector can be held close to the focus region to display more detailed information, since the user can archive higher pixel densities as projection area shrinks. Hence, we would get an image resolution gradation interaction-based technique that capitalises on the distance between the projection surface and the projector itself, and a technique that would enable the visualisation of multiple information granularities. The viewing experience would be similar to that of a focus plus context display. Figure 5 shows a multi-granularity city map. The context region shows main streets only, while the focus region shows augmented urban information.

Fig. 6.
figure 6

Snapping multiple objects

The solution that we propose is more flexible than previous focus plus context solution, where the resolution and position of both focus and context displays is fixed. In the solution that we propose, users can dynamically move in the environment and manipulate the resolution of the projections.

5.2 Cultural Heritage Tourism

In the previous use case, we described an interaction technique based on direct blending of multiple views. However, we can give a step further in terms of interaction with projected content. We can use the intersection of different projections as the trigger to quickly view information that involves multiple objects. For example, we can think of an interaction were multiple objects being projected by different handheld projectors can snap to each other when close enough. When snapped together, either they change their appearance to disclose additional information or they trigger the visualisation of more information. To unsnap them, we need to keep a small distance between objects that can trigger such actions (Fig. 6).

As an example, suppose that there are two users projecting information. The first user is projecting a map and a second user is exploring the 3D model of a monument. Intersecting the projection of these two users results in the visualisation of a map with the 4D model pinpointed. Then a third user projects another an object that has location associate. The intersection with the previous projections draw a route path between the location of the user’s object and the position of the monument now snapped to the map. The application supports the projection of multiple objects per projector. Hence, the limit of this technology lays on the creativity of its users. Additionally, the linkage between objects can be used as an authentication mechanism, where data is only disclosed when two objects are projected.

A side effect of using mobile devices to process the visual information that will be projected is that, without projectors, they can work as a traditional AR tools. For example, we use the mobile application to overlap historical pictures. Figure 7 depicts a smartphone overlapping the real world with an historical image.

Fig. 7.
figure 7

Tracking system: Overlap of a historical picture using the smartphone

5.3 Augmented Mobile Advertisement

The advertisement market can also benefit from the use our technology to reach out their audiences. First, our mobile prototype allows simple authoring of “augmented” advertisement content which then can be used to generate interaction with user within the “real” scene (Fig. 8). Below we have a smartphone product on a tabletop. In Fig. 9, the smartphone is the object being tracked by the projector’s camera. In this case, we use occlusion of features as an action trigger. For example, if the user puts a hand over the natural features of the button “more info” then our system triggers the action associated with that button.

Fig. 8.
figure 8

How a product was built

Fig. 9.
figure 9

Interaction with menus based on feature occlusion

We have also tested projection-based metaphors as a new way of transferring content to a mobile device without the requirement of having connectivity. This technique is especially relevant for tourists that are often dependent on roaming connectivity, which can be expensive. In our setup, we used a projector to display an animated QR code and a smartphone to read the animation in the form of download. The transfer rate archived is not suitable for general file transfers but works well for small amounts of information like text and small images, for example, information related to a product.

6 Evaluation

In this section, evaluate the relevance and impact of this study. As part of a preliminary user study, we asked 9 individuals to experiment our prototype. First, we demonstrated to each participant the features of the system, and then suggest them to try out the techniques described in the previous section. Each interaction session between participants lasted about 30 min. During the experiment we observed how participants used the system and then we conducted individual post-study interviews.

All participants manage to grasp the basic concepts of the prototype quite fast. Additionally, the participants did not show any difficulty learning the projection-based interaction techniques that were proposed. As we expected, the feature that was reported as the most appealing is the ability to easily exchange and combine information in a shared workspace, in addition to user-friendly approach used to setup a projection mapping.

There are although some technical aspects that can be improved. First the image analysis algorithm that we use to enforce normative policies cannot be executed at a real time frame rate. In our implementation it streams an image frame to an image analysis service that returns a set of tags that describe the image. The projection is automatically blocked if the projector is pointing, for example, to a car, people, or streets. In the future, we want to restrict the projection in such way that it will never hit people in the face, but while keeping the projecting around the person. This can be easily archived with a face tracking algorithms, which are already available in OpenCV.

The computational time required for tracking operations can also be improved. At the moment, the tracking algorithm can process only 17 frames per second. The method implemented for reducing out-of-focus blur helped us to archive better results with projectors that do not support auto-focus. Yet focus and projection size needs to be calibrated manually, because the focus of the projector can only be adjusted manually. This problem can be solved with the use of a laser-based projector.

7 Conclusion

In this paper we explored new perspectives on augmented reality systems that crystallise on new concepts of 3D projective mapping and interaction between multiple co-located users using handheld projectors. Interpersonal communication and collaboration may be supported more intuitively and efficiently compared to current handheld devices. Informal user feedback indicated that our designs were promising. Our work is the first mobile authoring system for 3D projective mapping that uses computer vision tracking techniques to facilitate the design live projections.

The current mobile projection technology has some limitations in terms of the light intensity, in addition to the fact that it can only provide image focus at a particular distance. Current handheld projectors have a luminance between 5 and 100 lumens, which we believe that will increase considerably next few years – some low-cost fixed projectors can nowadays reach 2,500 lumens. This limitation implies that handheld projectors can only be used indoors or outdoors at night. For dynamic mobile projections with variable distance between the projection surface and the projector, we advise the use of laser projectors which seem to be better suited, to the projection of sharp images.

As a future work, we are interested in empirically investigating how interaction between people may evolve with the usage of handheld projectors and how the technology is used for creative purposes. Finally, we plan to extensively explore other was of interacting with handheld projectors, by integrating for example gamification strategies, which may change the way people currently think. We will also investigate improvements in transfering data with QR codes through the visualisation of animated arrays since projected spaces have on their side the advantage of using large surfaces.