Keywords

1 Introduction

In a definition of augmented reality (AR) given by Ron Azuma in 1997, he introduced AR as a variant of virtual reality (VR), allowing “the user to see the real world, with virtual objects superimposed upon or composited with the real world” [1]. An AR experience is commonly achieved through stereoscopic video/optical see-through head-mounted displays (HMDs) or mobile phones utilizing an inside-out or outside-in tracking method to appropriately register virtual objects to the physical world. Although the history of AR goes back to the 1960s [2], the technological advances in recent years have resulted in a dramatic increase in AR research and public interest and adoption [3].

While AR is the product of interdisciplinary research, human society is encountering a new wave of advancements through the convergence of further traditionally separate fields, called the Fourth Industrial Revolution, which can be characterized by a fusion of technologies that merge and tightly integrate the physical, digital, and biological spheres [4]. In this new paradigm of convergence, all the physical and digital things will become more and more intelligent and connected to each other through the Internet, and the boundary between them will blur and become seamless.

While both academia and the AR industry are experiencing dramatic technological advances, advanced artificial intelligence (AI) and ubiquitous computing empowered by the Internet of Things (IoT) have been actively merged with the AR technology, and AR researchers and practitioners have found such a convergence as the key for realizing more intelligent and interactive environments [5,6,7,8]. IoT generally involves a network of computing devices embedded in the physical environment as a form of ordinary daily objects for intelligently sensing, collecting, communicating, and even interacting with the users and the objects themselves. This can provide the basis for smart environments through collective big data analyses and context-based services, e.g., real-time analytics and automation [9,10,11,12]. AR users in such environments can perceive and understand the world more effectively and efficiently by extending their sensing abilities and intelligence through the intelligent IoT devices and achieve better user experience and performance in given contexts/tasks, e.g., solving problems or searching information [13, 14].

From a traditional standpoint, it might seem that AR and IoT have different objectives with seemingly unrelated concepts, but they can, in fact, be complementary to each other [15, 16]. In principle, a spatially registered and visually augmented content in AR offers a direct and semi-tangible interface and is, thus, easy to comprehend and highly useful, particularly for everyday and/or anywhere usage [15]. Such AR interfaces enable the users to visualize and interact with IoT-enabled smart objects and their associated data in more convenient and intuitive ways. The AR client, such as a mobile or head-mounted glasses-like device, is capable of instantly connecting to an IoT device. Through this connection, users can access/receive context-relevant object/environment-specific data, control information and associated AR datasets for the given targeted service, understand the state (or how to operate) with current datasets from the IoT product, and interact with the physical object using direct control by natural interaction [15]. The interface can be visualized and operated in real time via augmented virtual content that is connected to IoT objects in the real world [17]. Conversely, for AR, IoT as an infrastructure for pervasive “anywhere” services offers an efficient way to make AR “scalable” to the same degree by handling the necessary data management (e.g., tracking data and content) in a distributed and object-centric fashion [5]. Thus, any IoT device can be accessed locally in a seamless manner, and the scalable interface allows for location-based geographical and AR services using AR clients [15]. Additionally, context-aware AR services are made possible by tapping into the refined environment information offered by the IoT infrastructure [5].

To further promote the idea, that is, the synergistic convergence of AR and IoT, this chapter describes the concept of transreality that symbiotically connects the physical and the virtual worlds, incorporating elements achieved through the convergence of AR, AI, and UbiComp. Further, this synergy illustrates how such transreality environments equipped with AR-IoT devices can transform our activities, providing highly intelligent and intuitive interactions with the environment, such as AR in situ interfaces adopting natural communication metaphors, and embodied interaction of AR agents and avatars facilitating a bidirectional relationship between entities in the real and the virtual worlds. The overarching goals and contributions of this chapter include the following:

  • Introduce the concept of AR-IoT-based transreality empowered by the convergence of AR, AI, and UbiComp (Sect. 32.3),

  • Provide a comprehensive knowledge base through a survey of the literature, which covers previous findings and recent trends of the convergence research empowering AR-IoT-based transreality interactions and environments (Sects. 32.2 and 32.3),

  • Propose an AR-IoT framework, identify key features/benefits, and present supporting evidence on its efficacy (Sect. 32.4),

  • Describe the transreality realm of interactions in the context of in situ virtual interfaces and embodied agents and avatars (Sect. 32.5),

  • Present exemplary use cases of AR-IoT while providing insights for potential future research directions related to pervasive and physically interactive AR (Sect. 32.5).

Through the remainder of this chapter, we first provide a fundamental overview of AR and IoT, covering broader technological concepts, e.g., mixed reality (MR), AI, and ubiquitous computing, which are traditionally considered separate technology thrusts but offer synergistic benefits.

Then, we introduce a concept of transreality – where humans, virtual, and physical objects are interacting with each other dynamically and intuitively through AR and smart objects, such as IoT-embedded devices [18]. To underline the importance of and trends toward this transreality, we survey the current state of AR research and services together with IoT and AI technologies while considering their scalability and integration into a unified framework as a control/interaction interface. We discuss the issue of scalability, which relates to the number of objects, object recognition, and data management, as well as tracking techniques that can support AR objects without significant latency, which is a crucial factor for the AR system performance.

Given this concept of transreality, we present a high-level AR-IoT framework for a smart and interactive environment while addressing three aspects of data management, object-guided tracking, and the interface design, which we believe are key components for the development of an efficient AR-IoT infrastructure [19].

We then explore how virtually embodied interactions, such as facilitated by virtual avatars or agents in AR, could be used in AR-IoT-embedded transreality environments and how such interactions could influence the users’ perception and behavior in social contexts while manipulating physical or virtual objects around them. In addition, we illustrate use case scenarios that could incorporate and benefit from the AR-IoT framework and transreality environments while discussing how AR can offer an intuitive and natural method for communication with IoT objects, compared to other methods, such as using a graphical user interface (GUI) with no visual, contextual, or spatial registration.

Finally, we provide concluding remarks and a summary of this chapter while presenting insights and potential future research directions. Note that this chapter is written largely based on the authors’ prior research and other relevant research in this domain, in particular two previous publications that the authors led on and wrote [19, 20].

2 Background

In this section, we provide a fundamental overview of traditionally independent technology thrusts, including AR/VR/MR, AI, ubiquitous computing, and the IoT paradigm, before we introduce the convergence of these technologies and the concept of transreality.

2.1 Augmented, Virtual, and Mixed Reality

In the fields of computer science and human-computer interaction (HCI), VR refers to a technology-mediated simulation that can create real or artificial experiences, where people can interact with a simulated computer-generated virtual environment. While VR-related concepts and technologies have been anticipated and discussed in science fiction books since well before the 1950s, the term “virtual reality” was coined and popularized by Jaron Lanier in the 1980s at a time when the technology was becoming available to researchers and end users [21, 22]. Anticipating future advances in this field, up to the point where human users may not be able to distinguish the virtual from the real, various terms such as simulated reality, hyperreality, and synthetic reality have been introduced and defined [23]. In such forms of reality, users may not be aware of whether the outside world is simulated or not through high-fidelity simulated multisensory feedback that is indistinguishable from natural sensations and perception [24].

Unlike VR, which strives toward complete immersion of users in a virtual environment, in AR, users can experience virtual content blended with real objects and environments as if the content exists as part of the real world. AR has existed in many forms since Sutherland presented the first prototype optical see-through HMD and discussed the “Ultimate Display” in the 1960s [2, 25]. Although AR/VR-related concepts were evolving with respect to different perspectives, the broader definition of MR is traced back to a seminal paper written by Milgram and Kishino in 1994 [26], which is the first academic paper to use the term “mixed reality” in the context of computer interfaces. They defined MR as “a particular subset of VR related technologies that involve the merging of real and virtual worlds somewhere along the ‘virtuality continuum’ which connects completely real environments to completely virtual ones.” As shown in Fig. 32.1, the “reality–virtuality continuum” spans from completely real to completely virtual environments and includes AR and augmented virtuality (AV). AR denotes experiences that superimpose virtual objects on the real environment, while AV can be thought of as the other way around when superimposing real objects on the virtual environment.

Fig. 32.1
figure 1

Reality-virtuality continuum by Milgram and Kishino [26]

Later, Mann [27] placed MR in a more generalized concept of “mediated reality” including VR and other forms of “modulated” reality like diminished reality, which tries to remove objects visually from the perceptible real world (Fig. 32.2). Mann et al. [28] recently illustrated a variety of concepts related to “reality” including “eXtended Reality (XR)” that tries to include the extreme ends of the virtuality continuum, i.e., reality and virtuality, which Milgram and Koshino did not include under the concept of MR. They presented a broader term, “Multimediated Reality (All R),” which is multidimensional, multisensory, multimodal, and multidisciplinary, by building a sophisticated taxonomy that covers different continuums related to virtuality and reality in a perspective of interactive multimedia.

Fig. 32.2
figure 2

The concept of mediated reality by Mann [27]

Given the context of AR and IoT in this chapter, our focus is on AR/MR rather than immersive VR experiences. The most popular definition of AR is from Azuma in 1997 [1], where he defined AR with three characteristics: (i) it combines real and virtual objects in a real environment; (ii) it runs interactively in real time; and (iii) it registers (aligns) real and virtual object with each other (in 3D). In a 2017 paper, Azuma predicted a brighter future for AR compared to VR in commercial markets because of the potential to improve the user’s understanding of and interaction with the real world, while it could replace all other display form factors, such as smartphones and tablets, via wearable devices like head-worn AR displays [29]. Recently, the term MR has become synonymous with a highly interactive version or expansion of AR [30]. In that sense, AR/MR involves not only the interaction between users and computers – e.g., virtual entities – but also the surrounding environment in the context of interaction.

Recent developments in AR/MR technology provided powerful but affordable computing devices for compelling AR/MR experiences with an unprecedented public interest in AR/MR [3, 31]. For example, a multiuser mobile AR game, Pokémon Go, was widely adopted around the world by 25 million active users in the United States and 40 million worldwide [32], and AR/VR startups raised over $3 billion investment in 2017 [33], while major IT companies, such as Apple, Meta (Facebook), Google, and Microsoft, have been investing and developing their own AR/MR platforms and technologies.

2.2 Artificial Intelligence and Ubiquitous Computing

As a separate technological thrust, AI denotes a field concerned with intelligence demonstrated by computing machines mimicking the natural intelligence of humans or animals, which has a long research history in computer science and cognitive science from the mid-1950s [34]. The term “artificial intelligence” was first coined by John McCarthy in 1956 [34], who is considered as one of the founders and leaders of AI, together with Marvin Minsky, Allen Newell, Arthur Samuel, and Herbert Simon. Recently, the field of AI has gained public attention and experienced significant technological achievements, thanks to novel computing devices, the increasing amounts of available data, and advanced data processing techniques, such as deep learning [35]. AI has become an essential part of the IT industry, and consumers are gaining more and more access to AI when performing tasks in their ordinary lives [36].

Ubiquitous computing (UbiComp), and the related areas of pervasive computing or calm technology [37], is a related technological concept that describes the notion that all computing occurs anytime and anywhere so that users do not distinctly realize that it is happening. Mark Weiser, who is widely considered to be the father of UbiComp, claimed that “the most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” in his pioneering article, The Computer for the 21st Century in 1991 [38]. For example, a door could intelligently recognize people who have authority to enter the room and open automatically for them, and a refrigerator could identify missing food items in it and issue delivery orders to fill it up.

While AI and UbiComp relate to each other complementarily, the technologies tend to have fundamentally different intentions. For example, UbiComp envisions computer technology as an invisible tool that is available throughout the physical environment while disappearing from the user’s consciousness, but AI is more keen to turn computer technology into social/intelligent agents [39]. Nowadays, these novel concepts of technologies, which the research pioneers introduced and proposed decades ago, have come true and become absorbed in our daily lives more and more through various technological advances and realizations. Commercialized intelligent virtual agents, IoT, and the concept of smart connected environments, which we describe in the rest of this section, are some of the examples/concepts that can benefit from the convergence with AR technology.

2.2.1 Intelligent Virtual Agents

In AI literature, computational intelligent agents are defined as any entity or device that can autonomously perceive its environment or context through observations using sensors and take actions to maximize its chance of successfully achieving its goals incorporating any physical/virtual actuators [40].

Not surprisingly, intelligent agents have been an important component for UbiComp systems in different forms, such as for system-level decision-making and natural user interfaces like chatbots and virtual assistants [41]. A chatbot, with which the users have conversations through text or text-to-speech, is a software application popularly used in online chat services. Although the concept of chatbot traces back to the 1950s when Alan Turing proposed the Turing test as a criterion of intelligence, the term “ChatterBot” was originally coined by Michael Mauldin in 1994 [42], and since then, various chatbot systems have been proposed and examined in terms of the user’s perception and the benefits of the systems [43, 44].

The concept of intelligent virtual agents (IVAs) has become prevalent to the public thanks to the recent commercialized digital assistant systems, such as Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana [45]. The products are embedded in various types of computing devices ranging from desktops to smartphones, even including cars and home appliances, and use sophisticated microphones and signal processing to capture human speech in various places, e.g., one or more rooms of one’s house. In this way, users can interact with IVAs in natural ways, e.g., verbal conversations, and the IVAs can perform a variety of tasks, such as playing music, answering basic questions, checking on sports scores or weather, and more via Internet-based services, while supporting human users [46]. Products like the Amazon Echo Show further add a screen for displaying basic content such as photos, videos, and weather forecasts.

Many IVA researchers focus on how to develop more effective agent systems and understanding of human perception of and behaviors with the agent systems, compared to the conventional interaction methods, e.g., improving satisfaction [47] and task performance [48]. Recently, regarding security and safety issues, research on reliability or trust in IVA systems have gained attention from both researchers and the public and become an important research topic [49].

2.2.2 Internet of Things

The term Internet of Things (IoT) was first introduced by Kevin Ashton in 1999, and 10 years later, he reiterated his vision of IoT through the addition of various sensors to computers and things to collect information and understand their environment without the need for human input [50]. Over the years, increasing interest in the IoT paradigm resulted in a multitude of industrial and research efforts, and many researchers aimed at capturing its different definitions, applications, implementations, and challenges [51,52,53,54]. For instance, in a highly cited review of IoT, Atzori et al. [51] survey the various definitions and the different perspectives adopted in defining this paradigm, such as “things oriented” or “Internet oriented” points of view. From the “things-oriented” perspective, although the conception of IoT from Auto-ID Labs [55] was initially only considering radiofrequency identification (RFID) tags as things, the concept of things soon evolved to include a more broader definition, such as the vision of the International Telecommunication Union that said, “from anytime, any place connectivity for anyone, we will now have connectivity for anything” [56]. Other descriptions, such as the one given by the European Commission, emphasized the intelligent nature of this connectivity by saying, “Things having identities and virtual personalities operating in smart spaces using intelligent interfaces to connect and communicate within social, environmental, and user contexts” [57]. On the other side, the “Internet-oriented” point of view, such as the Internet Protocol for Smart Objects (IPSO) alliance [58], focuses mostly on the networking means for connection of smart things. More recently, Rayes and Salam proposed their definition of IoT, also including the role of people as Internet users, “IoT is the network of things, with device identification, embedded intelligence, and sensing and acting capabilities, connecting people and things over the Internet” [59].

Overall, with the increased ubiquity of IoT devices, advanced AI and IVA technologies in pervasive environments can enable not only more sophisticated interpretations of and semantic understanding in the given context but also appropriate, secure, and useful actions performed by pervasive computing modules, such as IoT devices. Deep learning solutions for better data extraction and enhanced security are just a few examples [60, 61]. Reportedly, the total number of connected IoT units will reach 83 billion by 2024, rising from 35 billion in 2020: a growth of 130% over the next four years [62]. Manufacturers in consumer industry, e.g., Amazon and other companies, networked their IVA devices to IoT and related smart home appliances and found an important application field for IVAs, resulting in a novel research thrust and mutually beneficial overlap between various research fields. Many of the research topics pursued in the IoT field, such as privacy [63, 64], the relationship between edge and cloud processing [65], and network traffic optimization [66], will need to be reevaluated when IoT is deployed in the context of AR and IVA. Furthermore, some IoT applications, such as smart healthcare [67,68,69], can benefit from the addition of AR and IVA techniques.

2.2.3 Smart Connected Environments

The potential of the aforementioned AI and UbiComp, particularly IVA and IoT technologies, and their products further extend to home automation and more general interactions in smart connected environments with the increasing present smart devices [70, 71]. The term smart implies the acquisition of context information and the automated reaction to the context [39, 72, 73]. The concepts of smart objects and environments trace back to the late 1960s [73].

Smart environments are a space or a world where various sensors and computers are integrated with everyday objects and connected to each other through networks as an extension of UbiComp [74]. Smart objects normally have a physical representation in the real world [75, 76], while virtual entities could be designed to be smart in the digital world, mimicking the characteristics of physical objects, such as appearance and behavior [72]. Related research mostly deals with the smart objects’ interactions that take place as reactions to a context, which can be defined by other objects, the environments, and the users, for example, sensing human behaviors or activities [77], effective customized interfaces [78], security [79], and efficient communication techniques among the devices and the framework [80] in the smart environment.

The term “smart” in technology and the connected environments became popular together with the emergence of IoT because they are fundamentally complementary in research and for practical realizations [76, 81]. However, while UbiComp and IoT focus more on the tools that integrate in the physical world disappearing from consciousness through adaptation to the contexts, smart connected environments extend this understanding to intelligent and social interaction covering the concept of AI and agency. In other words, the IoT and agents establish the framework for smart connected environments with smart acting objects through connectivity [39].

When it comes to smart connected environments, smart homes are an important direction and driving force for research [70, 77, 79, 80, 82]. However, the scope of smart environments is growing beyond the scale of home environments to include urban areas and cities [83]. For example, the paradigm of Industrial 4.0 entails the trend toward automation and data exchange in manufacturing technologies and processes [4] and factories incorporating smart machines that are augmented with wireless connectivity, sensors, and actuators, hence the term smart factories [84]. Information about buildings can be continuously monitored through various sensors embedded in smart buildings, such as building operation and maintenance requirements [85]. Smart cities equipped with IoT devices and connected networks can help us find sustainable solutions to the issues that we encounter while the population is growing, e.g., effective and efficient city management and public community services [86]. In such smart environments where all smart objects are embedded and we become oblivious to their existence, AR and virtual avatars/agents can be an effective interface to interact with the user’s local or even remotely connected environments.

3 The Augmented Reality Internet of Things in Transreality

In the previous section, we covered different technological concepts, which traditionally have developed separate research thrusts – broadly AR, AI, and UbiComp. Here, we survey previous research focusing on the convergence of the technologies in depth and introduce the concept of transreality, where the synergistic benefits of AR and IoT technologies, empowered by this convergence research, are emphasized in the blended virtual and real world.

The National Science Foundation emphasized the importance of convergence of different fields of research, as no discipline alone can resolve the complex challenges facing the world today [87]. They also differentiated between “convergence research” and various types of multidisciplinary research, as convergence of research fields is captured from the earliest to the last stages of the research process facilitated by teams formed upon intellectual diversity. Most importantly, “convergence research” not only works toward solving the complex problems through integration of knowledge and technology but contributes to the creation of novel research methodologies and avenues [88], which in our case is focused on the nexus of AR, AI, and UbiComp, and more specifically the convergence of AR and IoT.

3.1 Convergence of Augmented Reality with Artificial Intelligence and Ubiquitous Computing

Over the years, continuous research in the three fields of AR, AI, and UbiComp has led to individual growth in knowledge and advances in technology within each of the three fields. However, in line with the visions of “convergence research,” a trend of converging knowledge and technology is growing among the three fields, facilitating the development of novel research areas and questions. In this section, we present some of the previous research focused on the convergence between the three fields of AR, UbiComp, and AI.

3.1.1 Artificial Intelligence and Ubiquitous Computing

As addressed earlier in Sect. 32.2.2, AI and UbiComp are relatively close to each other and complementary in terms of the research goals and directions. The intersection of AI, UbiComp, and communication was first captured in the term ambient intelligence by Eli Zelkha and Simon Birrell in 1998 [73, 76]. The main feature of ambient intelligence was to be human-centric, described by Ducatel et al. [89] as, “it is aware of the specific characteristics of human presence and personalities, takes care of needs and is capable of responding intelligently to spoken or gestured indications of desire, and even can engage in intelligent dialogue.” This facilitation of the user is achieved through observing and understanding the environment in a non-intrusive, transparent, and ethically acceptable way [90]. Ramos et al. [91] emphasized the importance of AI in the ambient intelligence vision, dividing the ambient intelligence system into two parts: the operational layer and the intelligent layer. While the operational layer covers the technologies that control and manage the data and actions, such as hardware sensors and actuators, communications, computer graphics, and ubiquitous computing, the intelligent layer includes a multitude of core AI, such as natural language processing, computer vision, incompleteness, and uncertainty, to enable use in a wide range of environments and deal with unprecedented events that might arise.

Over the years, various research efforts were conducted to realize the ambient intelligence vision and identify the important challenges of the field to facilitate various applications. One such challenge is context awareness of the system, which requires a full understanding of the environment and the user’s behavior and activities [92, 93]. Plotz et al. [92] pointed out the limited generalizability of activity recognition approaches necessary for context-awareness which rely on domain knowledge. They presented a domain-independent feature-learning framework, which outperformed classical heuristic approaches tested on four different datasets. Doctor et al. [94] proposed a type-2 fuzzy intelligent agent that would learn from the user through long-term interaction in order to perform on their behalf. In a 5-day experiment, they tested the capabilities of their agent using an intelligent dormitory called “iDorm” equipped with various sensors and actuators for sensing and affecting different aspects of the environment, such as light and temperature. They found that their approach was capable of dealing with environmental and personal uncertainty in a non-intrusive manner. Patterson et al. [95] designed a cognition assistant system for patients, mainly with Alzheimer’s disease, for spatial and daily tasks. Their presented system was based on the detection of location, state, and activity. For instance, patients could be reminded of the required sequence of steps to complete a daily task based on their location and their motion.

Recently, UbiComp technology and environments are becoming more advanced and complex, presenting new challenges that require more robust solutions. Georgievski and Aiello pointed out the importance of AI and planning methods as a solution to continuously evolving UbiComp environments [96]. They surveyed 53 papers focused on AI planning within the UbiComp field and develop a framework classifying the literature and capturing the main elements in three dimensions of environments, planning, and interpretation. Based on their findings, they presented opportunities for future research, such as focusing on user preference and the requirement of more detailed spatial information for specific scenarios, e.g., recognizing human posture during emergency scenarios.

UbiComp’s invisible nature and their increasing appearance in our daily lives have brought about new challenges in terms of trust in the system and users’ sense of security [97]. To address this concern, D’Angelo et al. [97] introduced a trust-based architecture, where entities within the network make trust decisions for other entities using soft computing and data mining techniques. Although their model was capable of learning malicious tactics, they emphasized the importance of more research in this area, considering possible attacks on the network.

In recent years, fueled by public interest, a large body of research at the intersection of AI and UbiComp focused on IVAs, which are becoming more capable in understanding and communicating with humans verbally and nonverbally. Developing one of the earlier voice-based home assistant prototypes, Soda et al. [98] noted that voice-only interactions can be less informative and, borrowing from IVA literature, integrated different embodied virtual agents in their system. Their findings indicate that the presence of the embodied agents increases users’ willingness to interact with the system and enhance their experience by giving feedback about the users’ command. With the increasing popularity of intelligent virtual assistants, such as Amazon Alexa, Google Assistant, Apple Siri, and Microsoft Cortana, many researchers conducted comparative studies between these consumer products. For instance, López et al. [99] compared the usability of the four intelligent assistants mentioned above in terms of the quality of services they provide. They pointed out opportunities for more studies to understand the usability of such devices in areas such as counseling or working in tandem with robots. To mitigate factors such as social isolation that can negatively influence the lives of older adults, Reis et al. [100, 101] proposed and tested a model that would facilitate older adults by maintaining their social connections using consumer intelligent assistants. Druga et al. [102] investigated the impact of virtual/robotic intelligent agents, such as Amazon’s Alexa and Anki’s Cozmo on children’s engagement and perceptions of the agent. Various ideas arose from their study, such as the importance of agent’s mobility, having facial expressions, and mirroring the child’s interaction style. Knote et al. [103] conducted a systematic literature review to capture the theories and principles used in designing smart personal assistants and their application areas. Their findings resulted in five main principles: context awareness, self-evolution, multimodality, anthropomorphism, and platform integration and extensibility. Austerjost et al. [104] pointed out the opportunity of extending the use of smart assistants to spaces other than homes, such as laboratories. They designed their voice-based agent, which was capable of interfacing with laboratory equipment and facilitating tasks, such as modifying the settings of a device or reading out values.

In recent years, researchers have utilized simulators to better design and test the features of smart home environments, where AI and UbiComp technologies are incorporated and merged; however, certain limitations still exist, such as unavailability of varied and realistic data from sensors. Lee and Cho et al. [105] and Helal et al. [106] designed simulators equipped with virtual sensors acting as real sensors, IVAs in the environment behaving as human occupants, and a configurable space with a 3D GUI.

IVAs that are designed to interface with IoT devices face many of the security concerns that exist in the basic concepts of UbiComp technologies, such as continuous and invisible monitoring of the user for the purpose of facilitating their needs. Chung and Igora et al. [107] analyzed the potential security vulnerabilities of such IVAs as a means to identify areas in need of security provision. Separately, information from many IVAs can potentially be used for crime investigation. Chung and Park et al. [108] developed a tool for digital forensics called CIFT aimed at understanding the use of Amazon Alexa in this application domain. To account for potential issues that may arise with the dominance of one type of IVA over the others, such as privacy and interoperability, Campagna et al. [109] developed an open IVA called Almond equipped with features and structures to address these issues.

Overall, past research in the intersection of AI and UbiComp covered a wide range of topics, from the development of ambient intelligence prototypes to the development of AI algorithms to better provide for the wide range of scenarios involving ubiquitous computers in areas such as action recognition, system validation, trust, and IVAs. The further adoption of AR interfaces in ambient intelligence spaces will bring about novel interactions and pose new questions for both the industry and academia.

3.1.2 Augmented Reality and Artificial Intelligence

AR, or more broadly MR, environments are capable of creating a multimodal experience for users with visual and auditory stimuli as the most common sources of information [20]. The inclusion of AI in AR experiences can foster improved spatial scalability and context awareness of AR applications.

One of the highly researched topics in this convergence area has been the development of embodied IVAs. To understand the differences in the realization of these embodied IVAs in AR, Holz et al. [110] suggested the terminology of mixed reality agents (MiRA) and presented a taxonomy based on the three dimensions of agency, corporal presence, and interactive capacity. One of the earliest examples of AR agents was first presented at 1993 SIGGRAPH and was called The ALIVE system [111]. The ALIVE system’s semi-intelligent agent was comprised of various features, such as internal needs and activity hierarchy, and responded to the user detected by a vision-based approach. Another early example of AR agents was Anabuki et al.’s Welbo [112], which was embodied as a robot and helped users design an AR living room. In their work, they emphasized the value of such agents’ abilities in interacting with both virtual and physical entities. Also, to the best of our knowledge, the first academic paper that introduced an interactive virtual human in AR is Balcisoy and Thalmann’s application in real-time interactive drama with real and virtual actors in 1997 [113].

Although an AR agent’s capability to interact with physical entities could enhance a user’s experience, it is a technologically difficult task as it requires very precise tracking and semantic understanding of the physical environment [114]. In this area, Barakonyi and Psik et al. [115] presented an animated agent framework that facilitated interactions in AR through sentient computing by forming an understanding and being influenced by its physical surrounding. They discussed specific aspects that are applied to IVAs interacting in AR, such as the ability to both respond and influence the physical environment. For instance, in a multiplayer game called the MonkeyBridge developed by Barakonyi and Weilguny et al. [116], embodied agents are able to observe the changes in the physical and virtual pieces in the game and accordingly make decisions on how to reach the game’s target. Similarly, to facilitate IVA’s interaction within physical environments that are dynamically varying, Checklov et al. [117] developed an approach using the simultaneous localization and mapping framework, allowing the agent to detect planar surfaces on novel objects. More recently, Azad et al. [118] described an MR playground, tested in VR, that would benefit from procedural content generation, enriching the game and taking advantage of the real objects and surfaces in the user’s environment, such as a Mario style game where the furniture acts as a track for the player’s avatar.

One of the areas where IVAs, and more generally AI, have facilitated AR applications is education. Hantono et al. [119] reviewed the literature on the use of augmented reality agents in the education domain from 2005 to 2015. In their work, they identified some of the main challenges of the field, such as personalized customization of the information for each user, and designing mechanisms to reduce task load through the provision of appropriate instructions. In one of the first examples of integration of IVAs for educational use, Wagner et al. [120] designed an art history game and compared the player’s experience through different devices. Their findings indicated that the AR experience with a virtual character was rated as most enjoyable compared to other game content conditions: text-only, text and audio, 2D image, and non-AR 3D character. Outside of the context of embodied AR agents, Holstein et al. [121] designed a teacher awareness tool using AR glasses called Lumilo. Lumilo is paired with an intelligent tutoring system, through which the teacher can view information about the state of each student and the whole class, such as common errors made when solving problems.

Entertainment is another application area where the convergence of AI and AR has been explored and researched. Cavazza et al. [122] created an interactive storytelling prototype, where the virtual character’s role and the storyline evolve using hierarchical task networks, which is affected by inputs from the user. Dow et al. [123] also presented their interactive drama prototype called Facade, where players interacted with IVAs in the story through either AR or two desktop-based approaches. Their findings indicated that as expected, the AR interface increased users’ sense of presence; however, this effect did not necessarily lead to higher user engagement, which was potentially caused by the type of interactive scenario and a lack of distance for players to easily engage in the game.

Continuous advances in the quality and usability of AR devices and tracking and AI algorithms show promise for the inclusion of intelligent AR interfaces and agents for facilitating users in a more robust and ubiquitous manner.

3.1.3 Augmented Reality and Ubiquitous Computing

The adoption of UbiComp with pervasive physical components opened new research ideas in many AR research areas, such as AR applications, user interfaces, and tracking. For example, the convergence of AR and UbiComp fostered novel approaches for creating multimodal AR user experiences with their environment, such as the ability to interact with physical and virtual objects through situated AR interfaces. In particular, the entertainment domain has been a very popular topic within the AR and UbiComp convergence field by bringing entertainment to users’ physical space and taking advantage of elements within that space. Cheok et al. [124] designed an AR game called Touch-Space, and included elements of ubiquitous, tangible, and social computing in their game. For instance, a player’s location in the physical space could trigger the embedded information about potential mines and treasures affecting the user’s score. Montola [125] conducted a survey of pervasive AR games explaining how such games are distributed through time, space, and social structures. He later describes four game categories based on how the environment is utilized in the game, such as local games that are developed for the context of a certain location like tourist games. Vogiazou et al. [126] implemented an AR tag game called citiTag, where players’ locations were transmitted over the network, allowing them to tag players from other teams or get tagged themselves. Their experimental findings reflected players’ engagement in the AR game as they borrowed elements from real-world experiences, such as bending the rules in creative ways or wanting to spend more time within the game. Interestingly, players got involved in testing the limits of technology, for instance, by trying to understand the boundaries of the tracking system’s accuracy and using it to their benefit. This is in line with Chalmers and MacColl’s discussions of seamful design and the value of taking advantage of the seams that exist in UbiComp environments, such as the limitations in a device’s sensing abilities, suggesting that these seams leave room to later be utilized for different purposes [127].

Over the years, AR and UbiComp researchers have developed multiple user interfaces aimed at creating a richer and more interactive experience. Lee et al. [128] pointed out means for enriching AR interfaces with ubiquitous computers but also noted that there still exists a lack of haptic feedback from virtual objects. To enhance such interactions, they developed a system that takes advantage of mediated vibrotactile feedback on the user’s hands. Han et al. [129] presented an AR haptics interface called the HydroRing, taking into account the importance of ubiquitous interaction with the environment. Using HydroRing, one can sense vibration, pressure, and temperature from various physical and virtual objects, such as the unseen wiring behind a wall or menu items in an AR experience.

Tracking is another area of research that is necessary for creating a continuous and accurate placement of virtual entities in the physical world. Newman et al. [130] pointed out the current tracking limitations of ubiquitous AR approaches since they are mostly developed to be application dependent with limited generalizability. To address this limitation, they introduced their tracking architecture called the UBITRACK aimed at covering the needs of different applications within their proposed Milgram-Weiser continuum – considering the spectrum of real-virtual environments and monolithic and ubiquitous computing approaches. Singh and Mantri reviewed the different tracking methods used for AR applications and discussed their limitations, proposing a ubiquitous hybrid tracking approach taking advantage of both vision-based and sensor-based methods [131].

Similar to the convergence of UbiComp and AI, advances in IoT technology and its popularity resulted in a sudden increase of research and new interaction metaphors for AR interfaces [20]. The majority of the research in the AR and IoT convergence area viewed and utilized AR as an interface to communicate with IoT devices and in some cases comparing them with previously used traditional interaction approaches. This point of view was reflected by Gimenez and Pous [7] as well, noting, “AR has recently been touted as one of the ideal interfaces for IoT.” For instance, Garcia-Macias et al. [132] described the idea of sentient visors for the IoT paradigm, as it comprises of both virtual and physical things, allowing identification of smart objects, relaying information about their services, and providing means to interact with them. They captured this idea in their mobile-based prototype called the UbiVisor and illustrated its application in a scenario where a plant pot was equipped with humidity and temperature sensors. Users could access the sensor information and receive situated notifications for watering the plant by pointing their mobile devices at it. Similarly for mobile-based interactions, Wirtz et al. [133] noted that issues, such as the need to communicate with smart objects through mediating apps, restricts the ubiquity of such interactions. They addressed both Internet connectivity and interaction limitations through an AR prototype, where the interaction paradigms were provided directly by each smart object through situated graphical user interfaces. In a similar approach, Heun et al. [134] pointed out that 2D remote interfaces can add unwanted complexity to the user experience in many scenarios for interactions with smart objects. They proposed The Reality Editor, allowing users to control smart objects through situated UIs instead.

Several researchers conducted studies to understand the impact of 2D remote interfaces and their situated counterparts in AR environments [15, 135]. Liu et al. [135] emphasized the importance of receiving feedback from the situated UI, designing three task difficulty levels, and comparing them through four different interfaces: text, 2D interface, situated AR overlay without feedback, and situated AR overlay with feedback. Their findings indicated the importance of feedback in interface design as participants performed significantly better in the situated AR overlay with feedback condition compared to others. Similarly, Jo et al. [15] compared situated interactions with traditional 2D interfaces, for instance, turning the lights on and off. Interactions with the situated UI had several benefits such as ease of use, naturalness, and speed, although there were drawbacks such as fatigue, which can be explained by the novelty of the interaction. In search for more universal interfaces for smart and actuated devices, Kasahara et al. [17] designed the exTouch system, where by pointing their mobile devices at physical actuated objects, users were able to manipulate them through manipulation of the virtual counterpart.

Other researchers utilized AR glasses instead of mobile AR devices to facilitate the IoT interaction space. In a survey of smart glasses, Lee and Hui [136] described and categorized the capability of smart glasses in facilitating different modes of interaction, such as on-device touch or hands-free input. For instance, Zhang et al. [137] and Kollee et al. [138] presented head-based and gesture-based methods utilized through HMDs to interact with smart objects.

While smart home environments are often considered for the applications in the convergence of AR and IoT, the benefits of the convergence have been actively explored in smart factory environments as well. Hao and Helo et al. [139] proposed using AR as a means to enhance the interaction between human and machine in smart manufacturing environments. They presented a device maintenance scenario where operators can benefit from situated support using smart glasses, such as viewing the device’s datasheets.

Interestingly, this convergence area’s main focus has been on the utilization of AR interaction metaphors for IoT devices, and less attention has been given to the utilization of IoT for enhancement of AR experiences. Although in various AR projects, researchers have developed Internet-connected prototypes for input/output purposes [140, 141], only a few have leveraged IoT devices and interaction schemes in the realization of their AR experiences [15, 142, 143]. This limitation might be because most AR experiences are envisioned and developed to be ego-centric with the necessary devices (i.e., sensors and displays) collocated with the user and not distributed in the environment. Also, the perception of potential performance complications by utilizing IoT standards might be another contributing factor in the slow adoption of IoT devices in the service of AR experiences [144]. However, in recent years, more research avenues have opened up for realizing ubiquitous and distributed AR experiences fueled by the increase in popularity and use of IoT devices among the general public.

We predict that over the next years, further advances in AR user interfaces and technologies toward everyday use cases in a smart home environment will give rise to the standardized integration of IoT devices into AR frameworks and the development of new IoT devices specifically for the purpose of enhancing AR experiences with improved sensing and multimodal sensations, e.g., driven by the gaming industry.

3.2 Transreality and The Augmented Reality Internet of Things

We covered current convergence research with different combinations of AR, AI, and UbiComp technologies by describing the concepts and summarizing some prior literature in the previous section. Here, we introduce the concept of transreality, where both physical and virtual objects can sense and understand the user’s or other peripheral activities in the environment and dynamically and seamlessly interact with each other as a whole integration of the three research thrusts, while describing other related concepts.

The term transreality often appears in gaming context, e.g., transreality games, which is a type/mode of video games that combines playing the game in a virtual environment with game-related physical experience in the real world [145]. Beyond the gaming context, such transreality could take advantage of various pervasive/ubiquitous, mobile, location-based, and AR/VR/MR technologies to extend the implications to our daily lives aiming at more effective and efficient human-computer interactions. Martin and LaViola [18] introduced a Transreality Interaction Platform (TrIP) to “enable service ecosystems which situate virtual objects alongside virtualized physical objects and allow for novel ad-hoc interactions between humans, virtual, and physical objects in a transreality environment” and presented a proof-of-concept implementation while describing the system architecture, data model, and query language.

Similar to transreality, such concepts combining the real and virtual worlds in the domain of UbiComp and AR/MR/VR have been proposed and researched for more than a decade [146,147,148]. For example, Kim et al. defined ubiquitous virtual reality (U-VR) as “a concept of creating ubiquitous VR environments which make VR pervasive into our daily lives and ubiquitous by allowing VR to meet a new infrastructure, i.e. UbiComp” [146].

Paradiso and Landay also presented a concept of cross-reality – ubiquitous MR environment that comes from the fusion of technologies, such as ubiquitous sensor/actuator networks and shared online virtual worlds [147]. Following this concept of real-virtual convergence (i.e., cross/dual reality), Lifton et al. [148, 149] proposed several prototypes bridging the physical and virtual worlds through networked sensors and actuators. One example is the ShadowLab, inspired by Second Life [150], where the data/information from 35 sensors distributed in a floor of the Media Lab (MIT) were used to visualize features, such as the amount of current drawn from outlets and activity levels detected (i.e., level of sound, motion, and vibration) in the virtual environment.

Mirror world, which is originally introduced in David Gelernter’s 1991 book Mirror Worlds [151], is another yet similar paradigm described by Ricci et al. [152] to capture the convergence of UbiComp, AI, and AR. In this paradigm, the new interaction space provided by AR and IoT can facilitate operations outside of the limited environments, such as smart rooms, to operations in much larger scales, for example, smart cities and communities. In the larger spaces supported by this paradigm, objects share the counterparts in the real or the mirror world, capable of sensing information from either world and be actuated by the inhabitants of those worlds (i.e., humans or agents).

Another related concept is hybrid reality that Seo et al. described in a smart home context [153]. They particularly employed the hybrid reality to overcome the difficulties in the evaluation of smart homes and user experience, such as differences in the layout of smart homes, and the number of smart appliances. In the paradigm, real objects are superimposed with virtual ones, and the virtual environment allows for adding new sensors and actuators. This approach provides opportunities to both developers and potential residents of such spaces to design the features of the space based on the needs of the residents and resolve any potential issues before building the physical space.

The concept of Digital Twins is also one of the realizations in this transreality trend. Although the concept was anticipated before, e.g., in the book Mirror Worlds [151], the term “digital twin” became widely acknowledged in publications by Michael Grieves who applied the concept in industry manufacturing [154]. The digital twin is basically an integration/connection between the physical product (or living/nonliving entity) and its digital/virtual replica in the virtual world. Through the interactions with the virtual replica, users can have more flexibility to understand and control the physical world beyond the spatial and temporal limitations in the real world. The concept, which aims to create the exact same virtual twin of the physical part, is compared to other cross-reality concepts that generally focus on the mere synchronization of the physical world and the digital representation, such as an abstraction of some aspects of the physical world. Glaessgen and Stargel [155] pointed out the difficulty of appropriate maintenance and sustainable management of complex vehicle systems in NASA and Air Force and proposed the digital twin paradigm, which is a high-fidelity simulation space containing a digital replica of the real object equipped with all the necessary features, such as the physical model, sensor information, and maintenance data captured from the real twin, to resolve these challenges. Tao et al. [156] also emphasized the importance of the seamless integration of the physical and virtual worlds to realize more effective digital twin environments and introduced a digital twin framework using VR/AR technologies.

Despite the variations of the concepts addressed above, the essence of their goals and approaches is a space supporting ubiquitous communication between continuously connected physical and virtual entities in an intelligent manner, even for those focused on specific domains. For the realization of such transreality environments, where AR, UbiComp, and AI are synergistically converging, the integration of AR and IoT is essential and invaluable. Applin and Fischer [157] pointed out that the convergence of IoT and AR can better allow people to make changes to the world around them and discussed cultural implications of this blended physical-virtual reality technologies in an anthropological perspective.

Given this timely interest and need, we will describe a generalizable AR-IoT framework and interaction spaces in the following sections while emphasizing the importance for achieving more effective and efficient human-computer interactions in the future of transreality.

4 AR-IoT Framework and Interaction Design

Several AR-IoT frameworks have been proposed previously considering the general-purpose AR-IoT services [6, 158] and specific scenarios, such as smart home environment [80, 153]. Here, we present an AR-IoT framework that suggests three key aspects briefly mentioned in Sect. 32.1, i.e., (1) distributed and object-centric AR data management, (2) AR-IoT object-guided tracking, and (3) context-based AR interaction and content interoperability, and describe the subsequent benefits of the convergence of AR and IoT, which could introduce novel smart and physically interactive AR services. An example of such a high-level AR-IoT framework that blends in smart and interactive mixed environments is shown in Fig. 32.3. Figure 32.4 presents a potential architecture design, which describes five steps of data flow in the framework: (1) AR-IoT service providers and manufacturers upload the data for the AR-IoT products to the cloud server, which can be utilized for the user’s interaction with the products, such as tracking features and services that the products can offer; (2) the local edge server near the AR-IoT clients (or users) synchronizes necessary data of the AR-IoT products, which are available in the local AR-IoT client environment with the cloud server while considering the user’s profile and context; (3) each AR-IoT product in the client environment also synchronizes the data with the edge server, so that the AR-IoT clients can identify and interact with the products; (4) the users interact or control the AR-IoT products through natural and intuitive AR interfaces; and (5) the user experience, profile, and situational context are continuously updated in the edge server, so that the AR-IoT products can provide timely appropriate and relevant service to the users.

Fig. 32.3
figure 3

AR-IoT framework in the transreality paradigm: smart IoT objects can store and provide AR-necessary data to the users for smart and interactive AR experiences, and anywhere and anything AR interactions will be available for the users

Fig. 32.4
figure 4

An example of the AR-IoT architecture

The IoT objects will be accessible and interactable to the AR users in a distributed manner and can provide appropriate and necessary data for the AR experience, such as tracking or context-relevant service contents [5]. For the seamless AR experience, robust techniques for tracking the physical environment and registering virtual contents are required. The tracking process should be achieved in real time, which normally needs a high computational power, e.g., for extracting, describing, and matching 2D/3D visual features from the scene. This computational burden can be reduced using the AR-IoT framework, which can store such tracking information in distributed IoT objects and provide them to the AR clients through the network directly. Also, virtual contents provided to the users should be context-appropriate. Here the context could include the AR use cases, for example, displaying information for consumer products such as appliance control and instruction manual, or the heterogeneous AR devices that the users are using, for instance, mobile interfaces, a projection-based Spatial AR (SAR), HMDs, and even voice or other audio effects. The AR-IoT framework should be adaptive and flexible to store and provide such context-relevant contents to the users through the IoT objects. Additionally through the AR-IoT framework, the users can have the capability, in terms of efficient operation of IoT objects, to interact with the IoT devices through the AR interfaces associated with the smart environment. This is based on the help of both the IoT communication capability and the graphical AR environment. Such an AR-IoT framework can be applied to any valuable applications and services to connect everyday IoT objects and provide the user’s AR experience, such as an AR manual, training, control, and instructions. It will offer comprehensive interfaces for the users to access and interact with AR entities and IoT objects that could engage the users in the shared physical-virtual space [159]. Note that this typical approach can also be realized with the filtering feature that the AR clients can identify IoT target objects nearby and augment the candidate virtual entities while obtaining context-relevant datasets from the object [15].

An example interaction mechanism that employs the AR-IoT paradigm in human-environment interactions is shown in Fig. 32.5c while comparing with conventional interactions with smart IoT objects and AR contents (Fig. 32.5a and 32.5b, respectively). We should note that a smart IoT object is made to be available in the physical environment seamlessly as a ubiquitous computer, so the users will interact with such smart objects based on the user’s prior interaction experience and intuition in that environment, which sometimes conflict with the actual affordances that the smart objects can provide. AR interaction can be more informative in that aspect because the users can see additional information about the objects and the environment through augmented virtual contents, which can be updated adaptively in real time. AR-IoT interaction taking advantage of both physical IoT objects and adaptive AR contents means that the users can control their surrounding environment more effectively and efficiently using all the affordances that IoT objects and AR contents can offer [160]. Jo et al. [15] proposed an architecture for such an AR-IoT interaction combining an AR interface with the IoT for shopping services. They developed a proof-of-concept prototype of the AR framework tested on IoT lamps and an in situ interaction method to support control directly with the IoT object.

Fig. 32.5
figure 5

Illustration of different interaction paradigms with smart objects: (a) ubiquitous computer interaction, (b) AR interaction, and (c) AR-IoT interaction. The I is used to indicate smart IoT objects

In the following sections, we will describe the three key aspects of the AR-IoT in detail while drawing a clearer picture of what the IoT-enabled AR of the future will look like, how it will operate, and what it will be capable of providing.

4.1 Object-Centric AR-IoT Data Management

AR-IoT frameworks and services along with dataset management for everyday objects are described briefly in this section. Additionally, we summarize several architectures, data processes, data structures, and content representation for the physical objects to interact with AR in the published research. AR-IoT services commonly need to manage generic/specific data and service content for their constituent objects or augmentation targets, for example, visual features for AR recognition and tracking, virtual content and information about the object itself, control interface, and organized additional contents for the operation. Having IoT objects that can communicate with the AR clients in the environment can allow the users to access such data and content through the AR-IoT interaction. The communication and information exchange between the AR client and the IoT objects can occur directly between them as well as through the regional IoT server [5]. Herein, we review current approaches to manage such physical object data for the use of AR, e.g., architecture and data handling, and discuss how they can be extended and scaled to the level of IoT [15].

To interact with real objects using AR interfaces in the early days, people were concerned with an AR framework that was capable of executing ubiquitous communication between physical objects and the AR device [161]. Recent works have attempted to deal with mapping the sensor-object relationship and filtering approaches to reduce the search space in the near space [5, 162]. Specifically, in one notable framework with respect to AR, Iglesias et al. [163] suggested an intelligent selection of resources by the user’s attributes, user-object proximity, relative orientation, resource visibility, and AR interaction connecting the object. They developed an object browser based on AR with context-aware representation of resources. Similarly, Ajanki et al. [164] constructed an AR interface with contextual information and defined context-sensitive virtual information about people, offices, and artifacts. They suggested a filtered AR concept to visualize selected information about teaching and research project for visitors in a university.

Figure 32.6 shows the possible data management/process flow for AR-IoT frameworks based on the physical objects, considering AR datasets with scalability. The AR service users can retrieve fiducial visual markers or natural features from the surrounding objects, which are necessary for the relationship between the physical and virtual objects to be defined in advance. The users, holding their AR devices, are able to view the filtered AR objects, which are corresponding to the nearby IoT-capable objects based on their relative distance or direction from the user. Then, the client AR system directly receives the “feature” and “content” information for the object with the attached sensor. The AR users can experience an efficient transreality environment (e.g., IoT control interface) to mix the virtual object by donning an HMD or using a mobile phone with an attached camera module [5].

Fig. 32.6
figure 6

Distributed data management scheme for “anywhere” AR services and interactions with “anything,” i.e., physical IoT objects. The letter I is used to indicate IoT objects

The cloud as a computing resource has been investigated in many studies considering the pervasive and ubiquitous AR experience [165]. The use of the cloud is beneficial for computation time to reduce the heavy work by matching the large quantity of features for the poor computing capability of mobile devices. Recent research has focused on the process to register and manage AR datasets with a cloud computing device, so they are concerned with how to fit tracking information and AR presentation datasets from the remote cloud server. For example, the users receive the collected AR attributes and tracking information, which are shared via the cloud server, to access and interact with the surrounding objects. Then, the AR users, who are only with the AR browser program without tracking information and AR presentation datasets in the device, can easily connect all of the things with the prebuilt relationship mapping among the physical objects and relevant virtual contents.

A more recent trend is to improve the way to use the adjacent computing resources in the user’s surroundings rather than enabling computing services on the end of the network at a long distance, such as a remote server or the cloud [5]. It will still be difficult for cloud services to support scalability to the level of “everywhere.” An alternative may be to connect to a single area server, which only covers a particular local area, e.g., a room or home, while managing a limited number of objects [165]. The adjacent computing approach can be used to solve problems such as a bottleneck assignment and detection of moving objects by a remote server. This approach is similar to fog computing architecture in the domain of a sensor network that emphasizes latency reduction with high-quality service and handles datasets at the network edge [166,167,168,169]. The AR users can connect directly to the objects in the surrounding area because the sensors attached to the object can detect it in real time to consider the user’s position in certain ranges [166].

Some research has proposed details of the content structure for resource management as another issue to provide AR datasets on the Web-based platforms. In some ways, the contexts on the Web-based platforms are similar to the AR environment with IoT services and content, where most IoT services are implemented as applications. One promising direction is to use the Web to support interactions with physical objects, as exemplified by Google’s Physical Web [170]. Here, the smart objects have particular URLs and can exhibit their own dynamic and cross-platform contents, represented in standard languages such as HTML and JavaScript. We can envision a future where various types of IoT services, including even AR, will be available under a unified Web framework, that is, the “webization” of things. Ahn and coworkers, for example, presented a content structure as an extension to HTML5 for building webized mobile AR applications [171]. This allows a physical object to be referenced and associated with its virtual counterpart. Kim et al. [172] also presented an AR content structure for building mobile AR applications in HTML5, as on the Web. They used an extended presentation logic of HTML to apply current Web architecture and a referencing method with matching between physical and virtual resources. As a similar AR data structure, Muller et al. [173] introduced a custom XML-based format to define AR manual structures for home appliances. Considering the concept of our AR-IoT framework, we present an example of associating a smart window (physical object) with virtual objects (augmentation) about weather information in Fig. 32.7. We augment the smart window with sensor data fed from physical weather sensor stations while receiving the related information from the Web at the same time.

Fig. 32.7
figure 7

Webized AR content representation in which virtual data are associated with a Web-accessible physical resource [170]: (a) virtual and physical resources of webized AR content and (b) an example associating a physical sensor dataset [174]

In our AR-IoT framework for transreality, there are many different types of IoT devices in the user’s environment, so we should consider the characteristics of AR contents according to the related IoT devices. This concept is similar to the website components that have the different configurations depending on the platforms, such as in mobile or desktop computing devices. Thus, to make the IoT-enabled AR platform to be naturally applied anywhere, depending on the nature of the IoT device, it should adaptively control the degree of AR content representations.

4.2 Scalable AR-IoT Recognition and Tracking

AR research primarily focuses on overlaying a virtual 3D model, that is, on how to realistically integrate augmentation with the real world. The most important technical challenges for AR would be fast, accurate, and stable recognition and spatial tracking and registration (or pose estimation) of objects in the environment. AR tracking technologies using the context of physical objects have gradually increased not only for the robust tracking quality but also for the efficient data management and the scalability of AR experience [130].

There are many research works on AR tracking methods that use computer vision algorithms and data from distributed sensors in the environment (see Fig. 32.8). Claros et al. proposed a fiducial marker-based AR medical system using a wireless sensor network (WSN) to monitor real-time information with collected biometric signals from patients [175]. The WSN was used to process semantic information collected from distributed sensors and a marker ID overlapped with the real world to visualize perceptual information (e.g., temperature and humidity). Mihara et al. implemented a light-emitting diode (LED) AR marker with the procedure of reading LED blink patterns attached to a TV rather than fiducial markers [176]. Despite the ease in using the fiducial markers or pattern IDs, one of the problems with them is that they must have individual markers corresponding to each physical object, which makes the scalable AR experience difficult. For example, if there are a large number of objects in the environment around the AR user, the same or even more number of markers will be needed, which could make the environment unnatural and cluttered. Thus, scaling such marker-based methods for millions of IoT objects would be even more difficult – either the level of accuracy or the capability of real-time response is likely to suffer. Placing and attaching markers to thousands of everyday objects also is not a practical solution. Additionally, because there is no single universal recognition and tracking method to cover all types of objects, a multitude of algorithms should be used collectively. Therefore, in a typical situation, before the object is recognized, one cannot determine a priori about which algorithm would be the best to apply to its recognition in the first place. Also, all algorithms must be attempted exhaustively, which will again result in significant latency.

Fig. 32.8
figure 8

Computer vision-based object recognition and tracking solutions for AR using (a) a fiducial marker, (b) natural features, and (c) a 3D model-based approach

Another popular tracking technique is a natural feature-based method that identifies an object and computes its pose using primitive geometric visual features and properties of the object detected by a camera sensor [177]. Different feature-based techniques vary in their robustness and may be applicable to certain classes of objects. Such feature-based methods require a high volume of features (and data) and a strong computing power to establish a robust feature extraction and matching and often require a preliminary learning phase to handle difficult matching conditions, for example, an angled view, dark lighting, and occlusions [178].

In case the target object has few visual features, e.g., textureless objects, template image matching approach is often used; however, this method has some disadvantages for use in robust 3D tracking [164]. Different approaches to solve the problem of tracking textureless 3D objects have been developed for certain situations, such as under poor lighting conditions, during partial occlusions, and against cluttered backgrounds, but the quality of tracking still remains relatively low [178]. Some of the researchers proposed 3D model-based tracking methods that attempt to recognize and track target objects by matching and fitting a 3D wireframe model to the edges extracted from a camera image [179]. However, such model-based approaches also have drawbacks, for example, the methods require a good initial solution with a lengthy convergence time, and it is not clear which reference 3D wireframe model would be the most suitable. Despite such complications, they remain viable alternatives, given the absence or lack of feature information in some specific scenarios.

After all for the robust and visually unobtrusive tracking method, the currently dominant AR tracking is based on simultaneous localization and mapping (SLAM) technique, which denotes the computational technique that creates and updates a map of an unknown space where the user is located, while simultaneously tracking their location in it [3]. SLAM was originally proposed in the field of robotics but now is known as a great alternative to traditional AR tracking approaches because it avoids the necessity for prior information, such as reference images or 3D models. Apparently, SLAM has overcome many limitations that the previous tracking methods had and has been enhanced in terms of the robustness of tracking over the past 10 years, although there are still typical limitations for SLAM, such as a high computational cost to deal with tracking and mapping simultaneously and tracking loss caused by fast camera motions.

Our proposed AR-IoT framework with multiple network connections in the transreality environments will address this scalability issue using the distributed and object-oriented manner.

4.3 Context-Based AR-IoT Interactions and Content Interoperability

The use of the AR-IoT framework can also enhance the intuitiveness of physical object attributes and interfaces among a large amount of objects and data in the environment by providing context-relevant visual clues that can guide possible physical-virtual interactions. The most prevalent IoT applications often involve object control interfaces in situ or remote scenarios. IoT objects consist of sensors for incoming data, networking modules for wireless capability, and actuators to control the object’s functionality [179]. Expressly, actuators can operate with the user’s interaction, such as the decision of object properties, e.g., turning on/off lights or opening/closing doors.

Researchers presented interaction approaches to control and manipulate the physical world through digital interfaces or virtual replicas. Jackson et al. showed the operation method for such object behaviors, e.g., controlling electronic devices or the states of physical objects, in a simulated environment for home automation [180]. A few research papers have also shown different mechanisms or visualizations of interconnected simulation results with sensors and/or actuators embedded in an everyday environment. For example, Lifton and Paradiso presented a dual reality system that generates an interplay between the simulated world and a bunch of sensors, such as an electrical power strip in the real world [148]. In their simulated dual reality, the users can explore a variety of experiences interacting with both physical and virtual objects; this opportunity is concerned with not only mutually reflected sensor browsing but also interaction techniques with the interconnected sensor through sensor/actuator networking. Lu also proposed a bidirectional mapping technique for IoT-enhanced information visualization while presenting a system to realize eco-feedback for energy saving [181]. When the user turns on a home appliance, such as a TV, in the real environment, for example, the attributes of the deployed sensor to detect the user’s activity are transmitted to the simulated world. Then, the monitored virtual world can generate a counterpart representation in the real world.

In the scope of AR research and applications, some also focused on such an AR simulation for the control of the object’s function and usability measurement. For example, AR can be used to visualize simulations of applied control for previewing or training purposes [182]. Since AR can provide intuitive and immediate virtual information of in situ local objects and even for remote objects, e.g., using a remote-controlled camera, AR is an excellent visualization method for object control considering user’s context [5]. With recent advances of AR technology, there have been a few attempts to merge the two, i.e., using AR as the control and simulation interface for IoT objects. Researchers have tried direct control for everyday objects in the AR environment. Rekimoto and Ayatsuka proposed a visual tagging system called CyberCode, which is based on an AR 2D barcode to identify and detect objects [183]. The system has an operation mode to manipulate physical objects, in which the user performs a natural interaction among the target objects, for example, “drag and drop” from one object to another target object after selecting the manipulating object. Then, the target object would carry out the particular operation with context-relevant information in the first selected manipulating object. Müller et al. also suggested an AR manual to convey step-by-step instructions [173]. In the AR manual system, they defined a user markup manual language (UMML) file to generate sequential operations with corresponding steps. Greg Tran presented an AR 3D architecture system based on contextual relationships with real geometry for interrelationships between digital and physical objects [184] while describing “Mediating Mediums,” which explores the future of mixed reality and the relationship with physical form and environments, particularly about architecture. The system could provide simulated geometries that are projected into videos of physical spaces. Consequently, in the near future, IoT objects will connect with each other, and AR users in the operation environments will be able to intuitively manipulate the context of objects, with immediate in situ support from intelligent AR-IoT technologies.

Moreover, it is important to note that the user experience and perception in the AR-IoT interactions will depend vastly on the AR devices or displays. Kruijff el al. provided a classification of perceptual issues in AR and suggested predominant issues for a specific device (e.g., head-worn display, handheld mobile device, projector-camera system) [185]. Most previous works mainly used smartphones to provide images that synthesize real and virtual environments, but they did not consider the presentation of synthesized images directly to the human eye. More recently, AR devices with a helmet-type HMD (or head-worn display) that synthesizes spatially registered virtual objects overlaying a user’s view have been introduced and becoming popular. These helmet-type AR devices are mainly divided into optical and video see-through HMDs, depending on whether actual images are viewed directly by the user or via a video input. For example, the optical see-through HMDs provide the means to present AR visual information to users normally using an additive light model approach, e.g., by projecting light onto a surface which is then reflected into the user’s eyes [186], whereas the video see-through HMDs capture the real environment scene through camera modules and render the processed imagery with superimposed virtual contents. While different types and form factors of AR devices are proposed and introduced, the interaction mechanisms between the users and the environment or among the physical and virtual content should be adapted and diversified depending on the context in AR-IoT-enabled transreality. In that sense, we aim at emphasizing the importance of a framework for AR-IoT interactions considering the object characteristics and the impact of various kinds of AR devices.

4.4 AR-IoT Framework Evaluation

To evaluate the proposed AR-IoT framework, here we develop a proof-of-concept implementation of the framework and experimentally assess the user’s satisfaction level and usability through a couple of preliminary user studies in a shopping service context [15]. The underlying assumption is that our proposed AR-IoT interaction approach would be useful and well-received and create more effective user experience compared to the traditional GUI-based approach.

We developed a proof-of-concept prototype incorporating IoT-enabled smart clocks and lamps (see Fig. 32.9). We used a Raspberry Pi 3 Model B (RPi) and had beacons integrated into the smart clocks and lamps to pose them. The RPi board is equipped with a quad-core 1.2-GHz 64-bit CPU, a 1-GB RAM, a 100 Base Ethernet, 4 USB 2.0 ports, a HDMI port, and a MicroSD port and has a small storage capacity and wireless Internet communication (BCM43438). The AR client was implemented on a smartphone that connects to the IoT products directly through the beacons. The proposed AR-IoT framework was applied to this prototype, which means in addition to storing generic data and virtual content in the IoT objects, individual and different-typed IoT objects of interest in the vicinity of the AR client can communicate the information for the users to recognize and track the objects, e.g., broadcasting the tracking features, algorithm type, and current physical state of the product, such as location and distance to the client or other companion reference objects. The datasets for AR features were extracted from the images of the IoT objects for recognition and tracking, and the TCP/IP protocol was used for the communication to exchange data between the AR client and IoT devices.

Fig. 32.9
figure 9

AR-IoT object prototypes for a proof-of-concept study: a smart clock and a smart lamp with Raspberry Pi embedded for IoT features

For the studies, we prepared a situated environment mimicking a shopping center, where participants could receive a list of available IoT-enabled smart objects nearby when they entered the environment and see the AR-capable objects, which were filtered based on distances between the participant and the object. The AR client system, which the participant used, received information about the smart objects, such as AR tracking and control interfaces (e.g., buttons). Based on the communicated information, the object-relevant virtual contents and control interface were spatially overlaid on the target product in the AR-IoT interaction condition, whereas traditional 2D GUI on the smartphone screen was used in the control condition. To develop control UI menus and AR contents, we used Unity3D C# scripting language and PTC Vuforia AR tracking engine.

The first experiment looked at the level of user’s satisfaction by comparing two different types of interfaces: (a) conventional Web-based and (b) AR-based. Ten participants (age M = 36.0, SD = 7.5; female: 2) experienced both interfaces in a within-subject design and answered a question asking their general satisfaction with the interfaces in a 7-point Likert scale. The satisfaction scores were analyzed through the nonparametric Wilcoxon test for paired samples, which revealed that the mean satisfaction score was significantly higher (Z = 2.871, p < 0.05) with the AR-based interface (M = 6.2, SD = 0.8) than the conventional interface (M = 3.5, SD = 1.2).

The second experiment investigated the usability of the AR-IoT interface. Similar to the first experiment, 16 participants (age M = 37.0, SD = 6.6; female: 4) were asked to switch the IoT-enabled lamps on and off using (1) conventional GUI using the touch screen on a smartphone and (2) AR-based interface with spatial registration of virtual GUIs to the target IoT device. Participants performed the light control tasks multiple times with different IoT lamps and were asked to answer a usability survey asking their perception of the interaction and the performance in a 7-point Likert scale, such as ease of use, naturalness, fatigue, speed, and simple preference (see Table 32.1). The result showed that the participants reported higher scores for the AR-based interface in all the usability categories compared to the conventional GUI (see Fig. 32.10).

Fig. 32.10
figure 10

Usability evaluation results

Table 32.1 Usability questionnaire measuring the ease of use, naturalness, fatigue, task completion speed, and preference

We also evaluated the error rates in the control of the IoT devices between the two interfaces, e.g., the number of incorrect selection and operation of the device. The results showed that the participants made 32 errors in average (out of 80 trials) for the GUI-based condition, while they only made 3 errors in average for the AR-based condition, which suggests that the participants took advantages of the spatial and visual affordance of the AR interface for more intuitive and direct object control in the physical environment.

5 Opportunities of Embodied Interactions in AR-IoT Environments

While we described the concept of transreality and the convergence of AR-IoT and summarized the related research in previous sections, we covered various examples of 2D/3D user interfaces – mostly for controlling the physical IoT objects or the environments. The most common interaction style is to use typical GUIs with Windows, Icons, Menus, and Pointers (WIMP) on 2D screens, e.g., PCs, smartphones, and tablets. Involving reliable and effective touchscreen techniques, the affordances of the interaction have been increased in more “natural” ways, such as selecting the target object by tapping the screen or an intuitive drag and drop [183, 187]. In addition, while VR/AR technology is considered as a valid platform for such interactions with IoT objects, the consideration of spatial context, e.g., searching or navigating the potential IoT objects in the environment and interacting with other adjacent objects, becomes more and more important [188]. Some examples of such AR-IoT interaction mechanisms are illustrated in Fig. 32.11.

Fig. 32.11
figure 11

Illustrations of various interactions with IoT objects: (a) an in situ/remote operation with traditional graphical user interface (GUI) button, (b) an AR interface that could recognize the target object and present appropriate GUIs, (c) a metaphorical natural interaction (e.g., drag and drop) to invoke an object function interacting in a virtual/augmented space to affect the physical world

In particular, as immersive wearable AR displays like smart glasses and see-through HMDs are grabbing the public’s attention and becoming more and more popular, a variety of 3D user interfaces and interaction techniques using ad hoc VR/AR controllers or natural body gestures, which are traditionally researched in the VR/AR community for decades [189], can be applied to the AR-IoT interactions considering the spatial environment context. Recent research by Lages and Bowman [190] presented an adaptive AR interface that can change the position and the form depending on the environment while the user is walking. For example, a set of virtual windows in the AR environment could follow the user, rotate themselves, and automatically align to the wall for better user experience in the AR interactions. They conducted a user study that investigated how such contextual adaptations can contribute to the system usability and the user’s behavior and revealed that participants particularly appreciated the ability of the adaptive interface to automatically follow and position the information to the user’s view. Such intelligent and context-adaptive AR interactions are one of the key elements to achieve more effective and efficient AR-IoT interactions among the users and the physical and virtual objects in the transreality environment.

While considering the intelligent and context-adaptive AR interactions, we see unique opportunities of embodied interactions through virtual avatars and agents with visual appearances in AR-IoT environments. In the remainder of this section, we will first present some of the affordances of such virtual avatar/agent-based embodied interactions in the AR-IoT transreality paradigm, give a few examples of prototypes that utilized the idea of AR-IoT for their intelligent agents, present findings on fundamental research on embodied AR agents that can shape the realization of embodied agents in the AR-IoT transreality paradigm, and describe two use cases where the intelligent embodied virtual agents can be adapted to enhance the human-computer interaction experience in our daily lives and professional settings.

5.1 Embodied Interactions in AR-IoT Environments

As we described before in Sect. 32.3 regarding the transreality paradigm our daily lives of the future will become more complex with large volumes of heterogeneous data and information from a variety of sources such as millions of smart objects [59, 62]. The importance of pervasive AI and ubiquitous IoT objects will continue to grow, and consequently, the research on the efficiency of communication with them and trust in the validity of information conveyed will be important. Embodied interactions in AR-IoT environments can go beyond users controlling IoT-enabled devices via 2D/3D interfaces, which were made available to them through AR devices, by extending to multimodal interactions through embodied virtual avatars and agents in AR and facilitating physical-virtual interactions in the AR-IoT transreality paradigm – in particular considering the efficiency and trust in the interactions with the smart IoT objects and physical environments.

Nonverbal behavior plays a crucial function in regulating communication and providing critical social and contextual information [191]. For example, some nonverbal behaviors, such as pointing and gesturing, can directly convey information that would otherwise require significant verbal explanation. This is true for both seeing and carrying out such behaviors. Socially intelligent nonverbal behaviors in the embodied interactions can convey situational and information awareness, which can be critical for establishing common ground in communication. For example, if a user is looking around a scene, information conveyed is assumed to be relevant to that scene, without any verbal explanation. Furthermore, information conveyed about the scene will likely carry more weight (increase trust), as the provider will be perceived as having direct awareness of the events/objects in the scene. Nonverbal communication can also be used to increase the transparency of the confidence in, and even the provenance of, the information, which in turn can increase trust and efficiency [192].

In the vision of pervasive connected AR environments (i.e., the transreality paradigm), humans and virtual entities are aware, can seamlessly interact with, and influence each other through different modalities, suggesting a bidirectional relationship between the physical and virtual worlds. For interactions to be seamless, virtual entities are envisioned to capture the features of their real counterparts, such as appearance and behavior, and exhibit awareness and understanding of their environment. In such circumstances, the visual embodiment and nonverbal behaviors of the virtual entities mean the increase of the chances for successful communication among the users and the AI or smart objects/environment through the richer and efficient communication channels. Here, we describe some of the findings from our own research efforts in the past and other relevant works that emphasize the potential benefits of embodied interactions assuming seamless physical-virtual interactivities are available in transreality with the convergence of AR-IoT.

5.2 Embodied AR-IoT Agent Prototypes

To our knowledge, we have only found a few prototypes where researchers utilized embodied AR agents as a means to facilitate interactions with smart objects. In the context of embodied IVAs capable of controlling smart objects, Amores et al. [193] introduced the idea of an embodied AR agent that can change the state of a smart lamp, for example, the agent walks toward the lamp and switches it on or off. Kim et al. [143] further extended this idea to investigate the influence of the embodiment and locomotion behavior in more detail, where these factors were varied in a human-agent interaction scenario inspired by common use cases of voice-based assistants, such as Amazon Alexa. The embodied AR agent was presented to the users through and optical see-through HMD (see Fig. 32.12a) and extended the contribution of Haesler et al. [194] by exploring a more diverse interaction scenario. In the interaction scenario studied by Kim et al. [143], some tasks involved the virtual agent controlling physical objects in the same room as the participant, while others presented opportunities where the embodied agent were asked to control physical objects or gain/relay information to and from the physical environment (i.e., objects and people). Additionally, some tasks revolved around the concept of user’s privacy needs and their level of inclination to share their private information with the agent. The task diversity in this scenario allowed for various opportunities where the effects of agent’s embodiment and locomotion behavior were explored, such as the embodied AR agent walking to a physical lamp in the room to turn it on, compared to having no locomotion behavior and no embodiment conditions. They found that the embodied virtual agent with abilities to be aware of its surrounding environment and physically influence it could positively impact the participants’ sense of privacy-preserving and their confidence in the agent’s activities and abilities.

Fig. 32.12
figure 12

Virtual humans interacting with the physical objects: (a) a virtual human turning on a floor lamp using IoT-enabled light bulb and (b) a virtual human moving a physical token in a board game scenario

5.3 Embodied AR Agent Insights

Advances in AR technology have facilitated increasing research on the development and understanding of embodied AR agents. As embodied AR-IoT interactions evolve, such findings can shape how AR-IoT agents are realized and understood. One area that has received an increasing attention is embodied AR agent’s physical-virtual interactivity [195]. Lee et al. [196, 197] studied the effects of a virtual human’s ability to move a physical token on participants’ perceptions of their interaction in a table top game setup. They found that participants felt more co-present with the virtual human when she moved a real game token compared to moving a virtual one, and interestingly, this effect changed their overall perception of the virtual human’s ability with regard to moving/affecting other physical objects (see Fig. 32.12b).

In consideration of the effects of tactile physical-virtual interactivity in the embodied interactions in AR/MR, Lee et al. [198] developed an AR physical-virtual table, where a virtual human and a participant occupied the virtual and physical ends of the table. In one condition, leaning on either side of the table resulted in a wobble on the other side as if both physical and virtual sides of the table were seamlessly connected, while it’s not the case in the other condition. They found that participants felt more socially present with the virtual human when the table wobbled compared to the condition that it did not. In another study where participants shared a runway with a virtual human, Lee et al. [199] simulated the effects of the virtual human’s footsteps when walking and jumping using vibrotactile feedback (see Fig. 32.13). They found that participants experienced a higher sense of co-presence (i.e., sense of being together) with the virtual human and perceived it as more physical compared to conditions where this effect was absent. Such vibrotactile feedback in these examples can be easily achieved in smart environments with IoT objects embedded to improve the user experience of the interaction with virtual entities in AR-IoT environments.

Fig. 32.13
figure 13

A vibrotactile platform for generating the sensation of a virtual human’s footsteps on the floor when walking and jumping

Similar to many real-world experiences, sometimes the changes in the environment are more subtle and only affecting the ambience of the main interaction. As physical entities, humans are used to detecting and responding to such changes, for example, airflow or environmental noise; thus, it is important to understand the influence of similar behaviors when portrayed by virtual entities. Kim et al. [200, 201] investigated how air blowing from a real fan, which moves virtual papers and attracts the attention of a virtual human, influences participants’ sense of co-presence with the virtual human through different AR setups. They found that participants felt more co-present with the virtual human when the virtual papers and curtains fluttered due to the real airflow and the virtual human exhibited awareness of these events, by holding onto the fluttering paper and looking toward the fan compared to a condition where virtual entities were unaffected by the real fan blowing.

Advances in AR technology and continuous research efforts, such as those mentioned above, are paving the way for embodied interaction of virtual humans in various fields of application, such as intelligent virtual assistants, caregivers, and collaborators, allowing for more engaging experiences.

Kim et al. [202] and Wang et al. [203] explored the influence of an agent’s embodiment in collaborative problem-solving scenarios. Comparing embodied virtual human assistance, with voice assistance, and no assistance in a desert survival task, Kim et al. [202] found that receiving assistance, regardless of form (i.e., embodied and voice-only), enhanced participants’ performance. However, the embodied assistance provided a richer experience and less task load compared with voice assistance. Wang et al. [203] also varied the agent’s embodiment and appearance in a collaborative search task, where participants worked with the agent and asked for hints to find hidden objects. Their findings showed that participants gazed more at the humanoid agents compared to the one that looked like a virtual Alexa. Healthcare is another important area that can benefit from intelligent agents and connectivity of objects. Kim et al. [142] designed scenarios involving health-related and daily life activities, comparing real and virtual human assistants in embodied and voice-only forms. Their results indicated that the virtual counterparts are not at the point of creating a similar experience to a real human; however, both embodied interactions created a more engaging experience than their voice-only counterparts.

Although virtual humans have been researched more extensively than other form factors, they are not the only entities capable of influencing users’ perceptions. Virtual animals have also been researched in a number of domains, with encouraging findings that suggest their potentials for roles involving physical-virtual connectivity (see Fig. 32.14).

Fig. 32.14
figure 14

Illustrations of example interactions with a virtual dog in AR. The users can see and interact with the virtual dog, for example, walking the dog together and playing fetch with it

Johnsen and Ahn et al. [204] developed an AR setup where children interacted with a virtual pet dog and their exercise levels influenced the state of the virtual dog and the tricks it could learn. For instance, the pet became more fit as the child exercised more. Compared to a goal-based interface without the virtual dog, children’s exercise levels significantly increased when interacting with the virtual dog. Norouzi et al. [205] gave participants a virtual pet dog and created a scenario where another person walked over the participant’s virtual dog. They varied the response of the virtual dog to the physical collision (i.e., not doing anything or falling over) and the awareness of the other person (i.e., aware of the dog or unaware of the dog). They found that when the virtual dog exhibited the falling over behavior, participants felt more co-present with it and also gave lower affect scores to the other person regardless of their awareness level.

Overall, these findings indicate that a virtual character’s awareness of both the physical and the virtual world and their ability to influence and be influenced by them is vital for their effective realization. The convergence of AR-IoT in transreality will increase the potential of such physical-virtual interactions using prevalent smart IoT devices, and the embodied interactions through/with virtual entities will benefit from such seamless mixed environments while offering unique opportunities to enhance social realism and influence. The findings addressed above can also be viewed as potential guidelines when designing embodied interactions in the transreality space.

5.4 Potential Use Cases

In this section, we describe a couple of use cases that capture the potential of AR-IoT convergence in our daily lives. These examples illustrate the integration of the ambient intelligence paradigm with the main aspects of the AR-IoT space that we addressed in Sect. 32.4: (1) distributed and object-centric data management, (2) IoT object-guided tracking, and (3) seamless interaction and content interoperability. Through this convergence, we describe the notion of an object-centric interconnection of physical and virtual things capable of exhibiting awareness of their environment, including the user, in a singular or a collective manner and accordingly facilitate AR interactions and experiences.

Scenario 1

Steve is planning to buy a new high-end gaming personal computer (PC) and has already narrowed down the important criteria for the choice of PC, e.g., GPU/CPU requirements, display resolution, etc. Some of the options were based on the suggestions of his intelligent AR assistant, which are pervasively embedded in his mobile phone, wearable glasses, and home. At the electronics store, Steve goes to one of the store’s guiding stations and gets virtual landmarks for directions to the devices on his list. To do so, his smart AR glasses directly connect to the guiding station, which opens up the virtual interaction space situated over the station itself. Steve asks his intelligent AR assistant for the device list they had devised earlier. The list virtually appears in his field of view through the AR glasses, and he interacts with it by dragging and dropping it on the virtual window of the guiding station. Now he gets appropriate directions for every item on his list and virtual landmarks for each item, which efficiently guide him on where to go and what to look for. Similar to the interaction with the guiding station, Steve’s smart AR glasses can directly connect to, gather information from, and interact with objects of interest in the store without needing to connect to a main server, which is in line with the main components of the AR-IoT object-centric data management vision. For instance, situated with each item of interest, he can view the product information and control the device through AR interaction metaphors, such as playing a video from his own library to test the display quality by means of a virtual user interface and gestures. After some time, Steve has narrowed down his choice but is stuck between two options. His intelligent AR assistant appears and helps him to make the best decision by visualizing a comparison table and suggesting certain items for him. Depending on the amount of data, it might be cumbersome for the user to read all the visualized information in AR. In such cases, the AR assistant with visual embodiment can remind the user of each device’s capability and also the preferences of the user while exhibiting appropriate gestural and facial expressions. The example above illustrates circumstances where the user is connected to each object separately or collectively through his intelligent AR assistant. Also, with the context of ambient intelligence in mind, the intelligent AR assistant already knows the preferences of the user, the form factor it should assume based on context, and the type of information the user might be interested in based on the profile data collected from previous interactions with him.

Scenario 2

Nowadays, it is a common practice for many professions, especially healthcare, to be trained using physical and virtual simulators with varying degrees of fidelity. Alice is an instructor at a university and has recently purchased a high-fidelity physical patient simulator with a human-like appearance to train medical students. Although Alice has used a number of simulators before, she is not sure how to interact with this model. Her smart AR glasses can detect the smart IoT-enabled simulator and directly connect to it. This leads to the appearance of the simulator’s interaction metaphor visualized in her field of view situated next to the physical simulator. To get familiarized with the system, Alice can choose her preferred interface among several interface options, for example, a traditional graphical user interface or a virtual twin (i.e., in line with the digital twin paradigm [206] of the physical simulator, to educate herself. She chooses the virtual twin interface by gesturing toward that option. The virtual twin overlaid on the physical patient simulator describes all the different capabilities of the simulator step by step and encourages Alice to try each one. For instance, the virtual twin starts with the pulse capability of the simulator and asks her to touch the marked spot on the physical patient simulator’s wrist to sense the pulse and activate the virtual interface for adjusting its value and potential changes for different medical scenarios. Later, it guides Alice to the add-on capabilities of the physical simulator, such as smart instrumented moulage devices that can be added onto the physical simulator to replicate combat casualty care scenarios [207]. Moreover, she can add the students’ profiles in her class to the database of the simulator, and the intelligent interface of the system keeps track of the students’ academic/training progress through observing the students’ interactions with the simulator. For example, the simulator and the user’s AR glasses in the AR-IoT environment monitor the students’ activities, such as what symptoms of the simulated patient they missed in certain training scenarios and whether they did not treat the patient appropriately (e.g., not paying attention to the patient), which could lead to a wrong diagnosis, mistreatment, or unsatisfactory patient experience.

6 Conclusions

While AR/MR technology is experiencing a renaissance of development and consumer interest, there are still many obstacles in the way of the widespread adoption of ubiquitous AR in our everyday life. Azuma outlined the important challenges to overcome for ubiquitous AR, e.g., precise tracking anywhere and anytime, wide field of view of optical see-through near-eye displays, innovative interfaces, and semantic understanding of real-world objects [208]. Recent trends toward a merger of AR with UbiComp and advanced AI algorithms (i.e., the transreality paradigm) have the potential to resolve some of the challenges by using the distributed smart objects for data management and interactions while even making virtual content even more intelligent and interactive up to the point where they may be perceived as true social entities – such as socially influential virtual human avatars or agents [142, 143, 209]. In such transreality environments, we expect more natural and seamless interaction between the virtual content and the physical environment. To meet these expectations, advanced AR technology that enables more dynamic physical-virtual interaction has to be devised, and rigorous user studies are needed to understand how and in what ways the surrounding physical-virtual environment is contributing to human perception and interaction. Recent dramatic increase of evaluation research in AR supports this claim [3].

In this chapter, we described the concept of transreality, where physical and virtual worlds are highly merged and connected to each other, and presented the AR-IoT framework which can benefit from the realization of such transreality environments while discussing the main components in this realm to allow for a scalable, “anywhere” with “anything” interaction space in AR-IoT embedded environments. We presented a brief history of AR, IoT, and relevant concepts while covering the broader scopes of UbiComp and AI in consideration of synergies with pervasive AR technology. We further provided detailed descriptions of previous convergence research among them and literature that described related paradigms that envisioned and, in some cases, implemented the notion of pervasive and intelligent AR interactions with both real and virtual things.

We particularly described the three main components of the AR-IoT framework in the transreality paradigm, i.e., (1) distributed and object-centric AR-IoT data management, (2) scalable AR-IoT object guided tracking, and (3) context-based AR-IoT interaction and content interoperability, and described work that addressed the design and implementation of the components above.

Finally, we emphasized the potential benefits of embodied interactions in AR using virtual avatars and agents to improve the overall user experience and exert social influence over the users. The use of the AR-IoT framework, which can bring seamless interactions among the physical IoT objects and AR content, provides synergistic impacts on the user experience and behavior in human-agent interactions in the transreality space. We presented research that exemplifies such benefits and technical requirements for the “anywhere” with “anything” interaction space and described use cases that capture the essence of the transreality paradigm applied to a wide range of applications.

Together with upcoming high-speed 5G technologies and advanced Cloud and Edge computing paradigms, the interconnection between the virtual and real worlds will be more dynamic and immediate [210, 211]. As our real world is more and more equipped with IoT devices, we are moving toward a “physical Web,” where information is tied to tangible physical objects, locations, and spaces [208]. Such smart devices will continuously monitor human activities and intelligently recognize the context for better user experience [212]. The AR-IoT framework provides a universal interface for us to access, retrieve, and absorb this information.

The notion of digital living, which stands for a paradigm of living without the bounds of place and time as presented by Negroponte in his book “Being Digital” [213], is being realized with this new convergence of technologies. We anticipate widespread AR in such digital living environments [214]; it is timely and important to research and develop the synergistic convergence of AR with other relevant domain technologies, such as the AR-IoT framework, while also considering other ethical issues like privacy and data security.