Keywords

1 Introduction

Technology has redefined the way we retrieve and access information; it has increasingly become embedded in our lives and support us in performing different tasks, extending our perception, memory, and processing capabilities. AR further enables such a process by superimposing digital objects and information on physical reality. The growing trend to intertwine the human being with smart devices and systems has led to the emergence of an interdisciplinary research field called both Human Augmentation (HA) and Augmented Humanity (AH) [18, 24, 26]. Among the several definitions [18], Raisamo et al. describe Human augmentation as “An interdisciplinary field that addresses methods, technologies and their applications for enhancing sensing, action and/or cognitive abilities of a human. This is achieved through sensing and actuation technologies, fusion and fission of information, and artificial intelligence (AI) methods” [24]. In addition to providing this insightful definition and an in-depth analysis of Human Augmentation and the related ethical risks, they observe that despite the existence of several systems augmenting human senses, actions and cognition, the vision of human augmentation is not yet established. According to the authors, there is a lack of architectures and models that integrate individual contributions in a holistic approach that could be considered the basis for practical applications. In addition to the above considerations, in our opinion, it is necessary to refer to a cognitive theory that could drive the design, implementation and evaluation of human augmentation systems. This is especially important in consideration that one of the intended outcomes of HA systems is to support and enhance the user’s cognitive processes effectively. To the best of our knowledge, from what we observed in mixed reality literature, most of the current systems act mainly on a level of senses and partially on a level of action augmentation, while the potential that a cognitive augmentation can offer is not enough exploited. The primary purpose of MR systems is generally to support the user in performing a task by superimposing the necessary information on the physical world in real-time, information that can be therefore accessed and deepened through immediate and straightforward interactions, such as the use of gestures or verbal interaction with virtual assistants. In a prospective of HA, it is important to exploit the potential offered by new MR technologies and AI methodologies to not only extend/increase the user’s perceptive and interactive experience, but also to naturally support and augment the user’s cognitive processes while performing a task.

In the specific, starting from a brief analysis of the main theories of cognition, we propose a holistic, enactive view of HA, where user capabilities and cognition can be augmented as the effect of an increase of sense-motor possibilities in a mixed reality environment, and as a consequence of the introduction of AI modules. We’ll discuss in detail our enactive perspective in Sect. 2, also proposing a blueprint architecture for the design of HA systems. In particular, considering three different dimensions of augmentation, we’ll examine existing systems insisting more on “planes of augmentation”, or on the whole augmentation space. Finally, in Sect. 3, we address the importance of such a holistic approach identifying some requirements that are necessary to obtain an effective HA.

2 An Enactive Human Augmentation Architecture

Our position is that HA Systems should be designed and implemented considering human cognition at its heart, to define the most appropriate technologies and methodologies to support and, in a sense, augment it. Modern cognitive theories, considers cognition situated in the physical and social context. According to approaches known as 4Es, cognition is embodied in our bodies (Embodied Cognition), embedded in the physical and socio-cultural context (Embedded Cognition), enacted through the active perception of the environment (Enactive Cognition) or extended by using extracranial structures (extended cognition) [5]. Among these theories, we embrace the enactive approach to cognition. Enaction is the idea that individuals create their own experience through their actions; experience is the result of the reciprocal interaction between the organism’s sensorimotor capacities and its environment, by means of transformative and not merely informational interactions [10]. Cognitive processes belong to the relational domain of the living body coupled to its environment [28]. Therefore, if cognition depends on the kind of experience that results from having a body with various sensorimotor capabilities [25], we can assume that a system that increases sensorimotor capabilities introduces radical changes in the interaction between the user and the environment that, according to an enactive view, can affect their cognitive processes. When we introduce new sensors and new modes of actions, we are augmenting the user’s embodiment. A change in the embodiment leads to different interactions with the environment and consequently leads to changes at a level of cognitive processes [14, 25].

Fig. 1.
figure 1

Enative Perspective of a HA system: In green, human cognition arises from the three dimensions “senses”, “motor” and “memory and processing”. In blue, an augmented human cognition results from the enhancement along the three dimensions. An augmentation of the three dimensions allows the user to perceive and interact with elements embedded in the augmented environment. (Color figure online)

A schema illustrating our vision of a human augmentation system is depicted in Fig. 1. In the schema, we consider human cognition as the result of the individual’s interaction with the environment through their sensorimotor, memory and processing capabilities. An augmentation can be therefore introduced on three dimensions named “senses”, “motor” and “memory and processing”. Such an augmentation allows the individual to interact in an augmented way with a hybrid real-virtual environment and can lead to an augmented cognition. Figure 2 shows a blueprint architecture for the design of HA systems. In the proposed architecture, a human-system coupling of the dimensions mentioned above enables new sensorimotor patterns, leading to new possibilities for actions and, therefore, cognition and experience. Augmenting the sensory dimension means expanding the user’s sensory possibilities by introducing devices equipped with sensors, such as visors, glasses, and haptic sensors. The motor dimension can be augmented by extending the physical capabilities of the human with artificial limbs, external tools or devices that capture user actions making them exploitable in a mixed environment, such as controllers, gloves, exoskeletons and tracking sensors. Human memory and processing capabilities can be supported and extended by exploiting the system processing and storage resources. The augmentation on this dimension can span from introducing simple modules and links to external resources to the integration in the system of advanced AI modules. In this context, both symbolic (ontologies and reasoners) and sub-symbolic modules (deep approaches, data-driven modules for the creation of semantic spaces) can be considered to analyse, classify and interpret information related to the interaction between humans and the environment, such as information perceived through sensory modalities, or concerning the performed actions.

Fig. 2.
figure 2

Blueprint architecture for the design of HA systems. A human-system coupling enables new sensorimotor patterns, leading to new possibilities for actions, cognition and experience. Memory and processing capabilities can be extended by exploiting the system AI modules.

In Sects. 2.1, 2.2 and 2.3 we will analyse, using non-exhaustive examples, how a partial augmentation can be achieved focusing mainly on a few of the identified dimensions. To analyse these systems in our perspective and along the defined dimensions, we interpret them as more situated on augmentation planes. The plans are partial views of the entire envisioned HA space; in our interpretation, they increase mainly perceptual, action or multimodal interaction capabilities. In Sect. 2.4 we’ll describe some systems more in line with our holistic vision, since they cover the whole space and consider the effects on cognitive processes. In the final discussion, we will re-examine the proposed architecture, outlining possible approaches and risks to avoid in implementing our vision of HA.

2.1 Augmented Perception Plane

The augmentation on the perceptual level occurs by intervening mainly on the sensory and the memory and processing dimensions. The introduction of devices such as visors, glasses, haptic sensors, leads to an augmentation of the “senses” dimension. Combining this dimension with the introduction of methods of analysis, representation and understanding of what is observed, heard or touched in the environment, can enhance the user’s perception. The two dimensions allows for an analysis and reasoning about data perceived through the added sensors.

Systems that rely mainly on this plane are those introduced to ensure greater accessibility, often to compensate a sensory gap by introducing elements to obtain a stimulus through other senses. This plane includes, for example, systems to assist indoor and outdoor navigation of visually impaired people, as in the case of the ARIANNA framework [9]. The possibility of replacing one sensory modality with another is also essential when it is impossible to get close to something, or even more, to touch it for people’s safety or artefact protection purposes. By exploiting the touch sensory channel, some systems exploit 3D printed models that can be felt to compensate for vision. An augmentation on the “memory and processing” dimension allows the introduction of augmented information which can then be provided back to the user through audio, visual or tactile channels, which are activated by touching a model or moving along a path [13].

The senses can also be enhanced through the use of augmented reality devices. In this case, virtual elements can be introduced into the observed scene, increasing perception and knowledge, allowing the user to obtain information about something that is not accessible or no longer exists. As an example the MR game proposed in [8], recreates the setting of the archaeological excavations of the ancient Roman port of Naples. The game’s mission is to spatially explore the augmented archaeological site, deepening understanding of the past artefacts’ functions. The visualization of the excavation reconstruction is mediated by a sphere-shaped widget, which has a double function: allowing to explore the virtual environment and mitigate the FOV limitation of the device by adapting dynamically to it.

2.2 Augmented Action Plane

A contribution of augmentation mainly targeted on the “motor” and “memory and processing” dimensions define an Augmented Action plane, enabling new action possibilities. Extending the physical capabilities of human beings with external tools or devices, allows people to perform actions even in case of impairments or lower capacities due to age or illness. The use of specific devices allows for interaction in mixed reality. Technology becomes a support tool to maintain and promote wellness. It ensures that a limitation or deterioration of motor skills does not affect the individual’s social life. Socially assistive technologies can work alongside people to fill in and complement human abilities enhancing their performances [21]. An example is the system proposed in [4], dedicated to patients needing rehabilitation from a stroke. This type of patient shows limitations in physical movements and, at the same time, requires monitoring of cognitive activities. The patients’ motor abilities are mapped on virtual world dimensionalities, where they can perform simple cognitive games and exercises aimed at recovering the degrees of movement of the limbs with the supervision of a therapist. The adaptation of the game to the specific physical condition of the users augments their action possibilities as if in the virtual world they do not have physical limitations. Motor augmentation can also be obtained including in the environment actuators that can lead to a change in the state of the environment mediated by the system, which can also be accomplished by teleoperation or remote presence [27].

Combining the “motor” dimension with the “memory and processing” dimension, where the last dimension is augmented by modules for analyzing verbal and non-verbal signals leads to an extension of the possibilities of actions and their associated meanings. The system’s ability to interpret a gesture, a gaze, or a speech signal in order to elicit a particular effect (in the real or virtual world) extends the meanings that simple actions could convey and increases the chances of communicating a specific intention or of changing the state of the environment. For example, let us consider the spontaneous movement of the eyes: it can be analyzed by the system to understand information related to users, for example their level of attention towards what they are looking at. In addition to this implicit augmentation mode, the user can employ this nonverbal communication modality to convey an intention explicitly. For this reason, the gaze is proposed and studied as a mode of interaction in place of other modalities, for example to trigger the selection of objects [23]. Similarly, also gesturing can be in a sense augmented by introducing modules that provide them a meaning or allow their use beyond the boundaries of physical reality [16].

2.3 Augmented Multimodal Plane

The systems that focus more on the “senses” and “motor” dimensions are often designed to offer multimodal experiences to users. The multimodal spaces ADA, Pulse and Dune offer an immersive and interactive experience. ADA is an intelligent room [7] which locates and identifies people by using vision, audition and touch senses and reacts with sound effects, images and games screened on a 360\(^\circ \) ring of LCD projectors. Dune [1] is a light landscape where sensors and microphones capture participants’ footsteps and sounds: the installation expresses a sort of mood; when is alone it is sleeping, while in the presence of people reacts with lights and sounds.

An engaging multimodal experience is provided by Sirena Digitale [3], a hologram impersonating a Parthenopean mermaid, performing the repertoire of traditional Neapolitan songs. The system, accessible as a permanent exhibition in Napoli, allows for a multimodal fruition of cultural and musical heritage. The visitors do not passively listen the songs proposed by the system, but they can interact with the system to choose a song and its language version. The interaction modality, based on sensors to track users’ hands, was designed to exploit a set of intuitive gestures. The installation allowed to experiment with the haptic interaction providing the visitors with sensations in agreement with what they were observing. For instance, the user could perceive touching water while the hologram showed the mermaid immersed in the sea and getting feedback regarding interaction with the interface. The system shows the users the processed cultural information through the hologram, augmenting their visual sense and, at the same time, the visitors’ touch employing the haptic device. Augmented multimodal experiences are often proposed for learning purposes. As an example, the Block Talks toolkit combines tangible computing and augmented reality (AR) technologies to foster sentence construction abilities [12].

2.4 Human Augmentation Space

In our enactive vision of human augmentation, we include more systems that address and combine all the aforementioned dimensions to support the user in performing a task and monitor how augmentation impacts the user’s cognitive processes. An example is the use of AR in industrial settings to enable users to perform the task more efficiently. It has been proven that AR facilitates adequate working conditions and many perceptual/cognitively demanding tasks can be done better, easier, and faster compared with traditional methods. Ariansyah et al. [6] investigate the impact of different AR modalities in terms of information mode and interaction modality on user performance, workload and usability during a maintenance assembly task. The study found that the use of AR can reduce task-related workload but at the same time can induce non-task related workload.

In the context of the E-Brewery project [2] has been developed a system for industrial plants management. The system allows the user to analyze the overall production trend through specific information panels and intervene by modifying the settings of the industrial machines. The situated visualization of those panels permits to show the information in their spatial and semantic context just where the information is needed. The objective is to reduce the cognitive load due to information search and parameters control. The system continuously monitors such parameters and directly notifies them in the employee’s headset. The system also aims to reduce interaction friction by employing some expedients, such as orienting the panels towards the user and allowing close and far interaction with its components.

Another scenario is the design of systems supporting users in daily activities, augmenting their abilities to complete everyday tasks. The aim of the study proposed in [17] was to introduce an intelligent assistant for individuals with mild cognitive impairment. The system exploits a serious game, adaptive fuzzy decision-making methods and IoT to augment the interaction possibilities of the users with the environment (enriched with AR objects) and help them in taking decisions more independently. The user’s cognitive state is assessed according to the serious game scores.

An appealing context for human augmentation is the surgical environment. Technology is increasingly being employed in surgery [11], augmented reality for example is exploited for different purposes: to deepen the patient’s condition and conduct simulations in a preoperative phase, for training or reexamination purposes, but also during the surgical procedure. It represents a challenging scenario to study and experiment human augmentation according to our enactive approach; the introduction of innovative devices and AI methodologies can strongly influence the possibilities of perception and actions of surgeons and their situations understanding and decision processes [14, 15, 22].

3 Concluding Remarks and Future Works

In this work we discussed an enactive perspective of HA. We have proposed a blueprint architecture for the design and implementation of systems that, by augmenting sensorimotor and mental dimensions in a synergic way, can expand the normal capabilities of users. We conceive such a HA system as somehow transparently intertwining with humans, i.e., the perception of the system should move to the background with respect to the offered opportunities, leading instead to the impression of an immersive, unmediated presence. It is therefore necessary to understand how the increase on the different dimensions and their combination may impact the user. Indeed, although different technologies used for human augmentation can lead to various advantages, adverse effects may be achieved. For example, the information to process may become excessive, resulting in a “negative effect” on the “memory and processing” dimension. XR technologies can support perception, search, memory and mental processing, providing visual cues that are lacking in the physical environment [20]. In some cases they can release mental resources, allowing users to focus more on accomplishing the task. In other cases the effect can be quite the opposite because introduced details in the observed objects can produce an information overload and force the user to perform additional interactions to select the information of interest [19]. In addition, allowing users to expand and increase their sensorimotor dimensions in an augmented environment may result in an increased mental workload due to the limited ability of humans to handle multiple sources of attention simultaneously and interpret both real and augmented information [20]. Another risk is that the access to information may not be transparent and introduce considerable friction. Some technologies could involve too many movements [29], leading to a “negative effect” on the “motor” dimension. The used devices should have an adequate resolution and FOV [19], to avoid a “negative effect” on the “senses” dimension.

Starting from these considerations, in future works, the proposed architecture will be deepened and exploited to design HA systems that, by introducing XR solutions and AI methodologies, can effectively expand the possibilities of human beings, pursuing and monitoring two important outcomes:

  • Enactive augmentation outcome: an effective increase of user possibilities and abilities in the accomplishment of the task;

  • Usability outcome: an high usability deriving from a high human-system coupleness level, characterized by low friction and a high transparency in information access.

As concrete example, let us consider as applicative scenario the laparoscopic surgery practice. An HA system can augment the sensorimotor abilities of surgeons during a preoperative or a operative task and as consequence, in the enactive perspective can influence their cognitive processes. This important aspect will drive the design and implementation of the HA system, to pursue an effective augmentation on the overall HA space. To implement the architecture modules for the specific scenario, it will be necessary to proceed from a cognitive analysis of the surgery practices, the involved cognitive processes and the required knowledge structures. As an example, usually surgeons analyze 2D computed tomography and magnetic resonance images to study the specific situation of a patient and define the best surgical strategy. The possibility to visualize a 3D model reconstructing the patient’s organ, reduce the need to mentally reconstruct in 3D the patient’s situation and perform visual imaginary processes. The possibility to manipulate such a 3D reconstruction of the organ, enable the ability to perform so-called epistemic actions, performed to obtain information that is difficult to obtain mentally. Therefore, starting from a cognitive analysis of surgical practices, in future works will be identified the more appropriate technologies and methodologies as well as evaluation procedures to accomplish such a HA vision.