1 Introduction

Human presence can be considered an essential ability to complete specific procedures through human knowledge and experience in the context of Industry 4.0, while also contributing to address unplanned situations and accidents as they occur [34, 48, 51]. As Industry 4.0 takes shape, human operators experience an increased complexity of their everyday practices, compelling them to understand a variety of manual operations, be highly flexible in a very dynamic working environments, as well as learn from remote experts when additional knowhow not available on-site is required [11, 30, 52, 53, 56]. Thus, ensuring the conditions to support remote collaboration, the process of joint and interdependent activities performed to achieve a common goal, is of paramount importance for the fourth industrial revolution [20, 30, 42,43,44,45, 66, 80], in particular, in the field of training, assembly, quality control, repair or maintenance [40, 41, 56, 78].

One of the most promising innovation accelerators to support these needs is Augmented Reality (AR), being considered a key pillar of Industry 4.0 to facilitate the digitization of the manufacturing sector, contributing to a higher level of efficiency by speeding up the entire production chain [13, 17, 19, 65,66,67,68, 80, 82, 88]. Solutions using AR have been explored to provide distributed collaborators with a common ground environment, i.e., serve as a basis for situation mapping, e.g., informing where to act, and what to do, making assumptions and beliefs visible, since it allows overlying responsive computer-generated information on top of the real-world environments, combining the advantages of virtual environments and the possibility for seamless interaction with the real-world objects and other collaborators [11,12,13, 15, 19, 30, 33, 54, 66, 77, 89]. It is expected that using AR will improve efficiency and accuracy of the performed tasks by enhancing the perception of the shared understanding [15, 17, 41, 54], as well as collaboration times, knowledge retention, increased problem context and awareness [22, 39, 76, 82, 89].

However, there is still little research conducted on collaborative studies [8, 10, 18, 20]. In particular, recent research on remote collaboration has shown that implementation for industrial scenarios is challenging, since most of the research, so far, have been devoted to creating the enabling technology under controlled settings, for instance, adopting simple tasks as proofs-of-concept, mostly answering about what the technology can achieve rather than about its level of integration as part of a solution for a specific problem [20, 65]. While this is the case, there is also a lack of insights into how human operators use current AR-based solutions and the type of challenges they face in real industrial environments [5, 26]. We especially lack an understanding of motivations, needs, and barriers for the targeted users.

This landscape opens up the space for academia and industry alike to work side-by-side in obtaining an overview of existing challenges that need to be overcome. For example, the creation of AR-based solutions that meet the needs of human operators [3, 19, 26], bring domain experts into the proposal and validation of such technology due to the value of their knowledge about the problem and the workflows, which may lead to increase the adoption of such technologies by a larger audience, who might not be experts in AR technology.

Therefore, understanding domain experts needs to be better integrated with the design and development processes [23, 34, 48, 51] by intertwining human expectations and practices, as well as spaces and digital artifacts into cohesive interaction solutions for Industry 4.0 [23]. It is important to contribute to support research that places AR in close relation with the collaborative contexts it aims to address and reflect on the extent of its contributions. The design focus must evolve and move from technology deployment to devise how the technology can augment human capacities as individuals or members of a team [23]. Moreover, it is paramount to ensure that the research adds to the body of knowledge and provides enough context and evidence to enable a transparent account [85] and transferability [70], thus contributing to support the wide scope of challenges concerning Industry 4.0 and facilitate the digitization of the manufacturing sector.

Accomplishing these goals is not without its challenges. The integration of domain experts is not just a matter of inviting them for a discussion or asking them to evaluate prototypes. For an efficient and fruitful exchange of knowledge and for it to guide the advances of AR technologies to support collaboration, a common ground for discussion needs to be established. In many situations, domain experts may not be technically savvy and the possibilities and expectations for the technology may seem too abstract to understand. And it is at this point that the wide range of work already performed to develop and assess AR technologies can play a pivotal role. In this regard, this article argues that the inclusion of domain experts is the step to take to advance collaborative AR solutions and proposes a methodology that takes advantage of our previous work on proposing and validating AR technologies to rapidly create tangible technological artefacts that are used to support the discussion. This is then applied to a real use case in remote maintenance with partners for the industry sector and a step-by-step account is provided regarding how it contributed to understand domain expert practises and needs, and what challenges technology has yet to surpass.

Overall, the work presented here includes the following main contributions:

  1. 1.

    a proposal of a systematic approach to include domain experts and assess the extent to which AR-supported collaboration might be useful and contribute to remote maintenance;

  2. 2.

    identification of a list of relevant aspects to assist in such scenarios, which can be the starting point for further evolving the design of new collaborative solutions and propose AR-based prototypes;

  3. 3.

    application of the proposed methods in practise and a discussion of how it worked and lessons learned, which may be applied to other remote settings.

This paper is organised as follows. First, in Sect. 2 we introduce related work on the use of AR for remote maintenance. Next, in Sect. 3, we describe the design and implementation of a participatory design to foster the contributions of domain experts to such scenarios, resulting in user motivations, along with context of collaboration from a concrete case study. These requirements are then key to the proposal of a first prototype for remote maintenance as described in Sect. 4. The, the methods and results from a user study to assess collaborative aspects of said prototype are reported in Sects. 5 and 6 respectively. Afterwards, the main insights are discussed in Sect. 7. Finally, concluding remarks and future research opportunities are drawn in Sect. 8.

2 Related work on AR for remote maintenance

Maintenance can be defined as an elaborate combination of activities that occur during the life cycle of an entity, to return it to a state where it can perform the required function. It aims to ensure equipment performance, reduction of downtime and minimize disruption of production schedules. With the increasing complexity of industrial facilities due to the rise of Industry 4.0, maintenance processes play an extremely important role, improving competitiveness and contributing to sustainable development in Industry. Maintenance is a core activity of the production life-cycle, accounting for as much as 60 to 70% of its total costs [72]. Therefore, the provision of the right information to the right professional, with the right quality and in time is critical to increase efficiency [2, 25, 90].

Unfortunately, some issues cannot be easily fixed by on-site technicians alone, and an in-depth analysis with experts is required. However, skilled specialists are usually in short supply due to the time required for these individuals to obtain such expertise. Moreover, such kind of intervention can be expensive and sometimes takes large periods of time for experts to reach the location where the maintenance tasks must be performed, which means bringing them on-site may not be a viable option. As such, remote collaboration using AR among off-site experts and on-site technicians is a prominent topic in current research [19, 64] for dealing with the increasingly complex maintenance procedures.

For more than two decades, the field of Computer-Supported Cooperative Work (CSCW) has been concerned with designing solutions to support remote maintenance, [31, 56], sometimes referred to as “collaborative maintenance” or “remote assistance” [77]. The most common solution is the use of video conference systems, which are widely available and easily accessible [41, 44]. Unfortunately, with this technology, collaborators are limited to passively watching video feeds with no means for interaction with the remote physical environment [29]. Such systems only allow assistance through verbal cues or hand gestures in response to a visual feed [43, 81]. Another constraint of video conference systems is the limited ability to reference areas of interest or specific objects on the environment, i.e., it can become ambiguous or vague, leading to confusion and error, since video conferencing is not suitable for converging spatial information [8, 43, 47]. Because these systems do not support the same level of awareness as co-located collaboration, professionals tend to adopt time consuming, complex verbal negotiations to communicate their intended directions and achieve a common goal [21, 43, 44].

As an alternative to video conferencing, AR has been investigated to combine knowledge between distributed professionals [20, 37, 50]. The concept of Collaborative AR can be described as an AR system where: “multiple users share the same augmented environment locally or remotely and which enables knowledge transfer between different users” [37].

AR-based solutions can be used in situations where know-how and additional information from professionals unavailable on-site is required [32, 56, 76, 86]. Remote professionals can add augmented visual communication cues to enhance a scene as it is captured by an on-site professional and provide real-time spatial information about objects, events and areas of interest [21, 32, 43, 46]. By creating a common ground environment, such solutions can provide a shared understanding, i.e., enhance alertness and awareness, improve the overall (level of) understanding of the working situation, as well as contribute to performing tasks faster and more accurately [11, 15, 20, 33, 43, 77].

A number of studies have explored different methods to improve mutual work understanding, task efficiency, and information sharing [44, 56]. Most focused on the use of virtual annotations to augment the scene, such as drawings, pointers, or pre-defined shapes (e.g., arrows, circles, and others) on 2D images or live video scenes, aiming to improve collaboration effectively [15, 43, 55]. Annotations are an essential interaction method in daily life, being used to summarize and highlight important elements of the physical environment or to add reminders, explanations or messages for others. A step further in the virtualization of annotations has been achieved thanks to the development of AR technology, as it is a powerful way of offering users more information about the real-world surrounding them [27].

An example was proposed by Masoni et al. [64] based on off-the-shelf mobile devices and a desktop computer to connect a remote expert with an unskilled worker performing maintenance procedures on an internal combustion engine of a car. The local user could capture a picture of the environment, use it as a visual marker and share it with the remote expert. Then, the skilled remote collaborator can annotate the received photo on the desktop computer, selecting what kind of feedback to send (based on common operations in maintenance: unscrew, screw, indications, warning, disassemble and assemble), sketches, and notes.

To complement, Aschauer et al. [4] proposed a solution using video stream sharing in similar devices as previous approaches, in which the freezing and unfreezing functions were integrated with the annotations features. Touching the video freezes the live stream and provides drawing features. Afterwards, the solution switches back to the live video view, which shows the annotations created in the 3D environment. Besides, voice and message chat were also available for communication between collaborators.

To provide on-site technicians with a hands-free approach while conducting maintenance procedures, annotations may also be visualized using a see-through HMD, as described in Madeira et al. [57]. After capturing the on-site technician context and sharing it with a remote expert for assistance, hand tracking can be used to manipulate the annotations, enabling the adjustment of their position and scale in the real-world according to the context, thus enriching the on-site professional experience and improving visualization of instructions.

Another approach consists in using 3D shared models related to the on-site worker context, i.e., take advantage of pre-existing virtual objects, also known as virtual replicas (or digital twins in industry 4.0) to provide assistance among distributed collaborators. For example, Oda et al. [75] presents a solution for guiding an on-site worker during interventions in an aircraft combustion engine. The remote expert has access to a virtual replica of the physical object, that he/she can manipulate and add annotations, thus providing situated instructions. This approach enables high accuracy since it involves 3D representations when compared with the traditional image-based 2D approaches. Nevertheless, it needs to be adapted to each new context, since 3D models must exist for each new situation, i.e., it requires each relevant physical object in the on-site worker environment to be modelled and tracked to visualize it in the expert’s virtual environment.

The use of 3D shared models has also been explored to assist in a robotics scenario, as described by Mourtzis et al. [72]. The solution focuses on a cloud implementation to facilitate communication between on-site technicians and remote experts by sharing maintenance instructions based on pre-existing 3D models. The remote expert uses CAD models of the products in the cloud database that allows the technicians to interact with the shared models. After the instructions are created, technicians are notified, download them and can proceed with the maintenance task.

Although current literature reports initial efforts towards the creation of AR-based prototypes, these efforts still rely on exploring how current technology can be directly applied. However, in order to properly support the challenges faced by human operators in such tasks, it is paramount to understand how AR technology can better assist them, which means they must be included in the design and development of new cohesive solutions for such scenarios. In this line of thought, domain experts may contribute to the process of providing context and increasingly realistic requirements in order to challenge and assess the capacities of AR technology in responding to real collaborative scenarios. To this end, research efforts must include such systematic approaches. The work reported here contributes to demonstrate the importance of applying such approaches, as described in the next sections.

3 Understanding AR for remote maintenance through a participatory process

In this section we present the methodology adopted to identify the needs from an Industrial context regarding remote collaboration. Then, we discuss the findings derived from the focus group in light of the relevant literature. We focused on maintenance due to its impact on work methodologies and benefited from an on-going collaboration with partners from the Industry sector.

3.1 Methodology

To understand how collaborative work is accomplished and how it can affect the design of solutions using AR, we propose a methodology, comprising four steps. Step 1 requires identification of industrial needs characterised by a desire for knowledge sharing between experts and on-site technicians [28] (see Fig. 1a). In this context, we capitalized on a framework of tools and features for AR-supported collaboration, which resulted from the experience of our research group in creating and testing different technologies and methods, mostly proposed in the scope of user-centered design approaches, over the years. This was harnessed to create storyboards on the possibilities of AR resources and their potential use in collaborative contexts. Using a strategy focused on an existing framework enabled a very low-resource approach to the creation of tangible concretizations of some of the concepts and features in discussion allowing to materialize ideas, and providing a common language among all individuals involved in the discussion, i.e., researchers on AR and collaboration and experts in maintenance in remote scenarios. Step 2 requires adaptation and integration of the defined requirements into the maintenance prototype, thus providing it with collaborative capabilities (Fig. 1b). Step 3 implies the creation of the necessary architecture to support interaction with the shared context (Fig. 1c). Step 4 enables iterative refinement of the prototype through evaluation with different targeted audiences (Fig. 1d).

Fig. 1
figure 1

Methodology adopted to bring domain experts into the design and understanding of how collaborative work is accomplished in an Industry context and how it may affect the design of collaborative solutions using AR: a A focus group was conducted to identify the needs from an Industrial context based on a framework in which tangible artifacts supporting the creation and discussion of storyboards were used, b this effort led to the creation of a set of requirements of relevant features suggested by the domain experts, c these requirements were fulfilled through the creation of a remote AR-based prototype for remote scenarios, d last, an evaluation was conducted following a set of tasks identified as relevant in maintenance contexts.

3.2 Focus group with domain experts

Among the available elicitation techniques, focus groups and interviews with stakeholders are considered among the most effective for knowledge transfer [14, 23]. In this context, we establish a user-centered methodology through participatory design, i.e., by actively involving stakeholders in the design process [6, 36, 49] to understand how AR could be leveraged for remote collaboration, and how it can address professionals’ expectations to ensure our work meets their needs and is usable. We conducted a focus group with eight domain experts (Table 1) to collect qualitative data [23, 38, 79, 83, 84].

To prepare the focus group, we started by defining the main goals: gathering information on different aspects of remote collaboration to understand how these can affect the design of collaborative solutions using AR technology. First, we explored the collaborative realities of each participant and progressively introduce and address the subject of AR. We designed storyboards based on a framework which is the result from previous research [Omitted for review] on different dimensions of collaboration using AR to substantiate proposals for known problems. The focus group discussion had the duration of approximately 2 hours. A moderator facilitated the discussion using a script as illustrated in Table 2. and collected data using a mobile device to record audio and notes from the participants, who provided their informed consent. The statements recorded were transcribed, cutting out anything unnecessary and boiling down to the essential information. Later, we studied and analysed the insights from the collected data to determine common themes [16, 36], or shared understandings that can be expected from this target public.

3.3 User motivations and context of collaboration

Table 1 Profile of the participants of the focus group section, including: project managers, technicians for remote support, UI designer, software tester and quality assurance engineers, as well as an associated professor and two PhD research fellows
Table 2 Example of questions used during the focus group to elicit discussion among the participants, which focused on understanding how collaboration is achieved
Fig. 2
figure 2

Context of use obtained from a focus group section. Left: co-located collaboration using synchronous communication among a technical instructor and multiple on-site technicians. Right: remote collaboration using synchronous and asynchronous communication between an on-site technician and a remote expert

The focus group allowed to identify user motivations and context of collaboration (Fig. 2) to discern what drives the different stakeholders [71, 74]:

  • Technical Instructor Motivation Elicit performance of tasks between co-located workers, favoring the acquisition of knowledge, new skills, and the development of attitudes appropriated for the professional context. Technological literacy medium to high.

  • On-site Technician Motivation Conduct maintenance and repairs on facility or domestic equipments. They often require know-how and additional information from professionals unavailable on-site. Technological literacy low to medium.

  • Remote Expert Motivation Ensure assistance to on-site technicians using different mechanisms according to the complexity of the problem. Technological literacy medium to high.

Two contexts of collaboration were identified: co-located and remote collaboration (Fig. 2). We focused on remote collaboration, this being our main interest and the scenario our partners had more experience and potential applicability:

  • Co-located collaboration Conducted in training situations between technicians and an instructor to promote the acquisition of new skills. Usually performed two months per year using text, images, videos, etc.

  • Remote collaboration Allows a remote expert to assist on-site technicians facing unfamiliar problems that require additional know-how. The size of the workspace is constant and focused on a specific equipment. In some cases, the technician must move around the physical environment (e.g., due to electrical connections). Three types of complexity tasks were identified:

    • Simple issues Collaborators use synchronous communications through voice calls on handheld devices to help with simple procedures, e.g., locate a certain component in an equipment, which usually takes between 2 min and 2 hours;

    • Moderate issues Collaborators share text and images in an asynchronous way, since the procedures require understanding the physical environment with more detail, e.g., installing a filter in a new equipment. The remote expert needs to use a graphic editor tool on a computer to create annotations based on drawings. Later, the on-site technician receives these instructions via email. This type of collaboration is frequently used when voice calls are insufficient to reach a solution, taking between 10 min to 90 min;

    • Complex issues Collaborators use synchronous sharing of text, images, video, and annotations, since the complexity of the procedures demands constant supervision and assistance, e.g., replacing an electronic board in an existing equipment, ensuring all connections are properly handed. A commercial tool is used, in which the remote expert uses a computer, while the on-site technician a handheld device. Communication usually takes between 45 min and 120 min.

3.4 Reflections on AR for remote maintenance

Using AR technology in a maintenance context was introduced and discussed through storyboards and videos from our previous research, including mechanisms for visualization of components, presentation of step-by-step instructions, use of digital documentation, among others [omitted for revision]. This was considered extremely important to provide a visual overview of AR features that can be extended to remote collaboration.

Participants found it relevant to visualize situated AR-based content aligned with the real-world environment. They recognized AR can contribute to a better understanding of where to perform a given action. Displaying annotations on top of a region of interest was highlighted since currently they are unable to do so.

Two constraints were raised. First, the existence of large amounts of information in moderate and complex tasks, since these may involve several procedures, that last for considerable amounts of time. The amount of information combined with the lack of means to view it aligned with the regions of interest creates confusion and periods of discussion between professionals, while trying to understand which information was created by who, as well as the order in which to consume it. Second, share step-by-step content could help minimize the problem of the amount of visual content, as well as serve as basis for re-visiting annotations created for a specific problem, at a later time. Therefore, when a similar problem occurs, existing content may be re-used, saving time and authoring effort. Such an approach would be useful to create a kind of AR documentation that might be used with or without a remote collaborator.

When questioned about the use of other types of content, e.g., 3D models, participants stated their line of equipments feature more than 150 models, with thousands of individual components, which may hamper the process of making the models available. This large variety of models could affect the performance of a collaborative solution, in particular, since their technicians face multiple contexts, in which internet connection may not be adequate to support sharing large amounts of data. Furthermore, they mention that the low technological literacy of their work force could entail training is required to understand how to handle such content. Given these limitations, they would give priority to a simpler, more generic solution for a wide range of scenarios.

Regarding hardware possibilities, participants emphasize their technicians are constantly moving, requiring easy-to-carry handheld devices, while also enabling the augmentation of annotations. Although this type of device requires the technician to place the device on a surface to perform a task, they reported that on-site technicians consider this as more natural, when compared to the use of a hand free approach through Head Mounted Devices (HMD), that would require additional training and adaptation periods as well as a significant investment in Hardware (HMDs, laptops, etc.). Regarding the remote experts, they consider a computer as the ideal device, but also find relevant having a handheld device for situations in which they work abroad, e.g., being present inside factories or warehouses.

Another relevant finding was the decision on whether AR is applied in remote scenarios depends on the complexity of the collaborative tasks. It would require a significant effort to create a solution, introduce it in the field (including training of professionals) and maintain it over time. In other words:“applying AR to remote maintenance needs to be worth the effort”.

3.5 Definition of requirements

From the feedback obtained, a set of requirements were outlined for the design of collaborative prototypes using AR, as illustrated in Table 3. The main purpose of sharing these requirements is to evidence the impact of the followed methodology in obtaining useful and detailed requisites that cover a wide range of features to support the collaborative effort.

Table 3 Requirements identified for the creation of AR-based solutions during the focus group

3.6 Discussion

The participatory process allowed to identify insights, which otherwise would possibly not be considered. Next, we discuss the main insights derived from the focus group in light of the relevant literature, which suggest little research has been conducted on AR-based collaborative studies [8, 10, 18, 20, 65], in particular, regarding remote collaboration in industrial scenarios.

One possible limitation to a broad adoption of remote AR-based solutions is associated to the maturity level of such solutions, since most prototypes focus only in assisting specific situations, leading to proofs-of-concept limited to explore what current technology can achieve. As emphasized by the domain experts during the participatory process, this lack of adaptability to dynamic industrial scenarios may be one of the main reasons why existing remote AR-based solutions are not widely adopted by most companies. The current maturity can also be justified by the lack of inclusion of human operators, i.e., real industrial workers in the design and development processes of such tools, i.e., take into account their motivations, needs and barriers. Despite the existence of AR-based prototypes, cohesive remote supporting tools for Industry 4.0 can only be better prepared to provide assistance by following user centered design methodologies, i.e., intertwining human expectations and practices, as well as knowing in advance the challenges these professionals face in real industrial environments. Therefore, the design focus must evolve and move from technology deployment to devise how the technology can augment human capacities.

In addition, domain experts suggest that although most studies reported in literature have focused on AR-based solutions for remote synchronous scenarios, it is also important to address remote asynchronous scenarios, in which collaborative actions take place at different times, since their off-site experts may not always be available at the exact moment assistance is required. Asynchronous scenarios present several research opportunities to complement existing AR-based solutions, namely study retention of produced information and its consumption at a later time, how multiple annotations are related and coexist within an environment to support a concrete set of actions, as well as the study of temporal sorting and clustering of information [35].

Another important topic is creation of step-by-step annotations and re-use such type of instructions later in other collaborative sessions, if an identical task demands it, which may also be interesting for other scenarios of collaboration due to its clear advantages in time and resources. Furthermore, these features contribute to assist technicians with content authoring, which remains a significant barrier to the wide spread use of AR in industrial scenarios [2, 9, 66]. The creation of step-by-step instructions may allow generation of documentation captured in the maintenance context, which can replace traditional manuals, without the need for content libraries or programming expertise, thus potential improving procedures understanding.

Another essential point that stands out in the focus group regards the use of 3D shared models. While having such type of solution provides higher levels of detail for team-members, it was not considered the best alternative by the domain experts, mostly due to the fact that it requires the existence of digital twins for every maintenance scenario, which may not always be possible for companies that cover hundreds, sometimes millions of products. Likewise, it also requires a higher technological literacy to handle 3D aspects of said approach, which most workforce’s may not contemplate.

Equally important, despite the large speculation regarding the advantages of HMDs, the use of such devices requires careful analysis, since not all scenarios may benefit from their in-situ visualizations. The industrial partners presented in the participatory process stated that in the past they had surveyed part of their workforce on such topic, reporting that the majority of their on-site technicians preferred keep using traditional handheld devices, despite the obvious limitation of not offering a hand-free approach when conducting maintenance tasks. They argue that their price and required training does not make them the most suitable solution to the larger workforce’s they represent.

Hence, combining the requirements derived from the participatory process with those arising from the literature provides a context that informs the contribution/significance of the work reported in this paper, by intertwining human expectations and practices and digital artifacts into cohesive interaction solutions for Industry 4.0 as described in the next section.

4 Prototype for AR remote maintenance

This section describes an effort towards the creation of an AR-based prototype for remote collaboration based on the aforementioned requirements. The prototype aims to support scenarios that may require know-how and additional information from professionals unavailable on-site, as is the case of maintenance scenarios. Therefore, it focuses on two types of users: on-site technicians and remote experts (Fig. 3).

Fig. 3
figure 3

Example of the multi-platform capabilities of the proof-of-concept. The on-site technician is able to use a handheld device, while the remote expert can pick between using a computer, an interactive projector or a handheld device

Since on-site technicians are constantly moving, it seems adequate to equip them with easy-to-carry handheld devices, while also enabling augmentation of annotations. Regarding the remote expert, we support multiple types of devices, including computers, interactive projectors, or handheld devices.

Fig. 4
figure 4

Prototype Overview. Goal: Allow an on-site technician to capture the real world and use mechanisms to annotate it. Then, the content is shared with a remote expert for them to analyze and provide instructions (using identical mechanisms as those aforementioned). Finally, the technician can view the real world augmented with the instructions and perform an intervention

Figure 4 presents an overview of the prototype, which implements a subset of the requirements, in the spirit of an iterative and user-centered approach. When facing unfamiliar problems, on-site technicians can point a handheld device at the situation that requires assistance and manually capture (freeze) its context. Then, using annotation mechanisms, he/she can edit the captured picture, creating layers of additional information to illustrate difficulties, identify specific areas of interest or indicate questions. Next, the enhanced picture is sent to the expert to provide a relevant illustration of the situation and enable the expert to suggest instructions accordingly i.e., inform where to act, and what to do, using similar annotation features, plus some specific functions to facilitate the creation of content.

Afterwards, the on-site technician receives the enhanced picture showing the annotations from the remote expert. Technicians can place a handheld device nearby and follow the instructions in a hands-free setting. At any time the technician can pick up the device and perform an augmentation of the shared context, by re-aligning the annotations with the real world, thus receiving stabilized spatial information. Moreover, remote experts can receive video, manually freeze and annotate on the still video frame, rather than in a live feed, improving awareness and situation understanding. During this process, experts can generate content captured during real maintenance procedures and produce detailed documentation, which can be used for recording the procedure or reporting the current task progression. This process can be repeated iteratively until the task is successfully accomplished. Besides, audio communication is also available.

Fig. 5
figure 5

Example of the prototype functions associated to the on-site technician (left: drawing and notifications; augmentation of content; visualizing remote expert screen) and the remote expert (right: sorting annotations; pointing through 3D gestures; creation of step by step instructions)

According to the team member role, the prototype provides a tailored set of functions, as illustrated in Fig. 5. The use of shared images provides contextual information. Therefore, we added two mechanisms, one to suggest capturing a specific region of interest, thus improving awareness, and another for re-adjusting the shared image (e.g., rotate, scale, move).

Both collaborators can draw in different colors on top of the shared images. This enables them to highlight a specific component to be replaced by drawing distinct areas of interest or sketching an arrow. In addition, it is possible to add notes, such as relevant instructions i.e., important warnings or other contextual information.

Pointing can be extremely important to address several aspects of remote collaboration. To address this, the prototype allows pointing using 2D arrows, generating virtual arrows on a desired location of the captured/shared image. These be selected and manipulated i.e., change size, rotation, position. The remote expert can also point using 3D gestures, for example to illustrate how to perform an action (e.g., indicate where to plug a specific wire) . This function is only available when using a computer and an external sensor (e.g., Leap Motion) for hand recognition.

Sorting annotations allows sequential generation of IDs, providing temporal information on how annotations should be analysed and consumed, facilitating understanding of problems involving several instructions.

The remote expert can create step-by-step instructions, particularly relevant in asynchronous collaboration scenarios, where team members may be unable to cooperate/communicate simultaneously.

Both collaborators can re-use previous annotations from other sessions/teams, since they can be an important source for documentation, also allowing to reduce the response time. As such, besides being used for collaboration, annotations can also be leveraged to minimize the need for expert assistance if similar situations happen in the future. Specific sets of annotation sequences, created to address a maintenance task can be stored in the server. As such, if the same malfunction may come up, a possible solution can be re-used to instantly recall existing AR sequences.

On-site technicians can visualize an augmentation of annotations using the pictures captured as a marker, i.e., situated instructions on the real-world environment as an additional layer of spatial information.

Notifications also exist, e.g., images, text, and sound to enable awareness between collaborators. This is especially important in synchronous collaboration, avoiding possible conflicts. A preview of the annotations is presented before an image is shared and a confirmation panel is displayed, allowing validation before sending.

Video streaming can be relevant when combined with other features, e.g., hand gestures, providing a richer source of situation understanding, allowing an on-site technician to view the hands of a remote expert, while he/she explains how to perform a given action.

The prototype was developed using the Unity 3D game engine, based on C# scripts. To place the virtual content in the real-world environment, we used the Vuforia library. Communication between the different devices was performed over Wi-Fi through specific calls to a PHP server responsible for storing and sharing the enhanced content accordingly.

5 User study

We conducted a user study to understand if the prototype would be viable in a real remote setting, identify usability constrains, and understand participants satisfaction. As a case study, we focused on a typical remote maintenance scenario, where an on-site technician (using a handheld device) had to perform a set of maintenance procedures on an equipment, while requiring assistance from a remote expert (using a laptop computer). We defined a set of synchronous and asynchronous tasks with the assistance of our partners from the Industry sector, which resulted from analysing the most common procedures their professionals face.

5.1 Tasks

Participants would act as on-site technician and then as remote expert, while an experimenter was the respective counter part, e.g., On-site: capture the equipment context and request which component must be replaced and how. Then, perform the instructions provided using the augmented annotations displayed on top of the equipment. During this process, the experimenter (acting as the remote expert) would force multiple iterations, resulting in the need for collaboration to fulfil the task; Remote instruct the on-site participant on how to install a new filter and deal with several components in the process, while suggesting which tools to use from a large set of options. Also, create a step-by-step guide on how to replace a specific component of the boiler. During this process, the experimenter (on-site technician in this case) would also force collaboration by asking how to handle multiple aspects associated to the task, including re-visiting some aspects of some instructions to force the re-use of annotations.

5.2 Procedure

Participants were instructed on the experimental setup, the tasks, and gave their informed consent. Afterwards, they were introduced to the prototype and a time for adaptation was provided, i.e., a training period to freely interact its functions. Then, the tasks were performed, while being observed by an experimenter who provided assistance, if necessary. After finishing, participants answered a post-experiment questionnaire.

5.3 Participants

Nine participants (3 female) performed the tasks and completed the post-experience questionnaire (although a sample of just 5 users is anticipated to find approximately 80% of usability issues [73, 87]). For this stage of evaluation, we recruited participants from our University encompassing Faculty members, MSc and Ph.D. students, that had no prior experience with the defined case study, but had experience in HCI and collaborative tools (e.g., Skype, Team Viewer, etc.) in their daily activities, as well as in evaluating AR solutions.

Table 4 Example of questions asked to the participants at the end of the study, during the post-task questionnaire
Fig. 6
figure 6

Example of annotations created by the participants: how to install a new filter (left), suggest which tool to use (center) and identify which component must be unplugged (right)

5.4 Data collection

Two types of data were collected. Task performance, comprised of the time needed to complete all procedures, logged in seconds by the device, and number of errors, logged by the device and an experimenter. Although we were not focused on comparing the usage of the design against any other experimental condition, we wanted to understand the time required to perform such tasks, and assess errors caused by communication issues or by malfunctions in our prototype; Participants’ opinion, gathered through a post-task questionnaire, including: demographic information and questions concerning collaborative aspects and through notes from a post-task interview to understand participants’ opinion towards the collaborative process and to assess ease of use of the prototype features, as well as preferences.

Some examples of open-answer questions are illustrated in Table 4. We decided to prioritize participant opinions at this stage and leave validated methods, such as the System Usability Scale (SUS), or NASA TLX for future studies with more experienced participants. The data collection was conducted under the guidelines of the Declaration of Helsinki.

6 User study results

All participants were able to collaborate using the AR-based features of the prototype. On average, each test lasted for 70 min (the tasks took 40 min to complete). They found relevant seeing AR-based annotations (Fig. 6) and recognized it contributed to a better understanding of where to perform a given action, which facilitated communication and discussion. Moreover, they considered augmentation of content, drawing, creation of step-by-step and re-use of annotations as the most useful features and suggested the integration of voice-recognition into the prototype for command activation, a feature which was discard as a priority by the domain experts, given the type of environments they usually face, which demonstrates that including the domain experts in the design and development processes helps to focus on the necessary functionalities to achieve collaborative work in Industrial environments.

Next, we present the main insights associated with each feature of the prototype. We chose to present issues and suggestions made by the participants, as well as possible solutions whenever they have already been implemented, highlighting the importance of using a user-centered approach to improve our prototype.

Participants enjoyed augmentation of content, e.g., annotations aligned with the real-world environment, recognizing that it contributed to a better understanding of where to act and what to do. Participants also pointed out that this feature requires the handheld device to be faced at the boiler to visualize the AR content, which might not be practical when performing some maintenance tasks that could require the use of both hands.

The possibility to freeze the video stream was also well received, since it gives more control to the remote expert. Most participants stated that although video enables sharing each step of the creation of the annotations, simple enhanced images would be enough to solve most simpler collaborative problems. The only exception was the combination between video and the use of 3D Gestures for pointing, which is much more useful with a video than with a still image.

Participants recognized they would use drawing often, being versatile to address most needs and suggested using different levels of line thickness. They also identified the need to display a preview of the annotations before sending them, which was already integrated into the prototype using a pop-up module (Fig. 7).

Fig. 7
figure 7

Drawing: Interfaces before (left) and after (right) the inclusion of a mechanism to preview the annotations before being shared

Fig. 8
figure 8

Pointing through Arrows: Interfaces before (left) and after (right) the inclusion of a mechanism to facilitate selection and manipulation of virtual content

The use of notes was considered useful to share important messages, especially for the case of asynchronous communication conditions. Yet, participants highlighted longer text might not be practical to write or see on handheld devices.

The sorting function was considered important, as it fixes a problem which could become more relevant when a significant number of annotations exists. The possibility to select and re-adjust the order of specific annotations was also considered relevant.

Pointing through arrows was considered relevant to identify specific regions of interest. The enhancement of the selection and manipulation of this type of annotations was suggested in order to facilitate the creation of content. This was already integrated into the prototype using a pop-up module with shortcuts (e.g., rotate clock/counter-clock wise, scale and delete) (Fig. 8). Participants also stated that the only reason they would use drawing instead of this feature, would be to create personalized arrows. besides, it could be useful to have predefined shortcuts to other common shapes (e.g., circles, rectangles, etc.).

Participants stated the step-by-step feature was useful and recognized its ability to generate a set of simpler annotations, instead of larger ones with more visual content (Fig. 9). Finally, re-visiting annotations created for a specific problem at a later time was considered interesting to help minimize the need for remote assistance in some cases.

Notifications were considered relevant to the team-members level of awareness during the collaboration process, in particular the use of sound to re-call attention during for asynchronous situations, where the on-site team-member may be doing something else while waiting for the feedback of the remote expert. However, they considered that the central position of the pop-up module could occlude the annotations, which was already fixed by placing the content in a lower position in the interface.

Participants considered they would use the suggestion to change region of interest without help. They recognized the mechanism used to address this issue was well implemented, only missing a module to include text. This suggestion was already integrated into the prototype, following a similar mechanism as described before for the arrows, in this case for the selection and manipulation of the frame, while also including a module for text.

Fig. 9
figure 9

Example of step-by-step instructions shared by the expert to assist in accomplishment of a maintenance task. Starting on the left, the on-site participant is provided with the identification of which component to remove through a red contour. Then, in the center, three arrows mark which screws must be removed. Finally, on the right, an order to do such activities is provided as well as identification to replace the boiler fan

7 Discussion

Remote maintenance relying on AR is complex, multidisciplinary and extremely relevant in Industry 4.0, since the expertise to solve a particular problem is often distributed among multiple remote professionals.

In this work, we set out to understand how collaborative work is accomplished and how it affects the design of solutions using AR technology to mitigate obstacles of remote scenarios. In summary, Table 5 presents the main insights of this work, following the work by Lopik et al. [54], which presents a set of recommendations and issues grouped according to known and emerging items regarding AR capabilities for industry 4.0, and considering it is important that the community adopts more systematic methods to provide insights from the analysis conducted.

Table 5 Summary of the main results and insights of the case study

Designing AR-based solutions that intertwine human expectations and practices is a multifaceted process which relies on iterative and multidisciplinary approaches. For example, the use of tangible artifacts to create a common language during the elicitation period with domain experts with different backgrounds proved to be an advantage, which may indicate that research groups may capitalize on their work to create common ground for discussion with partners from the industry sector, which may not be experts in AR concepts, but thoroughly understand the needs and motivations of their workforce. In fact, it is obvious from the focus group that the industrial partners were willing to use AR-based tools for remote collaboration, since they already share enhanced pictures generated through visual editors and sent via e-mail and were considering using more specific and robust software. It is also clear that the tools currently being used are very limited and that the best solutions use handheld devices and simple drawing applications that seem far behind from what AR can provide to the workforce. This shows a real need for such technologies in these scenarios. Furthermore, the proposed methodology led us to obtain new insights on remote collaboration mediated by AR, define a list of important requirements, as well as a first prototype that can be brought into a more realistic environment for evaluation, discussion, and improvement, since it was evaluated and refined based on real-life tasks conducted in daily activities by the technicians of our industrial partners. This reinforces the validity of the prototype and methodology used, since the focus moved beyond typical toy problems with Lego Blocks or Tangram puzzles, that have been used in the literature and present rather low complexity, and to which most participants are familiarized with.

Among the list of novel features, elicited by the participatory process are: sorting of annotations, creation of step-by-step instructions, re-visiting past actions, as well as the use of notifications. In their own way, each feature contributed to the team-members understanding of the task progress, and the creation of augmented content tailored to each task characteristics.

All in all, participants with different backgrounds were able to collaborate and fulfil the proposed real-life based tasks through the AR-based prototype, few moments after they have been introduced to it and to the notion of enhanced stabilized annotations and other concepts that were not familiar to them at the time of the study.

Notwithstanding, the prototype also presents some limitations since a picture/video stream is a 2D representation of a 3D scene. In complex operations it might be required changing the point of view to better comprehend the problem context. The region of interest approach we propose is an attempt to tackle this difficulty. Yet, all these questions of selecting the point of view (especially if the remote expert requires seeing a specific section or components) must be addressed in future research. How to remotely indicate the proper view? Some work has used additional cameras or remotely controlled robotic arms, but such complex set-up does not seem feasible in industrial settings. Some possible alternative are the used of 3D models (as illustrated by [1, 7, 24, 58, 75]) that can be manipulate from both sides to show for example the perspective of interest or indicate to the on-site technicians where to point the camera at. However, these approaches also present drawbacks (number of 3D models required, etc.), which means a hybrid approach may be the way to go to ensure more difficult tasks requiring moving around the environment can also be supported.

Also important is the use of HMDs for such scenarios, since some participants mentioned its absence, even though we have not adopted this approach. The use of HMDs seems a good option for the on-site collaborator. Nevertheless, its adoption usually does not take into account the workforce computer literacy (which can lead to the technology rejection), and at the same time, the need of a graphic enabled computer, the wiring and the lack of comfort of some solutions and the rather low battery lifetime of others, associated to the well known interaction constrains (even an alternative as voice recognition may not work properly in noise scenarios as the ones Industry settings incorporate) do not seem compatible with real use scenarios in which the technicians move from location to location. Currently, it seems that the way to go is the use of mobile devices although this compromise will hinder in some way the possibility to perform the maintenance while seeing the instructions, since on-site technicians will have to hold the device to see the instructions and put it aside to perform the maintenance itself. In this context, the use of augmented pictures may be a good compromise, but the development of portable self-contained AR enabled HMD for full days of labour is a real need.

Besides, one of the main challenges identified during the user study was how to deliver contextualized information, i.e., how information can he shared without cluttering the users’ field of view and without interfering with their task. A problem mentioned in some situations by the on-site participant when the remote annotations appeared in an intrusive way, thus occluding/cluttering important parts of the environment. One possible solution our prototype already supports as a result from the focus group discussion is the use of temporally situated data: step-by-step instructions created to explain a full stack of operations. As such, it is possible to only display the relevant information at each step, without avoiding information overload. However, when real time interaction is performed, a mechanism must be defined to allow either the remote user or the solution itself to select the best location and size to show the necessary information without occluding relevant parts of the view.

Another challenge is ownership of virtual content, i.e., how to present or discard information at a given moment, according to the collaborator’s needs, aiming to support multiple team-members at once, e.g., one to many scenarios of collaboration: one on-site technician and three remote experts. This is particularly important to understand who provided a given set of recommendations, contributing to the team-members awareness and problem resolution. As such, it may be possible to generate new knowledge from different sets of thoughts and claims. Likewise, when a technician faces a problem that cannot be solved with existing instructions, which may be somehow outdated, s/he may request assistance from the individual(s) associated with such content, thus allowing to update said instructions.

Furthermore, an important topic that must also be addressed is associated with the characterization and evaluation of remote scenarios supported by AR. As recognized by the community, experimental evaluation is often limited or absent when we address such scenarios [8, 18, 20, 41, 54, 59,60,61,62,63, 69]. In fact, evaluation is even more relevant in industrial environments, to understand if AR-supported solutions can be useful in such contexts, and in what conditions. This is challenging, as evaluation needs to provide measures that, in the long-term, also help decision makers have an increased quantification of its impact on industrial processes to inform the adoption of these technologies. Therefore, with the increasing interest in remote maintenance, as well as other areas of remote collaboration application like inspection, quality control, repair or training, among others, it is imperative to design and develop better guidelines, methodologies and evaluation tools by/while trying to understand some of the following questions:

  • Can we use/adapt evaluation methods from other domains? For example, learn from areas such as Tele-rehabilitation, CSCW and Groupware, Human-Robot Interaction or Psychology.

  • What tasks are relevant to evaluate this type of solutions, so that we can encompass the full complexity of the solution, its features and interaction capabilities?

  • What aspects/dimensions of collaboration should be considered to characterize the collaborative phenomenon and improve the work effort over time?

8 Final remarks and future work

This work gave us the opportunity to uncover insights on the real needs of the Industry sector regarding collaborative scenarios through the involvement of domain experts. The process of using tangible artifacts for creating a common language, acquire information about the collaborative context and comprehend the workforce needs with such experienced individuals confirms the need for traceability, offering useful qualitative feedback on how to support remote collaboration.

By merging these outcomes with literature methods into a prototype and performing its evaluation through user study on remote maintenance, we were able to cover existing gaps of recent literature. For example, provide a list of important requirements that can guide researchers and developers in creating AR-based solutions, while most works only report the technological advancements, completely isolated from describing the context of use, and the user needs. Furthermore, we found that the AR-based prototype using manual stabilized annotations and video sharing provides means for remote experts to collaborate with on-site professionals, regardless of their localization and time in real-life tasks. To elaborate, it was made clear the importance on novel features, like sorting annotations, step-by-step instructions, re-visiting past actions and user notifications for industrial scenarios with rather long and sometimes complex tasks. In addition, we report and discuss the main insights of this work, based on a list of emergent recommendations, and issues/challenges. This effort can lead the research community to a more (needed) critical analysis on how to address scenarios of remote collaboration through the use of AR technologies, as well as elicit novel opportunities that need further research to improve the collaborative process. Last, one of the more challenging areas we unveiled in this work was the necessity to develop a set of methods and processes to better understand and evaluate the collaborative process. While we must be prudent with generalizing our findings, we expect our results to be valuable for future reproduction into more realistic environments, including other domains of remote collaboration besides the industry sector, promoting evaluation, discussion, and refinement according to the needs of distributed team-members. This work is being expanded by investigating characterization and evaluation of remote collaboration supported by AR technologies, to better understand what should be taken into consideration when addressing the collaborative phenomenon, leading to a more effective work-effort. Although the study presented was conducted in an environment purposely configured to be as realistic as possible, we recognise the need to perform field studies with domain experts to test our findings and validate our prototype in real design settings. Later, we will also investigate how the use of different interaction devices and notification methods can affect the collaborative process in dynamic scenarios of remote maintenance.