Keywords

1 Introduction

There is a great deal of interest in augmented reality (AR) applications in research and teaching, for entertainment, as well as for use in industry [1,2,3]. Besides stationary [4] and hand-held systems [5], in particular head-mounted displays (HMDs) are gaining increased attention. For professional use in industry it is necessary to select displays that meet the requirements of operators and users in terms of usability, ergonomics, functionality and robustness. The evaluation of AR HMDs is complex, covering aspects like the quality of the display (including aspects such as field of view, resolution, contrast, visibility under different lighting conditions), handling, reliability, robustness and usability, as well as the wearing comfort and ergonomics (weight, balance, run time on batteries) [6]. While some of this information is provided as technical data by manufacturers this is usually insufficient to assess more complex aspect like comfort and ergonomics.

In this paper we report on a test platform that includes a suite of test applications for the systematic evaluation of AR HMDs in industrial settings. The test applications allow conducting user tests with standardized tasks, thus enabling easy comparison between different displays. To realistically assess aspects like wearing comfort and ergonomics users have to work with a system in a realistic setting for extended periods – this in turn requires test tasks that are scalable in complexity and duration. The test suite supports the creation of customized tests tasks that cover a large range of task complexity and task duration.

2 State of the Art

Researchers recognized at an early stage that the quality of the display is of critical importance for the usability of an AR system. Already in 1995 Rolland et al. [7] examined both the technological issues involved in AR display usability, as well as central perceptual and human factors such as depth perception, user acceptance, and safety. In 1997 Azuma [8] discussed the advantages and disadvantages of various display technologies and identified important display qualities such as resolution and contrast. Numerous evaluation studies of AR systems exist in the literature and many of these include the evaluation of different displays (e.g., [9] or [10]).

Existing studies can provide general guidance in the selection of an AR display, such as selecting a suitable display type for an application (e.g. HMD vs. Handheld vs. Projection [11]), but are often of limited use in the selection of a display for a specific development project in industry. A key factor is the rapid development of AR hardware in recent years and the fact that AR hardware has only recently reached the maturity and reliability for productive use in industrial applications. Published evaluation and test results often cover out-of-date hardware or prototype systems.

The aim of our platform is therefore to support the evaluation of HMDs by means of a standardized and partially automated procedure. The use of AR displays to assist in picking and assembly tasks has been investigated in research for a long time [12, 13] and is now established as a product in the market (e.g. [14]), which makes this application well suited for test purposes.

3 Evaluation Platform Requirements

As part of our project, a test platform for HMDs is developed consisting of two parts: a hardware test bed that allows to measure key parameters of HMDs in a largely automated way and an interaction part with standardized user tests. In this paper, we focus on the interaction part that is currently being validated in user tests with common AR HMDs like the Epson Moverio, Microsoft Hololens and Vuzix glasses. The hardware test bed has recently been completed as a functional prototype and consists of a platform on which a mannequin head is mounted with two high-resolution cameras at the eye positions (Fig. 1).

Fig. 1.
figure 1

Hardware part of the test platform and moving directions of the platform

The platform is equipped with sensors and actuators. The sensors measure the mass and balance of the HMD, and the motion actuators allow to move the head to measure latency and lag. The cameras make it possible to automatically measure several important aspects such as field of view, contrast and occlusion.

The interaction part of the evaluation platform was designed to enable user tests that measure important qualities of AR HMDs that are difficult or impossible to derive from the raw data provided by the hardware test bed, e.g. wearing comfort and user fatigue. Wearing comfort and fatigue can only be realistically assessed by human users after prolonged use of a HMDs in a realistic setting. Other qualities like readability of information and recognizability of graphics in different places in the field of view are also more realistically assessed by user tests.

Requirements for the test tasks were derived from in-depth discussions with practitioners using AR in industrial applications in industry, research and university settings. From the results the following requirements were established regarding the test tasks, the test environment and the implementation (Table 1).

Table 1. Overview over requirements for the test tasks

The same process was also used to establish a collection of variables (qualities) that should be determined in the tests with users (Table 2).

Table 2. Overview over variables for user tests

Based on these requirements solutions were developed in an iterative user centered design process to cover the tasks, tests, implementation and data collection as presented in the following subsections.

4 Test Tasks

Typical tasks that are currently supported by industrial AR applications are navigation, information visualization and assembly or maintenance guidance. While these are all suitable as potential realistic test tasks our initial focus is on assembly and maintenance guidance tasks because these allow to address the additional test requirements of limited training, replication and limited setup in an easier way. Navigation tasks typically require extended test areas with significant set up costs and are therefore difficult to replicate. Information visualization tasks can be difficult to scale, especially if they are to be performed by untrained users. While simple information visualization tasks can be performed without previous training it is difficult to derive realistic and useful test scenarios that extend usage time to an hour or more. More complex visualization tasks are suitable for longer tests, but usually require previous training and domain expertise. Assembly and maintenance tasks can be designed to be performed by users with limited training and experience and are well suited for tests. In our first test application we have implemented several picking and assembly tasks. The assembly of objects makes for a clear and motivating scenario for test users and the duration and complexity of a test can (within reasonable limits) be adjusted both through the size and complexity of the object to be assembled and through repetition of the assembly task itself. In future versions we plan to expand the scope of tasks to include information visualization/read-out tasks that can be combined with the assembly and maintenance tasks.

5 Tests

One potential issue with using industrial assembly tasks in user tests is that the tests have to take occupational health and safety issues into account, especially if (power) tools are involved. Regulations often require previous training, supervision and special insurance, which make it difficult to recruit large and diverse user groups for testing. Since 2001, we have successfully used construction toys like fischertechnik [15], Lego [16] and Makeblock [17] in augmented reality demonstrators and test applications. A large and diverse group of users has used these construction toys in AR applications over the last 3 years in which we conducted public outreach activities in the MobileGameLab. The MobileGameLab is a STEM laboratory in Bremen (Germany), where children, students, senior citizens and other groups can gather experience with emerging technologies like AR and also create their own applications and projects. Based on these experiences and requirements we developed the initial set of picking and assembly tests based on components from fischertechnik and Lego construction toys. As toys these are certified as ‘safe-to-play-with’ for age groups 8 and up. The toys are requiring no tools to assemble and allow for easy replication of tests across multiple instances and sites. For the creation of assembly test tasks, the different construction set systems we have identified specific advantages and disadvantages (Table 3):

Table 3. Overview over different construction set systems with specific advantages and disadvantages

Due to these differences we currently use Makeblock only for mobile robotics projects and focus on the use of Lego and fischertechnik for the test platform. Fischertechnik seems to be best suited for a generic test platform, because a flexible subset of elements (suitable for placement in a typical assembly workplace) allows to cover a wide variety of models. Lego seems to be well suited for more specialized casual applications, e.g. use in demonstrators and fair presentations, where a specific set of blocks can be acquired for the specific assembly scenario.

6 Implementation

The use of construction toys allows for an easy modification and extension of assembly tasks to adapt to specific requirements and test scenarios. The elements of both fischertechnik and Lego models can be easily reused across many test and are affordable, especially if second hand elements can be purchased in bulk. This allows to create a library of 3D models that can then be used to create picking and assembly instructions. To support a wide variety of AR HMDs with different tracking and interaction modalities on different platforms and different operating systems the test applications are implemented in Unity, because Unity support is available for all current main-stream AR HMDs. Unity also provides support for other platforms like mobile devices which can often be of interest as an independent reference.

The Unity development environment and the large collection of tools available for it allow for fast and effective creation and modification of test tasks. For future versions we plan to further automate the creation of appropriate visualization and instruction elements from the 3D models.

7 Data Collection

There is a large collection of variables (qualities) that can be of interest in the evaluation of an AR HMD. A small set of these can be captured directly in the test application, e.g. Task Completion Time. However, most variables are best measured by prompting users to provide feedback, e.g. user experience (UX) and technology acceptance, while others can be measured either through user questionnaires or sensor instrumentation, e.g. fatigue and user attention.

In the current version of the test environment we record those variables that can be derived directly from user interaction in the test software (#of user interactions, task completion time) and use questionnaires to capture the remaining variables. There is a set of widely used standardized questionnaires that address the variables of interest, e.g. IsoNorm and SUS to rate usability related variables, AttrakDiff for UX related variables and Nasa-TLX to measure the task load perceived by the user. However, presenting users with a complete set of these questionnaires that cover all variables of interest has proven to be impractical. Test participants would have to answer far to many different questions that are sometimes overlapping in the different questionnaires and the differences in wording can cause additional confusion, prompting users to abandon the questionnaire and drop out of the test. We have therefor developed an initial questionnaire (Fig. 2; currently in German only) that aims to cover the variables of interest in a coherent way, with wording adapted for industrial AR applications and with the number of questions reduced to a practical minimum. For the future we aim to refine the questionnaire by enabling test designers to limit the number of questions depending on the variables of interest in a specific test setup, by validating the questionnaires results against the established standard questionnaires and by translating the questionnaires to other languages, starting with English.

Fig. 2.
figure 2

Questionnaire (currently in German only)

8 Experience with the Test Setup in Different Configurations

Figures 3, 4 and 5 show different test setups, using a variety of AR glasses (and alternative techniques like projection) in a variety of settings. Our approach is flexible enough to enable user tests in all these settings with minimal adaptation. A simple ad-hoc setup (Fig. 4) allows conducting tests everywhere, without the need for additional infrastructure. Such an approach is especially useful in early exploratory stages of evaluation. A low-cost workspace setup provides a realistic simulation of an industrial workspace in a way that can be easily and cheaply replicated (Fig. 5). Such a setting is especially useful to conduct tests with large numbers of participants or at different locations. Tests in a real factory environment (Smartfactory OWL, Fig. 3) enable more realistic tests and are especially useful to validate the external validity of previous test results. Different setups have been used with multiple Lego and fischertechnik models in different tests and demonstrations, e.g. at the Hannover Industry Fairs in 2014, 2015, 2016, 2017, in the SmartFactory OWL since 2016 and in the MobileGameLab since 2017 with a wide variety of users and test tasks.

Fig. 3.
figure 3

AR assembly system with AR glasses (left) and projection (right) in SmartFactoryOWL

Fig. 4.
figure 4

Minimalistic test setup with Microsoft Hololens

Fig. 5.
figure 5

Low-cost workspace with Microsoft Hololens

9 Observations and Outlook

Experiences in tests with users have shown that even inexperienced users can quickly understand how to carry out complex assembly tasks using AR visualization. The central objective in our test scenarios is currently a realistic use of the HMDs over a long period of time in order to enable the test users to be able to evaluate aspects such as usability, wearing comfort and ergonomics of the HMD.

This goal has been achieved. We have conducted extensive tests with AR displays including the Microsoft HoloLens, Epson Moverio and Vuforia, as well as custom built displays.

The use of assembly tasks with construction toy systems as an application scenario has proven to be valuable, since it allows efficient creation of test tasks at different levels of complexity and cost-effective creation of test workspaces.

The visualization techniques used in the assembly tasks have so far not been subjects of investigation and we have used simple techniques that are easy to implement and that have proven to be easy to understand in previous demonstrators. In the minimalistic setup (Fig. 4) we use simple animation of augmented 3D ‘doppelganger’ objects of the building blocks to be moved. This presentation is easy to create and well understood, but can be tiring in longer test tasks, as the users see a lot of animation all the time. Test users have also commented that it would be useful if the last component that was assembled would be highlighted.

Rack based test workspaces with boxes (Figs. 3 and 5) can use more simplistic information to indicate the required part (from box number to visual augmentation of the box) that is less tiring for users in longer test tasks. Such a setup also allows to extend tests to wearable displays without camera and tracking functionality, since the clearly identifiable boxes also allow picking by direct instruction (e.g., box A3) and without AR visualization. For the future we plan to extend the test framework with a wider set of visualization options to experiment with the usability of visualization techniques in addition to the display hardware.