Keywords

1 Introduction

The ongoing digitalization in industrial applications leads to rising requirements for machines, systems and assistance devices. To a greater extend, Cyber-Physical Systems (CPS) which connect the physical and virtual world are part of today’s production and logistics facilities. Not only are complex machines becoming more intelligent and connected, also simple things e.g. bin are equipped with intelligence and are part of Cyber-Physical Production Systems (CPPS). Although, automation in production and logistics facilities is increasing, there will always be processes, e.g. picking, in which the human worker with his cognitive abilities is more efficient than any robot [1]. Employees are still subject to various physical and mental demands [2]. Thus, the aim of companies is to assist workers by new technologies while handling economic goods and through that integrate him better into complex CPPS [3]. It can be assumed that the cooperation between technical assistance systems and human beings within a “Social Networked Industry” will evoke a change in psychological and especially cognitive demands [4]. To achieve this, the interaction between humans and machines has to be designed in natural and intuitive ways and CPS have to adapt to the human worker [3].

One type of CPS is the Automated Guided Vehicle (AGV), which transports goods in warehouses and manufacturing systems. There is a variety of AGV on the market, which differ, e.g. in the payload. Nearly all of those miss natural interaction possibilities and therefore do not focus on psychological aspects. Usually, fleet management systems control the behaviour of those vehicles. Therefore, the vehicles themselves offer no or only unidirectional, limited interaction possibilities with human workers.

2 Background

For a bidirectional interaction between humans and machines, expressing emotions is one mayor challenge, which has to be faced and managed in a user-centered design.

2.1 Facial Action Coding System (FACS)

For many years now, the connection between emotions and facial expressions is an investigated subject in the field of emotion psychology. From a scientific side of view, it is not always possible to interpret expressions right [5]. Whilst interpretations may be uncertain and faulty, describing human faces is a well-processed matter. The “Facial Action Coding System” (FACS) developed by Ekman and Friesen in the late 70s is a method of describing a face by a short code. This code is based on the muscles situated in the human face, which are summarized into so-called “action units” (AUs). These AUs consist of the corresponding number of the muscles or the muscle group, as well as a letter marking the intensity, which is ranging from “A” (lowest) to “E” (highest). Therefore, the code consists of a combination of numbers and letters, which provides a precise description of the muscles in action and the resulting expression. For interpretation purposes, Ekman and Friesen provided a table to predict possible emotions from human faces described by the system. They also recommended using these assignments with caution due to possible controversial signals and the lack of evidence for this table. The detected emotions in this chart are constrained to very basic emotions, like surprise, fear or happiness [6, 7]. This system is used in Sect. 4.3 to analyze and develop the different faces of EMILI, and shows both the potential and the limits of the comparison.

2.2 Heuristic Evaluation

A heuristic evaluation is a usability analysis that helps to determine usability problems in the user interface design. In 1990, Nielsen [8] in collaboration with Molich developed ten heuristics, which function as a rule of thumb rather than guidelines for usability. Meaning that the usability principles can be accommodated to the individual design which has to be examined. The heuristic evaluation is suited to use in the early stages of the iterative design process, because the evaluator does not necessarily needs to use the system and can examine the interface on paper for example. The aim of the heuristic evaluation is to identify violations of the ten usability principles and record those violations with explanations according to principles, which have not been fulfilled. Further, the evaluator can give design advice on how to remedy the usability problems.

The heuristic evaluation is not intended to replace other usability testing methods, but can identify major usability problems in the beginning of the iterative design process.

2.3 Circumplex Model

An approach by Russel [9] claimed that emotional states can be arranged in a two-dimensional space alongside the axes arousal-sleep and displeasure-pleasure (Fig. 1). Russel states that emotions are not independent from one another and are related to each other. His point of view suggests that emotions do not stand for themselves, are organized in a circular model around the “zero”, and directly connected to facial muscles.

Fig. 1.
figure 1

Circumplex model of affect [9]

The capability to show and transfer emotion is crucial for the human-robot-interaction. Whereas speech transfers meaning and emotion, non-verbal communication relies on body gestures and facial expressions to express the emotional state. If robots do not express any communication or emotion, this could be interpreted as “cold” towards the human. Thus, it is necessary that robots convey their emotional status. Since robotics develops rapidly and their embodiment makes them more humanoid (anthropomorphic), they have the additional features of using postures or gestures to express their emotional state than solely screen robots.

It was revealed that arousal (the level of energy) and valence (negative or positive tendency) decline as the head is moving downwards. Consequently, moving you head up raises the two dimensions. This finding shows that changes in head movement influence the interaction between human and robot and that head movement can transfer intuitive signs about the emotional state.

3 State-of-the Art in Human-Robot-Interaction

Robots in working environments are almost a common picture in the production industry – embedded in cages or secured from human interference by optical safety measures they perform their purpose with an adamant alacrity. However, working in close quarters in heterogeneous teams of humans and robots may summon unseen kinds of challenges, especially concerning interaction and communication.

3.1 Robot Movement Behavior

A direct interaction between humans and robots evokes several safety concerns because AGVs must not harm humans [10, 11]. There are social as well as cognitive aspects when robots interact near humans [12]. Usually, AGVs are equipped with sensors to detect objects in their path to avoid accidents, e.g. laser scanners. The applicable guidelines only describe the necessity of avoiding direct contact with humans, but not if there has to be a certain distance, in which the robot should stop. Physical safety has the highest priority when working with robots, but neglecting mental safety may decrease the acceptance of the robot [13]. Humans apply the same social rules to computers as they do to humans [14]; this might be true for robots as well. People keeping certain distances from other humans according to the familiarity of the approached person – the so-called “proxemics” – is a well-researched field. The anthropologist Hall introduced four different zones – the intimate distance, which forms a circle up to 0.5 m around the human, the personal distance, ranging from 0.5 m to 1.2 m, the social distance, from 1.2 m up to 3.5 m, and the public distance which is beyond 3.5 m [15, 16].

Humans encounter robots differently; the reaction depends on both universal and individual properties. Takayama and Pantofaru evaluated individual characteristics, like personality and familiarity which contribute to the way people approach robots. Especially people who have or had pets drew a closer average distance to the robot of 0.39 m, while others chose an average distance of 0.52 m [17]. This result seems to fit in perfectly in Halls model regarding intimate and personal distance. Their results support the theory that people use the same social rules with robots as they do with other humans.

As for the universally applicable rules Nakashima and Sato’s study shows that an AGV moving towards humans faster than 0.8 m/s triggers a great deal of fear in the user, while a speed of 0.2 m/s does not imply any. There was a proportional correlation of AGV movement-speed and distance kept by the user [18].

Bulter and Agah tested that an AGV movement-speed slower than the human walk-speed was the most pleasant one for people with little experience with robots. They also discovered that the height of a robot makes a great difference in the distance kept to the robot [19]. Hirori and Ito, tested robot heights of 600 mm, 1200 mm and 1800 mm and the distance kept by the test subjects and confirmed these results later [20]. Therefore, robot-height may also cause fear in humans.

3.2 Displaying Information in a Natural Way

Mehrabian reveals that 55% of affective information is transferred by nonverbal elements e.g. facial expressions [21]. Emotions can be expressed by using dimension models (arousal, valence and stance) or categories such as anger and happiness.

In robotics, one has to distinguish robots after their tasks and goals. Robots like Pepper [22] are determined to be “social” robots and help in assisting people. Pepper’s main aim as a social companion is to read emotions of its human counterpart and is used as a receptionist. It can recognize clients it met before, engage in a conversation or organize meetings. With its humanlike shape and 20 degrees of freedom (DOF) it can move head and arms. The design of Pepper’s face is limited to its eyes and mouth, which are static. Here the key to social behavior are the use of lights around its eyeballs and the use of postures. Additionally Pepper has a touch tablet on her chest, enhancing human-robot-interaction. The tablet displays images, videos and web pages.

The first designed robotic faces in the case of Feelix [23] and Kismet [24] were partly humanlike, because of the constraints of mechanical design and control. This is shown by the immediate and abrupt changes in facial expressions. The design focuses on the components mouth, eyes and eyebrows and is based on the FACS. Feelix has two lips and two eyebrows to convey the six basic emotions. In comparison, Kismet has 15 DOF in his face portraying a set of emotions which are created by a three-dimensional affect space with the components arousal, valence and stance.

The aim of Baxter’s LCD face was to involve humanoid features that evoke the impression of a social robot but on the other hand do not raise user expectations beyond its capability. This minimalizes the dangers of the uncanny valley. Further, its emotional status is expressed by using color. In a study examining the effect of color integrated in its display, respondents felt less pleasant and less safe as the screen turned red.

Showing emotion as a robot is a challenging task, especially if expressions using body language are not feasible and humanlike facial expressions have to be used.

4 EMILI – A Newly Human-Robot-Interaction

This paper focus on the system EMILI (“Ergonomic Mobile Interactive Load Carrier for Intralogistics”) which represents a combination of simple small load carrier and an ergonomic automated guided vehicle. Besides having the advantages of both system types, the focus lays on physical and psychological ergonomic aspects.

4.1 Concept and Design of EMILI

EMILI has a storage area and the dimensions of a small load carrier. Therefore, it can simply be integrated into existing facilities and processes. The contained characteristics of a vehicle enable EMILI to drive on its own, shows that no conveyor technology is needed. EMILI is adjusting its load handling device via lift functionality according to the workers height (Fig. 2) This ensures the picking of items at ergonomic heights of the worker.

Fig. 2.
figure 2

Load handling device – upper lift (a) lowered, (b) raised

Since no external drop off stations are needed, location-independent picking scenarios can be realized. Subsequently, the vehicle is able to interact bi-directionally with the worker. On the one hand, web-based software interfaces exist with which it is possible to control EMILI via App or wearables e.g. controlling EMILI by smart glasses. On the other hand, EMILI has an industrial-grade, robust, energy efficient, bi-stable segmented e-paper display through which the human worker gets direct feedback (Fig. 3).

Fig. 3.
figure 3

Status faces – (a) “Sleeping”, (b) “I’m busy”, (c) “Sorry”

EMILI is controlled by the Robot Operating System [25]. All of the software components are represented through so-called nodes with which it is easy to integrate new functionalities. The interaction with machines and other systems in its environment is realized through a RESTful service interface. Defined GET-requests deliver information e.g. battery state and POST-requests start different functions e.g. stop.

4.2 Interaction Between EMILI and Humans

EMILI was designed to be most intuitive when it comes to interaction between itself and human workers in the field of warehouse logistics. In terms of safety it was designed to enable collaboration with humans, which allows to share the working space without any safety installations like sensors. Hence, a seamless process integration is feasible. Many channels of communication have been taken into account in the design phase. It is possible to interact with it using apps on handheld devices, smart glasses, via its e-Paper display and visual feedback expressed via colored LED. These modalities offer the opportunity for the human to easily see the status and gain control of the robot. Using EMILI’s web interface it is possible to create apps for smart phones, smart glasses or laptops. Smart glasses, in terms of Augmented Reality glasses, offer a wide range of visualization and precise interaction. Virtual objects can be placed right into the field of view of the worker and overlay on the real world shape of EMILI. As an example, in case of power problems, EMILI will indicate to start a maintenance process and a person can wear AR glasses to find more information on the problem. Further, virtual arrows in the view field can point to the exact screws to be loosened to get access to the battery. Virtual guidelines can now help the worker to attach the correct power plugs on the right spot and apply the adequate voltage.

The integrated colored LED strips are mounted at the outline of EMILI. Each of the LED can be switched on, off and altered in color. Thus, row sequences of a single color or different colors can be highlighted or only a few parts can be switched. Here we gain a variable display for the indication of the mood (respectively the state of EMILI), turn signal or indication lights for the load-handling device.

More intuitively is the communication via the integrated display in the front, which enables it to express the overall state using an Emoji-like avatar face, show actual states using specific icons (Fig. 3) and state textual phrases. These parts of the display can be enabled or disabled depending on changes of state and its working progress – also animation like turning signals at both ends of the display can be shown to indicate changes of track direction. This enables the worker to accurately interpret the actual system status without the need of external technology.

Finally, there is also simple voice communication possible, as known from Alexa, Google Assistant or Siri. Using a Bluetooth headset the person interacting with EMILI can give simple commands like “EMILI, follow me!”, “EMILI, stop!” etc.

4.3 Display Analysis

To review how well EMILI’s faces are designed and verify the reliability of the display-design, we compared its faces to corresponding emoji from the Unicode and to examples from the FACS. EMILI mimics human facial expressions and therefore emotional states – that makes the purpose of its face similar to the function of emoji which are designed to express human feelings [26, 27]. As stated before, human facial expressions are never to interpret correctly in every situation, and also emoji vary in the reliability of interpretation [5, 28]. Emoji as part of the Unicode are set in their meaning while the appearance is free to design. The comparison is based on the emotion EMILI should present and the corresponding emoji. In the following discourse, three particular examples are chosen from our investigation.

The first expression is EMILI’s “I’m Busy” face (Fig. 4a), displayed by widely opened eyes and raised up lip corners. The corresponding emoji is represented by the Unicode code U+1F61A, which is described as “slightly smiling face” (Fig. 4b). To recreate EMILI’s face with expressions given by the FACS, two human expressions need to be combined. The left photo (Fig. 4c) shows a neutral mouth and widely opened eyes, the code for this picture is 5E. AU 5 pulls up the upper eyelid in an intensity of E, which is the maximum. The right photograph (Fig. 4d) shows a human face with eyes in a neutral position, the lip corners are pulled up slightly. The responsible AU for the lip movement is the AU 12 called “lip corner puller”, the photo is rated 12B. Considering the FACS, AU 12 occurs with the emotion “happy”, whilst AU 5 appears with anger, fear and surprise. Given the fact that AU 12 is only present with the emotion “happy”, EMILI’s “I’m Busy” face has a mixture of happiness und surprise.

Fig. 4.
figure 4

(a) EMILI’s “I’m Busy” face (b) the “slightly smiling face” emoji, (c) and (d) samples from FACS [6, 28]

The “Sleeping” face (Fig. 5a) is represented by closed eyes and a slightly smiling expression. The comparable emoji with the Unicode U+F1634 is described by “sleeping face” (Fig. 5b). Comparing the two pictures, two differences are obvious – the mouth and the “z” letters above the emoji’s head. EMILI has a smiling closed mouth and lacks the “z” letters, while the emoji has a round, opened mouth. In a study of Miller et al., the “sleeping face” emoji was one of the least misinterpreted, which means that a high parity with this emoji and EMILI’s face could provide a correct interpretation of the shown emotion. Choosing human faces from the FACS for the comparison, the left photo is the same photo as in the first comparison (Fig. 5c). The right one is a face with closed eyes, rated with 43E (Fig. 5d). AU 43 is responsible for closing the upper eyelid, and is not mentioned in the table for emotions predictions. Therefore, no appropriate emotion can be considered.

Fig. 5.
figure 5

(a) EMILI’s “Sleeping” face, (b) the “sleeping face” emoji, (c) and (d) samples from FACS [6, 29]

The analysis of the “Sorry?” face (Fig. 6a) was far more difficult than the previous ones. With eyes facing diverse directions and a single-sided lowered down lip corner, no correspondent in the repertoire of the FACS exist. EMILI’s face shows the emotion “confusion”, hence the correspondent emoji with the description “confused” with U+1F615 (Fig. 6b) was chosen. For the comparison, a photo with a lowered lip corner rated with 15B (Fig. 6c) and a human face with a subjective confused looking face (Fig. 6d) are selected. AU 15 is called the “lip corner depressor” and is responsible for pulling the lips down. EMILI shares the one-sided lowered lip corner with the emoji, but the emoji has a different set of eyes. The left photo (Fig. 6c) is connected to the emotion “sadness”. The right photo (Fig. 6d) is rated with 1B+2C+4C+5A+V11C. AU 4 pulls the eyebrows down, while AU 1 moves the inner eyebrow corners up. AU V11C moves the skin beneath the cheekbones, the “V” describes an asymmetrical shaping.

Fig. 6.
figure 6

(a) EMILI’s “Sorry?” face, (b) the “confused” emoji, (c) and (d) two samples from the FACS [6, 30]

The combination of AU 1, 2, 4 and 5 occurs in the context of “fear”, AU 1, 4 and 11 are typical for “sadness”. The combined look would be a sad and fearful expression.

The investigation shows the difficulty of the comparison between human and artificial faces, as well as the boundaries of this method. Considering the overall investigation, the comparison of the three faces “I’m Busy”, “No, Sorry!” and “Sleeping” was likely, whilst other faces lacked of comparative possibilities. Using the FACS to compare artificial facial expressions with real ones may be limited, but it might always be advantageous to use the system to create emoji with a high accordance in interpretation. To encounter misleading faces in the development of EMILIs faces, an empirical study was designed to reduce the possible misinterpretations of its faces.

4.4 Survey

On behalf of the heuristics evaluation ambiguous icons and facial expressions for the different states were tested in an online survey. Besides mitigating the risk of a misleading design, another advantage of using an online survey is to accumulate a large sample size. Hence the made statements reflect the opinion of a large population. The survey was distributed at the Fraunhofer Institute for Material Flow and Logistics and 191 respondents anonymously took part. The interval coefficient was 95% and the error of margin was 6%. For the survey, we designed two alternative interfaces to the current interface and for comparison, used two scales of the User Experience Questionnaire.

The focus is on the outer appearance and was tested on two scales on a 7-point Likert scale: attractiveness and perspicuity. In the survey participants had to classify EMILI’s facial expressions (Fig. 7). Almost half of the respondents (48%) chose design b and roughly 40% chose design a. These close results are reflected on the attractiveness scale. Design b and design a with means of 4.6, respectively 4.5 are nearly equal, whereas design c reaches a mean score of 3.5. On the perspicuity scale design b and design a do not differ much (means: 4.9 and 5.04). Design c has the lowest score with a mean of 3.59. Taking further questions about the choice of design into account revealed that – although text hints support the icons and therefore EMILI’s state – participants did not connect those together.

Fig. 7.
figure 7

(a) existing interface (b) and (c) two alternative interfaces

The aim of the interface is to create an intuitive interaction between the user and EMILI. This means that EMILI has to be accepted by the user as a collaborative robot, has to trust in EMILI’s abilities, has to know in which state EMILI currently is and which task it is working on. In consideration of the users’ demands and the results of the online survey, we used the design b and enhanced the interface. Further, EMILI’s “Sleeping”, “Sorry?” and “Error” faces caused ambiguity and led to misleading interpretations. Based on the recommendations of the respondents and oriented on the FACS, three new faces (Fig. 8) replaced the misleading ones.

Fig. 8.
figure 8

EMILI’s (a) “Sleeping” face (b) “Error” face (c) “Sorry?” face

4.5 Movement Analyzes

It seems necessary for AGVs to keep specific speeds and certain distances to humans in order to avoid emotions of fear in working environments. The speed of an AGV may be faster than 0.2 m/s, but should never exceed 0.8 m/s. Keeping distances to other individuals seems to be a crucial part in the interaction between humans, and is therefore a necessary trait for robots in Human-Robot-Interaction. This problem may be encountered with technical solutions, but the robot must also distinguish between non-human obstacles and humans themselves; otherwise, it may get stuck between shelves or walls. This is possible with face-recognition-technologies like those that are used in the robot Pepper [31]. However, keeping a certain and settled distance to humans may not be a universal solution to the problem – there are situations where working in close quarters is particularly wanted. This evokes two further problems – on the one hand, the robot has to keep different distances in different situations and on the other hand, the robot has to communicate its states to the surrounding humans in a comprehensible way. For the first challenge, different modes of operation may be the solution. In pick-situations, where EMILI for example is used as a mobile container, a very short distance is kept to humans to ensure the proper execution of the task. The picker is used to EMILI and working in close range with it, therefore no great distance is required. As for the second challenge, in situations where EMILI is doing a transport-job the robot must keep greater distances to humans. It may encounter individuals who are not as familiar with the robot as the picker.

Letting the robot keep the equivalent distance according to the task might also be not enough, it has to indicate which distances it is keeping as well. Otherwise, unfamiliar individuals might not distinguish how close the robot will approach in the situation at hand. EMILI is equipped with LED stripes at every corner. These signals have to be mounted on AGVs moving faster than 0.3 m/s, and have to flash when it is about to move or during the movement itself [11]. There is so far no restriction to a certain color for this signal, which creates the opportunity to use these LEDs to communicate the kept distances of EMILI to humans using a simple color code. To communicate that it is keeping close distances, we propose depending on the environment to use red lights, for medium ranges green and for greater distances blue. These measurements could help making the working environments safer, not only concerning physical but also emotional safety.

5 Summary

With EMILI a system has been developed which simplifies the interaction between humans and robots and also gives an outlook on how future interaction can be realized. The system was designed from scratch, with the goal to most adequately match the workers interaction possibilities without overloading cognitive capabilities. For that purpose, a range of interaction modalities was implemented using different technologies. In this paper, we focused on EMILI’s display with which it can represent emotions. We compared its faces to known emoji and by using the FACS. Further, an online survey was conducted to identify ambiguous states. Those results let to a user-centered redesign of the display. With those results and by emphasizing on the movement behavior of robots in human workspaces the challenges of digitization can be handled and collaboration is fostered in times of Industry 4.0.

6 Outlook

The development of intuitive human-robot interaction has just started. New interaction possibilities e.g. “Follow me” mode or speech recognition will be implemented. Further, the usage of colors, their intensities as well as light sequences and motions will be used to emphasize EMILI’s different states. The newly developed display and the implemented kept distances and velocities of EMILI will be tested in a user study to improve human-robot-collaboration.