Keywords

1 Introduction

A significant challenge facing the effort to use robots for complex assembly tasks is reducing the amount of time and resources needed to teach the robots how to perform the assembly in question. Expert roboticists can program new policies and skills within specialized domains such as manufacturing and lab experimentation, but this approach requires large amount of time and resources that are not always available [1]. Learning from demonstration has been proposed as a potential solution to this problem [2]. Using a Human-Robot Interaction (HRI) interface, the teacher, provides demonstrations of a desired task, which are then used to plan the robot actions that needs to be performed in order to successfully complete the assembly task.

This paper focuses on the specifics of the functionality of the HRI system which is used for the demonstration of the assembly from an inexperienced user and the simulation of the assembly task in a virtual environment. Although Learning from Demonstration (LfD) has been already used as a technique to teach a robot new skills [3], to the best of our knowledge it has never been used for teaching robotic assembly tasks. The corresponding HRI interface for facilitating such teaching should be simple enough so that a non-expert user can demonstrate new assembly tasks, while still enabling the user to supervise the assembly execution.

Another problem with the majority of existing HRI systems is that they require special architectures and complex interfaces to allow the user to interact with the robot [4], something that adds more difficulties to the inexperienced user. To tackle this issue, we propose the use of a simple web interface that allows the user to interact freely with the HRI system, without the constraints of a specialized architecture. The user can control complex actions of the system and supervise the process through a lightweight graphical interface on a web browser using touch controls (using a tablet PC for instance). This approach supports the creation of a user-friendly robot control interface used for demonstration and simulation of complicated assembly tasks. One of the advantages of the proposed system is that it allows the interaction of an inexperienced non-expert user with a complex robotic system for teaching assembly tasks that previously required special policies and skills from experts in this field.

The main contribution of this work can be summarized as follows:

  • User-friendly and intuitive HRI interface used for teaching new assembly tasks.

  • Web-based interface that is portable and integrated with ROS.

  • Assembly simulation functionalities.

1.1 Related Work

Human-robot interaction is a rapidly evolving field that has applications in almost all robotic tasks including manufacturing, aviation, surgery, agriculture and education. Specialized robots under human teleoperation or teaching have proven successful in hazardous environments and medical applications, as have special tele-robots under human supervisory control for repetitive industrial tasks [5]. Research for the way humans can safely and robustly interact with robots is yet at initial stages and there is much room for improvements on this field. Especially in factory assembly lines, much work has been done on teaching robots how to perform certain tasks, such as aircraft assembly, where there have been developments of techniques for observing human subjects in performing a manipulation task [6]. Most observation from demonstration methods are performed using computer vision with different detection techniques [7], while kinesthetic learning is also a valid method for teaching a robot numerous assembly tasks [8]. On the other hand, safety in the form of collision avoidance is a continuing issue when the interaction involves autonomous robots that operate in crowded environments [9].

Recently, there has been several projects addressing the control of robots through web interfaces. Such an approach is the PR2 remote lab that is used for shared development for learning from demonstration [2]. The web client of the project features many JavaScript (JS) widgets that connect to ROS [10] middleware through a ROS JavaScript (JS) library and websockets. The user can interact with the remote robot through predefined interfaces or small JS scripts. Another similar project addressed gesture-based remote HRI using a Kinect sensor, where a custom client/server web application is combined for remotely controlling the robot through four hand gestures [11]. Recording demonstrations through a web interface has also been used previously for demonstration capturing of mobile manipulation tasks provided by non-experts in the field [12]. That approach employed also a lightweight simulation environment to reduce unnecessary computations and to improve performance through the Gazebo simulator [13]. Robot control through a web interface has also been used to allow elderly users to have access to the multi-robot services of the Robot-Era project [14].

Moreover, there have been some attempts to build web frameworks for HRI experimentation. One such attempt is the Robot Management System (RMS), a novel framework for bringing robotic experiments to the web [15]. It provides a customizable browser-based interface for robot control, integrated support for testing of multiple study conditions, and support for both simulated and physical environments. The client interacts with the ROS algorithms through the RMS web server (HTTP) and with the physical and simulated environments through JSON requests.

The majority of the existing approaches that utilize web human-robot interfaces are focused on the creation of remote laboratories for user interaction with robots, and not so much on the use of web technologies for teaching robots new assembly tasks. In our work, we combine technologies that have been used to remotely control robots with technologies that have been used to teach robots from demonstration, in order to create a high-level HRI interface for teaching robots new assembly tasks.

2 Overall Architecture

Our system is built around the concept of a robot capable of learning and executing assembly tasks demonstrated by a human. The system will learn assembly tasks, such as insertion or folding, by observing the task being performed by a human instructor. Then, the system will analyze the task and generate a corresponding assembly program whereas 3D printable fingers tailored for gripping the parts at hand will be automatically designed. Aided by the human instructor, the robot will finally learn to perform the actual assembly task, relying on sensory feedback from vision, force and tactile sensing, as well as physical human robot interaction.

All of the above work heavily relies on smooth and robust user interaction with the system using the HRI interface. The HRI interface consists of a graphical user interface with which the user interacts through a web browser, allowing deployment on portable devices, such as a PC tablet. In the back-end of the system, a web server responds to the user requests from the web browser and performs the required actions using a state machine and the Robot Operating System (ROS) to communicate with the robotic system. The communication between the browser and the webserver is performed through JavaScript and PHP requests, while the communication between the web server and ROS is performed with the ROSLIBJS library [16]. In our implementation, the web server and the ROS environment are installed on separate workstations. However, it is possible to use a single machine for both, depending on its computing capabilities.

The HRI interface is built using a server-client architecture for increased portability and centralized control. Different technologies in both the client and the server should work in unison to create an integrated environment. The client side must be intuitive and lightweight so it can be deployed on a mobile device (a PC tablet in our case). On the other hand, the server side should be able to handle the computationally demanding work of analyzing and simulating, while handling the requests made by the user through the tablet. Last but not least, the client has to be in constant communication with the server to enable the continuous interaction with the user that is required.

On the client side, we make use of the latest technologies to provide the best user experience with our interface. Some of them include HTML5, used to build the user interface, mainly with the use of buttons and frames and JavaScript (JS) that is used to provide dynamic functionality and reduce the amount of pages the client has to load. We also use Ajax and JSON along with ROSLIBJS for the client-server asynchronous communication and, Gzweb [17] that is used to provide the user tools for viewing and interacting with the 3D simulation.

On the sever side, we make use of popular technologies with robustness in mind to create a capable and fully operating server. Some of the utilized technologies include PHP, a common and easy to use programming language that is used for developing the application and calling system or ROS services, and MySQL, a common and lightweight database that is used for storing all the information needed for the HRI interface. Finally, we employ ROS with Gazebo, the environment used for implementing the simulation along with many useful libraries that they provide (Sect. 3.4) (Fig. 1).

Fig. 1.
figure 1

System’s overall architecture

3 Implementation of the Interface

The developed HRI interface consists of different modules enabling the user to teach the assembly task to the robot and supervise the learning process. The user follows a predefined sequential procedure and the graphical interface allows her to control the process. The main workflow of our system consists of three main and discrete stages (or phases) that the user should follow in order to teach a new assembly to the robot. These are the teaching phase, the design phase and the training phase.

3.1 Teaching Phase

The initial step of the teaching phase is the assembly task creation, where a name describing the assembly task can be specified by the user, the parts that the robot must assemble are selected, and the assembly type, e.g. folding assembly, has to be defined. The models of the parts are uploaded (if they are not already in the system) and the system displays them in a 3D simulation environment using Gazebo (Sect. 3.4) (Fig. 2). The next step employs the detection of the uploaded parts in the work environment. This step is important for the demonstration process since the parts have to be identified while the user performs the assembly task. For this purpose, we use an RGBD camera streaming through a ROS node as a web video server, so the video stream can be seen by the user. The assembly parts are placed within the camera view and the system is called to identify them. A 3D representation of the object is overlaid on the image where it was recognized to provide feedback to the user for confirming the identification (Fig. 3a). After the parts have been detected, the assembly task must be demonstrated by the user in front of the RGBD camera. The video stream data are streamed through the web, using a corresponding ROS node, for the user to watch on the HRI screen. The demonstration of the assembly task can be recorded and then reviewed by the user who can choose to save it or discard it (Fig. 3b). The saved sequences of the demonstration are used to extract the placement and movement of the parts during the assembly.

Fig. 2.
figure 2

The HRI interface, displaying a 3D representation of the environment with the parts

Fig. 3.
figure 3

(a) Detection of the parts has been performed and the user is prompted to confirm the result, (b) The teacher’s hand is detected and assembly demonstration is recorded

The next step of the process is performed by the Key-frame Extraction module. After the assembly task has been demonstrated and the system has captured the frames of the process, it extracts the main frames that demonstrate the movement of the parts. The user can choose to add or remove key-frames if the system’s generated results do not meet the assembly’s needs (Fig. 4a). Semantic information can also be added to every key-frame so the system can use it as input for the Assembly Program Generator module. Employed labels for the corresponding states are the following: Initial position, Grasping, Picking up, Moving, Aligned, Contact, Assembled, and Retract hand. The interactions between the parts and the teacher’s hand are automatically identified to provide feedback to the teacher. After saving the key-frame selection, the trajectory of the parts’ movement during the assembly task is created by the system.

Fig. 4.
figure 4

(a) Key-frames are extracted by the system and the user provides feedback on the suggested sequence, (b) Gazebo simulation of the demonstrated assembly is generated based on the selected key-frames

Last but not least, the assembly process is simulated in the Gazebo 3D simulation environment. Using the trajectory path of the parts’ movement from the key-frame extraction process, the movement of the 3D objects can be animated so the user can have a preview of the actual physical assembly task (Fig. 4b). The computationally demanding processes of object detection, key-frame extraction and assembly simulation are executed on the system’s server and not on the client’s computer, making the graphical interface lightweight and easy to use for the non-expert user.

3.2 Design Phase

In the design phase of the assembly, the system creates new fingers for the robot’s gripper and the grasp poses for the robot. Firstly, a 3D model of the original finger of the robot’s gripper is displayed and the system generates new CAD models for the fingers based on the specifications of the parts that are going to be assembled. The designing of the fingers takes place in an external system (Catia V5) and the CAD models are uploaded automatically to the HRI interface. A progress bar informs the user about the system’s progress in creating the fingers and the planning of the grasps. After the models are generated the user can inspect them through a Javascript 3D viewer and proceed to the next stage where the system presents the best grasp poses of the robot’s gripper that have been generated, in the 3D environment of Gazebo. The user can view the most appropriate poses of the gripper sorted with respect to the assembly part that has to be gripped (Fig. 5a). The poses that don’t seem feasible or can’t be reached by the robot can be removed by the user. In this stage, the user can also request the printing of the fingers in a 3D printer and attach them to the actual robot.

Fig. 5.
figure 5

(a) Display of the simulated grasp, (b) Execution of the assembly after part detection

3.3 Training Phase

In the training phase of the assembly, the appropriate Assembly Program has to be loaded for execution. After loading the program, the system informs the user that the parts have to be detected on the working table to proceed with the assembly. The detection module is similar to the one used on the teaching phase and overlays 3D representations of the objects at the position they are detected. After the detection is confirmed, the user can either ask for an assembly simulation in the Gazebo environment or proceed with the execution of the assembly. While executing the assembly, the robot motions are generated using the information that was extracted by the Key-frames extracted in the teaching phase. Due to the uncertainties that can arise about the object’s position after the robot picks it up, the system requests a re-detection of the parts after the robot has picked them up. The user has a clear view of the assembly execution through a video stream that comes from the camera that is mounted on the table, so the robot movements can be inspected (Fig. 5b). The HRI interface also provides feedback about the assembly state by using messages from ROS created from the assembly program that controls the assembly process. When the assembly parts are in contact, the user has the option of stopping the execution and enabling physical-HRI to move the robot’s arm to the desired position to avoid wrong positioning. Finally, when the assembly is completed the HRI interface prompts for a new object detection and re-execution of the assembly if needed.

3.4 Simulation

Simulation aids the instructor to obtain a good understanding of the assembly process as perceived by the system. Simulating various parts of the procedure in a 3D virtual environment can help avoiding errors, reducing training time, and increasing safety during operation. To aid us with the simulation of the assembly, first and foremost, we have used the Robot Operating System (ROS) that is widely used in modern robotic applications. ROS employs many tools, including algorithmic calculations and PID control simulations. Aiding the need for visualization of the simulation for various parts of our interface and providing with a 3D physics simulation environment is Gazebo. This is a simulation tool that works along with ROS and provides tools and resources for 3D simulations of robot models (after the CAD files of the robot have been obtained) and other objects such as the assembly parts. Since the developed graphical interface is going to be deployed on a PC tablet, the visualization of our simulation needs to be provided using web technologies. For this reason, we chose to use Gzweb, an alternative to Gazebo visualization client (gzclient) that runs on a web browser and provides interfaces for Gazebo.

A significant amount of effort has been put to enable simulating the robot movements using ROS and Gazebo. An invaluable tool for this purpose is ros_control, a ROS library that provides interfaces for simulating PID controllers through simple configurations and plugins. Using similar structures with the actual robot hardware, we can simulate joint movements using effort, velocity or force controllers and simulate the robot’s behavior in any environment and assembly conditions. Any specific actions that had to be done for utilizing generic controllers, like the gripping mechanism or the motion generation, were created using Gazebo plugins written in C++ and python with the libraries provided by Gazebo. However, simulating a system as complex as a robot has many limitations and undefined variables like the friction values or motor transmissions, introducing discrepancies between the actual and simulated execution.

3.5 Perception

During demonstration, based on acquired image sequences, visual cues containing information about how to execute an assembly task are extracted. Recent developments allow for the use of low-cost RGBD sensors, which, include an additional channel that acquires depth images of the scene (apart from the three color channels), providing data in 2.5D. This information is regularly used by state-of-the-art methods on object detection and hand tracking in order to infer 3D pose. In our work, the 6-DoF object detection method proposed in [18] has been adopted for the detection of assembly parts that are placed inside the robot’s workspace by the human teacher. During demonstration, the RGBD sequences are recorded and using this data, hand-object tracking is performed off-line. Hand tracking is based on the method proposed in [19] and the estimation of hand poses is performed using an articulated 3D model with 42 DoF.

Key-Frame Extraction.

The representation of the underlying scene and spatio-temporal encoding of the teaching-by-demonstration process is achieved efficiently with key-frame extraction. Using the detection and tracking results from the perception module, the scene is segmented effectively. Pose (position and orientation) of the objects and the hands is used as input to the key-frame extraction module. For each key-frame extracted, the system exports XML files for the objects and each hand detected, containing all available semantic and technical information, such as semantics of the objects’ relations, poses of objects and teacher’s hands, contact points and forces, grasping states and hardware information [20].

4 Experiments and Evaluation

In this section, we present some results from the experiments we performed with 13 inexperienced users that were requested to use the interface to teach the robot an assembly task. The subjects had a brief introduction of the system’s functionalities and used a tablet PC to guide the learning process for the robot. In the end, a simulation of the robot’s movements was displayed. Furthermore, after completing the task, to evaluate the usability of the interface, the subjects had to answer on a five-point Likert scale the questions in Table 1.

Table 1. Questions that the subjects had to answer concerning the HRI interface

The answers had the following options: strongly disagree (=1), disagree, no opinion, agree, and strongly agree (=5). The average scores and standard deviations for each question are presented in Table 2. From interviews the majority of the users evaluated positively the experience and were pretty confident with the interface if they were asked to teach a new assembly to the robot without any assistance. The average score for all the questions that the subjects had to answer was 4.31/5.

Table 2. Median, average, and St. Deviation of question scores on a five-point Likert scale

5 Conclusion and Future Work

In this paper, we presented the design and technical implementation of an advanced HRI interface to allow the teaching of new assembly tasks to collaborative robots. The interface is built as a web application and can be executed on tablet PCs. To this end, many advanced technologies are employed and offer a seamless experience to the inexperienced user that is requested to guide the process of Learning by Demonstration for the robot. We evaluated the usability of the system by having inexperienced users try to teach an assembly to the robot and rate the experience, for which we had positive reviews. In the future, we aim to add new features to the interface, like multiple demonstrations from different cameras for the assembly and a more generic interface to allow the construction of a broader range of assemblies.