Keywords

1 Introduction

A high incidence of Work-Related Musculoskeletal Disorders (WMSDs) in the manufacturing and construction industries due to daily repetitive manual handling activities that involve lifting, lowering, pulling, pushing, carrying or moving [13] by workers calls for adequate ergonomic evaluation and correct risk assessment. However, correct application of ergonomic assessment tools requires systematic and comprehensive approach to data collection [4, 5]. A tool that can enable this as well as provide an opportunity for real-time feedback is the Microsoft Kinect [6].

This paper investigates how the Microsoft Kinect v2, henceforth called Kinect, can be trained to detect the manual handling activities on the shop floor towards enabling accurate data collection for ergonomic evaluations and risk assessment. It further describes how the Kinect is trained to capture data for automatic ergonomic assessment with the eventual goal of providing real-time feedback to both a human and a robot-cooperation system. Such feedback will inform the human to take corrective posture changes and also provide commands to enable the robot-cooperation system take a decision on when it is needed by the human for assistance. Our contribution ensures that only data related to a manufacturing activity of focus (such as lifting or lowering of loads) is recognized and recorded. This leads to more accurate ergonomic analysis as well as lower data and computational processing requirements during real-time monitoring. The Kinect v2 is a low-cost, gaming, depth sensing device utilized for human motion capture as well as human-computer interactions. It consists of a depth sensing technology, a built-in color camera, an infrared (IR) emitter as well as a microphone array and is an upgrade from the previous v1 model [7]. It is able to sense the location, movement and voices of people and can track up to six people and 25 joints for each person.

1.1 Literature Review of Camera-Based Data Collection for Industrial Ergonomics Analyses

Literature survey has shown that in order to collect human motion data on the shop floor for ergonomic analysis, real-time monitoring of workers is a key requirement. This is because lifting and carrying among operators in factories if not well monitored can be detrimental to the worker’s health [8]. When workers are monitored and guided during any manufacturing process, it leads to overall reduction in production errors. To achieve this, video capture systems are often used. In [9], a video camera was used to monitor the working postures of palm oil harvesters. It involved taking snapshots of awkward postures with the significant postures captured for onward ergonomic assessment. A similar technique was used by [10, 11] to collect data for ergonomic risk assessment. It involved monitoring and taking photograph of operators while performing a lifting/lowering task on an auto parts shop floor as well as technicians while changing brake shoes of freight wagons at a railway maintenance. In another experiment to investigate the risk of developing WMSDs among bicycle repair workers, [12] used still photography and video photography to collect the data needed for the risk assessment. These methods, though easy to use, are time consuming, unreliable [13], and do not give the 3D information as well as accurate joint information of workers in congested workplaces.

In order to find a more suitable and more convenient method for data collection on the Shop floor, [5] compared the use of Kinect for data collection with the use of observation methods during an ergonomic assessment of postural load using OWAS. The Kinect was found to yield many benefits as it is easy to use, required less time for data processing and does not interfere with the work process—all at low cost. Consequently, this work focuses on the use of the Kinect sensor for more accurate capture of 3D data.

Experiments have shown that the Kinect is capable of generating accurate Kinematic information required to fill an ergonomic assessment grid such as the RULA grid. The accuracy of the Kinect to perform this function was ascertained by Plantard et al. [6], using large set of work poses at different Kinect positions with the joint positions, joint angles as well as RULA scores as inputs. The result showed that the accuracy of the pose estimation is influenced by the Kinect position and that error occurred when the human arm aligns with Kinect. They however, concluded that Kinect is a useful motion capture tool for ergonomic evaluation. A Kinect—based real-time ergonomic analysis of only lifting operation has been developed in the past. This work integrated a static ergonomic model with the Kinect and the system was found to measure the recommended weight limit and strain on worker’s skeleton [8]. Prabhu et al. [14] in their study demonstrated that the Kinect can be used in a real world setting. Paliyawan et al. and Uribe-Quevedo et al. [15, 16] used it to monitor the posture variation of seated human operators so as to detect any deviation from the correct posture. In [17], Kinect was used to collect data for postural control assessment during a functional reach and standing balance task. In [18], it was used to monitor lifting operations by tracking in real-time, the body joint angles during the operation with the aim of recommending correct and safe lifting techniques. In order to ensure continuous coverage of large spaces, [19] used multiple Kinect sensors integrated with JACK human simulation software to track skeletal data of operators performing fastening operation so as to scale the Kinect data into JACK for onward real time ergonomic evaluation using the Rapid Upper Limb Assessment (RULA) tool in JACK.

The use of ergonomic analysis with robot-cooperative system for purposes of reducing WMSDs has also been investigated in various forms in literature. For example, [20] proposed the Human Centric Automation concept using robots and Kinect. Their work involve using the Kinect to capture real time motion data for automatic ergonomic assessment so as to utilize the ergonomic scores obtained as a feedback to inform the system on when robots are needed to assist. This leads to reduced physical workload, minimized production errors, decreased risk of WMSDs, as well as increased performance.

However, the previous researches in which the Kinect was used to monitor and track human operators faced significant challenges because the version of the Kinect used was the Kinect v1. Unlike Kinect v2, Kinect v1 could not detect gestures associated with manual handling as a result of the absence of the Visual Gesture Builder as well as the Kinect Studio which are special tools found in the Kinect for Windows Software Development Kit (SDK) 2.0. The importance of gesture detection cannot be overemphasized as it helps to ensure more accurate data capture for ergonomic evaluations. This work will describe in details, how the Kinect v2 can be trained to detect gestures applicable to manual handling activities. The goal is that through gesture detection, real-time ergonomic analysis can be achieved with real-time feedback to both human and robot-cooperation system so as to possibly offer intelligent automation assistance during human posture detrimental tasks, thereby leading to a reduction in overall data analysis as well as overall production errors.

2 Methodology

In this section, we discuss how the SDK tools for the Microsoft Kinect were used to detect gestures to trigger data recording for analysis. We focus on two gestures: load lift and load lowering.

Basically, the two main methods of detecting gestures using Kinect are the detection method which comprises of the Heuristic approach, and the Machine Learning (ML) which involves data sources and recording of clips using the Kinect studio [21]. In this work, the ML approach is employed because the VGB has the ability to facilitate ML techniques into the user’s gestures by employing both the recorded and the tagged data.

Usually, for ML in VGB, data is recorded in clips using the Kinect Studio. The Kinect Studio which enables developers to record clips which are imported into the VGB solution for proper training and testing of the gestures [22]. In this work, the Adaboost Trigger indicator, which is a detection technology that produces discrete results, was used to train the gestures. This is because the gestures are trained as discrete gestures.

The methods involved in the creation of gesture, training and analysis of the gestures include: (i) the skeletal data of the trainer is recorded while lifting and lowering some load, using the Kinect Studio. The trainer is recorded while performing these operations at a particular position from the Kinect, called the central location. (ii) The processed (XEF) files which are in clips are imported into the VGB by creating new solutions in VGB in which the clips are added to projects. The project, when created in the new solution, automatically splits into two, one for the building/training data and the other for the testing/analysis data. (iii) The gestures are tagged in VGB. (iv) The gestures are then built and analyzed using VGB. (v) The trained gestures’ file known as the .gbd file are then used to write the codes in the Discrete Gesture Basics tool which is a tool in the Kinect for Windows SDK 2.0. (vi) The gestures are further tested on other locations on another environment in front, beside, and behind the central location to establish if the gestures can be trained in an environment and be detected on another environment and also to establish the points beyond which the gestures can no longer be detected in the workplace.

2.1 Experimental Setup

The components used in this experiment is the hardware component which is the Kinect v2 sensor, a Laptop, tables of the same height, a work piece for lifting, and the software component which include the Color Basics, the Kinect Studio, the VGB Preview, the VGB Viewer—Preview and the Discrete Gesture Basics. The software components are all found in the Kinect for Windows SDK 2.0.

In order to investigate the effectiveness of the Kinect in detecting gestures at various distances and angles, measuring points in the environment were set up as depicted in Table 1 and Fig. 2. These points are taken within the field of view of the sensor. At each point, the confidence level of the Kinect at detecting gestures was tested. In these experiments, a lifting gesture and a lowering gesture is carried out in a room to depict the actual lifting of a part by an operator in a manufacturing environment (Fig. 1).

Table 1 Detailed experimental design
Fig. 1
figure 1

3D representation of the training environment

Fig. 2
figure 2

Schematic representation of the experimental set up showing the various locations

3 Results

3.1 Gesture Training Results

Figure 3 depict the training of the lifting and lowering gestures. During training, 30 lowering gestures were used resulting in 2742 labelled examples with an average Root Mean Square value (RMS) of 0.243 and over 253 frames while 32 lifting gestures were used resulting in 3079 labelled examples with an average RMS of 0.299 and over 445 frames. Furthermore, an accuracy of 100 % was obtained while the error was found to be 0 %.

Fig. 3
figure 3

Training of the Lifting Gesture (a) and training of the Lowering Gesture (b)

3.2 Testing of Trained Gestures Using the Visual Gesture Builder Viewer (Live Preview)

The gestures were tested on various locations in another environment as seen in Table 1 and Fig. 2 using the VGB Viewer, with the results of the lifting gesture shown in Fig. 4. The result shows that at \( P_{3} \) and \( P_{9} \), the confidence of the Kinect at detecting gestures was very low and could be questionable. This suggests that Kinect placement in the environment will affect the accuracy of the gesture detection.

Fig. 4
figure 4

Live previews at points 1 to 9

3.3 Coding the Gestures Using the Discrete Gesture Basics

The .gbd file data generated after training and testing the gestures are utilized by the programmer as a criteria for creating both lifting and lowering gestures using the Discrete Gesture Basics of the Kinect for windows SDK 2.0 and used for coding discrete gestures. It provides the best thresholds as well as the .gbd files required for coding the gestures. Prototyping with VGB is very important as the classifiers generated is useful for the coding in the Heuristic approach. The program developed is then utilized for detecting lifting and lowering gestures during real time ergonomics evaluation.

4 Discussion

To collect motion data on the shop floor for Ergonomic evaluations and correct Risk Assessment, an application has been developed which utilizes the motion sensing technique of the Microsoft Kinect sensor. The application detects manual handling gestures such as lifting and lowering on the shop floor. The details are discussed as follows: As mentioned previously, Training accuracies for the gestures used were 100 % while the error was found to be 0 %.

Figure 4 depict the live previews of the lifting gesture at different locations as represented in Table 1 and Fig. 2. It was obtained when the gestures were tested on other locations in another environment in front, beside, and behind the central location to establish if the gestures can be trained in an environment and be detected on another environment and also to establish the points beyond which the gestures can no longer be detected in the workplace. A closer look at these data shows that at points 3 and 9, the confidence of the Kinect to detect the gesture is slightly lower than at other points. One can therefore conclude that below angle 60° and above angle 120°, the Kinect may not be able to detect any gesture.

Finally, the result of the trained data is utilized in writing the appropriate code in the Discrete Gesture Basics so as to enable the Kinect to detect the gestures. Figure 5 shows the Kinect tracking humans while lifting and lowering an object. A minimum of six workers can be tracked at the same time using this sensor.

Fig. 5
figure 5

Lifting Gesture (a) and lowering gesture (b), after coding with the Discrete Gesture Basics

This application, is intended for use by integrating it with another data collection application developed by the authors of this paper which is a program developed using the Application Programming Interfaces (APIs) provided by the Kinect for Windows SDK 2.0 and which include the Windows Runtime APIs, .NET APIs and a set of native APIs, precisely the Windows Presentation Foundation (WPF) Application of the .NET Framework 4.5 in Visual Studio 2013. It can track, measure and record the angle of the joints of any human and the 3D skeletal joint positions (X, Y, and Z) in millimeters. The framework in Fig. 6 shows how the Kinect can utilize both the developed application and the written algorithm to effectively extract the motion data of human operators for real-time ergonomic evaluations and correct risk assessment with the view to offer intelligent automation assistance during human posture detrimental tasks on the manufacturing shop floor.

Fig. 6
figure 6

A framework for Real-time data collection for Ergonomic evaluation using Kinect

5 Conclusion

In the past, several methods were used to collect human motion data on the shop floor for ergonomic analysis. These include the self-report such as interviews and questionnaires; Observation methods such as video capture and Direct methods using the wearable marker sensors. The use of marker less sensors such as the Microsoft Kinect, which is a low-cost, gaming, depth sensing device utilized for human motion data capture, has been employed recently to capture data for ergonomic analysis. However, none of the technologies employed by previous researchers have considered the detection of gestures and how this can improve the accuracy of the data collection process.

This paper therefore presents an application developed to enable real-time human motion data capture on the shop floor for ergonomic evaluations and possible automation assistance through gesture detection, using the various tools in the Kinect for windows SDK 2.0. In the work, an experiment was conducted in which the Kinect was trained to detect manual handling gestures of the workers. This can be beneficial on the manufacturing shop floor to monitor in real-time, the workers with the overall aim of collecting their motion data for ergonomic evaluation with adequate feedback to both human operators and robot cooperation system.

Finally, a framework, which shows how the developed application can be used to collect real-time human motion data on the shop floor towards ergonomic evaluation and intelligent automation assistance, is presented.

6 Future Work

In the future, we plan to complete the research by conducting the ergonomic analysis using an appropriate tool and also to develop a feedback system using the Kinect which gives feedback to the human operators concerning any detrimental work postures and the robot cooperation system on when the robots are needed to assist, as shown in Fig. 6.