Keywords

1 Introduction

In order to enhance competitiveness and improve work efficiency, the factory’s manager need to get the worker’s progress in real-time. In the traditional method, the wearable device (such as a microphone, PDA and other devices) were used, they were cumbersome and ineffective compared to Kinect, which can capture the human skeleton, the skeleton data is more easily identified for body gesture recognition than RGB data, and is not prone to be interfered by the back ground (Fig. 1).

Fig. 1.
figure 1

Kinect v2 and skeleton frame

In this paper, we used Kinect to record some person’s whole working progress as xef file, extracted body skeleton as feature from it, labeled one of them as training template, and used other person’s data as testing data (Fig. 2).

Fig. 2.
figure 2

System architecture

Fig. 3.
figure 3

Raw time series, arrows show the desirable points of alignment

In the Sect. 2, the improved Dynamic Time Warping algorithm (DTW) was explained, In the Sect. 3, we will show how DTW could be employed to identify similar to query subsequences in the long data streams. In the Sect. 3, the evaluation result was shown. In the Sect. 4, we summarized.

2 Development of Operation Estimation Method

2.1 Introduce of Dynamic Time Warping Algorithm (DTW)

The Dynamic Time Warping algorithm (DTW) is a well-known algorithm in many areas: handwriting and online signature matching [1, 2], sign language recognition [3] and gestures recognition [3, 4], data mining and time series clustering (time series databases search) [5,6,7,8,9,10], computer vision and computer animation [11], surveillance [12], protein sequence alignment and chemical engineering [13], music and signal processing [11, 14, 15].

DTW algorithm has earned its popularity by being extremely efficient as the time-series similarity measure which minimizes the effects of shifting and distortion in time by allowing “elastic” transformation of time series in order to detect similar shapes with different phases. Given two time series \( {\text{X}} = ( {\text{x}}_{ 1} , {\text{x}}_{ 2} ,\ldots {\text{x}}_{\text{N}} ) , {\text{N}} \in {\text{N}} \) and \( {\text{Y}} = ( {\text{y}}_{ 1} , {\text{y}}_{ 2} ,\ldots {\text{y}}_{\text{M}} ) , {\text{M}} \in {\text{N}} \) represented by the sequences of values (or curves represented by the sequences of vertices) DTW yields optimal solution in the O (MN) time which could be improved further through different techniques such as multi-scaling [14, 15]. The only restriction placed on the data sequences is that they should be sampled at equidistant points in time (this problem can be resolved by re-sampling). If sequences are taking values from some feature space Φ than in order to compare two different sequences \( {\text{X,Y}} \in\Phi \) one needs to use the local distance measure which is defined to be a function:

$$ d:\Phi \times\Phi \to R \ge 0 $$
(1)

Intuitively d has a small value when sequences are similar and large value if they are different. Since the Dynamic Programming algorithm lies in the core of DTW it is common to call this distance function the “cost function” and the task of optimal alignment of the sequences becoming the task of arranging all sequence points by minimizing the cost function (or distance). Algorithm starts by building the distance matrix \( {\text{C}} \in {\text{R}}^{{{\text{N}} \times {\text{M}}}} \) representing all pairwise distances between X and Y (Fig. 4). This distance matrix called the

Fig. 4.
figure 4

Time series alignment, cost matrix heat map.

local cost matrix for the alignment of two sequences X and Y:

$$ {\text{C}}_{ 1} \in {\text{R}}^{{{\text{N}} \times {\text{M}}}} :c_{i,j} = kx_{i} - y_{i} k,\,i \in [1:N],\,j \in [1:M] $$

Once the local cost matrix built, the algorithm finds the alignment path which runs through the low-cost areas - “valleys” on the cost matrix, Fig. 5. This alignment path (or warping path, or warping function) defines the correspondence of an element \( x_{i} \in X \) to \( y_{i} \in Y \) following the boundary condition which assigned first and last elements of X and Y to each other, Fig. 6. Formally speaking, the alignment path built by DTW is a sequence of points \( p = (p_{1} ,p_{2} , \ldots ,p_{k} ) \) with \( {\text{pl = (pi,}}\,{\text{pj)}} \in [ 1 : {\text{N]}} \times [ 1 : {\text{M]}} \) for \( {\text{I}} \in [ 1 : {\text{k]}} \).

Fig. 5.
figure 5

The optimal warping path aligning time series from the Fig. 3.

Fig. 6.
figure 6

Correct sequence and impossible sequence

2.2 The Improved DTW

We optimized the DTW algorithm according to the order of the human motion. In the whole action routines, every section is in a certain sequence, will not be disorder. For example, the Sect. 3 can only happen before the Sect. 4, and not happen before the Sect. 2. Therefore, according to the sequence of section, we can fix the error results, the evaluation accuracy were improved more than 10%.

3 Experiment Conclusion

3.1 Data and Evaluation Formula

6 persons (Fig. 7) did Chinese gymnastic (Fig. 8) as data set. Each section has n activities, which are labeled as 1, 2 …, n.

Fig. 7.
figure 7

Collecting data

Fig. 8.
figure 8

Chinese gymnastic

The evaluation formula:

$$ D(m) = \left| {R(m) - C(m)} \right| $$
$$ A = 1 - \frac{{\sum\limits_{m = 1}^{M} {D(m)} }}{M} $$
  1. (a)

    m: Number of frames

  2. (b)

    C(m): the estimated completion of work

  3. (c)

    R(m): the practical completion of work

  4. (d)

    A: Accuracy

3.2 Results

The average accuracy of DTW is 86% (Fig. 9). The blue line is ground truth and the red line is test result. X axis: Length of the test job, Y axis: Activities and the complete degree.

Fig. 9.
figure 9

The DTW results (Color figure online)

The average accuracy of improved DTW is 97.63% (Fig. 10).

Fig. 10.
figure 10

The improved DTW results

4 Conclusion and Future Work

In this paper, we improved DTW algorithm to estimate the whole working progress completion, to use gymnastics data of 6 person’s to cross test, the average accuracy is more than 90%. The result show when to use one person’s data as template, himself data as test data, the result is almost 99%. If to use other person data as test data, or when the skeleton was lost by Kinect, the result is not good enough. In the future research we will resolve above problems.