Abstract
This paper introduces a basic frame for rehabilitation motion practice system which detects 3D motion trajectory with the Microsoft Kinect (MSK) sensor system and proposes a cost-effective 3D motion matching algorithm. The rehabilitation motion practice system displays a reference 3D motion in the database system that the player (patient) tries to follow. The player’s motion is traced by the MSK sensor system and then compared with the reference motion to evaluate how well the player follows the reference motion. In this system, 3D motion matching algorithm is a key feature for accurate evaluation for player’s performance. Even though similarity measurement of 3D trajectories is one of the most important tasks in 3D motion analysis, existing methods are still limited. Recent researches focus on the full length 3D trajectory data set. However, it is not true that every point on the trajectory plays the same role and has the same meaning. In this situation, we developed a new cost-effective method that only uses the less number of features called ‘signature’ which is a flexible descriptor computed from the region of ‘elbow points’. Therefore, our proposed method runs faster than other methods which use the full length trajectory information. The similarity of trajectories is measured based on the signature using an alignment method such as dynamic time warping (DTW), continuous dynamic time warping (CDTW) or longest common sub-sequence (LCSS) method. In the experimental studies, we applied the MSK sensor system to detect, trace and match the 3D motion of human body. This application was assumed as a system for guiding a rehabilitation practice which can evaluate how well the motion practice was performed based on comparison of the patient’s practice motion traced by the MSK system with the pre-defined reference motion in a database. In order to evaluate the accuracy of our 3D motion matching algorithm, we compared our method with two other methods using Australian sign word dataset. As a result, our matching algorithm outperforms in matching 3D motion, and it can be exploited for a base framework for various 3D motion-based applications at low cost with high accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
3D motion analysis is currently receiving interests from researchers and developers in various fields due to its potentials for many application domains. Zetu by using the combination of a laser tracker and a magnetic tracker improved the process performance and add efficiency by optimizing the actions of process workers in manufacturing environment [18]. In another similar study, Balteanu provides an efficient monitoring of occupational hazards while handling heavy loads [3]. Other typical applications of 3D motion analysis are biomechanical analysis, sport motion analysis and physical rehabilitation practice, which were introduced by Humm [7], Wang [15] and Fitzgenarld [6]. Some other interesting examples of 3D trajectory analysis in other fields are watermarking for intelligent video authentication [11] and virtual head mouse implementation [10].
Over recent years, thanks to the development of sensor technology and mobile computing, trajectory-based object motion analysis has gained significant interest from researchers. It is now possible to accurately collect location data of moving objects with less expensive devices. Thus, applications for sign language and gesture recognition, global position system (GPS), car navigation system (CNS), animal mobility experiments, sports video trajectory analysis and automatic video surveillance have been implemented with new devices and algorithms.
The Microsoft Kinect (MSK) sensor system is the one that is equipped with a RGB visual camera, an infrared laser emitter and an infrared camera [9]. It can detect the skeleton positions without any marker device on the human body. Thus, it can provide the color image, the 3D depth image and the 3D skeleton data that can be used to build a human body skeleton, which can be further utilized for applications of 3D motion analysis at a simple way and a low cost. It can easily extract the human body skeleton with 3D information by the rapid human pose recognition functions that are developed on the top of the 3D measurement performed by the infrared camera.
We applied this MSK sensor system for developing a basic frame for rehabilitation practice system which can evaluate how well practice was performed based on comparison of the patient’s practice motion traced by the MSK system with the pre-defined reference motion in a database. This practice system displayed a pre-defined reference motion in a 3D motion database system, and the rehabilitation patient followed the reference motion. During this practice, the patient’s motion was traced by the MSK sensor system and compared with the reference motion by our matching algorithm. In this situation, 3D motion matching algorithms, which is generally trajectory-based object motion analysis, is an important feature for accurate evaluation for how well followed the reference motion.
The major interest of trajectory-based object motion analysis is the motion trajectory recognition. The motion trajectory recognition is generally achieved by a matching algorithm that compares new input trajectory with pre-determined motion trajectories in a database. The first generation of matching algorithms only used raw data to calculate the distance between two trajectories, which is ineffective. Raw data of similar motions will appear differently because of various varying factors such as scale and rotation. To overcome this problem, local features of trajectory, called signature, were defined for motion recognition [5,16,17]. This signature performs better in flexibility than other shape descriptors, such as B-spline, NURBS, wavelet transformation, and Fourier descriptor. Trajectories represented by the signature and the descriptors are invariant in spatial transformation. However, computing the distances between trajectories using this signature is not enough for accurate recognition of 3D motion. To improve the performance, some matching approaches were used to ignore similar local shapes of different motion trajectories or to ignore outliers and noise.
‘Matching’ is an important process in motion recognition and classification, which have been studied for years and widely used in many fields. It is achieved by alignment algorithm, and the famous and efficient ones in motion recognition are dynamic time warping (DTW), continuous dynamic time warping (CDTW), and longest common sub-sequence (LCSS) [1,12,13].
Recent researches use the full length of trajectory data for motion recognition [5,9,16,17]. However, many points of the trajectory have similar signatures because they lie on a straight line, thus computing task for signatures can be useless. To eliminate this drawback, we developed a new method that computes the signatures only from the region of ‘elbow points’ to gain advantage of computing speed. Besides, we also present a set of descriptors and normalization process for invariant motion recognition. For accuracy evaluation for our 3D motion trajectory matching algorithm, we compared our method with two other methods, PCA-based Gaussian mixture models (GMM) and global Gaussian mixture models [4], using Australian sign word dataset [2].
2 3D motion practice with MSK sensor system
The Microsoft Kinect (MSK) sensor system that used for our low cost 3D motion analysis system is depicted in Fig. 1. It is a composite device consisting of a stereo infrared camera as 3D depth sensor, an RGB camera and multi-array microphone. The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions. The RGB camera, which is similar with other normal camera in the market, captures color images associated with depth images at the same moment.
MSK attracts researchers by the capability to track the skeleton image of one or two people moving within the Kinect field of view. The MSK sensor system provides the joint information for the human body, which includes 20 joint points for each skeleton frame as presented in Fig. 2. Roughly speaking, each part of the human body listed in the diagram of Fig. 2 corresponds to a joint, which gives its position and other useful data. The connection of the joints in some way based on the position information obtained from the MSK system can build the skeleton of the body.
All information about skeleton joints from the MSK system is recorded and saved in a data file. Depth data for all joints also can be saved, and can be used for the case of a special algorithm that needs full depth data of human body. Figure. 3 shows the visual image (actually in RGB mode) overlapped with the joints and the body skeleton. Note that all information needed to generate this figure is obtained from the MSK sensor system, and manipulated and saved by the 3D motion analysis system we developed.
In practice for matching the 3D motion trajectories, the motion analysis system first recalls a 3D reference motion data which is saved in a 3D motion database and plays it on the screen, which the payer (rehabilitation patient) tries to follows. In this practice, the MSK captures the skeleton and joints of the player, and then overlaps those two skeletons of 3D motion. An example of practice corresponding to the Fig. 3 is presented in the Fig. 4. In those figures, the green-colored skeleton is the reference motion, and the red-colored one is the practice motion. Note that, beside this visual presentation, it also performs 3D trajectory matching algorithm on those two trajectories, which is described in detail in the following sections.
3 Preprocessing for 3D trajectory data
Due to the system noise, measurement noise or both, trajectory data may not be accurate. ‘Smoothing’ process is an important task because it enhances the signature’s computational stability by reducing the noise and vibration of motion. However, trajectory shape may be affected by the smoothing process. To cope with the effect of noise, the derivatives of a smooth version of data using a smoothing kernel φ(t) are considered, i.e. x (j)(t) = (x(t) * φ(t))(j). By the derivative theorem of convolution, we can have x (j)(t) = x(t) ∗ φ (j)(t). For this paper, a B-spline B(t) is taken to be the smoothing kernel φ(t). An odd degree central B-spline of degree 2h − 1 with the integer knots − h, − h + 1,.., 0, h − 1, h is given by
where the notation f +(s) mean f(s) if f(s) ≥ 0 and 0 otherwise. For a quantic B-spline, h = 3 [8].
Next, we normalize the location and the scale of a 3D trajectory so that all trajectories are transformed to a common domain. Trajectory normalization makes scale, rotation and translation invariant, which can produce better performance for the following processes. We applied the continuous principal component analysis (PCA) to the trajectory points [14], where we assume that three distinct nonzero eigenvectors can be computed from the 3D trajectory coordinates. The continuous PCA ensures the invariance of the translation, the rotation, the reflection, and the scale.
4 Signature as trajectory descriptor
For trajectory matching, we need a descriptor that can well describe the shape of the trajectory. In our study, we use the signature for the descriptor. For a point t, the signature S(t) is defined by five values κ(t), τ(t), h(t), e(t) and c(t). κ(t) is the ‘curvature’ that is a measurement for the turning amount of the contour, and τ(t) is the ‘torsion’ that presents its twist amount out of the tangent-normal plane. Other three values h(t), e(t) and c(t) are the ‘Euclidian distances’ from the point t to the start-point, the end-point, and the center-point of the trajectory, respectively. Note that the center-point is computed by the continuous PCA which is performed at the normalization process. Thus, for a motion trajectory in 3D space with N points Γ = {x(t), y(t), z(t)|t ∈ [1, N]}, the signature set D* for the entire trajectory is defined in the following form
where
An elbow point is a point on the trajectory which has the curvature value κ(t) larger than a threshold φ. If we know the coordinates of the elbow points and their sequential order, we can rebuild an approximated trajectory by connecting the elbow points with the straight lines of points. Consequently, information about the elbow points is good enough to align two trajectories for matching task. Therefore, a new literal set of signature only with the elbow points can be described as
This new set of the signature only with the elbow points has 4 or 5 times less number of elements(signatures, points) than D *. As a result, computational burden for matching two trajectories can be dramatically reduced by using D′ rather than D *. An illustration for the elbow points (block dots) and the 3 distances are shown in the Fig. 5.
5 Signature alignments
As mentioned in previous section, each trajectory is represented by a set of signature. Note that in our proposed method, we only compute the signatures at the elbow points. For each elbow point, as mentioned above, five signature elements are obtained: κ(t), τ(t), h(t), e(t) and c(t). In order to match two trajectories, two corresponding signatures should be correctly paired. Since there are many noisy factors such as different number of signatures in two trajectories, a matching approach should consider methods to handle the noisy factors. There are many approaches to match two set of sequence data such as longest common sub-sequences (LCSS) and dynamic time warping (DTW) [12]. The LCSS is more adaptive and appropriate distance measurement for trajectory data than DTW [5]. We therefore choose LCSS for matching process in our study.
Given an integer δ and a real number 0 < ε < 1, we define the LCSS δ,ε (A, B) as follows:
The constant δ controls how far in time we can go in order to match a given point from one trajectory to a point in the other trajectory. The constant ε is the matching threshold. The similarity function S between two trajectories A and B, given δ and ε, is defined as follows:
This LCSS model allows stretching and displacement in time, so we can detect similarities in movements that happen at different speeds, or at different times.
6 Evaluation of motion matching accuracy
The motion analysis system we developed was originally targeted for rehabilitation practice system. This practice system displays a pre-defined reference motion in a 3D motion database system on the screen, and the rehabilitation patient tries to follow the reference motion being played on the screen. During this practice, the patient’s motion is traced by the MSK sensor system and compared with the reference motion by our matching algorithm to evaluate how well practice has been performed. In this situation, 3D motion matching algorithm, which is generally trajectory-based object motion analysis, is a key feature for accurate evaluation for how well followed the reference motion.
In order to evaluate the accuracy of our proposed method for matching the 3D motion trajectories, we used the 3D trajectory information of the Australian Sign Language (ASL) data set obtained from University of California at Irvine’s Knowledge Discovery in Databases archive [2]. The ASL trajectory dataset consists of 95 sign classes (words), and 27 samples were captured for each sign word. The coordinates x, y and z are extracted from the sign’s feature sets to calculate the trajectory signature. The length of the samples is not fixed. The details for the experimental setup are exactly the same as that described in [4], where the data set consists of sign words ‘Norway’, ‘alive’, and ‘crazy’. Each sign-word category has 69 trajectories.
Haft trajectories from each category were used for training, and the remains were used for testing. A test sample is classified by the nearest neighbor rule (k = 5). The experiment was repeated 40 times (each time with a randomly selected training and test datasets). The average result of recognition was 84.76 %. We also performed the experiment with pose normalization method [5]. Our proposed method was compared with other methods of PCA-based Gaussian mixture models (GMM) and global Gaussian mixture models [4], and the comparison result is reported in the Table 1.
Note that our proposed method used only a subset of trajectory data while other methods used the full length trajectory data. Even though the recognition result of our proposed method does not outperform the PCA-based GMM method, the number of data points for recognition process is much smaller, which implies less computational complexity. Therefore, our proposed method is more advantageous than PCA-based GMM in term of recognition speed.
7 Conclusion
In this paper, we introduced a rehabilitation motion practice system which detects 3D motion trajectory with the Microsoft Kinect (MSK) sensor and proposed a 3D motion matching algorithm. The rehabilitation motion practice system displays a reference 3D motion in the database system that the player (patient) tries to follow. The player’s motion is traced by the MSK sensor system and then compared with the reference motion to evaluate how well the player follows the reference motion. In this system, 3D motion matching algorithm is a key feature for accurate evaluation for player’s performance.
For accurate evaluation of rehabilitation motion practice, we proposed new 3D matching algorithm which only uses the features, named in ‘signature’, obtained from ‘elbow points’ which are the points that have the curvature value larger than a specific threshold. We performed experiments to show its effectiveness for 3D motion matching applications using sign language recognition [2] along with comparison with two existing methods PCA-based Gaussian mixture models (GMM) and global Gaussian mixture models [4].
In the first step of our matching algorithm, all trajectories are smoothed and then normalized by continuous PCA [4]. By using continuous PCA, all trajectories are invariant to translation, rotation and scale. Once all the trajectories are normalized, a set of signature which contains both local features and global features of trajectory is computed from only the elbow points. The longest common sub-sequence (LCSS) matching algorithm yy was then used to match the signatures from the elbow points in two trajectories. Comparison of one trajectory and another trajectory in a database, actually one set of signatures and another set of signatures in a database, is quite complicated if the database size is big and the length of the trajectory is long. Therefore, using only subset of full trajectory points is simple and fast in trajectory matching process.
Even though our method uses less information of the trajectory for matching, sign word recognition results showed that our proposed method can still maintains high recognition rate compared to the existing methods. This implies that the features from the elbow points are good information enough for matching two trajectories. However, further works should include investment for the sensitivity of the threshold value to recognition results, which affects the selection for the number of elbow points. Also, practical application study should be performed with large number of sigh words.
Also, experimental results show the possibility that our 3D motion analysis system can be exploited for a base framework for various 3D motion-based applications such as physical rehabilitation support, sport motion analysis and biomechanical applications. We are currently performing research for various applications of our developed system.
References
Aach J, Church G (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17:495–508
Australian Sign Language Dataset (April 1999), http://kdd.ics.uci.edu/databases/auslan/ auslan.html
Balteanu M. S (2010) Professional risk management using computerized monitoring. In: Proceedings of the International Conference on Risk Management, Assessment and Mitigation(RIMA), pp 51–55
Bashir F., Khokhar A., Schonfeld D (2005) Automatic object trajectory-based motion recognition using Gaussian mixture models. In: IEEE International Conference on Multimedia and Expo, pp 1532-1535
Croitoru A., Agouris P., Stefanidis A (2005) 3D trajectory matching by pose normalization. In: Proceedings of the 13th annual ACM international workshop on Geographic information systems, pp 153-162
Fitzgerald D. et al (2007) Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises, In: Proceedings of 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp 4870-4874
Humm J. R. et al (1994). A biomechanical analysis of ballet dancers on pointe. In: Proceedings of 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp 374-375
Kehtarnavaz N, deFigueiredo JP (1988) A 3D contour segmentation scheme based on curvature and torsion. IEEE Transactions on Pattern Analysis & Machine Intelligent 10:707–713
Khoshelham K, Elberink SO (2012) Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors 12:1437–1454
Pallejà T, Guillamet A, Tresanchez M, Teixidó M, Viso AF, Rebate C, Palacín J (2013) Implementation of a robust absolute virtual head mouse combining face detection, template matching and optical flow algorithms. Telecommunication Systems 52:1479–1489
Singh R, Vatsa M, Singh SK, Upadhyay S (2009) Integrating SVM classification with SVD watermarking for intelligent video authentication. Telecommunication Systems 40:5–15
Vlachos M., Hadjieleftheriou M., Gunopulos D., Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 216-225
Vlachos M., Kollios G., Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th International Conference on Data Engineering, pp 673-684
Vranic D., Saupe D (2001) 3D shape descriptor based on 3D Fourier transform. In: Proceedings of the EURASIP Conference on Digital Signal Processing for Multimedia Communications and Services (ECMCS 2001), pp 271-274
Wang R., Leow, W. K. Leong, H. W (2008) 3D-2D Spartiotemporal registration for sport motion analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1-8
Wu S, Li YF (2009) Flexible signature descriptions for adaptive motion trajectory representation, perception and recognition. Pattern Recognition 42:194–214
Yang J.Y., Li Y.F (2010) A new descriptor for 3D trajectory recognition. In: Proceedings of 2010 I.E. International Conference on Automation and Logistics (ICAL 2010), pp 37-42
Zetu D, Banerjee P, Thompson D (2000) Extended-range hybrid tracker and applications to motion and camera tracking in manufacturing systems. Robotics and Automation 16:281–293
Acknowledgment
This study was financially supported by Chonnam National University 2011, and was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Educations, Science and Technology (NRF-2013R1A2A2A04016782).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pham, HT., Kim, JJ., Nguyen, T.L. et al. 3D motion matching algorithm using signature feature descriptor. Multimed Tools Appl 74, 1125–1136 (2015). https://doi.org/10.1007/s11042-014-2103-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2103-2