Abstract
Accurate human pose estimation is of vital importance for a variety of human-robot interaction applications, including cooperative task execution, imitation learning and robot-assisted rehabilitation. As robots move from controlled indoor environments to unstructured and outdoor environments, the ability to accurately measure human pose without fixed sensors becomes increasingly important. In this paper, we present a general framework for accurately estimating human pose from a variety of sensors, including body-worn inertial measurement units and cameras, that can be used in indoor and outdoor environments to accurately estimate human pose during arbitrary 3D movements. Using a kinematic model of the human body, the sensor data is fused to estimate the body joint angles and velocities using a constrained Extended Kalman Filter which automatically incorporates feasible joint limits. For periodic movement such as gait, performance can be further improved via online learning of the gait model, individualized to the user. The proposed approach can deal with intermittent data availability and measurement errors during highly dynamic movements.
This work was supported in part by Canada’s Natural Sciences and Engineering Research Council.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Accurate human pose estimation is of vital importance for a variety of human-robot interaction applications, including cooperative task execution, imitation learning and robot-assisted rehabilitation [1]. The gold standard for human motion data collection is marker-based motion capture, based on either optical or magnetic marker technology. A number of commercial solutions are available, e.g. [2, 3]. With optical motion capture systems, small reflective markers are attached to body landmarks, and observed with a set of cameras. The marker images and exact knowledge of the relative placements of the cameras are used to extract accurate 3D positions of each marker, achieving positioning sub millimeter positioning accuracy [2]. The marker positions coupled with a kinematic model of the human body and an association between each marker and the corresponding link in the model are then used to estimate the body pose. However, even with the high accuracy of marker positioning reported by the system manufacturers, the quality of pose estimation is frequently corrupted by three common measurement issues: (1) temporary marker occlusions, (2) unlabeled markers, and (3) mis-labeled markers.
Temporary marker occlusions occur when no camera has a direct line of sight to the marker, and can be caused by either occluding objects in the scene or by self occlusion. Unlabeled markers appear when a marker is measured by one or more cameras in the capture scene, but cannot be associated with any of the known markers in the kinematic model. Mis-labeled markers, also known as marker swapping, occur when markers are incorrectly associated to known markers in the kinematic model. This issue occurs frequently when many markers are in a small volume, for example during close contact between demonstrators or fine hand/finger movements when the hands and fingers are brought together.
These issues can be systematically addressed during off-line post processing, but for interactive applications, on-line pose estimation is required. Aristidou and Lasenby [4] proposed an approach for missing marker handling via marker position prediction, based on previously known marker positions and rigid body assumptions. Meyer et al. [5] proposed an approach for on-line marker labeling for full-body motion capture based on a custom initialization and a probabilistic iterative estimation procedure. Maycock et al. [6] propose an approach for hand movement tracking using unlabeled markers, based on a Global Nearest Neighbor (GNN) approach. Marker occlusions are handled using interpolation. Mandery et al. [7] propose an approach for pose estimation using unlabeled marker measurements using the smart sampling Kalman filter.
While marker-based systems provide high accuracy in indoor settings, they require extensive camera setup and calibration, as well as line-of-sight visibility between the cameras and the markers, implying a restricted capture space. For many practical applications, these requirements are too restrictive. Recently, alternatives to camera-based motion capture based on body-worn Inertial Measurement Units (IMUs) have been proposed [8]. While IMUs enable capture in arbitrary environments, they suffer from gyro drift and poorer pose estimation accuracy than marker-based systems. Kalman filter approaches coupled with a kinematic model of the body have frequently been applied for IMU-based pose estimation, to fuse the accelerometer and gyroscope measurements and deal with gyro drift [9, 10].
In this paper, we propose a general framework for human pose estimation based on the Extended Kalman Filter (EKF) and a skeleton model of the body. Using the motion model and state and observation covariances estimated by the EKF, we automatically eliminate missing or incorrectly measured markers, perform marker matching and pose estimation, handle joint limits and sensor measurement noise and bias. The proposed approach has been extensively validated with a variety of dynamic movements and demographics.
2 Proposed Approach
2.1 Problem Statement
We model the human body with an articulated rigid body skeleton, where adjacent bones are articulated via a set of N joint angles \(q_i, i \in 1...N\). The joint vector additionally includes the 6 dimensional pose of the body relative to the inertial frame. For the case of marker-based motion capture, our measurement consists of \(M_k\) marker Cartesian positions and velocities, the number of which may be different at each time step k.
where \(\mathbf {y_{M_k}}\) is the vector of Cartesian marker positions, and \(\mathbf {\dot{y}_{M_k}}\) is the vector of Cartesian marker velocities.
For IMU based motion capture, IMUs are attached to a set of body limbs, and our measurement consists of angular velocities and linear accelerations measured at each IMU:
where K is the number of IMUs, \(\mathbf {\omega _{K}}\) is the vector of angular velocity and \(\mathbf {a_{K}}\) is the vector of linear acceleration measurements in the local IMU frame.
Given the observations at frame k, our objective is to estimate the pose \(q_k\). Additionally, in the case of marker measurements, to deal with missing and incorrectly labeled or unlabeled markers, we must first associate the observed markers with the corresponding skeleton location, and discard incorrect markers.
2.2 The Extended Kalman Filter Formulation
The EKF state consists of the joint positions, velocities and accelerations, \(\mathbf {x} = [\mathbf {q}\, \mathbf {\dot{q}}\, \mathbf {\ddot{q}}]^T\). We assume a constant acceleration state evolution model, such that
where \(\mathbf {x}_{k}\) is the vector of joint angles, velocities and accelerations at time k, dt is the sampling time interval and \(\mathbf {w}\) is the process noise, assumed to be zero-mean Gaussian noise with covariance \(Q_k\).
The observations are related to the state via the non-linear forward kinematics, \(\mathbf {z} = h(\mathbf {x})\). To perform state estimation, the forward kinematics is linearized about the current operating point.
where J is the Jacobian of the measurement equation with respect to the state \(\mathbf {x}\), and \(\mathbf {v}\) is the observation noise, assumed to be zero-mean Gaussian noise with covariance \(R_k\).
The unconstrained EKF formulation described above can lead to joint angle estimates which are not physically feasible. To ensure that the estimated joint angles remain physically feasible, the joint angles are constrained to remain within joint limits, by restricting the Kalman gain to ensure that the updated state remains within the constraints [11, 12].
In IMU based estimation, another significant source of error is the gyroscope drift, particularly about rotation axes parallel with gravity, i.e., the yaw rotation in the world frame. With some apriori knowledge of the motion, we can estimate the mean yaw angle of links with respect to the torso or the world frame. Placing a virtual yaw sensor on a link and assuming it always takes this mean as measurement effectively prevents drift from accumulating [13]. The measurement noise covariance of the virtual sensor can be used as a tuning parameter to allow for accurate yaw motion estimation while reducing the effect of drift. Figure 1 shows the virtual yaw sensor drift correction for a gyroscope with \(0.01\,rad/s\) bias experiencing sinusoidal motion about the world z axis.
For the case of periodic movement, the assumption of constant acceleration can be removed by learning an individualized model of the periodic movement over time, to more accurately model the acceleration. This approach explicitly models the state as a parameterized sum of sinusoids, and learns the model parameters during online observation of the movement. The learned model enables drift free integration of the velocity and acceleration measurements, and improves pose estimation during periodic movement such as gait [13].
At each time step, the EKF estimates the measurement covariance \(P_k\), based on the Kalman filter update equations [14].
2.3 Incorrect and Missing Marker Detection
To allow for incorrect and missing markers during online measurement, at each time step, the probability distribution of the location of each model marker is predicted using the current state estimate, the observation model, and the measurement covariance. For each observed marker, the probability that the marker is generated by the model probability distribution is computed. The observed marker with the highest probability is associated with the model marker. This approach simultaneously deals with swapped and unlabeled markers.
To handle missing markers, the observation vector size varies at each time step, based on the number of markers observed. Missing markers therefore do not contribute to the measurement update; only the observable markers and the state evolution model are used to estimate the joint pose. Information from the observable markers and the estimates of the joint pose and velocity allow the filter to smoothly deal with temporary marker occlusions.
3 Experimental Results
The proposed approach based on marker data is evaluated with a dynamic motion capture dataset with significant occlusions, marker-swapping and unlabeled markers. Figure 2 illustrates an exemplar sequence, consisting of two actors – both outfitted with markers – interacting briefly followed by an acted fall, with significant marker occlusions during ground contact.
We do not assume that any labels are correct due to the possible swaps and frequent occlusions in the raw data, and the procedure described in Sect. 2.3 is applied to all marker measurements at each time step to perform labeling prior to pose estimation. In the initial frame, observed markers are first assigned to the actors – and their associated model markers – of the motion capture scene. An example sequence illustrating marker trajectories and the recovered pose is shown in Fig. 3.
The proposed approach is compared to off-line pose estimation and a Jacobian inverse based approach [15]. For off-line pose estimation, pre-processing is applied to fill-in missing marker data, remove spurious markers and correct any mislabeling, and the pose is then estimated using global optimization. The Root Mean Squared Error (RMSE) between the measured marker positions and their estimates based on the recovered joint angles and the forward kinematics are compared in Table 1.
As can be seen from Table 1, the proposed method significantly outperforms the Jacobian-based approach, and achieves performance comparable to the off-line method. In particular, performance for the right hand is significantly improved; the right hand is occluded for over ten seconds followed by mislabeling of the marker which the Jacobian inverse based approach cannot recover from. The proposed method detects the incorrect label and reassigns the marker appropriately.
Also noteworthy is the standard deviation of the error, shown in Table 1 within parentheses. The proposed approach generally minimizes this deviation, indicating smoother and more consistent generated motion as opposed to that of the Jacobian inverse approach. One possible reason for this can be the marker jumps found in the raw data. Marker jumps are where markers erroneously jump a few meters yet remain visible. Since the Jacobian inverse approach cannot effectively determine which data to reject, it generates visible discontinuities in the resulting joint angle velocities, shown in Fig. 4. In addition to rejecting these jumping markers the proposed approach may temporarily assign a nearby, unassigned marker as per Sect. 2.3.
Figure 5 illustrates the constrained estimation, taking into account joint limits for the elbow joint, which is limited by a 150\(^\circ \) range. The over-extension of the elbow joint is prevented during the occlusion time frames.
The proposed IMU only approach was compared to the marker based EKF. Participants were asked to march in place while wearing IMU sensors at the waist, thighs, and ankles. Three markers were placed on each IMU to estimate their orientation with respect to the limbs. Markers were also placed on the ankles, knees, and hips; these were used to estimate the joint centers and link lengths to generate the kinematic models as well as for ground truth inverse kinematics using the proposed marker EKF approach. Figure 6 shows the right knee joint angle estimation of the IMU EKF and Periodic-EKF [13] approaches compared to the marker based estimation, the proposed methods achieve an accuracy of 2.79 and 2.07 degrees RMSE respectively. IMU only estimation achieves accuracy comparable to that of camera based motion capture and can be utilized in environments for which camera based systems are not suitable.
4 Conclusions and Future Work
In this paper, a comprehensive framework for online pose estimation was developed based on the Extended Kalman filter. The proposed approach models the human body as an articulated skeleton, and estimates the body position and orientation in space, together with the joint positions and velocities via fusion of a motion model and noisy measurements. Imperfect motion capture measurements, including missing and mislabeled markers, as well as IMU measurements, can be incorporated into the same framework. The proposed method is evaluated on a variety of datasets and demonstrates improved performance over state-of-the-art systems.
References
Kulić, D., Venture, G., Yamane, K., Demircan, E., Mizuuchi, I., Mombaur, K.: Anthropomorphic movement analysis and synthesis: a survey of methods and applications. IEEE Trans. Robot. 32(4), 776–795 (2016)
Motion analysis. http://www.motionanalysis.com
Vicon. https://www.vicon.com/
Aristidou, A., Lasenby, J.: Real-time marker prediction and CoR estimation in optical motion capture. Vis. Comput. 29(1), 7–26 (2013)
Meyer, J., Kuderer, M., Müller, J., Burgard, W.: Online marker labeling for fully automatic skeleton tracking in optical motion capture. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 5652–5657. IEEE (2014)
Maycock, J., Rohlig, T., Schroder, M., Botsch, M., Ritter, H.: Fully automatic optical motion tracking using an inverse kinematics approach. In: 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pp. 461–466 (2015)
Steinbring, J., Mandery, C., Pfaff, F., Faion, F., Asfour, T., Hanebeck, U.D.: Real-time whole-body human motion tracking based on unlabeled markers. In: IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (2016)
Roetenberg, D., Luinge, H., Slycke, P.: Xsens MVN: full 6DOF human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV, Technical report (2009)
El-Gohary, M., McNames, J.: Shoulder and elbow joint angle tracking with inertial sensors. IEEE Trans. Biomed. Eng. 59(9), 2635–2641 (2012)
Lin, J.F., Kulić, D.: Human pose recovery using wireless inertial measurement units. Physiol. Measur. 33(12), 2099–2115 (2012)
Gupta, N., Hauser, R.: Kalman filtering with equality and inequality state constraints. arXiv preprint (2007). arXiv:0709.2791
Bonnet, V., Daune, G., Joukov, V., Dumas, R., Fraisse, P., Kulić, D., Seilles, A., Andary, S., Venture, G.: A constrained extended kalman filter for dynamically consistent inverse kinematics and inertial parameters identification. In: IEEE-RAS International Conference on Biomedical Robotics and Biomechatronics, pp. 952–957 (2016)
Joukov, V., Bonnet, V., Karg, M., Venture, G., Kulić, D.: Rhythmic EKF for pose estimation during gait. In: IEEE-RAS International Conference on Humanoid Robots, pp. 1167–1172 (2015)
Welch, G., Bishop, G.: An introduction to the kalman filter. Technical report TR 95–041, Department of Computer Science, University of North Carolina at Chapel Hill (2006)
Yamane, K., Nakamura, Y.: Natural motion animation through constraining and deconstraining at will. IEEE Trans. Visual. Comput. Graph. 9(3), 352–360 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Joukov, V., D’Souza, R., Kulić, D. (2017). Human Pose Estimation from Imperfect Sensor Data via the Extended Kalman Filter. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds) 2016 International Symposium on Experimental Robotics. ISER 2016. Springer Proceedings in Advanced Robotics, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-50115-4_68
Download citation
DOI: https://doi.org/10.1007/978-3-319-50115-4_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50114-7
Online ISBN: 978-3-319-50115-4
eBook Packages: EngineeringEngineering (R0)