1 Introduction

Optical motion capture is a technology used to turn the observations of a moving subject (taken from a number of cameras) into 3d position and orientation information about that subject. It is commonly used to better analyse techniques for sports training and performance [28]; for observation of asymmetries and abnormalities in rehabilitation medicine [9]; in biomechanics labs (prosthetics, ergonomics); and for visualisation of virtual characters for films and computer games [48]. In general, to achieve accurate skeletal reconstruction of any legged body, 3 markers must be available on each limb segment at all times. However, even with many cameras, there are instances where occlusion of markers by elements of the scene leads to missing data. In order to unambiguously establish its position, each marker must be visible to at least two cameras in each frame. Although many methods have been developed to handle the missing marker problem, most of them are not applicable in real-time, are usually limited to short time period occlusions, and often require manual intervention. Real-time gap filling is very important in motion capture technology since the skeleton of the character can be produced simultaneously with tracking the motion; it is essential, especially in applications where interaction between the user and the computer is needed.

This paper investigates methodologies for real-time marker prediction, under multiple cases of occlusion, to drive centre of rotation (CoR) estimates, and then to automatically establish the skeleton model. A real-time integrated framework is presented, which predicts the occluded marker positions using a Variable Turn Model within an Unscented Kalman filter (UKF). The previous marker positions are used within the framework in addition to information related to the missing markers of the current frame, inferred from an approximate rigid body assumption. The predicted marker positions are then used to locate the joints. Without assuming any skeleton model, we take advantage of the fact that for markers on a given limb segment, the inter-marker distance is approximately constant. The proposed marker constraint methodology is simple, real-time implementable, and very efficient. Our method is automatic and scalable, without requiring any parameters to be set by the user. It considers all the cases of marker occlusion within a limb resulting in accurate predictions even in cases where all markers on a limb segment are missing for an extended period of time. The proposed approach is the first method that takes advantage of the special, but common, case where missing markers are visible to just one camera, reducing the marker estimation error significantly. With a continuous stream of accurate labelled 3d data, we can perform real-time CoR estimation; the CoR position is thereafter corrected via a real-time Inverse Kinematic technique which guarantees that the inter-joint pairwise distances remain constant over time. To the best of our knowledge, this is the first methodology that operates inter-joint constraints, using real-time Inverse Kinematic techniques, in order to maintain fixed bone lengths over time in optical motion capture systems. A skeletal reconstruction is thereby achieved, producing information which can be used for visual performance feedback. Figure 1 shows the outline organisation of our system. Experiments demonstrate that our methodology effectively recovers good estimates about the true positions of the missing markers and CoRs, agreeing with human intuition, even if all the markers on a limb are occluded for a long period of time. Our work has been tested against some of the most popular methods for compensating for interruptions caused by marker occlusion and the results confirm that our method generally outperforms others. The movements produced are smooth and are without abnormalities or oscillations, resulting in natural reconstructed motion.

Fig. 1
figure 1

An overview of the proposed method used to compensate for interruptions caused by occlusions and to drive real-time skeletal parameterisations

2 Related work

Many papers have focused on methods for joint localisation. Sphere fitting approaches are the most commonly used methods for calculating the CoR. This group of methods assumes that all markers remain a constant distance from the centre of rotation. Silaghi et al. in [59] use the Levenberg–Marquardt method to optimise the CoR location and the radii of the marker spheres, via a cost function which sums a per marker cost over all markers and all frames. Halvorsen et al., in [27], describe a closed form solution using the geometric properties of the sphere. In [24], Gamage and Lasenby also introduce a closed form solution, using a cost function of the squared differences in the squared distance from the CoR to a marker and the radius of the sphere associated with that marker. An alternative approach provided by Halvorsen, [26], gives a Bayesian analysis of the algorithm of [24], providing a first-order approximation of the effect of isotropic Gaussian noise upon the algorithm.

Another group of methods is that termed Transformational techniques; they assume that markers are rigidly attached to limb segments. Such an approach was implemented in [33, 51], and [22], where the limb orientation was obtained from sets of optical markers. In [12], a sequential algorithm was presented to locate the rotation centres of a human skeleton from marker data assuming that all markers are attached to a rigid body. The method is closed form, thus enabling real-time implementation.

Whilst several methods to estimate the location of the CoR have been proposed, the performance of most is unsatisfactory in the presence of unusual motions or of many contiguous occlusion-affected frames. Indeed, there are instances, even with expensive motion capture systems, where occlusion of markers by elements of the scene leads to missing data. While the discarding of frames with missing markers is a common technique, the omission of specific data could lead to the loss of useful information. Long-running occlusions leading to a large sequence of missing data can also cause complete failure of the system. Several methods have been proposed to predict the occluded markers in order to drive CoR estimation and skeletal reconstruction. Interpolation of the data using linear or nonlinear approaches is commonly used [58, 69]; this can produce accurate results, but it is a post-processing technique requiring data prior to and after the occlusion. Recently, Piazza et al. in [55] presented an extrapolation algorithm which assumes that the most common motion behaviors are circular or linear movements; however, this method can produce reliable predictions only for a limited number of occluded frames.

Rhijn and Mulder, [56], proposed a model-based optical tracking and model estimation system for composite interaction devices; however, it is an off-line procedure unsuitable for real-time applications. Dorfmüller in [20] used an extended Kalman filter (EKF) to predict the missing markers using previously available marker information, while Welch et al. in [68] used an EKF to resolve occlusions based on the skeletal model of the tracked person. Maidi et al. in [47] prevent intermittent tracking by taking measurements from an optical flow method to supply a hybrid approach based on Kalman filtering. For this reason, an additional marker is introduced for the optical flow tracker; nevertheless, it is impossible to track the object and maintain virtual graphics overlaying when both markers are not identified. Tak and Ko, [63], employed an Unscented Kalman Filter to ensure that the motion capture data remains in a feasible set. However, these methods require manual intervention or become ineffective in cases where markers are missing for an extended period of time. Aristidou et al. [1] also presents an EKF method using a constant velocity (CV) model with marker constraints from neighbouringFootnote 1 markers. However, the CV model limits its use to problems with constant marker velocity. These methods also do not take into consideration the fact that bones are rigid, thus the inter-joint pairwise distances should remain constant over time. Li et al. [41] propose DynaMMo, an approach that uses a Linear Dynamical System (LDS) to model motion capture data under sequences with missing values; in [42], the same authors introduce BoLeRo, a similar technique which also takes into consideration bone length constraints. The suggested algorithm has a ‘hard’ and ‘soft’ version of bone constraints assuming rigidity limitations of the distance between markers on a given segment limb. Although their algorithm results in smooth motion, the method is complex and expensive in terms of computational cost.

Herda et al., in [30] and [31], used a post-processing approach to increase the robustness of a motion capture system by using a sophisticated human model. The neighbouring markers that share kinematic relations with the occluded markers were used to help the estimation of the missing markers. However, the skeleton information must be known a priori in order to apply this method. Hornung and Sar-Dessai, [35], also developed a system to compensate for interruptions caused by occlusions by performing constant inter-distance signatures between neighbouring markers. Nevertheless, this approach may become ineffective when all or a significant number of markers are missing so that no information on that limb can be inferred from the available LEDs. Ringer and Lasenby, [57], also present an automatic method to identify indistinguishable markers based on cliques.Footnote 2 However, this requires an off-line procedure in order to determine marker cliques and parameters of the skeletal structure. Zordan and Van Der Horst in [72] mapped 3D marker position data to joint trajectories for a fixed limb-length skeleton, by attaching virtual springs and controllers, to follow their Cartesian positions. In general, these skeleton methods could work well for short time occlusions but fail to track the missing markers for large occluded sequences.

In [25], a style-based inverse kinematic method has been developed where a Gaussian Process Latent Variable Model (GPLVM) was used along with a pre-specified kinematic model. Wang et al., [66], presented a Gaussian process dynamical model (GPDM) in order to learn the human pose and motion models. They observed the motion using a chain of latent variables and nonlinear mapping from the latent space; the proposed learned model was also able to cope with marker occlusions. Although these are real-time processing methods, they require knowledge of skeleton information which severely restricts their use. Chai and Hodgins, [13], present a method that uses the neighbouring markers to estimate the missing marker in the current frame. They propose a local linear model from these neighbours and then reconstruct the full pose of the frame by conducting an optimisation in the space constrained by a pre-recorded database. Yu et al., [71], proposed an online motion capture labelling approach which also recovers missing markers. They cluster the markers into a number of rigid bodies based on the standard deviations of the marker-pair distances and if their fitting-rigid-bodies algorithm did not classify all the markers into rigid bodies, a missing marker auto-recovery method is applied assuming that the inter-marker distances are fixed over time. However, a training session is needed, the auto-recovery method for marker estimation does not take into account the limb segment rotation, no information about markers visible to a single camera is considered and the CoR estimation is not investigated under marker occlusions.

Park and Hodgins, [52], fill the missing data by learning a statistical model of the spatial relationship between each marker and its neighbours; they use a Principal Component Analysis (PCA) on each marker position and its neighbours throughout the entire motion. In [64], the authors modelled the human motion and filled the gaps in the data using a Conditional Restricted Boltzmann Machine (CRMB) with discrete hidden states. Their approach was trained using non-linear binary representations, conditioned on previous frames; at the same time, they took into consideration the correlation between joint angles, to produce more accurate results. Liu et al., [45, 46], presented a piecewise linear approach for estimating human motions from a pre-selected set of informative markers (principal markers). A pre-trained classifier identifies an appropriate local linear model for each frame. Missing markers are then recovered using available marker positions and the principal components of the associated model. In [36], the data were mapped onto a target motion by searching over patterns in existing databases. Recently, Courty and Cuzol in [16] presented an optimal data interpolation method through conditional stochastic simulation for automatic motion completion. However, this data-driven family of methods requires an off-line training procedure and the results are highly dependent on training data and limited to those models and movements the system has been trained on.

In this work, we prevent intermittent tracking using a UKF model which uses the velocity and acceleration derived from marker positions as the observation sequence, in addition to a constant rotation model. Inferred information from neighbouring markers has been utilised for a better approximation of the marker estimates. Having a continuous flow of data, we can drive real-time skeletal parameterisation. The CoR positions are thereafter corrected using an adapted version of the FABRIK Inverse Kinematics solver, which uses an inter-joint constant distance signature. So far, no existing method operates such bone length constraints to prevent rigid body violation. There are, however, methods where the chosen skeletal parameterisation is composed of rotations, thus no inter-joint kinematic restrictions are required [18, 30].

3 Calculating the centre of rotation

During capture, markers must be carefully placed on the body in order to obtain good results. Results using markers placed too close to the CoR are more susceptible to errors since a small error may cause large deviations in the estimated rotation, leading to erroneous calculation of the model parameters. The data discussed here are labelled from an active marker system (PhaseSpace [54]) where no tracking is necessary. The PhaseSpace impulse system identifies and tracks individual markers from their unique modulation and in this paper problems related to marker inversion are not therefore considered. In general, 3 markers per bone segment are required to estimate the CoR for joints with 3 Degrees of Freedom (DoF); for simpler problems having fewer DoF, such as knees and elbows, the CoR can be calculated with fewer markers [14]. In this paper, we consider the general case of joints with 3 DoF since no prior knowledge of the model or joint-type is assumed.

Locating the CoRs is a crucial step in acquiring a skeleton from raw motion capture data. To calculate the joints between two sets of markers, it is helpful to have the rotation of a limb at any given time. We can estimate the orientation of a limb at time k relative to a reference frame using the Procrustes formulation [34].

The location of the joints can be calculated using the approach in [12]. This takes advantage of the approximation that all markers on a segment are attached to a rigid body. Suppose the markers are placed on two segments (x and y) joined by a CoR. Let the CoR location in frame k be C k . The vectors from the CoR to markers in the reference frame are denoted by \(\mathbf{a}^{i}_{x}\) and \(\mathbf{a}^{j}_{y}\) for limbs x and y, respectively, where i and j are marker labels. The positions of the markers in frame k are given by:

$$ \mathbf{x}^{k}_{i}= \mathbf{C}^{k} + R^{k}_{x}\mathbf{a}^{i}_{x}\tilde {R}^{k}_{x} \qquad \mathbf{y}^{k}_{j}= \mathbf{C}^{k} + R^{k}_{y}\mathbf {a}^{j}_{y}\tilde{R}^{k}_{y}$$
(1)

where R x and R y are the rotors (quaternions) expressing the rotation of the joint limbs x and y, respectively. \(\tilde{R}\) is the quaternion conjugate of R. Let \(\mathbf{b}^{k}_{ij}\) be the vector from \(\mathbf{x}^{k}_{i}\) to \(\mathbf{y}^{k}_{j}\), that is

$$ \centering \mathbf{b}^{k}_{ij}= \mathbf{x}^{k}_{i} - \mathbf{y}^{k}_{j}= R^{k}_{x}\mathbf{a}^{i}_{x}\tilde{R}^{k}_{x} - R^{k}_{y}\mathbf {a}^{j}_{y}\tilde{R}^{k}_{y}$$
(2)

A cost function S can be constructed that has a global minimum at the correct values of \(\mathbf{a}^{i}_{x}\) and \(\mathbf{a}^{j}_{y}\) if the data is noise free, and returns a good estimate in the presence of moderate noise.

$$S = \sum_{k=1}^{m}\sum _{i=1}^{n_{x}}\sum_{j=1}^{n_{y}}\bigl[\mathbf {b}_{ij}^{k} - \bigl(R_{x}^{k}\mathbf{a}_{x}^{i}\tilde{R}_{x}^{k} -R_{y}^{k}\mathbf{a}_{y}^{j}\tilde{R}_{y}^{k} \bigr) \bigr]^{2}\ $$
(3)

where n x , n y are the number of markers on limbs x and y, respectively, and m is the number of frames used for the calculations. The minimum is given by the solution of the simultaneous linear equations, obtainable by differentiation:

(4)
(5)

where

$$\bar{\mathbf{b}}^{k} = \frac{1}{n_{x}n_{y}} \sum _{i = 1}^{n_{x}} \sum_{j = 1}^{n_{y}}\mathbf{b}^{k}_{ij} \quad \bar{\mathbf{a}}_{w} =\frac {1}{n_{w}}\sum_{i=1}^{n_{w}}\mathbf{a}^{i}_{w} \quad w = \{x,y\}$$

Having calculated the \(R^{k}_{w}\) and \(\bar{\mathbf{a}}_{w}\), we can locate the CoR. Figure 2 demonstrates an example of CoR estimation and skeletal reconstruction in real-time using the above method. However, due to occlusions, there are instances where not all marker positions are available. If all markers are available on one limb segment, w, the CoR may be estimated using only the current \(R^{k}_{w}\) and \(\bar{\mathbf{a}}_{w}\) as estimated in the previous frame, when all markers were visible, via (1). If there are markers occluded on both limb segments, a marker prediction methodology is needed.

Fig. 2
figure 2

An example of CoR estimation of a human body model using [12]. (a) The marker positions as returned by the motion capture system, (b) the calculated CoRs and the skeletal reconstruction

4 Marker prediction

The estimates for marker positions can be predicted using filtering. In this process, each single marker can be tracked individually and constraints for neighbouring markers can be incorporated. Most tracking problems require a dynamic model for accurate estimation of the trajectory of a maneuverable object. During the last decades, various mathematical models of target motions have been developed. Singer [60, 61] proposes a model which assumes that the target acceleration is a zero-mean first-order stationary Markov process. Based on Singer’s assumption, many papers have proposed a constant or variable acceleration model (e.g. [43]).

The nearly constant turn NCT and constant velocity CV models are based on a constant-speed condition and constant turn-rate assumption which restricts the variety of possible supported maneuvers. Within this work, a Variable Turn Model (VTM) [5] is implemented which uses the velocity and acceleration (it is assumed that the target’s velocity and acceleration are not constant over time) of the tracked object as state parameters of the Unscented Kalman filter in addition to rotation updates assuming that the rotation between two consecutive frames remains constant.

Unscented Kalman filter

Kalman filtering has been extensively used for real time estimation of linear dynamic systems. However, the traditional Kalman filter [40] is not suitable for use with non-linear dynamical systems, even if Gaussian approximations to the joint distribution of state x and measurement y are made. The Extended Kalman Filter (EKF) [38] is a minimum mean-square-error (MMSE) estimator which extends the scope of the Kalman filter to non-linear optimal filtering problems. It forms a Gaussian approximation to the joint distribution of state and measurement using a Taylor series-based transformation. Nevertheless, EKF implementation is complex (Jacobian and Hessian matrices with second-order filters are required), difficult to tune, and only reliable for systems that are almost linear on the timescale of the updates.

The Unscented Kalman Filter (UKF), [39], propagates mean and covariance information through nonlinear transformations providing more accurate results than the EKF, for a similar computational cost. Consider propagating an n x -dimensional random variable x and assume x has mean \(\bar{\mathbf{x}}\) and covariance P x . First, a set of 2n x +1 weighted samples or sigma points \(\mathcal{S}_{i} = \{W_{i},\mathcal {X}_{i} \}\) are deterministically chosen so that the true mean and covariance of the random variable x can be completely recovered from them. A set of scaled sigma points \(\mathcal{S} = \{\mathbf{W},\mathcal{X} \}\) can be calculated by setting:

$$\lambda= \alpha^{2}(n_{x}+\kappa) - n_{x}$$
(6)

and selecting the sigma point set by

$$\everymath{\displaystyle} \begin{array}{@{}l@{\quad}l}W^{(m)}_{0} = \lambda/ (n_{x} + \lambda ) & i = 0\\[2mm]W^{(c)}_{0} = \lambda/ (n_{x} + \lambda ) + \bigl(1 - \alpha^{2} +\beta\bigr) & i = 0\\[2mm]W^{(m)}_{i} = W^{(c)}_{i} = 1/ \{2 (n_{x} + \lambda ) \} & i = 1,\ldots,2n_{x}\end{array}$$

where α is a positive scaling parameter which controls the size of the sigma point distribution. κ≥0 is also a scaling parameter and W i is the weight associated with the \(i^{\mbox {\footnotesize th}}\) point such that \(\sum^{2n_{x}}_{i=0} W_{i} = 1\). Note that the (m) and (c) superscripts are just used for indexing the weight W i . Merwe et al., [49], proposed κ=0, to guarantee positive semidefiniteness of the covariance matrix and 0≤α≤1 to avoid sampling non-local effects when the non-linearities are strong. β≥0 is a weighting term which incorporates knowledge of the higher order moments of the distribution. β=2 is the optimal choice for a Gaussian prior.

Let the original state and noise variables at time k be \(\mathbf {x}^{\alpha}_{k} = [\mathbf{x}^{T}_{k} \ \mathbf{v}^{T}_{k} \ \mathbf {n}^{T}_{k}]\). The sigma point selection scheme is applied to this augmented state Random Variable (RV) to calculate the corresponding sigma matrix, \(\mathcal{X}^{\alpha}_{k}\). The mean \(\bar{\mathbf{x}}\) and covariance P of the Gaussian approximation is updated to the posterior distribution of the states as follows:

(7)

For k∈{1,…,∞} the sigma points are equal to

$$\mathcal{X}^{a}_{k-1} = \Bigl[\bar{\mathbf{x}}^{\alpha}_{k-1}\ \bar{\mathbf{x}}^{\alpha}_{k-1} \pm\sqrt{(n_{\alpha}+ \lambda) \mathbf {P}^{\alpha}_{k-1}} \,\Bigr]$$
(8)

and the time update is given by

where f(.) is the transition and h(.) the observation function. The measurement update equations are:

where x α=[x T v T n T]T, \(\mathcal{X}^{\alpha} = [(\mathcal{X}^{x})^{T} \ (\mathcal{X}^{\nu})^{T} \ (\mathcal{X}^{n})^{T}]^{T}\), λ is the composite scaling parameter, n α =n x +n ν +n n , Q is the process noise covariance, R is the measurement noise covariance, K is the Kalman gain, W i are the Unscented Transform weights, and Z k is the observation vector.

The transition function f(.) and the observation function h(.) are very important for implementing efficient UKF filtering. In this model, the transition function is taken to be a Variable Turn Model (VTM) with target velocity and acceleration the equivalents of the relevant tracked marker. However, a detailed look at real marker data indicates that the estimated velocities are invariably noisy. Many factors contribute to marker position noise, such as optical measurement noise, mis-calibration of the optical systems, reflections, motion of markers relative to the skin and motion of the skin relative to the rigid body (underlying bone). As a result, the target acceleration is mostly noisy. One method of measuring accelerations would be to attach accelerometers to the markers placed on the limb segments. Such a system, which synchronises measurements from the accelerometer and active markers was described and investigated in [6]. While the system shows some promise, we are faced with dealing with both the noise on the accelerometer readings and the fact that the reference frame of those readings must be determined. Given these difficulties, we decided to use isolated estimates of accelerations obtained from the position data. The target velocity was smoothed using a real-time median filter, thus allowing an adequate calculation of the acceleration which was therefore straightforwardly applied in the proposed model.

On the other hand, for the observation function we use a simple model which assumes that the rotation between two consecutive frames is constant. Thus, the time update of the observation state is equal to

$$\mathcal{Z}_{k|k-1} = R^{k-2,k-1}\mathcal{X}^{x}_{k|k-1}\tilde{R}^{k-2,k-1}$$
(9)

where R p,q is the rotor for the rotation between frames p and q, assuming that the rotation of the markers between two consecutive frames remains constant. It is important at this point to recall that the rotation between the previous and the current frame is re-calculated every frame just after the marker prediction using the estimated marker positions.

Early numerical implementations using Particle Filtering [21] showed that, although it could improve the performance of our state estimates, this improvement was marginal compared to the UKF-VTM and, therefore, not worth the added computational effort.

5 Applying marker constraints

The observation vector, \(\mathbf{Z}_{k}^{j}\), gives the observed position of the tracked marker j when available, otherwise it represents estimated position. The state vector represents true position and velocity as given above. To cope with cases where markers are missing for long periods of time, a tracker that uses information from both the previous frames and the current positions of neighbouring visible markers has been implemented, taking into account the rigidity of the body. Assuming that there are three markers on each limb and the inter-marker distance is constant over time, the observation vector can be updated as given below for 5 different scenarios. The proposed marker constraints from inferred neighbouring markers are simple, real-time implementable and do not assume prior knowledge other than fixed inter-marker distance (the model is rigid). This assumption is true in a noise free environment, however, since limb segments rapidly change direction, there is noise on marker motion relative to the skin and motion of the skin relative to the rigid body (underlying bone).

5.1 All markers are visible on a given limb

Where all markers are visible on a given limb, then

$$\mathbf{Z}_{k}^{j} = H\mathbf{x}^{k}_{j}+ \mathbf{v}^{k}_{j}$$
(10)

where \(\mathbf{x}^{k}_{j}\) is the current state of a tracked marker j on the limb, H is the observation model (in this case the identity) and \(\mathbf{v}^{k}_{j}\) is the observation noise. v is assumed to be zero mean multivariate normal with covariance R.

5.2 One missing marker on a limb segment

In the case where two markers are visible on the limb,

$$\mathbf{Z}_{k}^{1} = H\hat{\mathbf{x}}_{1}^{k}+ \mathbf{v}^{k}_{1}$$
(11)

where \(\hat{\mathbf{x}}_{1}^{k}\) is the estimated position of the occluded marker m 1 in frame k (assuming, in this example, that m 1 is the missing marker). \(\hat{\mathbf{x}}_{1}^{k}\) can be calculated as given below. Firstly, we calculate \(\mathbf {D}_{1,2}^{k-1}\) and \(\mathbf{D}_{1,3}^{k-1}\) which correspond to the vectors between marker m 1 and markers m 2, m 3 in frame k−1, respectively. These vectors are given by

$$\mathbf{D}_{i,j}^{k-1} = \mathbf{x}_{j}^{k-1}- \mathbf{x}_{i}^{k-1}$$
(12)

Thereafter, these vectors are rotated as

$$\hat{\mathbf{D}}_{i,j}^{k} = R^{k-2,k-1}\mathbf{D}_{i,j}^{k-1}\tilde {R}^{k-2,k-1}$$
(13)

assuming that the rotation between the current and the previous frame is the same as that between the previous 2 frames. Predicting the current rotation using previous rotations offers marginal improvement to the system performance, and such a small improvement means that it is not worth the additional computational cost [4]. One obvious way to proceed is to calculate the point \(\tilde{\mathbf{x}}_{1}^{k}\) which is an average of the estimated positions in frame k using the \(\hat{\mathbf{D}}\) vectors:

$$\tilde{\mathbf{x}}_{1}^{k} = \frac{ (\mathbf{x}_{2}^{k} - \hat {\mathbf{D}}_{1,2}^{k} ) + (\mathbf{x}_{3}^{k}- \hat{\mathbf {D}}_{1,3}^{k} )}{2} $$
(14)

where \(\mathbf{x}_{i}^{k}\) is the position of marker i in frame k. We now improve on this estimate by finding the solution of the intersection of the two spheres in frame k with centres \(\mathbf {x}_{2}^{k}\), \(\mathbf{x}_{3}^{k}\) and radii \(|\hat{\mathbf {D}}_{1,2}^{k}|\) and \(|\hat{\mathbf{D}}_{1,3}^{k}|\), respectively. \(\hat {\mathbf{x}}_{1}^{k}\) is assigned as the closest point on the circle of intersection to \(\tilde{\mathbf{x}}_{1}^{k}\). Figure 3 illustrates this process.

Fig. 3
figure 3

The observation vector in the case of 2 visible markers. The red dot, \(\tilde{\mathbf{x}}_{1}^{k}\), represents the average value as given in (14). The green dot\(\hat{\mathbf{x}}_{1}^{k}\), is the point on the intersection of the 2 spheres which is closest to \(\tilde{\mathbf{x}}_{1}^{k}\)

5.3 Two missing markers on a limb segment

In the case of only one marker visible (for example m 2) on a given limb, the observation vector is given as

$$\mathbf{Z}_{k}^{j} = H\hat{\mathbf{x}}_{j}^{k}+ \mathbf{v}^{k}_{j}$$
(15)

where \(\hat{\mathbf{x}}_{j}^{k}\) is the estimated position of the occluded marker m j (j=1,3) in frame k. \(\hat{\mathbf{x}}_{j}^{k}\) is given by

$$\hat{\mathbf{x}}_{j}^{k} = \mathbf{x}_{2}^{k}- \hat{\mathbf{D}}_{j,2}^{k} $$
(16)

where \(\mathbf{x}_{2}^{k}\) is the position of the visible marker m 2 on the limb in the current frame and \(\hat{\mathbf {D}}_{j,2}^{k}\) is as described above. This assumes that there is no marker rotation. Figure 4 demonstrates the above.

Fig. 4
figure 4

The observation vector in the case of only one visible marker. The red dots, \(\hat{\mathbf{x}}_{j}^{k}\), where j={1,3} represent the estimated position of the missing marker m j as given in (16)

5.4 All markers on a limb segment are missing

When all markers on a limb are occluded, we consider two possible subcases; the case where the other limb segment has some markers visible and the case where both limb segments have all of their markers occluded. If some markers on the other limb segment are visible (assume the y limb), the missing marker positions can be calculated using the CoR estimate, \(\hat{\mathbf{C}}_{k} = \mathbf{y}^{k}_{i} -R^{k}_{y}\bar{\mathbf{a}}_{y}\tilde{R}^{k}_{y}\) where i={1,2,3}. Our methodology takes advantage of the approach in [12] for CoR estimation and provides better approximations of the CoR using information from the adjacent limbs. The observation vector of the Unscented Kalman filter is then updated as

$$\mathbf{Z}_{k}^{j} = H\hat{\mathbf{x}}_{j}^{k}+ \mathbf{v}^{k}_{j}$$
(17)

where \(\hat{\mathbf{x}}_{j}^{k}\) is the estimated position of the occluded marker m j (j=1,2,3) in frame k. \(\hat{\mathbf {x}}_{j}^{k}\) is given by

$$\hat{\mathbf{x}}_{j}^{k} = \hat{\mathbf{C}}_{k} +\hat{\mathbf{D}}_{j,c}^{k}$$
(18)

where \(\hat{\mathbf{D}}_{j,c}^{k}\) is an estimate of the vector between marker m j and the CoR. This approach takes advantage of the fact that the distance between markers and the CoR is constant. This vector is approximated by \(\hat{\mathbf{D}}_{j,c}^{k} = R^{k-2,k-1}\mathbf {D}_{j,c}^{k-1}\tilde{R}^{k-2,k-1}\) where \(\mathbf{D}_{j,c}^{k-1} =\mathbf{x}_{j}^{k-1} - \mathbf{C}_{k-1}\). This assumes that the rotation of the markers between two consecutive frames remains constant. Figure 5 illustrates this procedure.

Fig. 5
figure 5

The estimation procedure when all markers on a single limb segment are occluded. The red dots represent the estimated position of the CoR, \(\hat{\mathbf{C}}^{k} = \mathbf {y}^{k}_{i} - R^{k}_{y}\bar{\mathbf{a}}_{y}\tilde{R}^{k}_{y}\) where i={1,2,3}, and the estimated marker positions on limb segment x, \(\hat{\mathbf{x}}_{j}^{k} = \hat{\mathbf{C}}_{k} + \hat{\mathbf {D}}_{j,c}^{k}\), where j={1,2,3}. \(\bar{\mathbf{a}}_{x}\) and \(\bar{\mathbf{a}}_{y}\) are updated using the predicted marker positions in the current frame

If both limb segments have all markers occluded, only information from previous frames can be used. The observation vector, \(\mathbf {Z}_{k}^{j}\), in this instance is calculated using a quaternion based method. This method also assumes that the segment rotation between two consecutive frames is constant. The observation vector can now be expressed as

$$\mathbf{Z}_{k}^{j} = H\hat{\mathbf{x}}^{k}_{j}+ \mathbf{v}^{k}_{j}$$
(19)

where \(\hat{\mathbf{x}}^{k}_{1}\) is equal to \(\hat{\mathbf{x}}^{k}_{1}= R^{k-2,k-1}\mathbf{x}^{k-1}_{1}\tilde{R}^{k-2,k-1}\). Assuming that the markers rotate constantly between two consecutive frames, the method performs better than using a simple variable velocity model since it ensures that markers do not move independently [4]. Markers on a limb segment move as a group of three and their inter-marker distance should remain constant over time. Note that we must ensure our rotor R satisfies \(R\tilde{R}=1\), and if this is not the case, normalisation must occur.

5.5 Markers visible in only one camera

Each marker can be reconstructed by the motion capture system if it is visible in at least two cameras. Indeed, looking at numerous real datasets, we have observed instances where the missing markers are not entirely occluded; information about position is often returned by a single camera. This information identifies a line, L 1, starting from the camera and passing through the position of the missing marker. By relaxing the constraints that the inter-marker distance is constant and accepting that the real position of the marker is on that line, we may be able to obtain a more accurate estimate of the position of the relevant marker. This position, \(\acute{\mathbf{x}}_{1}^{k}\), corresponds to the projection from the point \(\hat{\mathbf{x}}_{1}^{k}\) onto the line L 1, as in Fig. 6. This is applicable for the cases in which the motion capture system fully reconstructs one or two marker locations and another marker is visible in just one camera. If a limb segment has only one known and one partially visible marker, the system is more reliable when it first predicts the partially visible marker and then the entirely occluded marker.

Fig. 6
figure 6

The observation vector in the case of 2 visible markers and one marker visible only by one camera. The red dot, \(\acute{\mathbf{x}}_{1}^{k}\), is now used for the calculation of the observation vector, \(\mathbf{Z}_{k}^{1} = H\acute {\mathbf{x}}^{k}_{1} + \mathbf{v}^{k}_{1}\)

6 Bone length constraints using inverse kinematics

The UKF-VTM model for marker prediction returns good estimates of the marker positions, even if the markers are occluded for an extended period of time. It is, however, easily observable (see Fig. 7) that there are instances where the reconstructed markers may break the rigid body assumption (limbs may not have constant lengths over time), resulting in a violation of the model’s structure. This is more obvious in extreme cases where many markers from the same limb segment are occluded for an extended time period (see Fig. 15); the limb segment lengths drift (as the inter-joint distances were not fixed to be constant) even if the estimated marker positions do not significantly differ from their true positions. Therefore, it seems sensible to constrain the bone lengths to be constant over time, taking into account the rigidity of the body, as the skeleton methods do; this extension does not pre-suppose any knowledge of the model. For such unconstrained models we can preserve bone lengths using Inverse Kinematics techniques. Inverse Kinematics (IK) is defined as the problem of manipulating articulated figures in an interactive and intuitive fashion for the design and control of their posture. It was first introduced in the robotics area, but was then adapted in several other areas such as computer animation, ergonomics, and gaming. Ishigaki et al. [37] implement a real-time IK control interface for character animation which translates the performance into corresponding actions; that was achieved by integrating prerecorded motions with online performance and dynamic simulation. If the input motion does not match the conditions, a kinematic process is applied to match users’ motion with the example interactions. However, the proposed inter-joint constraint approach requires offline training, meaning that the results depend on the training data, and prior knowledge of the model is required. To the best of our knowledge, this is the first time that a real-time IK technique has been used for CoR correction in the presence of missing elements in optical motion capture.

Fig. 7
figure 7

The results produced using the UKF-VTM model. It is seen that the distances between hips and knees change, since it is not guaranteed that these distances are constant. True positions are coloured blue, while the predicted positions are red

6.1 Related inverse kinematics work

Several models have been proposed for solving the IK problem from many different areas of study. Most papers use variants of the Jacobian Inverse approach. The Jacobian solutions are linear approximations of the IK problem; they linearly model the end effectors’ movements relative to instantaneous system changes in link translation and joint angle. Several different methodologies have been presented for calculating or approximating the Jacobian inverse, such as the Jacobian Transpose, Singular Value Decomposition (SVD), Damped Least Squares (DLS), Selectively Damped Least Squares (SDLS), and several extensions [7, 8, 11, 50, 65, 70]. Jacobian inverse solutions produce smooth postures; however, most of these approaches suffer from high computational cost, complex matrix calculations, and singularity problems. An alternative approach is given by Pechev in [53] where the problem is solved from a control perspective. This approach is computationally more efficient than the pseudo-inverse based methods and does not suffer from singularity problems.

Another family of IK solvers is based on Newton methods. These algorithms seek target configurations which are posed as solutions to a minimisation problem, hence they return smooth motion without erratic discontinuities. The most well known methods are Broyden’s method, Powell’s method and the Broyden, Fletcher, Goldfarb and Shanno (BFGS) method [23]. However, the Newton methods are complex, difficult to implement and have high computational cost per iteration.

Courty and Arnaud, [15], proposed a Sequential Monte Carlo Method (SMCM) to incorporate kinematic constraints. Hecker et al., [29], utilised an iterative Inverse Kinematics solver (Particle IK) with various parameters for tuning the character skeleton behavior both statically and dynamically. Neither method suffers from matrix singularity problems and both perform reasonably well. Sumner et al., [62], and later Der et al., [17], proposed mesh-based IK solvers that learn the space of shapes from example meshes. However, this family of methods produces poses which require a pre-learning phase and are highly dependent on the training data.

A very popular IK solution is the Cyclic Coordinate Descent (CCD) method, first introduced by [67]. CCD is a heuristic iterative method with low computational cost for each joint per iteration, which can solve the IK problem without matrix manipulations. However, CCD can suffer from unrealistic animation, even if manipulator constraints have been added. It is designed to handle serial chains, thus it is difficult to extend to problems with multiple end effectors or target positions for internal joints.

A more detailed overview of IK techniques is given in [2]. Within this work, we incorporated FABRIK to solve the IK problem. FABRIK (Forward And Backward Reaching Inverse Kinematics) [3] is chosen since it is an iterative algorithm which uses points and lines to solve the IK problem. It divides the problem into 2 phases, a forward and backward reaching approach, and supports all the rotational joint limits and joint orientations by repositioning and re-orienting the target at each step. It does not suffer from singularity problems, it is fast and computationally efficient. FABRIK is simple, real-time implementable and can be easily extended to solve multiple end effector problems with internal joints.

6.2 Adjusting FABRIK for CoR correction

The tracking systems may return null data due to occlusions and that could cause interruptions in computation of pose and orientation of the target objects. The use of forecasted marker positions to compensate for these occlusion interruptions, and thus to drive real-time joint localisation, may introduce some error in the CoR estimation, breaking the rigid body assumption. Hence, inter-joint restrictions are incorporated using an adapted version of FABRIK for CoR correction.

6.2.1 Problems with serial chain models

FABRIK is a real-time IK solver which returns smooth postures in an iterative fashion. It uses the positions of the joints that have been estimated using predicted marker positions to find updates that meet the fixed inter-joint distance assumption. In the most general case, where the estimated CoRs are not positioned at the end of the chain, the solution can be achieved using a forward and backward iterative mode. The proposed method starts from a joint of the chain which was calculated using true marker positions and works forward, adjusting each estimated joint along the way until the next joint which was calculated using true data. Thereafter, it works backward in the same way, in order to complete a full iteration. This method, instead of using angle rotations, treats finding the joint locations as a problem of finding a point on a line; hence, time and computation can be saved.

A graphical representation of the algorithm in action is given in Fig. 8. Assume p 1,…,p n are the joint positions of a chain, where p 1 and p n are joint positions calculated using true data, and the joints inbetween are joints estimated using predicted marker positions. Set the distances between each joint to be \(d_{i} = \vert \mathbf{p}_{i+1} - \mathbf{p}_{i}\vert _{\mbox{\footnotesize avg}}\), for i=1,…,n−1, b=p 1 and t=p n ; these distances can be established by averaging over time from the frames where the markers, and the joints, are available. Then check whether the target is reachable or not; find the distance between p 1 and p n , dist, and if this distance is smaller than the total sum of all the inter-joint distances, \(\mathit{dist} <\sum_{1}^{n-1}d_{i}\), the target is within reach, otherwise, it is unreachable. If the target is within reach, a full iteration is performed in two stages. In the first stage, the algorithm estimates each joint position starting from p 1, moving forward to p n . Thus, initialise p 1=b and find the line, l 1, which passes through the joint positions p 1 and p 2. The new position of the 2nd joint, \(\mathbf{p}'_{2}\), lies on that line with distance d 1 from p 1. Similarly, the new position of the 3rd joint, \(\mathbf {p}'_{3}\), can be calculated using the line l 2, which passes through p 3 and \(\mathbf{p}'_{2}\), and has distance d 2 from \(\mathbf{p}'_{2}\). The algorithm continues until all new joint positions are calculated, including \(\mathbf{p}'_{n}\).

Fig. 8
figure 8

A simple example of FABRIK for the case where the estimated joints are positioned inbetween 2 true joint positions. (a) The initial position of the chain, where {p i } are true and \(\{\hat{\mathbf{p}}_{i}\}\) are estimated joint positions. (b) The first phase of the algorithm; the joint \(\mathbf{p}'_{i}\) has been adjusted as the point on line l i−1 with distance d i−1 from p i−1. (c) The second phase of the algorithm: let the new position of \(\mathbf{p}'_{5}\) be its initial position p 5; repeat the procedure starting this time from the other side of the chain. (d) The final posture; the algorithm is repeated for as many iterations as needed until the difference between the true and the returned positions of the joints p 1 and p 5 is less than a given tolerance

Having in mind that initially p n was the true position of the nth joint, a second stage of the algorithm is needed. A full iteration is completed when the same procedure is repeated but this time starting from the true position of the nth joint and moving backward to the 1st joint. Therefore, let the new position for the nth joint, \(\mathbf{p}''_{n}\), be its initial position t. Then, using the line l n−1 that passes through the points \(\mathbf {p}''_{n}\) and \(\mathbf{p}'_{n-1}\), we define the new position of the joint \(\mathbf{p}''_{n-1}\) as the point on that line with distance d n−1 from \(\mathbf{p}''_{n}\). This procedure is repeated for all the remaining joints, including p 1. The procedure is then repeated, for as many iterations as needed, until p 1 and p n are close enough (to be defined) to their initial, true positions. FABRIK always converges to any given chains/goal positions, when this is possible. If there are constraints which do not allow the chain to bend enough or if the target is not within the reachable area, there is a termination condition which compares the previous and the current position of the joint p 1, and if this distance is less than a specified tolerance, FABRIK terminates its operation. Several optimisations can be achieved using Conformal Geometric Algebra (CGA) [19, 32] to produce faster results and to converge to the final answer in fewer iterations; CGA has the advantage that basic entities, such as spheres, planes, or circles, are simply represented by algebraic objects. Therefore, a direct estimate of a missing joint, when it is between 2 true positions, can be achieved by intersecting 2 spheres with centre the true joint positions and radii the distances between the estimated and the true joints, respectively. Another simple optimisation is the direct construction of a line pointing toward p n , when the target is unreachable, adjusting the distances in such a way that each distance has changed uniformly. The adjusted FABRIK for CoR correction is illustrated in pseudo-code in Algorithm 1.

Algorithm 1
figure 9

A full iteration of FABRIK for CoR correction

There are, indeed, some special instances where FABRIK does not necessarily work in an iterative fashion. Such a special case is when the joints, which have been estimated using predicted markers, are located at the end of the chain. This is a simplified case of the general solution, since the answer is given directly in one single iteration. The new simplified algorithm is given here: assume p i is a true joint position. The joint update \(\mathbf{p}'_{i+1}\) is set as the point on line l i that passes through p i and p i+1 and has d i distance from p i . This procedure is repeated for all the remaining estimated joints until the end of the chain. A graphical representation of an implemented example is given in Fig. 9.

Fig. 9
figure 10

A simple case where the estimated joints are located at the end of the chain. In this example, the serial chain has 4 joints where p 1 and p 2 have been calculated using true marker positions and the \(\hat{\mathbf{p}}_{3}\) and \(\hat{\mathbf{p}}_{4}\) using predicted markers positions. Thus, set \(\mathbf{p}'_{3}\) to be the point on line l 2 which has distance d 2 from p 2 and \(\mathbf{p}'_{4}\) the point on line l 3 that has d 3 distance from joint \(\mathbf{p}'_{3}\)

6.2.2 Problems with tree models

In reality, most of the real legged models are comprised of several kinematic chains in a tree fashion. The proposed algorithm can be easily extended to process tree models with multiple serial chains. The solution is similar to the original extension of FABRIK for multiple end effectors, given in [3]. Hence, if the sub-baseFootnote 3 joint is a true position, FABRIK is used on each chain with estimated CoR positions individually, starting from the sub-base, in a forward and backward iterative fashion. If the sub-base is an estimate, the same procedure is applied but this time starting from the true joints of the connected chains and moving towards to the sub-base. This will produce as many different positions of the sub-base as the number of the connected chains. The new position of the sub-base will then be the centroid of all these positions. In the second stage, the normal algorithm is applied separately for each chain, starting now from the sub-base and moving outward to the starting joints. The method is repeated until all true joints have no significant change between their initial and updated positions.

6.2.3 Applying constraints

The main aim of this paper is to describe a general solution, where each single joint is treated as a ball and socket joint, allowing 3 degrees of freedom. In this way, we can ensure the generality of the proposed method, producing solutions without prior knowledge of the model. The proposed IK technique is used just to correct mis-positioning of joint estimates in the presence of missing data. The orientation of each joint is still calculated using the attached sensor LEDs (or their forecasted positions). In special cases where it is desirable to incorporate joint restrictions, especially for cases where joints are located at the end of the kinematic chains, FABRIK readjusts the target position and orientation, on each step, to satisfy the joint biomechanical limits. Note that FABRIK is able to support most of the joint types, as detailed in [3].

Obviously, the more information available regarding the model’s structure and joint constraints, the more accurate and efficient will be the results. This information can help to give a visually more realistic motion within a feasible set; however, it will limit the universal use of the proposed methodology and fails to satisfy our aim of no prior knowledge of the model.

Self-collision determination

Collision detection has been a fundamental problem in computer animation, physically-based modelling, geometric modelling, and robotics. Since the data used in these examples are captured from a markered optical motion capture system and since the 3d animated humanoids do not have a mesh that defines their external shape, self-collisions are not considered. Nevertheless, self-collisions can be handled using existing techniques, such as [10, 44].

Bone length constraints

The Inverse Kinematic technique used in this work ensures that, even if the marker prediction system fails to track the marker positions, the derived CoRs will be good estimates of their true positions. This happens since the IK procedure restricts the limb segment (bone) length to a constant value over time. The true length of each limb segment is calculated by averaging the inter-joint distances, from previous frames, when all joints have been estimated using true marker positions. Figure 10 shows an example of how the inter-joint distance varies over time; as expected, it is nearly constant. The predicted marker positions can cause instability in the inter-joint distance, which in some extreme cases might differ significantly from the mean bone length. The effectiveness of the proposed method is strongly related to the stability of the joint pairwise distance. Large variation of that distance means skeletal structure violation. In this manner, FABRIK guarantees that joints will still have a fixed distance between them throughout the occlusion period, allowing a better approximation of the CoR estimates. On real motion capture data, limb lengths might exhibit minor variations due to noise from skin movements at the point of marker attachment, and limb expansions, but in this work they are assumed to be rigid. Note that the limb length variation is further reduced when the data have been isolated using the proposed UKF-VTM method, as is briefly described in Sect. 7.

Fig. 10
figure 11

An example showing the inter-joint distance variation over time when true marker positions have been used, when the joints have been estimated using predicted marker positions, and after incorporating FABRIK for CoR correction (assuming that the distance is constant over time)

It is important to stress out that the proposed bone length constraint algorithm (FABRIK) is independent and can be easily fitted to most marker prediction methods for CoR correction.

7 Results and discussion

Experiments were carried out using a 24 camera PhaseSpace motion capture system capable of capturing data at 480 Hz [54]. The algorithm described in this work can process up to 300 limb segment pairs per second (using MATLAB). Our datasets comprise real data (i.e. captured data with natural occlusions or occlusions generated by artificial deletion) with more than 10,000 frames in each. It is technically impossible to evaluate the error when a benchmark does not exist to compare with. Thus, different numbers of markers per limb segment have been deleted randomly, in different parts of the human body, to enable error calculation. Nevertheless, the estimates of the missing markers and CoRs in instances where natural deletions exist have been evaluated visually based on the assumption that the character displays smooth and natural motion. The proposed methodology has been tested on various motions including dancing (14 segment body datasets), fast walking (7 segment leg datasets) and boxing (5 segment arm datasets). Using real data with occlusions generated by artificial deletion, we are able to calculate the error of the proposed methodologies; the error is measured as the average distance between the true and the estimated position (for marker and the CoRs) after many runs and different artificial deletions. As expected, the error varies between different instances of marker occlusion. As more markers become available, more information relative to the limb segment is available and thus, a higher accuracy is achieved. The magnitude of the error reported in the results is given in terms of real world distances. Within this work, our methodology (referred as UKF-VTM) was tested and compared against some of the most popular marker prediction approaches: an EKF model, such as [1], using similar marker constraints to the ones proposed here, a cubic spline interpolation and a real-time extrapolation method with marker position constraints, as reported in [55].

Using the VTM we took into consideration the velocity, direction, and acceleration changes of the markers’ trajectories over time. In addition, we considered the rotation of the limbs over time for a more accurate estimation of the marker positions. Certain drawbacks of the simple EKF were overcome by using a more evolved and sophisticated method; the use of a Variable Turn Model gives significant improvements in cases where the trajectories of the markers are variable and have abrupt fluctuations in speed and direction. The proposed system (UKF-VTM), without applying FABRIK for bone constraints, returns an average error (over 20 runs) of 1.296 cm in the case of one missing marker, 3.474 cm when 2 markers are missing, and 8.401 cm when all markers are occluded. The corresponding CoR estimation error is 1.082 cm in the case of one missing marker, 1.9867 cm in the case of two markers and 7.859 cm in the case where no markers are visible to the cameras. Table 1 lists and compares the results of each method implemented here under several occlusion scenarios. In general, the methods fail to track the missing marker paths when the marker constraints were not incorporated, especially in instances were the occlusion lasts longer than a time threshold. Table 1 also tabulates the worst case scenario (distance between true and predicted positions) of marker prediction and CoR estimation in addition to the standard deviation. The worst case scenarios usually appear at abrupt changes in velocity and direction of the missing marker during the occlusion period, where the UKF-VTM model requires some time to efficiently track the target.

Table 1 Average results (over 20 runs) under multiple cases of occlusion (large occlusions of 2,500 frames in total)

Obviously, the interpolation method has the lowest computation cost, but it is an off-line application. The UKF-VTM framework increased the processing time by 20% compared to the simple EKF, processing 315 limb segment pairs per second in MATLAB; which does, however, still allow real-time implementation. The UKF-VTM method performs better than the other methods looked at here, resulting in the most accurate results; our methodology efficiently recovers good estimates of the missing markers and accurate real-time CoR estimation.

A further error reduction was observed when the missing markers were partially visible to one camera. The error is significantly decreased, by 80% for marker prediction and 75% for CoR estimation, compared to the case where this information was ignored. Table 2 gives the prediction error for the case of one missing marker on each limb segment when the missing markers are both entirely occluded and visible in just one camera.

Table 2 Average results (over 20 runs) on real data with occlusions generated by deletions. Case of one missing marker on each limb segment for more than 1500 frames.

The difference between true and estimated positions is further reduced when the fixed inter-joint pairwise assumption was taken into consideration. The CoR error, in our methodology, is decreased on average (over 30 runs) by 11.9% in total, 2.96% when the estimated joints are located at the end of the chain and 12.7% when they are positioned between 2 true positions. FABRIK ensures that the inter-joint pairwise distances are constant over time, thus eliminating the error in the CoR estimation due to unnatural bone extensions. The error reduction is larger in cases where the estimated joints are between 2 true CoR positions, since both true joints, in an iterative fashion, constrain the inter-joint distance from the estimated joints. In the case where the estimated joints are positioned at the end of the chain, only one true joint contributes in the final solution. Table 3 presents the average error and achieved error reduction after applying IK to the case of a humanoid model with artificial deletions on 6 markers (over 30 markers in total) for more than 2,500 frames out of a total of 8,000. At the same time, the processing time was increased by only 5.27%, processing now approximately 300 limb segment pairs per second. FABRIK has been applied to all methodologies ensuring that the inter-joint distances will remain unchanged over time; the CoR error was reduced on average by 12.92% for the EKF, 45.33% for the extrapolation method, and 62.8% for the interpolation approach.

Table 3 Error reduction using Inverse Kinematics

Figure 11(a) shows examples of the true and predicted x-positions of an occluded marker over time, and its corresponding CoR (after incorporating FABRIK for CoR correction to all methods) for the case of a single occlusion. It is clear that the occluded marker can be tracked with high accuracy when it is visible in at least one camera and its CoR position can be reconstructed efficiently even if the occlusion period exceeds 1,000 frames. Note that the joint estimates differ from their true positions, even after the occlusion period, since the rotor R k expressing the rotation between the first and the current frame is an estimation using the predicted marker positions (see (4) and (5)). Nevertheless, the error of the CoR is reduced over time (when all markers are visible) because the new markers positions give a more accurate estimate of the rotor. Figure 11(b) also shows an example of the error variation for all methodologies due to occlusion for the case of 1 missing marker on each limb segment. Clearly, the hole-fitting achieved using the UKF-VTM method has the lowest error for both the average and the worst case scenario; this error is further decreased when the marker is visible to just one camera.

Fig. 11
figure 12

Example showing comparison results for each methodology for the case of 1 missing marker on each limb segment; occlusions were introduced to this dataset for extended time periods of up to 1,500 consecutive frames. The graph shows the forecasted trajectories when the occlusion was between frames 3,000 and 3,600. The upper row shows the results for marker predictions and the lower row for CoR estimates, after applying FABRIK. (a) The zoomed x-coordinate path over time (in this example note that the black and green curves for the markers are exactly overlayed), (b) the corresponding error variation over time

Figure 12 reinforces the above showing the cumulative distribution function (CDF) of the estimation error for the case of one missing marker on each limb segment. The x-axis shows the estimation error and the y-axis shows the probability, for y=a, of having an error less than or equal to a. Hence, for example, the probability that the estimation error is less than or equal to 1.2 cm is approximately 0.57 for the extrapolation and less than 0.4 for interpolation, while it is 0.62 for the case of the UKF-VTM. The median estimation error of the markers using the EKF model is approximately 1.12 cm (0.83 cm for the CoR), where the corresponding median error for the case of UKF-VTM is approximately 1.09 cm (0.75 cm for the CoR and 0.12 cm when markers are visible to just one camera).

Fig. 12
figure 13

Example showing the CDF of the estimation error (over 10 runs) for each methodology for the case of 1 missing marker on each limb segment: (a) the markers CDF and (b) the CoR CDF

The interpolation method produces smooth results since it uses previous and future positions. Our method, along with other approaches that use data only from previous frames, exhibits small oscillations or discontinuities between the last predicted position and the first frame when the marker again becomes visible. This is expected as no future information is used. We can avoid this phenomenon if, instead of using the true marker position as the final result, we continue to use the UKF framework. This way we will have smoother results with just a marginal increase in the marker and COR errors.

Extrapolation and interpolation return useful predictions for short-time occlusions but fail to track the marker positions when the occlusion is maintained for extended time periods, especially if markers change rapidly in direction and velocity. In particular, the path followed by the interpolation method does not reflect the actual state of the missing marker, but only connects the available positions using a cubic spline shape. In contrast, the UKF-VTM performs fairly well even if markers are missing for long time periods.

Figure 13 shows an example of using our algorithm on real data; in blue are the true positions of the markers and CoRs and in red the predicted positions. Figure 14 is another example that compares the results on the leg model; (a) presents the results when only the integrated UKF-VTM model is used, and (b) shows the results when bone length control was incorporated using FABRIK. Note the bone length violation in Fig.14(a), which is most obvious on the hips.

Fig. 13
figure 14

An example of marker prediction and CoR estimation using the proposed methodology. The true positions are coloured blue and the predicted positions red. (a) The artificially deleted data, (b) the predicted data

Fig. 14
figure 15

An example of implementation. Blue represents the true positions and red the predicted positions: (a) using only the integrated UKF-VTM framework, (b) using FABRIK for bone length control

Figure 15 shows another example of implementation under an extreme case with extended marker occlusions. The top picture of the figure shows the results when the fixed inter-joint distance was not imposed and the bottom picture when FABRIK was applied. Obviously, in the first case the skeletal structure was violated since the bones were not restricted to their original lengths; in the second case the results have been improved to a visually more natural shape.

Fig. 15
figure 16

An example of implementation under extreme cases with extended data occlusion. The top picture shows results using the integrated UKF-VTM; the lower picture shows the results when FABRIK was applied in order to maintain the fixed inter-joint distance assumption. The true positions are coloured in blue and the predicted in red

Experiments demonstrate that the proposed methodology can effectively track the occluded markers with high accuracy, even if they are occluded for extended periods of time, recovering in real-time good estimates of the true joint positions. FABRIK controls and corrects the CoR estimates decreasing the error between the estimated and true positions, thereby enabling real-time skeletal reconstruction. The resulting motion is natural and smooth, without oscillations and discontinuities, resembling that of true human movements.

The proposed approach, as well as predicting the coordinates of the invisible LEDs to prevent intermittent tracking, can also be used to isolate noisy data or bad inputs from individual motion capture cameras. As discussed in Sect. 4, motion capture data are noisy due to optical measurement problems, mis-calibration of the optical systems, skin movements, etc. After isolating the data using our UKF-VTM approach, smoother trajectories for both the marker and CoR coordinates has been perceived. It is also observed that the bone lengths are more constant, reducing the standard deviation from 4.2063 cm to 3.9185 cm, when the isolated data have been used.

8 Conclusion

This paper considers the problem of forecasting the occluded marker positions of optical motion capture systems to prevent intermittent tracking and automatically establish a skeleton model to which the markers are attached. An Unscented Kalman Filter framework was deployed for marker prediction using a Variable Turn Model as the observation sequence, in addition to a constant rotation assumption over time. Inferred information from neighbouring markers has been used to give observation states via an approximate rigid body assumption. The proposed marker constraint model is simple and real-time implementable. At the same time, the proposed system is the first method which takes advantage of the information returned by each single camera. The predicted data is used for real-time joint localisation and the joint positions are then re-positioned using an Inverse Kinematics technique by taking into account the fact that inter-joint distances are constant over time. The work is novel in that it incorporates an adjusted version of FABRIK, a real-time iterative IK solver, which ensures fixed bone lengths over time, thus resulting in a more feasible motion. Our methodology outperforms in accuracy the methods used for comparison in this work. It is able to maintain real-time marker predictions, thus enabling good estimates of the CoR, even in the presence of several marker occlusions on each limb segment. It is reliable in datasets with large sequences of null data, even if the limb rapidly changes velocity and direction. The proposed technique could also be used to isolate bad inputs from single cameras; it is a common phenomenon that even one error in a large number of marker inputs can result in several errors per second, pulling the marker position out of the expected path. Thus, this method can be utilised to eliminate pops and jumps from camera switching and errors as they fall outside the predicted positions.

Future work will introduce biomechanical constraints to restrict motions to those from a feasible set; however, prior knowledge of the model and joints will then be needed. Also, a hybrid system with low cost inertial measurement units could be used to validate the method proposed here.