Robust Geodesic Skeleton Estimation from Body Single Depth

Kim, Jaehwan; Kim, Howon

doi:10.1007/978-3-030-01449-0_29

Jaehwan Kim²⁴ &
Howon Kim²⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11182))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

1236 Accesses
2 Citations

Abstract

In this paper, we introduce a novel and robust body pose estimation method with single depth image, whereby it is possible to provide the skeletal configuration of the body with significant accuracy even in the condition of severe body deformations. In order for the precise identification, we propose a novel feature descriptor based on a geodesic path over the body surface by accumulating sequence of characters correspond to the path vectors along body deformations, which is referred to as GPS (Geodesic Path Sequence). We also incorporate the length of each GPS into a joint entropy-based objective function representing both class and structural information, instead of the typical objective considering only class labels in training the random forest classifier. Furthermore, we exploit a skeleton matching method based on the geodesic extrema of the body, which enhances more robustness to joints misidentification. The proposed solutions yield more spatially accurate predictions for the body parts and skeletal joints. Numerical and visual experiments with our generated data confirm the usefulness of the method.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Full-Body Human Pose Estimation by Combining Geodesic Distances and 3D-Point Cloud Registration

Improved Skeleton Estimation by Means of Depth Data Fusion from Multiple Depth Cameras

Human Skeleton Extraction of Depth Images Using the Polygon Evolution

Keywords

1 Introduction

3D body pose estimation, whose goal is to recover the poses of body parts and joints with naturally articulated movements, plays a key role and is a well investigated problem in variety of areas such as computational vision, human-computer interface, and computer animations, and so on. Especially, in the works of Shotton et al. [1, 2], random forest algorithm proposed by [3] is employed to predict body poses from single depth image. The random forest is an ensemble learning method, which has proven fast and effective multi-class classifiers for various works such as image classification, object tracking, facial expression recognition, pose estimation and so on [4,5,6]. The solution proposed by Shotton et al. [1] is embedded within the commercial product ‘Microsoft Kinect sensor^TM’, which is readily available off-the-shelf gaming system. Moreover, the depth comparison proposed by [1] is popularly used in many works [2, 7] as learning features for the random forest classifier. Within the framework of body pose estimation based on the classified body parts [1], the accuracy and reliability of the body parts classification are important because they might influence the consequent learning process to infer the positions of 3D body joints. Furthermore, although the depth comparison features proposed by [1] are easy to compute and efficient in characterizing the change in body parts, the features themselves encode the only local information for the body parts not a global information such as the deformed whole body or the skeletal structure of the body joints. The depth comparison features are insufficient to empower the discriminative ability of the classifier.

Our approach for 3D body pose estimation from a single depth-map is related to the previous works from [8, 9] as they exploit a geodesic distance graph of the body depth image to localize the skeletal joints of the body.

In works [10], a variety of objective functions with the geodesic distance transforms based features for identifying interest objects in the semantic image segmentation with random forest. Moreover, in the context of a decision forest, a joint objective function for pixel classification and shape regression is introduced in [11], which yields more spatially consistent predictions than results from the typical objective function only considering the data labels.

Motivated by existing works [1, 8, 11], we propose a new feature descriptor based on a geodesic path over the body surface, referred to as GPS (Geodesic Path Sequence), which is derived by concatenating sequence of characters correspond to the vectors along deformation paths. In order for the body parts classification, we also incorporate the length of each GPS descriptor into the joint entropy-based objective function involving both the body parts labels themselves and their geodesic structural information, leading to more accurate predictions. The geodesic descriptors reflect a geometry of body surface well, which is expected to improve our body parts classification performance. In addition, we exploit a skeleton matching method based on the geodesic extrema of the body, thereby reducing the misidentification problems for the joints and their bones in the skeletal configuration. As with the step in [1], we develop a ground-truth generator and cheaply create varied realistic data by synthesizing an avatar 3D body model with some interesting poses sampled from a large motion capture data set, which consists of five different motions: standing (T1), walking (T2), hand waving1 (T3.a), hand waving2 (T3.b), sitting (T4), and upstanding (T5) (see Fig. 1, samples similar to standing (T1) set are included in the other sets). In this paper, our final goal is to predict an accurate skeletal configuration of the body pose rather than the standard anatomic positions of the body joints.

2 Geodesic Path Sequence Descriptor

We show how well our GPS provides significant patterns with discriminative information across anatomically different body parts, through the empirical comparison of affinity matrices derived from two different types of features (i.e., our GPS and depth values). Then, we describe how to incorporate our GPS descriptors into the joint entropy-based objective in learning the random forest classifier.

In order to take the human body manifold structure into account, we exploit the geodesic distances and their paths among all points over the body surface and a their barycenter point as feature descriptors for the random forest classifier. At first, we construct an undirected weighted graph $\mathcal {G} =({\mathcal {V}, \mathcal {E}})$ from the body points set $\{\varvec{p}_{\varvec{x}}\} \subseteq \mathcal {V}$, where $\mathcal {V}$ and $\mathcal {E}$ denote a set of vertices and a set of edges with pairwise distances being assigned as edge weights, and each $\varvec{p}_{\varvec{x}_i}$ is a 3D position vector consisting of a 2D coordinate $\varvec{x}_i$ and its depth $d_D(\varvec{x}_i)$ in the body depth image. The set of edges are defined as:

$$\begin{aligned} \mathcal {E} = \{d_E(\varvec{p}_{\varvec{x}_i}, \varvec{p}_{\varvec{x}_j})\in \mathcal {V} \times \mathcal {V}~|&(||\varvec{p}_{\varvec{x}_i} - \varvec{p}_{\varvec{x}_j} ||_2 < \delta ) \nonumber \\ \wedge ~&(||\varvec{x}_i - \varvec{x}_j||_{\infty } \le 1) \}, \end{aligned}$$

(1)

Each edge $d_E(\varvec{p}_{\varvec{x}_i}, \varvec{p}_{\varvec{x}_j})\in \mathcal {E}$ is stored as a weight $w(d_E)$, where a 3D Euclidean distance of less than $\delta $. The Dijkstra geodesic distance $d_G$ is then computed along the shortest path $\mathcal {P}$ between $\varvec{p}_{\varvec{x}_p}$ and $\varvec{p}_{\varvec{x}_q}$, which is defined as:

$$\begin{aligned} d_G(\varvec{p}_{\varvec{x}_p}, \varvec{p}_{\varvec{x}_q}) = \sum _{d_E\in \mathcal {P}(\varvec{p}_{\varvec{x}_p},\varvec{p}_{\varvec{x}_q})} {w(d_E)} \end{aligned}$$

(2)

The graph based geodesic descriptors are invariant to large motion deformations and geometric transforms as long as the local connection relationships remain, which well reflect the local body structure [8, 12]. We then generate a body-centric geodesic map by measuring the Dijkstra geodesic distances for all N points on the body, $\{ d_G(\varvec{p}_{\varvec{x}_i}, \varvec{p}_{\varvec{x}_0}) \}_{i=1}^N$. Each Dijkstra geodesic distance, $d_G(\varvec{p}_{\varvec{x}_i}, \varvec{p}_{\varvec{x}_0})$, is associated with the sum of edge weights along a shortest path between a point, ${\varvec{x}_i}$, over the body surface and a barycenter of the body, ${\varvec{x}_0}$, under an assumption that points on anatomically similar body parts maintain a nearly constant geodesic distance. From the body-centric geodesic map, we finally define a descriptor based on the geodesic path which is represented by accumulating sequences of characters correspond to the vectors along the body’s deformation path. The GPS for a point $\varvec{x}_i$ is defined as:

$$\begin{aligned} \varvec{d}_{g}({\varvec{x}_i}) = [c_1, c_2, \cdots , c_i], \end{aligned}$$

(3)

where $c_i$ is a character indicating the direction of between $\varvec{p}_{\varvec{x}_{i-1}}$ and $\varvec{p}_{\varvec{x}_{i}}$.

Dynamic time warping (DTW) is a powerful algorithm for measuring similarity between two time series by finding an optimal alignment. In here, we employ the fast DTW algorithm [13] in order to compute the similarity between two GPS descriptors in the binary test function of random forest within linear time. Fig. 2 shows that affinities between the inter- and intra- body parts for data aligned in the parts. All distance values for the affinities are normalized between 0 and 1. The more the affinity matrix has well-formed block diagonal structure, the better the partitioning of different parts. As shown in Fig. 2, the simple depth comparison features empirically do not provide enough discriminative power in learning the classifier. In case of two points having similar depth values, but located at different parts of the body, the features likely lead to erroneous predictions in the classification problem. Meanwhile, our proposed GPS is robust to large motion deformation, and it is effectively discriminative for different body parts.

3 Joint Entropy-Based Body Parts Classification

For formulation of the body parts classification from single depth image, we assume that a set of N training samples $\mathcal {Q} = \{ ({\varvec{f}_{\theta }}_i, \varvec{l}_i ) \}_{i=1}^{N}$ is given. The input variable ${\varvec{f}_{\theta }}_i$ corresponds to a feature for an individual pixel $\varvec{x}_i$. The output variable is a discrete label $\varvec{l}_i \in \mathcal {C}$, where $\mathcal {C}$ is a finite set of body labels.

In a given pixel $\varvec{x}$ in depth image D, we propose a GPS comparison feature similar to the existing depth comparison feature [1], which is defined as:

$$\begin{aligned} \varvec{f}_{\theta }(D, \varvec{x}) = d_{W}\left( \varvec{d}_g\left( \varvec{x} + \frac{\varvec{i}}{ | \varvec{d}_g\left( \varvec{x}\right) |}\right) , \varvec{d}_g\left( \varvec{x} + \frac{\varvec{j}}{| \varvec{d}_g\left( \varvec{x}\right) |}\right) \right) , \end{aligned}$$

(4)

where $d_W(\varvec{d}_g({\varvec{x}_{\theta }}_i),\varvec{d}_g({\varvec{x}_{\theta }}_j))$ is a warp path distance between $\varvec{d}_g({\varvec{x}_{\theta }}_i)$ and $\varvec{d}_g({\varvec{x}_{\theta }}_j)$ descriptors. $\theta = (\varvec{i}, \varvec{j})$ is a pair of offsets to the pixel $\varvec{x}$, and the scale invariance of depth is considered through the normalized by the length of $\varvec{d}_g(\varvec{x})$. Each node in tree is trained over a set of splitting candidates $\phi = \{(\theta , \tau )\}$, where feature parameter $\theta $ and partition threshold $\tau $. The split candidates $\phi $ are randomly sampled from uniform distribution. For each $\phi $ ($m=|\phi |$), the subsets $\mathcal {Q}_{L}$ and $\mathcal {Q}_{R}$ partitioned from the original set of data $\mathcal {Q}$ are evaluated with our various energy functions at the current node. The partitioning is performed as follows:

$$\begin{aligned} \mathcal {Q}_{L}(\phi )= & {} \{(D, \varvec{x})~ |~ \varvec{f}_{\theta }(D, \varvec{x}) < \tau \} \nonumber \\ \mathcal {Q}_{R}(\phi )= & {} \mathcal {Q}~ \backslash ~ \mathcal {Q}_{L}(\phi ) \end{aligned}$$

(5)

For the forest training procedure, the goal is to find optimal splitting parameters of each node and build partitioning binary tree which minimizes the objective function $\mathcal {J}$ defined as follows:

$$\begin{aligned} \phi ^{*}={{\arg \min }}_{\phi }{\mathcal {J}(\mathcal {Q}, \phi )}. \end{aligned}$$

(6)

An optimal criteria $\phi ^{*} = \{ {\theta }^{*}, \tau ^{*}\}$ is defined as the split parameters of the node, and later used for prediction of new input data. The entropy is the expected value of the information contained in each message. The Shannon’s entropy is generally used for training forests. Our goal is now to learn the joint probability $p_{t}(\varvec{l}, \varvec{g} | \varvec{f}_{\theta })$, where new variable $\varvec{g} \in \mathbb {R}^3$ is a continuous regression variable for describing the relative 2D offsets between the depth pixel $\varvec{x}$ and a barycenter of the body $\varvec{x}_0$, and the geodesic distance $d_G(\varvec{p}_{\varvec{x}}, \varvec{p}_{\varvec{x}_0})$. By using the chain rule, we rewrite the joint distribution as $ p_{t}(\varvec{l}, \varvec{g} | \varvec{f}_{\theta }) = p_{t}(\varvec{l} | \varvec{f}_{\theta })p_{t}(\varvec{g} | \varvec{f}_{\theta }, \varvec{l} )$, where we assume that $p_{t}(\varvec{g} | \varvec{f}_{\theta }, \varvec{l} )$ is a multivariate normal distributions. That is, $p_{t}(\varvec{g} |\varvec{f}_{\theta }, \varvec{l} ) \sim \mathcal {N}(\varvec{\mu }_{\varvec{g}| \varvec{l} }, \varvec{\varSigma }_{\varvec{g}| \varvec{l}} | \varvec{g}, \varvec{f}_{\theta }, \varvec{l} )$ is one distribution per class label $\varvec{l}$. We actually define the joint objective function $\mathcal {J}$ as follows:

$$\begin{aligned} \mathcal {J}(\mathcal {Q}, \phi )= & {} \sum _{\scriptstyle {p\in \{L,R\}}}\sum _{\scriptstyle {\varvec{x} \in \mathcal {Q}_{p}}} {\frac{|\mathcal {Q}_{p}|}{|\mathcal {Q}|}} {\psi _{E}{(\varvec{l}, \varvec{g} ; \mathcal {Q}_{p})}} , \end{aligned}$$

(7)

$$\begin{aligned} \psi _{E}{(\varvec{l}, \varvec{g}; \mathcal {Q}_{p})}= & {} -\sum _{\varvec{l}\in \mathcal {C}}\int _{\varvec{g}\in \mathbb {R}^3}{p_{t}(\varvec{l},\varvec{g} |\varvec{f}_{\theta })} \text{ log }({p_{t}(\varvec{l},\varvec{g} | \varvec{f}_{\theta })})d\varvec{g}, \nonumber \\= & {} \underbrace{-\sum _{\varvec{l}\in \mathcal {C}} {p_{t}(\varvec{l} | \varvec{f}_{\theta })\text{ log }(p_{t}(\varvec{l} | \varvec{f}_{\theta }))}}_{\psi _{E}(\varvec{l}; \mathcal {Q}_{p})} \nonumber \\&\underbrace{+\sum _{\varvec{l}\in \mathcal {C}}{p_{t}(\varvec{l} | \varvec{f}_{\theta })} \Big (\frac{1}{2}\text{ log }((2\pi e)^3 |\varvec{\varSigma }_{\varvec{g} | \varvec{l} }| ) \Big )}_{\psi _{E}(\varvec{g}; \mathcal {Q}_{p} | \varvec{l})}, \end{aligned}$$

(8)

where $|\varvec{\varSigma } |$ denotes the determinant of a matrix.

Finally, the models of random forest are achieved by optimizing the joint objective function Eq. (8), including the conventional objective ${\psi _{E}(\varvec{l}; \mathcal {Q}_{p})}$ for a discrete label $\varvec{l}$ as well as the objective ${\psi _{E}(\varvec{g}; \mathcal {Q}_{p} | \varvec{l})}$ for a continuous variable $\varvec{g}$ given $\varvec{f}_{\theta }$ and $\varvec{l}$ variables. In here, the posterior that we are interested in is about the body parts classification. The overall prediction of the forest with T trees is estimated by averaging the individual predictions together and the output is predicted by inferring:

$$\begin{aligned} \varvec{l}^{*} = {{\arg \max }}_{\scriptstyle {\varvec{l} \in \mathcal {C}}}p({\varvec{l} | \varvec{f}_{\theta }}) = {{\arg \max }}_{\scriptstyle {\varvec{l} \in \mathcal {C}}}\frac{1}{T}\sum _{t=1}^T{p_{t}(\varvec{l} | \varvec{f}_{\theta })}. \end{aligned}$$

(9)

4 Body Joints and Skeleton Identification

Given a body-centric geodesic map, as with the way in [8, 9, 12], the extreme points are computed by incrementally maximizing geodesic distances on the body surface. Based on the classified body parts and the geodesic paths between the body’s barycenter and its geodesic extrema (i.e., end-nodes of the human skeletal graph), we localize and identify the joint candidates lying on the paths. The joint candidates are selected with $\angle (\overrightarrow{\varvec{x}_{k-1}\varvec{x}_{k}},\overrightarrow{\varvec{x}_{k}\varvec{x}_{k+1}}) > \epsilon $). Here, $\angle (\overrightarrow{\varvec{x}_{k-1}\varvec{x}_{k}},\overrightarrow{\varvec{x}_{k}\varvec{x}_{k+1}})$ is an angle between two vectors $\overrightarrow{\varvec{x}_{k-1}\varvec{x}_{k}}$ and $\overrightarrow{\varvec{x}_{k}\varvec{x}_{k+1}}$, where the three points $(\varvec{x}_{k-1}, \varvec{x}_{k}, \varvec{x}_{k+1})$ being around the point ${\varvec{x}_k}$, are on the same GPS $\varvec{d}_{g}({\varvec{x}_k})$. $\epsilon $ is a threshold depending on the body type, and it is empirically set to about 30 in our experiments. After obtaining the joint candidates set $\{\varvec{x}_i\}$, the representative label $\varvec{l}^{'}$ is evaluated as Eq. (10) from the local window at each joint candidate, where the local patches are based on the already classified body parts. Fig. 3 describes the meta-examples generated at each step.

$$\begin{aligned} \varvec{l}^{'}_{\varvec{x}_i} ={\mathop {\arg \max }\limits _{\varvec{l}_{\varvec{u}}^{*}}}\sum _{\scriptstyle {\varvec{u}\in \mathcal {W}_{\varvec{x}_i}}}\sum _{\scriptstyle {\varvec{l} \in \mathcal {C}}} {\delta (\varvec{l}_{\varvec{u}}^{*}, \varvec{l})}, \end{aligned}$$

(10)

where $\varvec{l}_{\varvec{u}}^{*}$ is the label for the position $\varvec{u} \in \mathbb {R}^2$ being in the local window $\mathcal {W}_{\varvec{x}_i}$ centered at $\varvec{x}_i$. $\delta $ is a kronecker delta function. Our main idea is to match two graphs by comparing the labeled sets of ordered points on the paths between the body center and the geodesic extrema of the skeletal configuration under the assumption that there are meaningful joints for the human skeletal structure in the set of joint candidates. In here, the body center and the geodesic extrema labels are defined as 7 (center), 0 (head), 10 (right hand), 13 (left hand), 16 (right foot), and 19 (left foot), respectively. All joint candidates are identified and clustered as in Fig. 3(d) by matching with a given template graph as Fig. 4(a). In order to match the sub-skeletons (i.e., each skeleton for limbs and trunk) with the template graph, we consider a weighted bipartite graph such as illustrated in Fig. 4(b), which is with two vertex sets, a set of sub-skeleton labels and a set of joint labels, and the weight of each edge is defined as a DTW distance between two consecutive joint labels. Given the bipartite graph, the matching is performed by using the Hungarian method [14]. Finally, the skeletal graph with 15 labeled nodes is extracted, which correspond to the whole body skeleton (see Fig. 5(b)).

5 Numerical Experiments

We show the usefulness of our method, through the empirical comparison to different objective functions based on different types of features (i.e., our GPS and depth comparison feature [1]). We applied our method to samples from our ground-truth data sets, consisting of five types of motions: forward/backward walking, hand waving1, hand waving2, sitting, and standing; each motion group has approximately 100 frames. As in a conventional leave-one-out training scheme, the sequences for each model is evaluated with the trained model from other models. For quantitative evaluation of estimated joint positions and skeleton accuracy, we present three different types of measurements: (a) we estimate the mean absolute error (MAE) Eq. (11) in order for the training error evaluation of the classified body parts; (b) the mean average precision (mAP) is evaluated by averaging the precision of the estimated 15 joints on each frame, which is to determine whether the position of the estimated joint is within a given threshold relative to the ground-truth (in here, the threshold is fixed to $max(\{|\varvec{s}_{i}|\}_{i=0}^{4})/10$); (c) the other is a new measurement of similarity between the estimated skeletons and the ground-truth skeleton by comparing their DTW score, which is referred to as mean average matching (mAM) and defined as Eq. (12).

$$\begin{aligned} MAE = \frac{1}{N} \sum _{\scriptstyle {i}}\sum _{\scriptstyle {\varvec{l} \in \mathcal {C}}} {| \varvec{l}_{i}^{*} - \varvec{l}_{i}^{G} |} \in [0, 1], \end{aligned}$$

(11)

where $\varvec{l}_{i}^{*}$ and $\varvec{l}_{i}^{G}$ represent the predicted label of data $\varvec{x}_i$ and the corresponding ground-truth label, respectively.

$$\begin{aligned} AM(i) = \frac{1}{| \mathcal {F} |}\sum _{f \in \mathcal {F}}{\frac{d_{W} ( \varvec{d}_g(\varvec{s}_{fi}^{*}), \varvec{d}_g(\varvec{s}_{fi}^{G}))}{max(\{|\varvec{s}_{i}|\}_{i=0}^{4})}} \in [0, 1], \end{aligned}$$

(12)

where AM(i) is an average matching score between the estimated sub-skeleton $\varvec{s}_{i}^{*}$ and the corresponding ground-truth sub-skeleton $\varvec{s}_{i}^{G}$, one body skeleton has five sub-skeletons, $\{\varvec{s}_{i}^{*}\}_{i=0}^{4}$, and the mean matching score is normalized by the maximum length of the five sub-skeletons. $\mathcal {F}$ is a set of target frames.

Table 1. MAE, mAP, and mAM results for data sets used in Fig. 5(c, d) at depth level 30.

Full size table

In [1, 2], they apply a local mode-finding approach based on mean-shift with a weighted Gaussian kernel for each classified body part to infer the final positions of 3D skeletal joints. However, as shown in Fig. 5(a) and 6, the local modes obtained from the misclassified outlying parts such as the left hand (label 13) and the right foot (label 16) cause failing skeleton results. Meanwhile, as shown in Fig. 5(b) and the MAE in Table 1, our method provides well-classified body parts as well as well-matched body skeletons through our GPS based-joint entropy and skeletal matching methods. As shown in Fig. 5 and Table 1, although our method is slightly less accurate than [1] in the mAP, our method offers the advantage of more accurately matching the body skeleton. In our method, the position of a target joint to be predicted depends on its GPS, geodesic distance, and the inclination angle between its neighboring joint candidate vectors on the GPS. Because of this assumption, it can be seen that the measured mAP value at the anatomical joint position reference is slightly lower. On the other hand, the mAM value makes us confirm that our proposed method well reflects not only the local features in the body depth data, but also the global structures in the skeletal configuration. Figure 6 shows a visual comparison of the predicted results.

6 Conclusions

We have presented a novel geodesic path sequence (GPS) descriptor, joint entropy-based objective with the GPS, and skeleton matching method for 3D body pose estimation based on the body parts classification, whereby it is possible to robustly predict the skeleton’s position under severe body deformations. We also incorporate the GPS descriptor into a joint entropy-based objective function for learning both class and structural information about the body parts. Useful aspects of our proposed method could be summarized as follows: (a) The GPS descriptors can be widely used in variety of fields as a descriptor for deformable object representation; (b) The joint entropy objective function based on our GPS comparison features well reflects geodesic structural information over the body surface, leading to more accurate predictions in the random forest classifier; (c) The skeleton matching & identificaton based on the geodesic extrema of the body, which enhance more robustness to joints mis-identification. Empirical comparison with the conventional solution, single entropy-based objective with depth comparison features, confirmed the high performance of our method.

References

Shotton, J., et al.: Real-time human pose recognition in parts from a single depth image. In: Cipolla, R., Battiato, S., Farinella, G. (eds.) Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1297–1304. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-28661-2_5
Shotton, J., et al.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2821–2840 (2013)
Article Google Scholar
Breiman, L.: Random forests. J. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Criminisi, A., Shotton, J. (eds.) Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1022–1029. Springer, London (2009). https://doi.org/10.1007/978-1-4471-4929-3_11
Tan, D.J., Ilic, S.: Multi-forest tracker: a chameleon in tracking. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1202–1209 (2014)
Google Scholar
Dapogny, A., Bailly, K., Dubuisson, S.: Pairwise conditional random forests for facial expression recognition. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 3783–3791 (2015)
Google Scholar
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: Proceedings of International Conference on Computer Vision, pp. 415–422 (2011)
Google Scholar
Schwarz, L., Mkhitaryan, A., Mateus, D., Navab, N.: Estimating human 3d pose from time-of-flight images based on geodesic distances and optical flow. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, Santa Barbara, CA, pp. 700–706 (2011)
Google Scholar
Baak, A., Müller, M., Bharaj, G., Seidel, H., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Proceedings of International Conference on Computer Vision, pp. 1092–1099 (2011)
Google Scholar
Kontschieder, P., Kohli, P., Shotton, J., Criminisi, A.: GeoF: geodesic forests for learning coupled predictors. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 65–72 (2013)
Google Scholar
Glocker, B., Pauly, O., Konukoglu, E., Criminisi, A.: Joint classification-regression forests for spatially structured multi-object segmentation. In: Proceedings of European Conference on Computer Vision, Florence, Italy, pp. 870–881 (2012)
Google Scholar
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: Proceedings of International Conference on Robotics and Automation, pp. 3108–3113 (2010)
Google Scholar
Salvador, S., Chan, P.: FastDTW: toward accurate dynamic time warping in linear time and space. In: KDD Workshop on Mining Temporal and Sequential Data, pp. 70–80 (2004)
Google Scholar
Kuhn, H.: The hungarian method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This research is supported by Ministry of Culture, Sports and Tourism (MCST) and Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & Development Program R2016030043.

Author information

Authors and Affiliations

Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea
Jaehwan Kim & Howon Kim

Authors

Jaehwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Howon Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaehwan Kim .

Editor information

Editors and Affiliations

DGA, Bagneux, France
Jacques Blanc-Talon
Laboratoire XLIM, Futuroscope Chasseneuil Cedex, France
David Helbert
Ghent University, Ghent, Belgium
Wilfried Philips
CSIRO-ICT Centre, Canberra, ACT, Australia
Dan Popescu
University of Antwerp, Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, J., Kim, H. (2018). Robust Geodesic Skeleton Estimation from Body Single Depth. In: Blanc-Talon, J., Helbert, D., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2018. Lecture Notes in Computer Science(), vol 11182. Springer, Cham. https://doi.org/10.1007/978-3-030-01449-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-01449-0_29
Published: 25 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01448-3
Online ISBN: 978-3-030-01449-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Geodesic Skeleton Estimation from Body Single Depth

Abstract

Similar content being viewed by others

Full-Body Human Pose Estimation by Combining Geodesic Distances and 3D-Point Cloud Registration

Improved Skeleton Estimation by Means of Depth Data Fusion from Multiple Depth Cameras

Human Skeleton Extraction of Depth Images Using the Polygon Evolution

Keywords

1 Introduction

2 Geodesic Path Sequence Descriptor

3 Joint Entropy-Based Body Parts Classification

4 Body Joints and Skeleton Identification

5 Numerical Experiments

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Robust Geodesic Skeleton Estimation from Body Single Depth

Abstract

Similar content being viewed by others

Full-Body Human Pose Estimation by Combining Geodesic Distances and 3D-Point Cloud Registration

Improved Skeleton Estimation by Means of Depth Data Fusion from Multiple Depth Cameras

Human Skeleton Extraction of Depth Images Using the Polygon Evolution

Keywords

1 Introduction

2 Geodesic Path Sequence Descriptor

3 Joint Entropy-Based Body Parts Classification

4 Body Joints and Skeleton Identification

5 Numerical Experiments

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation