1 Introduction

Skeletons capture the essential topology of the underlying shapes, and have been widely used in various applications, such as model segmentation, registration, animation, and retrieval. Many skeleton extraction methods have been proposed, including high dimensional medial representation [31] and 1D curve representation [10]. Recently, owing to the simplicity of topology and ease of manipulation, 1D curve skeleton extraction has become a popular research topic in computer graphics community, for both closed polygonal meshes [2] and incomplete point cloud [15, 34]. To extract satisfactory skeletons from point cloud, however, much space has been left to be improved, among which robustness and accuracy are two key problems need to addressed. In this paper, we focus on 1D skeleton extraction from the point cloud.

For the point cloud models, especially for the raw data acquired via laser scanning, to effectively extract their skeletons is still a challenging problem for the following reasons. First, unlike closed polygonal meshes, there is no topology information between the sampled points, many geometry operator, such as the mesh-decomposition based method [18], requires mesh connectivity, cannot be extended directly to them. Second, for the scanned raw data, the point cloud usually exhibits missing regions, heavy noise and outliers, to infer the interior skeleton of input model is a non-trivial problem. Finally, as skeleton extraction is an ill-posed problem, to obtain high quality results, many existing algorithms require tedious model-specific parameters tuning, thus, uniform parameter setting is preferable for processing point cloud models.

In this paper, inspired by the distance field based skeleton extraction method [12] and the \(L_1\)-median skeleton extraction method [15], we propose a distance field guided \(L_1\)-median method to extract 1D curve skeleton from the point cloud. Our method works in the following steps. We first voxelize the input point cloud, and obtain a voxelized representation for the point cloud. This type of representation is even effective for model with large missing regions. With the voxelization results, we compute the distance field for the point cloud, where each voxel has a nearest distance from the model boundary. Then, from the distance field, we extract the initial skeleton of the model using a multi-scale parameter controlled thinning method. Finally, we incorporate the initial skeleton into the \(L_1\)-median optimization and develop a distance field guided \(L_1\)-median to extract a complete skeleton from the point cloud. Figure 1 illustrates the overview of the proposed algorithms.

Our method has following advantages. Compared with the distance field based method [12], our method can provide cleaner results. As distance field based methods [12] usually generate spotted locally minimal/maximal distance, which results in unconnected and incorrect skeletons. Compared with the \(L_1\)-median method [15], our method is more robust and efficient. With distance field as the guidance in the iterative skeleton generation, the sampled points move to the center of region along the distance increasing direction, avoiding zigzagging moving. Finally, with the distance field, we can specify more suitable neighborhood size in \(L_1\)-median skeleton optimization to obtain desirable results, which significantly alleviates the problem of trial-and-error.

Our main contribution is that we develop a distance field guided \(L_1\)-median skeleton extracting method, which integrates the advantages of both the distance field and the \(L_1\)-median projection skeleton extraction method. Our algorithm can be applied to the raw scanned data, requires only minimal user interaction to extract topologically clean and accurate skeletons, that is, we only need to set appropriate parameters to provide satisfactory skeletons.

Fig. 1
figure 1

System overview. The input cloud is first voxelized, then, we estimate the distance field. With the distance field, we compute the initial skeleton. Finally, we refine the initial skeleton using distance field guided \(L_{1}\) median

2 Related work

Many algorithms have been proposed for skeleton extraction from static models, we refer the reader to the survey by Cornea and Min [10] and the most recent work [16]. Several skeleton extraction methods from surface sequences also have been proposed, please refer to [27]. In this section, we only review the most related methods, such as the distance field based skeleton extraction methods, and 1D curve skeletons for meshes and point cloud models. For methods on medial axis and other higher dimensional medial representations, please refer to [31].

Skeleton extraction using distance field A variety of distance field methods have been developed for skeleton extraction [4, 5, 12, 13, 37]. The distance field can be computed using the fast marching method [29]. Most of these algorithms perform the following three steps: (1) finding the ridge points which locally center within the object, (2) pruning the insignificant extreme points, and (3) reconnecting the remaining extreme points. The main advantage of these methods is that the distance field computation is usually fast, the disadvantage is that pruning the insignificant extreme points is a non-trivial task, which may produce unclean skeletons. In this paper, inspired by parameter controlled volume thinning method [12], we extract the skeleton as the initial value for the following skeleton extraction optimization.

Skeleton extraction for mesh model Various methods have been developed to extract skeletons from the watertight surface meshes. Important works include mesh decimation based method [21], segmentation based method [18], field based approach [8, 25], geodesic distance based method [11], mean curvature flow based surface contraction [2, 9, 33], and method coupling graph contraction and surface clustering [17]. However, these methods require the mesh connectivity to obtain the skeletons. For example, Katz and Tal [18] applied graph cut on the mesh model to perform mesh decomposition, while graph cut requires mesh connectivity. Au et al. [2] extracted skeleton from the mesh model by shrinking the mesh with constrained Laplacian smoothing, and the smoothing operator also requires mesh connectivity.

Skeleton extraction for point cloud Both [6] and [26] constructed Reeb graph over point clouds to compute skeletons. Sharf et al. [30] applied deformable model, which involved multiple fronts inside the model, to capture the model’s volumetric shape. Then they tracked the fronts’ centers, merged and filtered the insignificant branches to obtain the final curve skeleton. Tagliasacchi et al. [34] proposed a ROSA (rotational symmetry axis) method to extract skeleton from incomplete point cloud. This method assumed the input shape to be cylindrical, furthermore, ROSA required normal information, while normal is difficult to estimate for sample points on the raw scanned data. Cao et al. [7] extended the mean curvature flow based skeleton extraction method [2] to the point cloud models. Livny et al. [24] extracted the tree skeletal structures from point cloud models. Li et al. [20] employed arterial snakes to extract skeleton from the incomplete point cloud, while this algorithm focused on topology recovery. Verroust et al. [35] computed the level sets of the distance map using neighborhood graph to extract curve skeleton of tabular shape such as blood vessels. Kustra et al. [19] computed refined skeleton from raw medial-surface point clouds. Skeletons can also be used to recover the intrinsic reflection symmetries of shapes, Zheng et al. [36] applied curve skeletons to perform intrinsic symmetrization for the input shape.

More recently, Huang et al. [15] introduced a \(L_1\)-medial projection operator to extract curve skeleton from 3D point cloud. As a state of the art method, without preprocessing, this algorithm can be directly operated on raw scanned data with poor quality to produce compelling results. However, in the iterative contraction procedure, the neighborhood size setting is important for obtaining satisfactory results. Although sophisticated adaptive neighborhood size tuning method have be given in [15], to process complex models, even using model-specific parameters tuning, to obtain desirable skeleton is a non-trivial task for this method [15].

3 System overview

The input to our method is one unorganized set of points \(Q=\{q_j\}_{j\in J}\subset R^3\), typically unoriented, unevenly distributed, and containing noise and outliers. The output is a 1D curve skeleton \(X=\{x_i\}_{i\in I}\subset R^3\) representing a one-dimensional local center of the shape underlying the input Q. The main steps of the algorithm are as follows.

Distance field computing The input raw scan model is first uniformly voxelized using multi-scale dividing method. On the output voxelization, we apply fast marching method to compute the distance field for the model.

Initial skeleton extraction With the computed distance field, we produce an initial skeleton for the input model using the multi-scale parameter controlled thinning, which provides an initial skeleton used in the following skeleton optimization.

Skeleton refinement By incorporating the initial skeleton into the \(L_1\)-median optimization, we develop a distance field guided \(L_1\)-median to extract the complete skeleton from the point cloud.

In our system, the voxelization presentation with distance field provides the internal, external and boundary information of model, which guides the sample points to move to the center of the model. With the guidance of initial skeleton and the distance field, we improve the \(L_1\)-medial method [15], and the projected samples will converge to the center of model in one optimal way. Figure 1 gives the system overview of the proposed method.

Fig. 2
figure 2

Point cloud voxelization. a Input point cloud. bd are voxelization results with different scales

4 Distance field computing

It is more difficult to voxelize the point cloud model than the closed mesh model. To achieve high-resolution voxelization for the point cloud, we first have to specify the boundary voxels of the point cloud model, then identify the outside voxels and the inner voxels. Finally, we refine these voxels. Similar to most of other methods, our method also cannot cope with the point clouds representing a non-orientable surface, such as the Klein bottle.

As point cloud is usually not closed, if the voxel is too small, the voxels cannot envelop the boundary of point cloud. To this end, we first build one bounding box for input model, and voxelize this bounding box with an initial voxel size, which is defined as \(s_0=d_{bb}/9\), (\(d_{bb}\) is the diagonal length of the bounding box). If a voxel contains one or more points of model, it is a boundary voxel. We specify an exterior voxel as seed and use flooding method to specify the outside voxels. With the boundary voxels and outside voxels, the remained voxels are inner voxels in the volume.

We then divide the volume with a multi-scale refinement strategy. In each refinement scale, we further uniformly divide each voxel into \(3 \times 3 \times 3\) smaller sub-voxels. We do not change the status of the sub-voxels from the interior voxels and the exterior voxles. For each divided voxel from the boundary voxel, if it contains one or more sample points of the input model, it is labeled as boundary voxel; if one of its 26 neighbors is exterior voxel, it is labeled as exterior voxel; otherwise, it is labeled as interior voxel. We subdivide them in the same way progressively until the voxel resolution up to a given threshold \(V^3(V \times V \times V)\), (which is essentially adjustable) and V is set to 100 by default. Figure 2 shows the refinement procedure of voxelizing the dinosaur model.

For models with highly concave regions, mis-classification may occur for the sub-voxels of boundary voxels in these concave regions during the refinement procedure of voxelization. It happens when an exterior voxel is surrounded by neighboring voxels containing points, it is mis-classified as interior. Although this rarely happens, this kind of mis-classification can be alleviated or even avoided by decreasing the (initial) voxel size of the multi-scale voxelization heuristically. Also note that even some voxels are mis-classified, they have no much impact on the final skeleton results, since we extract the skeletons using a global optimization system.

This scheme is effective even for models with noise, outliers, and missing areas. As illustrated in Fig. 2, we effectively voxelize the model components with large missing regions.

For raw inputs with heavy noise, outliers and missing regions, we can pre-filter the input model using the locally optimal projection (LOP) [23]. Actually, even if we preprocessed the input model with LOP, its topology would not be altered. Thus, we can obtain desirable skeleton that captures the essential topology of the model all the same.

For models with sparsely sampled regions, we resort to upsampling. Many point set upsampling methods have been proposed, such as moving least squares method (MLS) [1]. In our method, we employ a simple method. To perform upsampling, we first detect regions with insufficient sampling density according to the local sampling density \(\rho _i\). We estimate \(\rho _i\) for each \(q_i\) by finding the sphere with minimum radius \(r_i\) centered at \(q_i\) that contains the k-nearest neighbors to \(q_i\). Then \(\rho _i\) is defined as \(\rho _i=k/(r_{i})^{3}\). If \(\rho _i\) is below a given threshold value, then new sample points must be inserted. In our experiments, we insert the points using the simple linear interpolation, which works well for our skeleton extraction purpose. Figure 8 shows a lion model containing 8K points and its upsampled version containing 30K points after upsampling.

With the voxelization representation \(P=\{p_i\}_{i\in L}\) of point cloud, where L is the total number of the voxels in P, we apply the fast marching method [29] to approximate the distance field for the voxelization representation. The distance field \(\mathrm{DT}_p\) of each interior voxel p of a voxelized volume is the smallest distance from this voxel to the boundary of the volume. Note that, to compute distance field for the model with many disconnected components, we can estimate the distance field for each component, respectively, to produce the initial skeletons. Figure 5 shows the color-coded distance fields of two models.

Fig. 3
figure 3

Point cloud voxelization and distance field computing. a Input model, b, c are voxelization results with different scales, observing the voxelization results on the outliers, b is the color-coded slice of the distance field

Fig. 4
figure 4

Comparison of initial skeleton extraction methods. a Result of our method. b Result setting \({ TP} = 0.7\). c Result setting \({ TP} = 0.5\)

Note that for model with heavy noise/ourliers, using our voxelization method, these noise/ourliers will be contained in the voxels, and these voxels are classified into boundary voxels, as illustrated in Fig. 3c. In the distance field estimation using the fast marching method, these voxels are set the initial values (in our experiments, the initial value is zero). As these voxels are scattered and are disconnected with model, in the fast matching method, the distance field will not propagate in these voxles. We can remove these voxels from the generated distance field, and these voxels will not be included in the following skeleton extraction, as illustrated in Fig. 3d.

5 Initial skeleton extraction

Gagvani et al. [12] selected the ridge voxels having local maximal distance value as the candidates for curve skeleton points. They decided whether one voxel is a ridge voxel by performing the comparisons between the distance field value at a voxel and the average distance field value of its neighbors. That is, if a voxel meets the following condition, it is labeled as a skeleton point:

$$\begin{aligned} { MNT}_p< & {} { DT}_p-{ TP} \nonumber \\ { MNT}_p= & {} \sum ^{26}_{i=1}{} { DT}_{p_i}/26,\quad p_i\in { VP} \end{aligned}$$
(1)

where \({ DT}_p\) is the distance between voxel p and its nearest boundary voxel, VP is union of 26 neighboring voxels, \(p_i\in { VP}, { TP}\) is the thinning parameter, determining how close \({ MNT}_p\) should be to \({ DT}_p\) for p to be added to the skeleton. It controls the thickness of skeleton.

While using this parameterized thinning method, users have to specify the value for TP, which is neither straightforward nor so easy to estimate. Inappropriate parameter TP setting will make the initial skeleton too thick or too thin, furthermore, some noise skeleton points will be introduced. To address this problem, we propose a multi-scale method to identify the initial skeleton points. We compute TP in three different scales (with neighborhood size of \(3 \times 3 \times 3\), \(5 \times 5 \times 5\) and \(7 \times 7 \times 7\), respectively):

$$\begin{aligned} { MNT}_p^1= & {} {\sum \nolimits _{i = 1}^{26} {D{T_{{p_i}}}/26,\quad { MNT}_p^1< D{T_p} - { TP}} _1} \nonumber \\ { MNT}_p^2= & {} {\sum \nolimits _{i = 1}^{124} {D{T_{{p_i}}}/124,\quad { MNT}_p^2< D{T_p} - { TP}} _2} \nonumber \\ { MNT}_p^3= & {} {\sum \nolimits _{i = 1}^{342} {D{T_{{p_i}}}/342,\quad { MNT}_p^3 < D{T_p} - { TP}} _3} \end{aligned}$$
(2)

For the 26 neighbors of each voxel p, we first compute their average distance field \({ MNT}_p^1\). Then \({ TP}_{1}\) is defined as the difference between the average value of \({ MNT}_p^1\) of all the voxels and \({ DT}_p\) of all the voxels in the model, that is, \({ TP}_{1}=c\cdot \frac{\sum _{p\in L} { MNT}_{p}^{1}-\sum _{p\in L} { DT}_{p}}{|L|}\), and c is a coefficient between 0 and 1. We empirically set \(c=0.86\) , and find it an appropriate value for all of our experiments. \({ TP}_{2}\) and \({ TP}_{3}\) are evaluated in the similar way for the 124 neighbors and the 342 neighbors of each p, respectively. If voxel p meets all of the conditions of Eq.2, then we label p as the skeleton points.

As illustrated in Fig. 4, our method greatly improves the skeleton extraction results without tedious parameter setting. However, for complex model, as illustrated in Fig. 5c, the result is still not satisfactory, where there are many points with local minimal/maximal value using the approximate distance field. Thus, uniform parameter TP setting cannot handle all of situations, we need to refine the skeletons generated using distance field.

6 Skeleton refinement

We incorporate the initial skeleton into the \(L_1\)-median optimization [15] to refine the skeleton results. Recently, \(L_1\)-median [32] has been widely utilized in point cloud processing [3, 14, 15, 22, 23, 28]. Huang et al. [15] exploited local \(L_1\)-median method to extract skeleton of the unoriented raw point scan.

Given an unoriented set of points \(Q = {\left\{ {{q_j}} \right\} _{j \in J}}\subset {R^3}\), the \(L_1\)-medial skeleton can be obtained by using the optimal distribution of the projected points \(X = {\left\{ {{x_i}} \right\} _{i \in I}}\):

$$\begin{aligned}&\mathop {\arg \min }\limits _X \sum \limits _{i \in I} {\sum \limits _{j \in J} {\left\| {{x_i} - {q_j}} \right\| \theta \left( {\left\| {{x_i} - {q_j}} \right\| } \right) } } \nonumber \\&\quad + \sum \limits _{i \in I} {{\gamma _i}\sum \limits _{{i'}\in I\backslash \left\{ i \right\} } {\frac{{\theta \left( {\left\| {{x_i}-{x_{{i'}}}} \right\| } \right) }}{{{\sigma _i}\left\| {{x_i}-{x_{{i'}}}} \right\| }}} } \end{aligned}$$
(3)

where the first term is a localized \(L_1\) median of Q, the second term regularizes the local point distribution of XI indexes the set of projected points X, and J indexes the set of input points Q. The weight function \(\theta \left( r\right) = {e^{{ - {r^2}} /{{\left( {{h /2}} \right) }^2}}}\) is a fast decaying smooth function with support radius h. Parameter \(\sigma _i\) computed using weighted PCA is applied to detect the formation of the skeleton branches. \({\left\{ {{\gamma _i}} \right\} _{i \in I}}\) are balancing constants among X.

In this method [15], appropriate support radius h setting and bridge points selection (bridge points are used to connect skeleton branches, and connect skeleton branch with other non-branch points) are vitally important to obtain good results. With appropriate parameters setting and bridge points selecting, this method can produce desirable skeleton for model such as Fig. 6a. However, to set appropriate parameters and select bridge points is a tedious task. As illustrated in Fig. 6b, for the same model with different pose, this method may not work well.

Fig. 5
figure 5

Left Color-coded slice of the distance field of the model. Middle and right the extracted skeleton by the distance field (different view points)

The \(L_1\)-medial method tries to find the center of component as the skeleton points. As complex point cloud is usually composed of lots of disconnected components, if the neighborhood with radius h contains points coming from only one component of model, the sample point \(x_i\) may be moved to the center of the part. However, if the neighborhood contains several components of the model due to inappropriate radius h setting, it will be difficult for the sample point \(x_i\) to move to the center of right part, which will reduce to the wrong results. Furthermore, for model with several components with different sizes (each component also may have different size in its different parts), it is also difficult to set uniform h for all the components to produce high quality results. Thus, setting adaptive h is required to obtain good results. Although adaptive h setting techniques are given in [15], to produce satisfactory results is still tedious.

Fig. 6
figure 6

Left appropriate support radius h setting produces good results. Middle wrong bridge points lead to uncorrected skeleton extraction result. Right using voxelization information to select the bridge points, we get much better result

6.1 Distance field guided \(L_1\)-medial

We apply the information of classified voxels and distance field to solve above problems. When iteratively projecting the sample points to the center of the components, we assure that they should move along the direction of distance field increasing. For bridge points selecting, the connection line between branch endpoint and bridge point must be always inside the model.

Initial skeleton produced in previous section is located at the central regions of the model components, and can provide useful information to guide the movement of the projected sample points. Based on this observation, we use initial skeleton to guide the convergence of sample points, and present the following distance field guided \(L_{1}\) median samples projector:

$$\begin{aligned}&\mathop {\arg \min }\limits _X \sum \limits _{i \in I} {\sum \limits _{j \in J} {\left\| {{x_i} - {q_j}} \right\| \theta \left( {\left\| {{x_i} - {q_j}} \right\| } \right) } } \nonumber \\&\quad + \sum \limits _{i \in I} {{\theta _i}\left( {\frac{n}{N}} \right) \sum \limits _{k \in K} {\left\| {{x_i} - {s_k}} \right\| \theta \left( {\left\| {{x_i} - {s_k}} \right\| } \right) } } \nonumber \\&\quad + \sum \limits _{i \in I} {{\gamma _i}\sum \limits _{{i'} \in I\backslash \left\{ i \right\} } {\frac{{\theta \left( {\left\| {{x_i} - {x_{{i'}}}} \right\| } \right) }}{{{\sigma _i}\left\| {{x_i} - {x_{{i'}}}} \right\| }}} } \end{aligned}$$
(4)

where K is the index of initial skeleton points \(S = {\left\{ {{s_k}} \right\} _{k \in K}} \subset {R^3}\) and \({\theta _i}\left( r \right) = n{e^{ - {r^2}}}\). For the neighborhood centering at \(x_i\) with radius h, N is the number of model points contained in the neighborhood, n is the number of initial skeleton points contained in the neighborhood.

The distance field guided \(L_{1}\) median (Eq.4) indicates that initial skeleton has strong attraction to the projected sample points, and makes the projected points move to component center. When we set the gradient of the energy (Eq.4) to zero, we can obtain the following relation for each point location:

$$\begin{aligned}&\sum \limits _{j \in J} {\left( {x_i-q_j} \right) }{\alpha _{ij}} + {\theta _i}\sum \limits _{k \in K} {\left( {x_i-s_k} \right) }{\alpha _{ik}}\nonumber \\&\quad - {\gamma _i}\sum \limits _{{i'} \in I\backslash \left\{ i \right\} } {\frac{{x_i - x_{i'}}}{\sigma _i}}\beta _{ii'} = 0,\quad i \in I \end{aligned}$$
(5)

where \(\alpha _{ij}=\frac{\theta \left( {\left\| {x_i-q_j} \right\| } \right) }{\left\| {x_i-q_j} \right\| },\, j \in J\); \(\alpha _{ik}= \frac{\theta \left( {\left\| {x_i - s_k} \right\| } \right) }{\left\| {x_i-s_k} \right\| },\, k \in K\); \(\beta _{i{i'}} = \frac{\theta \left( {\left\| {{x_i} - {x_{{i'}}}} \right\| } \right) }{{{\left\| {x_i - x_{{i'}}} \right\| }^2}},\,\, {i'} \in I\backslash \left\{ i \right\} \).

We set

$$\begin{aligned} u= & {} \frac{\gamma _{i\sum \nolimits _{{i'} \in I\backslash \left\{ i \right\} } {\beta _{i{i'}}} }}{{{\sigma _i}\sum \nolimits _{j \in J} {{\alpha _{ij}}} }},\quad \forall i \in I \nonumber \\ v= & {} \frac{{{\theta }_{i}}\sum \nolimits _{k\in K}{{{\alpha }_{ik}}}}{\sum \nolimits _{j\in J}{{{\alpha }_{ij}}}},\quad \forall i\in I \end{aligned}$$
(6)

Then by rearranging Eq. (6) we get

$$\begin{aligned}&\left( 1+v-u \right) {{x}_{i}}+u\frac{\sum \nolimits _{{{i}^{'}}\in I\backslash \left\{ i \right\} }{{{x}_{{{i}^{'}}}}{{\beta }_{i{{i}^{'}}}}}}{\sum \nolimits _{{{i}^{'}}\in I\backslash \left\{ i \right\} }{{{\beta }_{i{{i}^{'}}}}}} \nonumber \\&\quad =\frac{\sum \nolimits _{j\in J}{{{q}_{j}}{{\alpha }_{ij}}+{{\theta }_{i}}\sum \nolimits _{k\in K}{{{s}_{k}}{{\alpha }_{ik}}}}}{\sum \nolimits _{j\in J}{{{\alpha }_{ij}}}} \end{aligned}$$
(7)
Fig. 7
figure 7

Skeleton extraction comparisons. a Input models, b voxelization results, c initial skeletons, d our finial skeletons, e results of [15]

Eq.7 can be considered as a system of equations with X as unknowns, i.e., \({ AX}={ BQ}+{ SP}\). As \(v\ge 0\), and \(u\ge 0\), if \(0\le u<(1+v)/2\), then matrix A is strictly diagonally dominant and is non-singular. The solution can be obtained by solving the system: \(X = {A^{- 1}}({ BQ+SP})\).

In our implementation, similar to [15], we apply a fixed point iteration to solve above system. Given the current iteration \({X^t} = \left\{ {x_i^t} \right\} ,t = 0,1,\ldots \), the next iteration is computed as follows,\(\forall i \in I\),

$$\begin{aligned} x_{i}^{t+1}= & {} (u-v)x_{i}^{t}+\frac{\sum \nolimits _{j\in J}{{{q}_{j}}\alpha _{ij}^{t}+{{\theta }_{i}}\sum \nolimits _{k\in K}{{{s}_{k}}\alpha _{ik}^{t}}}}{\sum \nolimits _{j\in J}{\alpha _{ij}^{t}}} \nonumber \\&-u\frac{\sum \nolimits _{{{i}^{'}}\in I\backslash \left\{ i \right\} }{x_{{{i}^{'}}}^{t}\beta _{i{{i}^{'}}}^{t}}}{\sum \nolimits _{{{i}^{'}}\in I\backslash \left\{ i \right\} }{\beta _{i{{i}^{'}}}^{t}}} \end{aligned}$$
(8)

where \(\alpha _{ij}^t = \frac{{\theta \left( {\left\| {x_i^t - {q_j}} \right\| } \right) }}{{\left\| {x_i^t - {q_j}} \right\| }},\, j \in J\); \(\alpha _{ik}^t = \frac{{\theta \left( {\left\| {x_i^t - {s_k}} \right\| } \right) }}{{\left\| {x_i^t - {s_k}} \right\| }},\, k \in K\); \(\beta _{i{i'}}^t = \frac{{\theta \left( {\left\| {x_i^t - x_{{i'}}^t} \right\| } \right) }}{{{{\left\| {x_i^t - x_{{i'}}^t} \right\| }^2}}},\, {i'} \in I\backslash \left\{ i \right\} \); \(\sigma _i^t = \sigma \left( {x_i^t} \right) \). As \(\sigma _{i}^{t}\) is defined as the directionality degree of \({{x}_{i}}\) within a neighborhood, then \(\sigma _{i}^{t}\in \left( 0,1 \right] \). As \(v\ge 0\) always holds, if we set from \(\left[ 0,1/2 \right) \), matrix A is strictly diagonally dominant, and the sequence \(x_{i}^{0},x_{i}^{1},x_{i}^{2},\ldots \), will converge to a position. In this paper, we set \(u=0.35\) for all the results.

In each iteration, the sample point \(x_i\) should moves along the direction with larger distance field. Let \(D_i^k\) be the distance value of point \(x_i^k\) at kth iteration. If in the next iteration, \(x_i^{k+1}\) moves to one voxel with less value: \(D_i^{k+1}<D_i^k\), we keep \(x_i\) fixed in the iteration, that is, \(x_i\) does not move in this iteration.

Note that, if \(0\le u<(1+v)/2\), then matrix A is strictly diagonally dominant and is non-singular, using the fixed point iteration, the \(x_{i}^{t}\) will converge to a position. In addition, if we do not set above hard constraint, the moved sampled points, that is, the generated skeleton may penetrate out of the model. In our implementation, we found the iterations always converge and the sample points always move to the model center. It also should be pointed out, using above hard constraints, the iterations will converge relatively slow.

To set appropriate radius size h is important to obtain desirable results. In [15], Huang et al. set uniform value in the initial iterations. In the following contraction iterations, they increase h gradually to contract the non-branch points to the component center. In our method, the initial skeleton and the distance field provide useful cues to set adaptive radius h for the sample points. Let sample point \(x_i^t\) at the tth iteration has support radius \(h_i^t\), and the neighborhood contains n initial skeleton points, \(s_j, j\in n\). We compute the center of these initial skeleton points in the neighborhood as:

$$\begin{aligned} center = \sum \nolimits _{j \in n} {{s_j}} \theta \left( {\left\| {x_i^t - {s_j}} \right\| } \right) /n \end{aligned}$$
(9)

If the distance between center and sample point \(x_i^t\) is smaller than the size of voxel, the next support radius \(h_i^{t+1}\) will be set as the distance field value of voxel which contains center. If the distance is larger than the size of one voxel, then \(h_i^{t+1}=(1+radio)h_i^t\), where radio is the increasing radio of the support radius. Note that, in the initial iterations, we use the same support radius defined in [15], we find that it works well in our experiments.

Similar to [15], we use bridge points to produce a complete skeleton. With the distance field, we can select suitable bridge points to connect the skeleton branch to the non-branch points for updating the existing skeleton branch, or to create joints which connect the neighboring skeleton branches. When we select bridge point, if the connection curve of the branch endpoint e and the bridge point penetrates outside of the model, it is not a qualified bridge point. With the voxelization presentation and voxel type information (boundary, outside and inner voxel), we can assure that skeleton will not penetrate outside of the model by selecting appropriate bridge point. That is, if connection curve passes through outside voxels, then this connection curve is not qualified, and we need to select new bridge points. In Fig. 6, we give the comparison results using and without using voxelization information for bridge point selection.

Fig. 8
figure 8

Skeletons extracted from models before and after upsampling. a, e Skeletons from original models by [15]. b, f Skeletons from upsampled models by [15]. c, g Skeletons from original models by our method. d, h Skeletons from upsampled models by our method

7 Results and discussion

We validate the effectiveness of the proposed methods by demonstrating the skeletons extracted from models of diverse shape and structure. We also compare our method with the most related method [15].

The running time of the proposed method is mainly consumed in the following four steps: voxelization, distance field computing, initial skeleton extraction, and final skeleton refinement. The former three steps are relatively fast, and the most time-consumption step is the latter. While accurate time consumption is model-specific, we evaluate the timing consumption of our method statistically. For a model with 35,000 points, if we take 1500 sample points in the projecting contraction, and voxelize the model with the resolution of \(100 \times 100 \times 100\), it averagely takes 3 seconds for voxelization, 10 seconds for distance field computing, 3 seconds for initial skeleton extraction, and 24 seconds for skeleton refinement using \(L_{1}\) median. By using the code presented by [15], it usually consumes about 32 seconds for the same model to produce the results. Our method costs more time due to the integration of the extra steps, which contributes to refined skeletons. All the tests are carried out on a PC with an Intel Core i7-4790K CPU and 16GB RAM.

In Fig. 7, we extract skeletons from models with thin cylindrical components. Note that these models exhibit significantly heavy noise. In these examples, both our method and algorithm [15] obtain satisfactory results, however, our method requires less tedious parameter tuning to produce the desirable results. That is, our method needs less trial and error operations to get the final results. Without distance field guidance, using the algorithm [15], the projected samples are apt to deviate from the center of components without careful parameters tuning, especially for the support radius h.

Impact of sampling density As having been pointed out, before voxelization, we need to detect the sparsely sampled regions and upsample these regions on the input point cloud Q. We define local sampling density as \({{\rho }_{i}}={k}/{{{({{r}_{i}})}^{3}}}\). If \(\rho _i\) is below a given threshold value, then that region is of insufficient sampling density and need upsampling. In our implementation, we set \(k=5\) and set \(r_i\) with a default neighborhood size \({{r}_{i}}=3{{d}_{bb}}/\root 3 \of {|J|}\), where \(d_{bb}\) is the diagonal length of Q’s bounding box and |J| is the number of points in Q. So the required sampling density \(\rho _i\) is \(5|J|/27d_{bb}^3\). For above neighborhood size, if the neighbors contained in the neighborhood are less than 3, then we increase neighborhood size to make the interpolation-based upsampling work, and produce enough samples for voxelization. Actually, we usually make the sample points denser than above threshold to facilitate the subsequent voxelization.

We perform our algorithm on two models with sparsely sampled torso and their upsampled models (see Fig. 8). When extracting skeletons from the sparse models containing 6381/4211 points, the results of both [15] and our method are not satisfactory. For our method, as insufficient sampling density will lead to the disconnection of distance field with high voxelization resolution, which leads to the artifacts in the final skeleton. After upsampling the models (amounting to 33,920/22,802 points), our method obtains compelling results, which are much better than the results of [15].

Comparison to mesh contraction We compare our method with Oscar Kin-Chung Au’s method on the dog model provided by the authors [2], and show respective results as follows. Note that we perform Oscar’s method on the mesh model using the executive files presented by the authors. We perform our method on the corresponding point cloud. As illustrated in following Fig. 9, the result of Oscar has more skeleton branches around the head part of the dog, and the skeleton retains the key features of the head part. In comparison, our method only retains the most salient features of the model. We also compare our result with that of [15].

Comparison to \(L_{1}\)-medial skeleton In Fig. 10, we work on the models with different characters, and also compare with [15]. In these examples, each model has some components that are not thin and cylindrical. In performing iterative contraction, Huang et al. [15] have to gradually increase the support radius h to further contract non-branch points to produce skeletons. While setting h in this way is model-special, to process complex model, the sample points may not contract to the component center. In our experiments, we found that without prior for the centers of components, the \(L_{1}\) median projection, aimed to automatically contract the points to the centers, sometimes does not work well. Furthermore, to receive satisfactory results, besides the radius h setting, appropriate bridge points selection is also a non-trivial problem. Without model shape prior, wrong bridge points may be selected, which will make the resulting skeleton penetrate out the model. However, with the guide of the distance field, our method provides much better results and also requires less trial and error operations. In Fig. 10, we present comparison results on six models. Note that we use the code presented by authors of [15] to produce all the results of the method [15], and for each result, we use the optimal parameters and try to obtain the best results.

Fig. 9
figure 9

Result comparisons. a Result of Oscar [2], b result of [15], c our result

Fig. 10
figure 10

Skeleton extraction comparisons. a Input models, b voxelization results, c initial skeletons, d our final skeletons, e results of [15]

Fig. 11
figure 11

Failed example. a Input modal viewed from 3 viewing directions: front, left side and right side. b RBF reconstruction result of (a). c Skeleton extracted from (b) using our method. d Skeleton extraction result of [15] (result from the authors’ paper)

Compared with [15], our method also has some disadvantages. Even for models with significant heavy outlier, the method [15] can be performed on these models without preprocessing for outlier removal. In this situation, we need to pre-filter the models using the locally optimal projection (LOP) [23] before performing effective voxelization. While it also should be pointed out that, the preprocess procedure will not alter the topology of the input model, thus, it will not have effect on the final skeleton generation.

Limitations Although our method can obtain excellent results for models with complex shape, our method still has the following limitations. The proposed scheme works well for most input data, however, for those scanned point clouds whose missing regions are too large, our method would be challenged. Taking the deer model in Figure 1 of [15] as an example, our method performs worse or even does not work. As illustrated in the Fig. 11a, large missing regions are distributed on the model. Our method cannot perform voxelization on the input model, thus, fails to extract the skeleton from the input model. If resample and complete the model using RBF method (Fig. 11b), we can voxelize the model and extract the skeleton, however, as illustrated in Fig. 11c, the skeleton deviates from the original model part center, since it is extracted from the upsampled models obtained using energy minimization model (RBF). The method of [15] can extract satisfactory result from the original input model without upsampling (Fig. 11d). This example illustrates that our method usually fails to extract the satisfactory skeleton from the model with large missing surface regions.

Finally, compared with [15], our method has higher time complexity for computing the distance field, and needs more memory consumption for voxelization presentation.

8 Conclusion

In this paper, we present a distance field guided \(L_1\)-median to extract skeletons from the point cloud. Our method combines the advantages of both the distance field based method and of \(L_1\)-median based method, and particularly, improves the robustness and effectiveness of the \(L_1\)-median skeleton method. We have extracted the skeletons from a variety of point clouds to illustrate the effectiveness of the proposed methods.

Many 1D skeleton extraction algorithms have been proposed, and every algorithm has its advantages and disadvantages. As skeleton extraction from complex point cloud data, especially from the scanned raw data with missing data, is an ill-posed problem, our method also exhibits some limitations. Thus, we believe that our method can be considered as a good complement for the skeleton extraction community.

Compared with extracting skeleton from one static model, extracting the skeleton series from the time-varying surface sequences is a more challenging task. As desirable extracted skeleton series should not only be topologically clean but also be temporally coherent. In the future, we would like to extend our method to this research direction.