1 Introduction

3D point clouds can be rapidly acquired using 3D sensors or photogrammetric techniques. However, they are rarely used in this form in design and graphics applications. Observations from the computer-aided design and modeling literature [3, 4, 14, 16] suggest that designers often model shapes by constructing several non-overlapping patches placed seamlessly. The advantage of using several patches over a single continuous patch is that a much more diverse variety of geometric features and surface topologies can be created. The decomposition also allows easier interaction and editing. The goal of this work is to automate the time-consuming process of converting a 3D point cloud into a piecewise parametric surface representation as seen in Fig. 1.

An important question is how surface patches should be represented. Patch representations in CAD and graphics are based on well-accepted geometric properties: (a) continuity in their tangents, normals, and curvature, making patches appear smooth, (b) editability, such that they can easily be modified based on a few intuitive degrees of freedom (DoFs), e.g., control points or axes, and (c) flexibility, so that a wide variety of surface geometries can be captured. Towards this goal, we propose ParSeNet, a parametric surface fitting network architecture which produces a compact, editable representation of a point cloud as an assembly of geometric primitives, including open or closed B-spline patches.

Fig. 1.
figure 1

ParSeNet decomposes point clouds (top row) into collections of assembled parametric surface patches including B-spline patches (bottom row). On the right, a shape is edited using the inferred parametrization.

ParSeNet models a richer class of surfaces than prior work which only handles basic geometric primitives such as planes, cuboids and cylinders  [11, 12, 17, 24]. While such primitives are continuous and editable representations, they lack the richness and flexibility of spline patches which are widely used in shape design. ParSeNet includes a novel neural network (SplineNet) to estimate an open or closed B-spline model of a point cloud patch. It is part of a fitting module (Sect. 3.2) which can also fit other geometric primitive types. The fitting module receives input from a decomposition module, which partitions a point cloud into segments, after which the fitting module estimates shape parameters of a predicted primitive type for each segment (Sect. 3.1). The entire pipeline, shown in Fig. 2, is fully differentiable and trained end-to-end (Sect. 4). An optional geometric postprocessing step further refines the output.

Compared to purely analytical approaches, ParSeNet produces decompositions that are more consistent with high-level semantic priors, and are more robust to point density and noise. To train and test ParSeNet, we leverage a recent dataset of man-made parts  [8]. Extensive evaluations show that ParSeNet outperforms baselines (RANSAC and SPFN  [11]) by \(14.93\%\) and \(13.13\%\) respectively for segmenting a point cloud into patches, and by \(50\%\), and \(47.64\%\) relative error respectively for parametrizing each patch for surface reconstruction (Sect. 5).

To summarize, our contributions are:

  • The first proposed end-to-end differentiable approach for representing a raw 3D point cloud as an assembly of parametric primitives including spline patches.

  • Novel decomposition and primitive fitting modules, including SplineNet, a fully-differentiable network to fit a cubic B-spline patch to a set of points.

  • Evaluation of our framework vs prior analytical and learning-based methods.

2 Related Work

Our work builds upon related research on parametric surface representations and methods for primitive fitting. We briefly review relevant work in these areas. Of course, we also leverage extensive prior work on neural networks for general shape processing: see recent surveys on the subject  [1].

Parametric Surfaces. A parametric surface is a (typically diffeomorphic) mapping from a (typically compact) subset of \({\mathbb {R}}^2\) to \({\mathbb {R}}^3\). While most of the geometric primitives used in computer graphics (spheres, cuboids, meshes etc.) can be represented parametrically, the term most commonly refers to curved surfaces used in engineering CAD modelers, represented as spline patches  [3]. There are a variety of formulations – e.g. Bézier patches, B-spline patches, NURBS patches – with slightly different characteristics, but they all construct surfaces as weighted combinations of control parameters, typically the positions of a sparse grid of points which serve as editing handles. More specifically, a B-spline patch is a smoothly curved, bounded, parametric surface, whose shape is defined by a sparse grid of control points \(\mathbf {C}=\{ \mathbf {c}_{p,q} \}\). The surface point with parameters \((u, v) \in [u_{\min },u_{\max }] \times [v_{\min },v_{\max }]\) and basis functions  [3] \(b_p(u)\), \(b_q(v)\) is given by:

$$\begin{aligned} \mathbf {s}(u,v)= \sum \limits _{p=1}^P \sum \limits _{q=1}^Q b_p(u)b_q(v) \mathbf {c}_{p,q} \end{aligned}$$
(1)

Please refer to supplementary material for more details on B-spline patches.

Fitting Geometric Primitives. A variety of analytical (i.e. not learning-based) algorithms have been devised to approximate raw 3D data as a collection of geometric primitives: dominant themes include Hough transforms, RANSAC and clustering. The literature is too vast to cover here, we recommend the comprehensive survey of Kaiser et al. [7]. In the particular case of NURBS patch fitting, early approaches were based on user interaction or hand-tuned heuristics to extract patches from meshes or point clouds [2, 6, 10]. In the rest of this section, we briefly review recent methods that learn how to fit primitives to 3D data.

Several recent papers  [20, 22, 24, 27] also try to approximate 3D shapes as unions of cuboids or ellipsoids. Paschalidou et al. [12, 13] extended this to superquadrics. Sharma et al. [17, 18] developed a neural parser that represents a test shape as a collection of basic primitives (spheres, cubes, cylinders) combined with boolean operations. Tian et al. [23] handled more expressive construction rules (e.g. loops) and a wider set of primitives. Because of the choice of simple primitives, such models are naturally limited in how well they align to complex input objects, and offer less flexible and intuitive parametrization for user edits.

More relevantly to our goal of modeling arbitrary curved surfaces, Gao et al. [5] parametrize 3D point clouds as extrusions or surfaces of revolution, generated by B-spline cross-sections detected by a 2D network. This method requires translational/rotational symmetry, and does not apply to general curved patches. Li et al. [11] proposed a supervised method to fit primitives to 3D point clouds, first predicting per-point segment labels, primitive types and normals, and then using a differential module to estimate primitive parameters. While we also chain segmentation with fitting in an end-to-end way, we differ from Li et al. in two important ways. First, our differentiable metric-learning segmentation produces improved results (Table 1). Second, a major goal (and technical challenge) for us is to significantly improve expressivity and generality by incorporating B-spline patches: we achieve this with a novel differentiable spline-fitting network. In a complementary direction, Yumer et al. [26] developed a neural network for fitting a single NURBS patch to an unstructured point cloud. While the goal is similar to our spline-fitting network, it is not combined with a decomposition module that jointly learns how to express a shape with multiple patches covering different regions. Further, their fitting module has several non-trainable steps which are not obviously differentiable, and hence cannot be used in our pipeline.

Fig. 2.
figure 2

Overview of ParSeNet pipeline. (1) The decomposition module (Sect. 3.1) takes a 3D point cloud (with optional normals) and decomposes it into segments labeled by primitive type. (2) The fitting module (Sect. 3.2) predicts parameters of a primitive that best approximates each segment. It includes a novel SplineNet to fit B-spline patches. The two modules are jointly trained end-to-end. An optional postprocess module (Sect. 3.3) refines the output.

3 Method

The goal of our method is to reconstruct an input point cloud by predicting a set of parametric patches closely approximating its underlying surface. The first stage of our architecture is a neural decomposition module (Fig. 2) whose goal is to segment the input point cloud into regions, each labeled with a parametric patch type. Next, we incorporate a fitting module (Fig. 2) that predicts each patch’s shape parameters. Finally, an optional post-processing geometric optimization step refines the patches to better align their boundaries for a seamless surface.

The input to our pipeline is a set of points \(\mathbf {P}= \{ \mathbf {p}_i \}_{i=1}^N\), represented either as 3D positions \(\mathbf {p}_i = (x, y, z)\), or as 6D position + normal vectors \(\mathbf {p}_i = (x, y, z, n_x, n_y, n_z)\). The output is a set of surface patches \(\{\mathbf {s}_k\}\), reconstructing the input point cloud. The number of patches is automatically determined. Each patch is labeled with a type \(t_k\), one of: sphere, plane, cone, cylinder, open/closed B-spline patch. The architecture also outputs a real-valued vector for each patch defining its geometric parameters, e.g. center and radius for spheres, or B-spline control points and knots.

3.1 Decomposition Module

The first module (Fig. 2) decomposes the point cloud \(\mathbf {P}\) into a set of segments such that each segment can be reliably approximated by one of the above mentioned surface patch types. To this end, the module first embeds the input points into a representation space used to reveal such segments. As discussed in Sect. 4, the representations are learned using metric learning, such that points belonging to the same patch are embedded close to each other, forming a distinct cluster.

Embedding Network. To learn these point-wise representations, we incorporate edge convolution layers (EdgeConv) from DGCNN  [25]. Each EdgeConv layer performs a graph convolution to extract a representation of each point with an MLP on the input features of its neighborhood. The neighborhoods are dynamically defined via nearest neighbors in the input feature space. We stack 3 EdgeConv layers, each extracting a 256-D representation per point. A max-pooling layer is also used to extract a global 1024-D representation for the whole point cloud. The global representation is tiled and concatenated with the representations from all three EdgeConv layers to form intermediate point-wise \((1024 + 256)\)-D representations \(\mathbf {Q}=\{\mathbf {q}_i\}\) encoding both local and global shape information. We found that a global representation is useful for our task, since it captures the overall geometric shape structure, which is often correlated with the number and type of expected patches. This representation is then transformed through fully connected layers and ReLUs, and finally normalized to unit length to form the point-wise embedding \(\mathbf {Y}=\{\mathbf {y}_i\}_{i=1}^N\) (128-D) lying on the unit hypersphere.

Clustering. A mean-shift clustering procedure is applied on the point-wise embedding to discover segments. The advantage of mean-shift clustering over other alternatives (e.g., k-means or mixture models) is that it does not require the target number of clusters as input. Since different shapes may comprise different numbers of patches, we let mean-shift produce a cluster count tailored for each input. Like the pixel grouping of [9], we implement mean-shift iterations as differentiable recurrent functions, allowing back-propagation. Specifically, we initialize mean-shift by setting all points as seeds \(\mathbf {z}^{(0)}_i=\mathbf {y}_i, \forall y_i \in R^{128}\). Then, each mean-shift iteration t updates each point’s embedding on the unit hypersphere:

$$\begin{aligned} \mathbf {z}^{(t+1)}_i = \sum \limits _{j=1}^N \mathbf {y}_j g(\mathbf {z}_i^{(t)}, \mathbf {y}_j) / (\sum \limits _{j=1}^N g(\mathbf {z}_i^{(t)}, \mathbf {y}_j)) \end{aligned}$$
(2)

where the pairwise similarities \(g(\mathbf {z}_i^{(t)}, \mathbf {y}_j)\) are based on a von Mises-Fisher kernel with bandwidth \(\beta \): \(g(\mathbf {z}_i, \mathbf {y}_j) = \exp (\mathbf {z}_i^{T} \mathbf {y}_j / \beta ^2)\) (iteration index dropped for clarity). The embeddings are normalized to unit vectors after each iteration. The bandwidth for each input point cloud is set as the average distance of each point to its \(150^{th}\) neighboring point in the embedding space  [19]. The mean-shift iterations are repeated until convergence (this occurs around 50 iterations in our datasets). We extract the cluster centers using non-maximum suppression: starting with the point with highest density, we remove all points within a distance \(\beta \), then repeat. Points are assigned to segments based on their nearest cluster center. The point memberships are stored in a matrix \(\mathbf {W}\), where \(\mathbf {W}[i,k]=1\) means point i belongs to segment k, and 0 means otherwise. The memberships are passed to the fitting module to determine a parametric patch per segment. During training, we use soft memberships for differentiating this step (more details in Sect. 4.3).

Segment Classification. To classify each segment, we pass the per-point representation \(\mathbf {q}_i\), encoding local and global geometry, through fully connected layers and ReLUs, followed by a softmax for a per-point probability \(P(t_i = l)\), where l is a patch type (i.e., sphere, plane, cone, cylinder, open/closed B-spline patch). The segment’s patch type is determined through majority voting over all its points.

3.2 Fitting Module

The second module (Fig. 2) aims to fit a parametric patch to each predicted segment of the point cloud. To this end, depending on the segment type, the module estimates the shape parameters of the surface patch.

Basic Primitives. Following Li et al. [11], we estimate the shape of basic primitives with least-squares fitting. This includes center and radius for spheres; normal and offset for planes; center, direction and radius for cylinders; and apex, direction and angle for cones. We also follow their approach to define primitive boundaries.

B-Splines. Analytically parametrizing a set of points as a spline patch in the presence of noise, sparsity and non-uniform sampling, can be error-prone. Instead, predicting control points directly with a neural network can provide robust results. We propose a neural network SplineNet, that inputs points of a segment, and outputs a fixed size control-point grid. A stack of three EdgeConv layers produce point-wise representations concatenated with a global representation extracted from a max-pooling layer (as for decomposition, but weights are not shared). This equips each point i in a segment with a 1024-D representation . A segment’s representation is produced by max-pooling over its points, as identified through the membership matrix \(\mathbf {W}\) extracted previously:

(3)

Finally, two fully-connected layers with ReLUs transform to an initial set of \(20\times 20\) control points \(\mathbf {C}\) unrolled into a 1200-D output vector. For a segment with a small number of points, we upsample the input segment (with nearest neighbor interpolation) to 1600 points. This significantly improved performance for such segments (Table 2). For closed B-spline patches, we wrap the first row/column of control points. Note that the network parameters to produce open and closed B-splines are not shared. Figure 5 visualizes some predicted B-spline surfaces.

3.3 Post-processing Module

SplineNet produces an initial patch surface that approximates the points belonging to a segment. However, patches might not entirely cover the input point cloud, and boundaries between patches are not necessarily well-aligned. Further, the resolution of the initial control point grid (\(20 \times 20\)) can be further adjusted to match the desired surface resolution. As a post-processing step, we perform an optimization to produce B-spline surfaces that better cover the input point cloud, and refine the control points to achieve a prescribed fitting tolerance.

Optimization. We first create a grid of \(40\times 40\) points on the initial B-spline patch by uniformly sampling its UV parameter space. We tessellate them into quads. Then we perform a maximal matching between the quad vertices and the input points of the segment, using the Hungarian algorithm with L2 distance costs. We then perform an as-rigid-as-possible (ARAP) [21] deformation of the tessellated surface towards the matched input points. ARAP is an iterative, detail-preserving method to deform a mesh so that selected vertices (pivots) achieve targets position, while promoting locally rigid transformations in one-ring neighborhoods (instead of arbitrary ones causing shearing/stretc.hing). We use the boundary vertices of the patch as pivots so that they move close to their matched input points. Thus, we promote coverage of input points by the B-spline patches. After the deformation, the control points are re-estimated with least-squares [14].

Refinement of B-spline Control Points. After the above optimization, we again perform a maximal matching between the quad vertices and the input points of the segment. As a result, the input segment points acquire 2D parameter values in the patch’s UV parameter space, which can be used to re-fit any other grid of control points [14]. In our case, we iteratively upsample the control point grid by a factor of 2 until a fitting tolerance, measured via Chamfer distance, is achieved. If the tolerance is satisfied by the initial control point grid, we can similarly downsample it iteratively. In our experiments, we set the fitting tolerance to \(5 \times 10^{-4}\). In Fig. 5 we show the improvements from the post-processing step.

4 Training

To train the neural decomposition and fitting modules of our architecture, we use supervisory signals from a dataset of 3D shapes modeled through a combination of basic geometric primitives and B-splines. Below we describe the dataset, then we discuss the loss functions and the steps of our training procedure.

4.1 Dataset

The ABC dataset [8] provides a large source of 3D CAD models of mechanical objects whose file format stores surface patches and modeling operations that designers used to create them. Since our method is focused on predicting surface patches, and in particular B-spline patches, we selected models from this dataset that contain at least one B-spline surface patch. As a result, we ended up with a dataset of 32K models (24K, 4K, 4K train, test, validation sets respectively). We call this ABCPartsDataset. All shapes are centered in the origin and scaled so they lie inside unit cube. To train SplineNet, we also extract 32K closed and open B-spline surface patches each from ABC dataset and split them into 24K, 4K, 4K train, test, validation sets respectively. We call this SplineDataset. We report the average number of different patch types in supplementary material.

Preprocessing. Based on the provided metadata in ABCPartsDataset, each shape can be rendered based on the collection of surface patches and primitives it contains (Fig. 4). Since we assume that the inputs to our architecture are point clouds, we first sample each shape with 10K points randomly distributed on the shape surface. We also add noise in a uniform range \(\left[ -0.01, 0.01\right] \) along the normal direction. Normals are also perturbed with random noise in a uniform range of \(\left[ -3, 3\right] \) degrees from their original direction.

Fig. 3.
figure 3

Standardization: Examples of B-spline patches with a variable number of control points (shown in red), each standardized with \(20\times 20\) control points. Left: closed B-spline and Right: open B-spline. (Please zoom in.) (Color figure online)

4.2 Loss Functions

We now describe the different loss functions used to train our neural modules. The training procedure involving their combination is discussed in Sect. 4.3.

Embedding Loss. To discover clusters of points that correspond well to surface patches, we use a metric learning approach. The point-wise representations \(\mathbf {Z}\) produced by our decomposition module after mean-shift clustering are learned such that point pairs originating from the same surface patch are embedded close to each other to favor a cluster formation. In contrast, point pairs originating from different surface patches are pushed away from each other. Given a triplet of points (abc), we use the triplet loss to learn the embeddings:

$$\begin{aligned} \ell _{emb}(a, b, c) = \max \big (0, ~|| \mathbf {z}_a - \mathbf {z}_b||^2 - ||\mathbf {z}_a - \mathbf {z}_c||^2 + \tau \big ), \end{aligned}$$
(4)

where \(\tau \) the margin is set to 0.9. Given a triplet set \(\mathcal{T}_\mathcal {S}\) sampled from each point set \(\mathcal {S}\) from our dataset \(\mathcal {D}\), the embedding objective sums the loss over triplets:

$$\begin{aligned} L_{emb} = \sum _{\mathcal {S}\in \mathcal {D}} \frac{1}{|{\mathcal {T}}_\mathcal {S}|} \sum _{(a, b, c) \in {\mathcal {T}}_\mathcal {S}} \ell _{emb}(a, b, c). \end{aligned}$$
(5)

Segment Classification Loss. To promote correct segment classifications according to our supported types, we use the cross entropy loss: \( L_{class} = -\sum _{i \in \mathcal {S}} \log (p^{t}_{i})\) where \(p^{i}_{t}\) is the probability of the \(i^{th}\) point of shape \(\mathcal {S}\) belonging to its ground truth type t, computed from our segment classification network.

Control Point Regression Loss. This loss function is used to train SplineNet. As discussed in Sect. 3.2, SplineNet produces \(20 \times 20\) control points per B-spline patch. We include a supervisory signal for this control point grid prediction. One issue is that B-spline patches have a variable number of control points in our dataset. Hence we reparametrize each patch by first sampling \(M=3600\) points and estimating a new \(20\times 20\) reparametrization using least-squares fitting  [10, 14], as seen in the Fig. 3. In our experiments, we found that this standardization produces no practical loss in surface reconstructions in our dataset. Finally, our reconstruction loss should be invariant to flips or swaps of control points grid in u and v directions. Hence we define a loss that is invariant to such permutations:

(6)

where \(\mathcal {S}^{(b)}\) is the set of B-spline patches from shape \(\mathcal {S}\), \(\mathbf {C}_k\) is the predicted control point grid for patch \(\mathbf {s}_k\) (\(|\mathbf {C}_k|=400\) control points), \(\pi ( \hat{\mathbf {C}}_k )\) is permutations of the ground-truth control points from the set \(\varPi \) of 8 permutations for open and 160 permutations for closed B-spline.

Laplacian Loss. This loss is also specific to B-Splines using SplineNet. For each ground-truth B-spline patch, we uniformly sample ground truth surface, and measure the surface Laplacian capturing its second-order derivatives. We also uniformly sample the predicted patches and measure their Laplacians. We then establish Hungarian matching between sampled points in the ground-truth and predicted patches, and compare the Laplacians of the ground-truth points \(\hat{\mathbf {r}}_m\) and corresponding predicted ones \(\mathbf {r}_n\) to improve the agreement between their derivatives as follows:

$$\begin{aligned} L_{lap} = \sum _{\mathcal {S}\in \mathcal {D}} \frac{1}{|\mathcal {S}^{(b)}| \cdot M} \sum _{\mathbf {s}_k \in \mathcal {S}^{(b)}} \sum _{\mathbf {r}_n \in s_k } ||\mathcal {L}(\mathbf {r}_{n}) - \mathcal {L}(\hat{\mathbf {r}}_m) ||^2 \end{aligned}$$
(7)

where \(\mathcal {L}(\cdot )\) is the Laplace operator on patch points, and \(M=1600\) point samples.

Patch Distance Loss. This loss is applied to both basic primitive and B-splines patches. Inspired by [11], the loss measures average distances between predicted primitive patch \(s_k\) and uniformly sampled points from the ground truth patch as:

$$\begin{aligned} L_{dist} = \sum _{\mathcal {S}\in \mathcal {D}} \frac{1}{K_{\mathcal {S}}}\sum _{k=1}^{K_{\mathcal {S}}} \frac{1}{M_{\hat{\mathbf {s}}_k}}\sum _{n \in \hat{\mathbf {s}}_k} D^2( \mathbf {r}_n, \mathbf {s}_k), \end{aligned}$$
(8)

where \(K_\mathcal {S}\) is the number of predicted patches for shape \(\mathcal {S}\), \(M_{\hat{\mathbf {s}}_k}\) is number of sampled points \(\mathbf {r}_n\) from ground patch \(\hat{\mathbf {s}}_k\), \(D^2( \mathbf {r}_n, \mathbf {s}_k)\) is the squared distance from \(\mathbf {r}_n\) to the predicted primitive patch surface \(\mathbf {s}_k\). These distances can be computed analytically for basic primitives [11]. For B-splines, we use an approximation based on Chamfer distance between sample points.

4.3 Training Procedure

One possibility for training is to start it from scratch using a combination of all losses. Based on our experiments, we found that breaking the training procedure into the following steps leads to faster convergence and to better minima:

  • We first pre-train the networks of the decomposition module using ABCPartsDataset with the sum of embedding and classification losses: \(L_{emb}+L_{class}\). Both losses are necessary for point cloud decomposition and classification.

  • We then pre-train the SplineNet using SplineDataset for control point prediction exclusively on B-spline patches using \(L_{cp}+L_{lap}+L_{dist}\). We note that we experimented training the B-spline patch prediction only with the patch distance loss \(L_{dist}\) but had worse performance. Using both the \(L_{cp}\) and \(L_{lap}\) loss yielded better predictions as shown in Table 2.

  • We then jointly train the decomposition and fitting module end-to-end with all the losses. To allow backpropagation from the primitives and B-splines fitting to the embedding network, the mean shift clustering is implemented as a recurrent module (Eq. 2). For efficiency, we use 5 mean-shift iterations during training. It is also important to note that during training, Eq. 3 uses soft point-to-segment memberships, which enables backpropagation from the fitting module to the decomposition module and improves reconstructions. The soft memberships are computed based on the point embeddings \(\{\mathbf {z}_i\}\) (after the mean-shift iterations) and cluster center embedding \(\{\mathbf {z}_k\}\) as follows:

    $$\begin{aligned} \mathbf {W}[i,k] = \frac{\exp ( \mathbf {z}_k^T \mathbf {z}_i / \beta ^2 ) }{\sum _{k'} {\exp ( \mathbf {z}_{k'}^T \mathbf {z}_i ) / \beta ^2 ) } } \end{aligned}$$
    (9)

Please see supplementary material for more implementation details.

5 Experiments

Table 1. Primitive fitting on ABCPartsDataset . We compare ParSeNet with nearest neighbor (NN), RANSAC  [15], and SPFN  [11]. We show results with points (p) and points and normals (p+n) as input. The last two rows shows our method with end-to-end training and post-process optimization. We report ‘seg iou’ and ‘label iou’ metric for segmentation task. We report the residual error (res) on all, geometric and spline primitives, and the coverage metric for fitting.

Our experiments compare our approach to alternatives in three parts: (a) evaluation of the quality of segmentation and segment classification (Sect. 5.1), (b) evaluation of B-spline patch fitting, since it is a major contribution of our work (Sect. 5.2), and (c) evaluation of overall reconstruction quality (Sect. 5.3). We include evaluation metrics and results for each of the three parts next.

5.1 Segmentation and Labeling Evaluation

Evaluation Metrics. We use the following metrics for evaluating the point cloud segmentation and segment labeling based on the test set of ABCPartsDataset:

  • Segmentation mean IOU (“seg mIOU”): this metric measures the similarity of the predicted segments with ground truth segments. Given the ground-truth point-to-segment memberships \(\hat{\mathbf {W}}\) for an input point cloud, and the predicted ones \(\mathbf {W}\), we measure: \(\frac{1}{K}\sum _{k=1}^{K}IOU(\hat{\mathbf {W}}[:, k], h(\mathbf {W}[:, k]))\) where h represents a membership conversion into a one-hot vector, and K is the number of ground-truth segments.

  • Segment labeling IOU (“label mIOU”): this metric measures the classification accuracy of primitive type prediction averaged over segments: \(\frac{1}{K} \sum _{k=1}^{K}\mathcal {I}\left[ t_k = \hat{t}_k \right] \) where \(t_k\) and \(\hat{t}_k\) is the predicted and ground truth primitive type respectively for \(k^{th}\) segment and \(\mathcal {I}\) is an indicator function.

We use Hungarian matching to find correspondences between predicted segments and ground-truth segments.

Fig. 4.
figure 4

Given the input point clouds with normals of the first row, we show surfaces produced by SPFN [11] (second row), ParSeNet without post-processing optimization (third row), and full ParSeNet including optimization (fourth row). The last row shows the ground-truth surfaces from our ABCPartsDataset.

Comparisons. We first compare our method with a nearest neighbor (NN) baseline: for each test shape, we find its most similar shape from the training set using Chamfer distance. Then for each point on the test shape, we transfer the labels and primitive type from its closest point in \(\mathcal{R}^3\) on the retrieved shape.

We also compare against efficient RANSAC algorithm [15]. The algorithm only handles basic primitives (cylinder, cone, plane, sphere, and torus), and offers poor reconstruction of B-splines patches in our dataset. Efficient RANSAC requires per point normals, which we provide as the ground-truth normals. We run RANSAC 3 times and report the performance with best coverage.

We then compare against the supervised primitive fitting (SPFN) approach [11]. Their approach produces per point segment membership, and their network is trained to maximize relaxed IOU between predicted membership and ground truth membership, whereas our approach uses learned point embeddings and clustering with mean-shift clustering to extract segments. We train SPFN network using their provided code on our training set using their proposed losses. We note that we include B-splines patches in their supported types. We train their network in two input settings: (a) the network takes only point positions as input, (b) it takes point and normals as input. We train our ParSeNet on our training set in the same two settings using our loss functions.

The performance of the above methods are shown in Table 1. The lack of B-spline fitting hampers the performance of RANSAC. The SPFN method with points and normals as input performs better compared to using only points as input. Finally, ParSeNet with only points as input performs better than all other alternatives. We observe further gains when including point normals in the input. Training ParSeNet end-to-end gives \(13.13\%\) and \(8.66\%\) improvement in segmentation mIOU and label mIOU respectively over SPFN with points and normals as input. The better performance is also reflected in Fig. 4, where our method reconstructs patches that correspond to more reasonable segmentations compared to other methods. In the supplementary material we evaluate methods on the TraceParts dataset  [11], which contains only basic primitives (cylinder, cone, plane, sphere, torus). We outperform prior work also in this dataset.

Table 2. Ablation study for B-spline fitting. The error is measured using Chamfer Distance (CD is scaled by 100). The acronyms “cp”: control-points regression loss, “dist” means patch distance loss, and “lap” means Laplacian loss. We also include the effect of post-processing optimization “opt”. We report performance with and without upsampling (“ups”) for open and closed B-splines.
Fig. 5.
figure 5

Qualitative evaluation of B-spline fitting. From top to bottom: input point cloud, reconstructed surface by SplineNet, reconstructed surface by SplineNet with post-processing optimization, reconstruction by SplineNet with control point grid adjustment and finally ground truth surface. Effect of post process optimization is highlighted in red boxes. (Color figure online)

5.2 B-Spline Fitting Evaluation

Evaluation Metrics. We evaluate the quality of our predicted B-spline patches by computing the Chamfer distance between densely sampled points on the ground-truth B-spline patches and densely sampled points on predicted patches. Points are uniformly sampled based on the 2D parameter space of the patches. We use 2K samples. We use the test set of our SplineDataset for evaluation.

Ablation Study. We evaluate the training of SplineNet using various loss functions while giving 700 points per patch as input, in Table 2. All losses contribute to improvements in performance. Table 2 shows that upsampling is effective for closed splines. Figure 5 shows the effect of optimization to improve the alignment of patches and the adjustment of resolution in the control point grid. See supplementary material for more experiments on SplineNet ’s robustness.

5.3 Reconstruction Evaluation

Evaluation Metrics. Given a point cloud \(\mathbf {P}= \{ \mathbf {p}_i\}_{i=1}^N\), ground-truth patches \(\{\cup _{k=1}^{K}\hat{\mathbf {s}}_k\}\) and predicted patches \(\{\cup _{k=1}^{K}\mathbf {s}_k\}\) for a test shape in ABCPartsDataset, we evaluate the patch-based surface reconstruction using the following:

  • Residual error (“res”) measures the average distance of input points from the predicted primitives following [11]: \(L_{dist} = \sum _{k=1}^{K} \frac{1}{M_{k}}\sum _{n \in \hat{\mathbf {s}}_k } D(\mathbf {r}_n, \mathbf {s}_k)\) where K is the number of segments, \(M_{k}\) is number of sampled points \(\mathbf {r}_n\) from ground patch \(\hat{\mathbf {s}}_k\), \(D(\mathbf {r}_n, \mathbf {s}_k)\) is the distance of \(\mathbf {r}_n\) from predicted primitive patch \(\mathbf {s}_k\).

  • P-coverage (“P-cover”) measures the coverage of predicted surface by the input surface also following [11]: \(\frac{1}{N}\sum _{i=1}^{N}\mathbf {I}\left[ \min _{k=1}^{K}D(\mathbf {p}_i, \mathbf {s}_k) < \epsilon \right] \) (\(\epsilon =0.01\)).

We note that we use the matched segments after applying Hungarian matching algorithm, as in Sect. 5.1, to compute these metrics.

Comparisons. We report the performance of RANSAC for geometric primitive fitting tasks. Note that RANSAC produces a set of geometric primitives, along with their primitive type and parameters, which we use to compute the above metrics. Here we compare with the SPFN network [11] trained on our dataset using their proposed loss functions. We augment their per point primitive type prediction to also include open/closed B-spline type. Then for classified segments as B-splines, we use our SplineNet to fit B-splines. For segments classified as geometric primitives, we use their geometric primitive fitting algorithm.

Results. Table 1 reports the performance of our method, SPFN and RANSAC. The residual error and P-coverage follows the trend of segmentation metrics. Interestingly, our method outperforms SPFN even for geometric primitive predictions (even without considering B-splines and our adaptation). Using points and normals, along with joint end-to-end training, and post-processing optimization offers the best performance for our method by giving \(47.64\%\) and \(50\%\) reduction in relative error in comparison to SPFN and RANSAC respectively.

6 Conclusion

We presented a method to reconstruct point clouds by predicting geometric primitives and surface patches common in CAD design. Our method effectively marries 3D deep learning with CAD modeling practices. Our architecture predictions are editable and interpretable. Modelers can refine our results based on standard CAD modeling operations. In terms of limitations, our method often makes mistakes for small parts, mainly because clustering merges them with bigger patches. In high-curvature areas, due to sparse sampling, ParSeNet may produce more segments than ground-truth. Producing seamless boundaries is still a challenge due to noise and sparsity in our point sets. Generating training point clouds simulating realistic scan noise is another important future direction.