PointMixup: Augmentation for Point Clouds

Chen, Yunlu; Hu, Vincent Tao; Gavves, Efstratios; Mensink, Thomas; Mettes, Pascal; Yang, Pengwan; Snoek, Cees G. M.

doi:10.1007/978-3-030-58580-8_20

Yunlu Chen¹²,
Vincent Tao Hu¹²,
Efstratios Gavves¹²,
Thomas Mensink^12,13,
Pascal Mettes¹²,
Pengwan Yang^12,14 &
…
Cees G. M. Snoek¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12348))

Included in the following conference series:

European Conference on Computer Vision

5350 Accesses
69 Citations

Abstract

This paper introduces data augmentation for point clouds by interpolation between examples. Data augmentation by interpolation has shown to be a simple and effective approach in the image domain. Such a mixup is however not directly transferable to point clouds, as we do not have a one-to-one correspondence between the points of two different objects. In this paper, we define data augmentation between point clouds as a shortest path linear interpolation. To that end, we introduce PointMixup, an interpolation method that generates new examples through an optimal assignment of the path function between two point clouds. We prove that our PointMixup finds the shortest path between two point clouds and that the interpolation is assignment invariant and linear. With the definition of interpolation, PointMixup allows to introduce strong interpolation-based regularizers such as mixup and manifold mixup to the point cloud domain. Experimentally, we show the potential of PointMixup for point cloud classification, especially when examples are scarce, as well as increased robustness to noise and geometric transformations to points. The code for PointMixup and the experimental details are publicly available (Code is available at: https://github.com/yunlu-chen/PointMixup/).

Y. Chen V. T. Hu—Equal contribution.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions

Unsupervised Point Cloud Representation Learning by Clustering and Neural Rendering

Article Open access 08 March 2024

Improved Training for 3D Point Cloud Classification

Keywords

1 Introduction

The goal of this paper is to classify a cloud of points into their semantic category, be it an airplane, a bathtub or a chair. Point cloud classification is challenging, as they are sets and hence invariant to point permutations. Building on the pioneering PointNet by Qi et al. [15], multiple works have proposed deep learning solutions to point cloud classification [12, 16, 23, 29, 30, 36]. Given the progress in point cloud network architectures, as well as the importance of data augmentation in improving classification accuracy and robustness, we study how could data augmentation be naturally extended to support also point cloud data, especially considering the often smaller size of point clouds datasets (e.g.ModelNet40 [31]). In this work, we propose point cloud data augmentation by interpolation of existing training point clouds.

To perform data augmentation by interpolation, we take inspiration from augmentation in the image domain. Several works have shown that generating new training examples, by interpolating images and their corresponding labels, leads to improved network regularization and generalization, e.g., [8, 24, 26, 34]. Such a mixup is feasible in the image domain, due to the regular structure of images and one-to-one correspondences between pixels. However, this setup does not generalize to the point cloud domain, since there is no one-to-one correspondence and ordering between points. To that end, we seek to find a method to enable interpolation between permutation invariant point sets.

In this work, we make three contributions. First, we introduce data augmentation for point clouds through interpolation and we define the augmentation as a shortest path interpolation. Second, we propose PointMixup, an interpolation between point clouds that computes the optimal assignment as a path function between two point clouds, or the latent representations in terms of point cloud. The proposed interpolation strategy therefore allows usage of successful regularizers of Mixup and Manifold Mixup [26] on point cloud. We prove that (i) our PointMixup indeed finds the shortest path between two point clouds; (ii) the assignment does not change for any pairs of the mixed point clouds for any interpolation ratio; and (iii) our PointMixup is a linear interpolation, an important property since labels are also linearly interpolated. Figure 1 shows two pairs of point clouds, along with our interpolations. Third, we show the empirical benefits of our data augmentation across various tasks, including classification, few-shot learning, and semi-supervised learning. We furthermore show that our approach is agnostic to the network used for classification, while we also become more robust to noise and geometric transformations to the points.

2 Related Work

Deep Learning for Point Clouds. Point clouds are unordered sets and hence early works focus on analyzing equivalent symmetric functions which ensures permutation invariance. [15, 17, 33]. The pioneering PointNet work by Qi et al. [15] presented the first deep network that operates directly on unordered point sets. It learns the global feature with shared multi-layer perceptions and a max pooling operation to ensure permutation invariance. PointNet++ [16] extends this idea further with hierarchical structure by relying on a heuristic method of farthest point sampling and grouping to build the hierarchy. Likewise, other recent methods follow to learn hierarchical local features either by grouping points in various manners [10, 12, 23, 29, 30, 32, 36]. Li et al. [12] propose to learn a transformation from the input points to simultaneously solve the weighting of input point features and permutation of points into a latent and potentially canonical order. Xu et al. [32] extends 2D convolution to 3D point clouds by parameterizing a family of convolution filters. Wang et al. [29] proposed to leverage neighborhood structures in both point and feature spaces.

In this work, we aim to improve point cloud classification for any point-based approach. To that end, we propose a new model-agnostic data augmentation. We propose a Mixup regularization for point clouds and show that it can build on various architectures to obtain better classification results by reducing the generalization error in classification tasks. A very recent work by Li et al. [11] also considers improving point cloud classification by augmentation. They rely on auto-augmentation and a complicated adversarial training procedure, whereas in this work we propose to augment point clouds by interpolation.

Interpolation-Based Regularization. Employing regularization approaches for training deep neural networks to improve their generalization performances have become standard practice in deep learning. Recent works consider a regularization by interpolating the example and label pairs, commonly known as Mixup [8, 24, 34]. Manifold Mixup [26] extends Mixup by interpolating the hidden representations at multiple layers. Recently, an effort has been made on applying Mixup to various tasks such as object detection [35] and segmentation [7]. Different from existing works, which are predominantly employed in the image domain, we propose a new optimal assignment Mixup paradigm for point clouds, in order to deal with their permutation-invariant nature.

Recently, Mixup [34] has also been investigated from a semi-supervised learning perspective [2, 3, 27]. Mixmatch [3] guesses low-entropy labels for unlabelled data-augmented examples and mixes labelled and unlabelled data using Mixup [34]. Interpolation Consistency Training [27] utilizes the consistency constraint between the interpolation of unlabelled points with the interpolation of the predictions at those points. In this work, we show that our PointMixup can be integrated in such frameworks to enable semi-supervised learning for point clouds.

3 Point Cloud Augmentation by Interpolation

3.1 Problem Setting

In our setting, we are given a training set $\{(S_m,c_m)\}_{m=1}^{M}$ consisting of M point clouds. $S_m = \{p^{m}_n\}_{n=1}^{N} \in \mathcal {S}$ is a point cloud consisting of N points, $p^{m}_n \in \mathbb {R}^3$ is the 3D point, $\mathcal {S}$ is the set of such 3D point clouds with N elements. $c_m \in \{0,1\}^C$ is the one-hot class label for a total of C classes. The goal is to train a function $h: \mathcal {S} \mapsto [0,1]^C$ that learns to map a point cloud to a semantic label distribution. Throughout our work, we remain agnostic to the type of function h used for the mapping and we focus on data augmentation to generate new examples.

Data augmentation is an integral part of training deep neural networks, especially when the size of the training data is limited compared to the size of the model parameters. A popular data augmentation strategy is Mixup [34]. Mixup performs augmentation in the image domain by linearly interpolating pixels, as well as labels. Specifically, let $I_1 \in \mathbb {R}^{W \times H \times 3}$ and $I_2 \in \mathbb {R}^{W \times H \times 3}$ denote two images. Then a new image and its label are generated as:

$$\begin{aligned} I_{\text {mix}} (\lambda )&= (1 - \lambda )\cdot I_1 + \lambda \cdot I_2, \end{aligned}$$

(1)

$$\begin{aligned} c_{\text {mix}} (\lambda )&= (1-\lambda ) \cdot c_1 + \lambda \cdot c_2, \end{aligned}$$

(2)

where $\lambda \in [0,1]$ denotes the mixup ratio. Usually $\lambda $ is sampled from a beta distribution $\lambda \sim \text {Beta}(\gamma , \gamma )$. Such a direct interpolation is feasible for images as the data is aligned. In point clouds, however, linear interpolation is not straightforward. The reason is that point clouds are sets of points in which the point elements are orderless and permutation-invariant. We must, therefore, seek a definition of interpolation on unordered sets.

3.2 Interpolation Between Point Clouds

Let $S_1 \in \mathcal {S}$ and $S_2 \in \mathcal {S}$ denote two training examples on which we seek to perform interpolation with ratio $\lambda $ to generate new training examples. Given a pair of source examples $S_1$ and $S_2$, an interpolation function, $f_{S_1 \rightarrow S_2}: [0,1] \mapsto \mathcal {S}$ can be any continuous function, which forms a curve that joins $S_1$ and $S_2$ in a metric space $(\mathcal {S}, d)$ with a proper distance function d. This means that it is up to us to define what makes an interpolation good. We define the concept of shortest-path interpolation in the context of point cloud:

Definition 1 (Shortest-path interpolation)

In a metric space $(\mathcal {S}, d)$, a shortest-path interpolation $f^*_{S_1 \rightarrow S_2}: [0,1] \mapsto \mathcal {S}$ is an interpolation between the given pair of source examples $S_1 \in \mathcal {S}$ and $S_2 \in \mathcal {S}$, such that for any $\lambda \in [0,1]$, $ d(S_1, S^{(\lambda )}) + d(S^{(\lambda )},S_2)) = d(S_1, S_2)$ holds for $S^{(\lambda )} = f^*_{S_1 \rightarrow S_2} (\lambda )$ being the interpolant.

We say that Definition 1 ensures the shortest path property because the triangle inequality holds for any properly defined distance d : $ d(S_1, S^{(\lambda )}) + d(S^{(\lambda )},S_2)) \ge d(S_1, S_2)$. The intuition behind this definition is that the shortest path property ensures the uniqueness of the label distribution on the interpolated data. To put it otherwise, when computing interpolants from different sources, the interpolants generated by the shortest-path interpolation is more likely to be discriminative than the ones generated by a non-shortest-path interpolation (Fig. 2).

To define an interpolation for point clouds, therefore, we must first select a reasonable distance metric. Then, we opt for the shorterst-path interpolation function based on the selected distance metric. For point clouds a proper distance metric is the Earth Mover’s Distance (EMD), as it captures well not only the geometry between two point clouds, but also local details as well as density distributions [1, 5, 13]. EMD measures the least amount of total displacement required for each of the points in the first point cloud, $x_{i} \in S_1$, to match a corresponding point in the second point cloud, $y_{j} \in S_2$. Formally, the EMD for point clouds solves the following assignment problem:

$$\begin{aligned} \phi ^{*} = arg\,min_{\phi \in \mathbf {\Phi }} \sum _{i} \Vert x_{i} - y_{\phi (i)}\Vert _{2}, \end{aligned}$$

(3)

where $ \mathbf {\Phi }=\{ \{1,\dots , N\} \mapsto \{1, \dots , N\} \}$ is the set of possible bijective assignments, which give one-to-one correspondences between points in the two point clouds. Given the optimal assignment $\phi ^*$, the EMD is then defined as the average effort to move $S_1$ points to $S_2$:

$$\begin{aligned} d_\text {EMD} = \frac{1}{N} \sum _{i} \Vert x_{i} - y_{\phi ^*(i)}\Vert _{2}. \end{aligned}$$

(4)

3.3 PointMixup: Optimal Assignment Interpolation for Point Clouds

We propose an interpolation strategy, which can be used for augmentation that is analogous of Mixup [34] but for point clouds. We refer to this proposed PointMixup as Optimal Assignment (OA) Interpolation, as it relies on the optimal assignment on the basis of the EMD to define the interpolation between clouds. Given the source pair of point clouds $S_1 = \{ x_i \}_{i=1}^{N}$ and $S_2 = \{ y_j \}_{j=1}^{N}$, the Optimal Assignment (OA) interpolation is a path function $f^*_{S_1 \rightarrow S_2}: [0,1] \mapsto \mathcal {S}$. With $\lambda \in [0,1]$,

$$\begin{aligned} f^*_{S_1 \rightarrow S_2} (\lambda )&= \{ u_i \}_{i=1}^{N}, \quad \text {where}\end{aligned}$$

(5)

$$\begin{aligned} u_i = (1-\lambda )&\cdot x_i + \lambda \cdot y_{\phi ^*(i)}, \end{aligned}$$

(6)

in which $\phi ^*$ is the optimal assignment from $S_1$ to $S_2$ defined by Eq. 3. Then the interpolant $S_\mathbf{OA} ^{S_1 \rightarrow S_2, (\lambda )}$ (or $S_\mathbf{OA} ^{(\lambda )} $ when there is no confusion) generated by the OA interpolation path function $f^*_{S_1 \rightarrow S_2} (\lambda )$ is the required augmented data for point cloud Mixup.

$$\begin{aligned} S_\mathbf{OA} ^{(\lambda )} = \{ (1-\lambda ) \cdot x_i + \lambda \cdot y_{\phi ^*(i)}\}_{i=1}^{N}. \end{aligned}$$

(7)

Under the view of $f^*_{S_1 \rightarrow S_2}$ being a path function in the metric space $(\mathcal {S}, d_\text {EMD})$, f is expected to be the shortest path joining $S_1$ and $S_2$ since the definition of the interpolation is induced from the EMD.

3.4 Analysis

Intuitively we expect that PointMixup is a shortest path linear interpolation. That is, the interpolation lies on the shortest path joining the source pairs, and the interpolation is linear with regard to $\lambda $ in $(\mathcal {S}, d_\text {EMD})$, since the definition of the interpolation is derived from the EMD. However, it is non-trivial to show the optimal assignment interpolation abides to a shortest path linear interpolation, because the optimal assignment between the mixed point cloud and either of the source point cloud is unknown. It is, therefore, not obvious that we can ensure whether there exists a shorter path between the mixed examples and the source examples. To this end, we need to provide an in-depth analysis.

To ensure the uniqueness of the label distribution from the mixed data, we need to show that the shortest path property w.r.t. the EMD is fulfilled. Moreover, we need to show that the proposed interpolation is linear w.r.t the EMD, in order to ensure that the input interpolation has the same ratio as the label interpolation. Besides, we evaluate the assignment invariance property as a prerequisite knowledge for the proof for the linearity. This property implies that there exists no shorter path between interpolants with different $\lambda $, i.e., the shortest path between the interpolants is a part of the shortest path between the source examples. Due to space limitation, we sketch the proof for each property. The complete proofs are available in the supplementary material.

We start with the shortest path property. Since the EMD for point cloud is a metric, the triangle inequality $d_{EMD}(A, B)+ d_{EMD}(B,C) \ge d_{EMD} (A,C)$ holds (for which a formal proof can be found in [19]). Thus we formalize the shortest path property into the following proposition:

Property 1 (shortest path)

Given the source examples $S_1$ and $S_2$, $\forall \lambda \in [0,1],$ $d_\text {EMD}(S_1, S_\mathbf{OA }^{(\lambda )}) + d_\text {EMD}(S_\mathbf{OA }^{(\lambda )}, S_2 ) = d_\text {EMD}(S_1, S_2)$.

Sketch of Proof. From the definition of the EMD we can derive $d_{\text {EMD}} (S_1, S_\mathbf{OA }^{(\lambda )}) + d_{\text {EMD}} (S_2, S_\mathbf{OA }^{(\lambda )}) \le d_{\text {EMD}} (S_1, S_2)$. Then from the triangle inequity of the EMD, only the equality remains. $\square $

We then introduce the assignment invariance property of the OA Mixup as an intermediate step for the proof of the linearity of OA Mixup. The property shows that the assignment does not change for any pairs of the mixed point clouds with different $\lambda $. Moreover, the assignment invariance property is important to imply that the shortest path between the any two mixed point clouds is part of the shortest path between the two source point clouds.

Property 2 (assignment invariance)

$S_\mathbf{OA }^{(\lambda _1)}$ and $S_\mathbf{OA }^{(\lambda _2)}$ are two mixed point clouds from the same given source pair of examples $S_1$ and $S_2$ as well as the mix ratios $\lambda _1$ and $\lambda _2$ such that $0\le \lambda _1 < \lambda _2 \le 1$. Let the points in $S_\mathbf{OA }^{(\lambda _1)}$ and $S_\mathbf{OA }^{(\lambda _2)}$ be $u_i = (1-\lambda _1) \cdot x_i + \lambda _1 \cdot y_{\phi ^*(i)}$ and $v_k = (1-\lambda _2) \cdot x_k + \lambda _2 \cdot y_{\phi ^*(k)}$, where $\phi ^*$ is the optimal assignment from $S_1$ to $S_2$. Then the identical assignment $\phi _I$ is the optimal assignment from $S_\mathbf{OA }^{(\lambda _1)}$ to $S_\mathbf{OA }^{(\lambda _2)}$.

Sketch of Proof. We first prove that the identical mapping is the optimal assignment from $S_1$ to $S_\mathbf{OA }^{(\lambda _1)}$ from the definition of the EMD. Then we prove that $\phi ^*$ is the optimal assignment from $S_\mathbf{OA }^{(\lambda _1)}$ to $S_2$. Finally we prove that the identical mapping is the optimal assignment from $S_\mathbf{OA }^{(\lambda _1)}$ to $S_\mathbf{OA }^{(\lambda _2)}$ similarly as the proof for the first intermediate argument. $\square $

Given the property of assignment invariance, the linearity follows:

Property 3 (linearity)

For any mix ratios $\lambda _1$ and $\lambda _2$ such that $0\le \lambda _1 < \lambda _2 \le 1$, the mixed point clouds $S_\mathbf{OA }^{(\lambda _1)}$ and $S_\mathbf{OA }^{(\lambda _2)}$ satisfies that $d_\text {EMD}(S_\mathbf{OA }^{(\lambda _1)}, S_\mathbf{OA }^{(\lambda _2)}) = (\lambda _2 -\lambda _1) \cdot d_\text {EMD}(S_1, S_2)$.

Sketch of Proof. The proof can be directly derived from the fact that the identical mapping is the optimal assignment between $S_\mathbf{OA }^{(\lambda _1)}$ and $S_\mathbf{OA }^{(\lambda _2)}$. $\square $

The linear property of our interpolation is important, as we jointly interpolate the point clouds and the labels. By ensuring that the point cloud interpolation is linear, we ensure that the input interpolation has the same ratio as the label interpolation.

On the basis of the properties, we find that PointMixup is a shortest path linear interpolation between point clouds in $(\mathcal {S}, d_\text {EMD})$.

3.5 Manifold PointMixup: Interpolate Between Latent Point Features

In standard PointMixup, only the inputs, i.e., the XYZ point cloud coordinates are mixed. The input XYZs are low-level geometry information and sensitive to disturbances and transformations, which in turn limits the robustness of the PointMixup. Inspired by Manifold Mixup [26], we can also use the proposed interpolation solution to mix the latent representations in the hidden layers of point cloud networks, which are trained to capture salient and high-level information that is less sensitive to transformations. PointMixup can be applied for the purpose of Manifold Mixup to mix both at the XYZs and different levels of latent point cloud features and maintain their respective advantages, which is expected to be a stronger regularizer for improved performance and robustness.

We describe how to mix the latent representations. Following [26], at each batch we randomly select a layer l to perform PointMixup from a set of layers L, which includes the input layer. In a point cloud network, the intermediate latent representation at layer l (before the global aggregation stage such as the max pooling aggregation in PointNet [15] and PointNet++ [16]) is $Z_{(l)} = \{(x_i, z_i^{(x)})\}_{i=1}^{N_z}$, in which $x_i$ is 3D point coordinate and $z_i^{(x)}$ is the corresponding high-dimensional feature. For the mixed latent representation, given the latent representation of two source examples are $Z_{(l),1} = \{(x_i, z_i^{(x)})\}_{i=1}^{N_z}$ and $Z_{(l),2} = \{(y_i, z_i^{(y)})\}_{i=1}^{N_z}$, the optimal assignment $\phi ^*$ is obtained by the 3D point coordinates $x_i$, and the mixed latent representation then becomes

$$\begin{aligned} Z_{(l),\mathbf{OA} }^{(\lambda )}&= \{(x^{\text {mix}}_i, z_i^{\text {mix}})\}, \quad \quad \text {where} \\ x^{\text {mix}}_i&= (1-\lambda ) \cdot x_i + \lambda \cdot y_{\phi ^*(i)}, \\ z_i^{\text {mix}}&= (1-\lambda ) \cdot z_i^{(x)} + \lambda \cdot z^{(y)}_{\phi ^*(i)}. \end{aligned}$$

Specifically in PointNet++, three layers of representations are randomly selected to perform Manifold Mixup: the input, and the representations after the first and the second SA modules (See appendix of [16]).

4 Experiments

4.1 Setup

Datasets. We focus in our experiments on the ModelNet40 dataset [31]. This dataset contains 12,311 CAD models from 40 man-made object categories, split into 9,843 for training and 2,468 for testing. We furthermore perform experiments on the ScanObjectNN dataset [25]. This dataset consists of real-world point cloud objects, rather than sampled virtual point clouds. The dataset consists of 2,902 objects and 15 categories. We report on two variants of the dataset, a standard variant OBJ_ONLY and one with heavy permutations from rigid transformations PB_T50_RS [25].

Following [12], we discriminate between settings where each dataset is pre-aligned and unaligned with horizontal rotation on training and test point cloud examples. For the unaligned settings, we randomly rotate the training point cloud along the up-axis. Then, before solving the optimal assignment, we perform a simple additional alignment step to fit and align the symmetry axes between the two point clouds to be mixed. Through this way, the point clouds are better aligned and we obtain more reasonable point correspondences. Last, we also perform experiments using only 20% of the training data (Fig. 3).

Network Architectures. The main network architecture used throughout the paper is PointNet++ [16]. We also report results with PointNet [15] and DGCNN [29], to show that our approach is agnostic to the architecture that is employed. PointNet learns a permutation-invariant set function, which does not capture local structures induced by the metric space the points live in. PointNet++ is a hierarchical structure, which segments a point cloud into smaller clusters and applies PointNet locally. DGCNN performs hierarchical operations by selecting a local neighbor in the feature space instead of the point space, resulting in each point having different neighborhoods in different layers.

Experimental Details. We uniformly sample 1,024 points on the mesh faces according to the face area and normalize them to be contained in a unit sphere, which is a standard setting [12, 15, 16]. In case of mixing clouds with different number of points, we can simply replicate random elements from the each point set to reach the same cardinality. During training, we augment the point clouds on-the-fly with random jitter for each point using Gaussian noise with zero mean and 0.02 standard deviation. We implement our approach in PyTorch [14]. For network optimization, we use the Adam optimizer with an initial learning rate of $10^{-3}$. The model is trained for 300 epochs with a batch size of 16. We follow previous work [26, 34] and draw $\lambda $ from a beta distribution $\lambda \sim \text {Beta} (\gamma , \gamma )$. We also perform Manifold Mixup [26] in our approach, through interpolation on the transformed and pooled points in intermediate network layers. In this work, we opt to use the efficient algorithm and adapt the open-source implementation from [13] to solve the optimal assignment approximation. Training for 300 epochs takes around 17 h without augmentation and around 19 h with PointMixup or Manifold PointMixup on a single NVIDIA GTX 1080 ti.

Baseline Interpolations. For our comparisons to baseline point cloud augmentations, we compare to two variants. The first variant is random assignment interpolation, where a random assignment $\phi ^\mathbf{RA} $ is used, to connect points from both sets, yielding:

$$S_\mathbf{RA }^{(\lambda )} = \{(1-\lambda ) \cdot x_i + \lambda \cdot y_{\phi ^\mathbf{RA} (i)}\}.$$

The second variant is point sampling interpolation, where random draws without replacement of points from each set are made according to the sampling frequency $\lambda $:

$$S_\mathbf{PS }^{(\lambda )} = S_1^{(1-\lambda )} \cup S_2^{(\lambda )},$$

where $S_2^{(\lambda )}$ denotes a randomly sampled subset of $S_2$, with $ \lfloor \lambda N \rfloor $ elements. ($\lfloor \cdot \rfloor $ is the floor function.) And similar for $S_1$ with $N - \lfloor \lambda N \rfloor $ elements, such that $S_\mathbf{PS }^{(\lambda )}$ contains exactly N points. The intuition of the point sampling variant is that for point clouds as unordered sets, one can move one point cloud to another through a set operation such that it removes several random elements from set $S_1$ and replace them with same amount of elements from $S_2$.

4.2 Point Cloud Classification Ablations

We perform four ablation studies to show the workings of our approach with respect to the interpolation ratio, comparison to baseline interpolations and other regularizations, as well robustness to noise.

Effect of Interpolation Ratio. The first ablation study focuses on the effect of the interpolation ratio in the data augmentation for point cloud classification. We perform this study on ModelNet40 using the PointNet++ architecture. The results are shown in Fig. 4 for the pre-aligned setting. We find that regardless of the interpolation ratio used, our approach provides a boost over the setting without augmentation by interpolation. PointMixup positively influences point cloud classification. The inclusion of manifold mixup adds a further boost to the scores. Throughout further experiments, we use $\gamma =0.4$ for input mixup and $\gamma =1.5$ for manifold mixup in unaligned setting, and $\gamma =1.0$ for input mixup and $\gamma =2.0$ for manifold mixup in pre-aligned setting.

Comparison to Baseline Interpolations. In the second ablation study, we investigate the effectiveness of our PointMixup compared to the two interpolation baselines. We again use ModelNet40 and PointNet++. We perform the evaluation on both the pre-aligned and unaligned dataset variants, where for both we also report results with a reduced training set. The results are shown in Table 1. Across both the alignment variants and dataset sizes, our PointMixup obtains favorable results. This result highlights the effectiveness of our approach, which abides to the shortest path linear interpolation definition, while the baselines do not.

Table 1. Comparison of PointMixup to baseline interpolations on ModelNet40 using PointNet++. PointMixup compares favorable to excluding interpolation and to the baselines, highlighting the benefits of our shortest path interpolation solution.

Full size table

Table 2. Evaluating our approach to other data augmentations (left) and its robustness to noise and transformations (right). We find that our approach with manifold mixup (MM) outperforms augmentations such as label smoothing and other variations of mixup. For the robustness evaluation, we find that our approach with strong regularization power from manifold mixup provides more robustness to random noise and geometric transformations.

Full size table

PointMixup with Other Regularizers. Third, we evaluate how well PointMixup works by comparing to multiple existing data regularizers and mixup variants, again on ModelNet40 and PointNet++. We investigate the following augmentations: (i) Mixup [34], (ii) Manifold Mixup [26], (iii) mix input only without target mixup, (iv) mix latent representation at a fixed layer (manifold mixup does so at random layers), and (v) label smoothing [22]. Training is performed on the reduced dataset to better highlight their differences. We show the results in Table 2 on the left. Our approach with manifold mixup obtains the highest scores. The label smoothing regularizer is outperformed, while we also obtain better scores than the mixup variants. We conclude that PointMixup is forms an effective data augmentation for point clouds.

Robustness to Noise. By adding additional augmented training examples, we enrich the dataset. This enrichment comes with additional robustness with respect to noise in the point clouds. We evaluate the robustness by adding random noise perturbations on point location, scale, translation and different rotations. Note that for evaluation of robustness against up-axis rotation, we use the models which are trained with the pre-aligned setting, in order to test also the performance against rotation along the up-axis as a novel transform. The results are in Table 2 on the right. Overall, our approach including manifold mixup provides more stability across all perturbations. For example, with additional noise ($\sigma =0.05$), we obtain an accuracy of 56.5, compared to 35.1 for the baseline. We similar trends for scaling (with a factor of two), with an accuracy of 72.9 versus 59.2. We conclude that PointMixup makes point cloud networks such as PointNet++ more stable to noise and rigid transformations.

Qualitative Analysis. In Fig. 5, we show eight examples of PointMix for point cloud interpolation; four interpolations of cups and tables, four interpolations of chairs and cars. Through our shortest path interpolation, we end up at new training examples that exhibit characteristics of both classes, making for sensible point clouds and mixed labels, which in turn indicate why PointMixup is beneficial for point cloud classification.

4.3 Evaluation on Other Networks and Datasets

With PointMixup, new point clouds are generated by interpolating existing point clouds. As such, we are agnostic to the type of network or dataset. To highlight this ability, we perform additional experiments on extra networks and an additional point cloud dataset.

Table 3. PointMixup on other networks (left) and another dataset (right). We find our approach is beneficial regardless the network or dataset.

Full size table

PointMixup on Other Network Architectures. We show the effect of PointMixup to two other networks, namely PointNet [15] and DGCNN [29]. The experiments are performed on ModelNet40. For PointNet, we perform the evaluation on the unaligned setting and for DGCNN with pre-aligned setting to remain consistent with the alignment choices made in the respective papers. The results are shown in Table 3 on the left. We find improvements when including PointMixup for both network architectures.

PointMixup on Real-World Point Clouds. We also investigate PointMixup on point clouds from real-world object scans, using ScanObjectNN [25], which collects object from 3D scenes in SceneNN [9] and ScanNet [4]. Here, we rely on PointNet++ as network. The results in Table 3 on the right show that we can adequately deal with real-world point cloud scans, hence we are not restricted to point clouds from virtual scans. This result is in line with experiments on point cloud perturbations.

4.4 Beyond Standard Classification

The fewer training examples available, the stronger the need for additional examples through augmentation. Hence, we train PointNet++ on ModelNet40 in both a few-shot and semi-supervised setting.

Semi-supervised Learning. Semi-supervised learning learns from a dataset where only a small portion of data is labeled. Here, we show how PointMixup directly enables semi-supervised learning for point clouds. We start from Interpolation Consistency Training [27], a state-of-the-art semi-supervised approach, which utilizes Mixup between unlabeled points. Here, we use our Mixup for point clouds within their semi-supervised approach. We evaluate on ModelNet40 using 400, 600, and 800 labeled point clouds. The result of semi-supervised learning are illustrated in Table 4 on the left. Compared to the supervised baseline, which only uses the available labelled examples, our mixup enables the use of additional unlabelled training examples, resulting in a clear boost in scores. With 800 labelled examples, the accuracy increases from 73.5% to 82.0%, highlighting the effectiveness of PointMixup in a semi-supervised setting.

Table 4. Evaluating PointMixup in the context of semi-supervised (left) and few-shot learning (right). When examples are scarce, as is the case for both settings, using our approach provides a boost to the scores.

Full size table

Few-Shot Learning. Few-shot classification aims to learn a classifier to recognize unseen classes during training with limited examples. We follow [6, 18, 20, 21, 28] to regard few-shot learning a typical meta-learning method, which learns how to learn from limited labeled data through training from a collection of tasks, i.e., episodes. In an N-way K-shot setting, in each task, N classes are selected and K examples for each class are given as a support set, and the query set consists of the examples to be predicted. We perform few-shot classification on ModelNet40, from which we select 20 classes for training, 10 for validation, and 10 for testing. We utilize PointMixup within ProtoNet [20] by constructing mixed examples from the support set and update the model with the mixed examples before making predictions on the query set. We refer to the supplementary material for the details of our method and the settings. The results in Table 4 on the right show that incorporating our data augmentation provides a boost in scores, especially in the one-shot setting, where the accuracy increases from 72.3% to 77.2%.

5 Conclusion

This work proposes PointMixup for data augmentation on point clouds. Given the lack of data augmentation by interpolation on point clouds, we start by defining it as a shortest path linear interpolation. We show how to obtain PointMixup between two point clouds by means of an optimal assignment interpolation between their point sets. As such, we arrive at a Mixup for point clouds, or latent point cloud representations in the sense of Manifold Mixup, that can handle permutation invariant nature. We first prove that PointMixup abides to our shortest path linear interpolation definition. Then, we show through various experiments that PointMixup matters for point cloud classification. We show that our approach outperforms baseline interpolations and regularizers. Moreover, we highlight increased robustness to noise and geometric transformations, as well as its general applicability to point-based networks and datasets. Lastly, we show the potential of our approach in both semi-supervised and few-shot settings. The generic nature of PointMixup allows for a comprehensive embedding in point cloud classification.

References

Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds. In: ICML (2018)
Google Scholar
Berthelot, D., et al.: ReMixMatch: semi-supervised learning with distribution alignment and augmentation anchoring. CoRR (2019)
Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: MixMatch: a holistic approach to semi-supervised learning. In: NeurIPS (2019)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)
Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)
Google Scholar
French, G., Aila, T., Laine, S., Mackiewicz, M., Finlayson, G.: Consistency regularization and CutMix for semi-supervised semantic segmentation. CoRR (2019)
Google Scholar
Guo, H., Mao, Y., Zhang, R.: MixUp as locally linear out-of-manifold regularization. In: AAAI (2019)
Google Scholar
Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: SceneNN: a scene meshes dataset with annotations. In: 3DV (2016)
Google Scholar
Li, J., Chen, B.M., Hee Lee, G.: SO-Net: self-organizing network for point cloud analysis. In: CVPR, pp. 9397–9406 (2018)
Google Scholar
Li, R., Li, X., Heng, P.A., Fu, C.W.: PointAugment: an auto-augmentation framework for point cloud classification. In: CVPR, pp. 6378–6387 (2020)
Google Scholar
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: NeurIPS (2018)
Google Scholar
Liu, M., Sheng, L., Yang, S., Shao, J., Hu, S.M.: Morphing and sampling network for dense point cloud completion. CoRR (2019)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Google Scholar
Ravanbakhsh, S., Schneider, J., Poczos, B.: Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500 (2016)
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2016)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. IJCV (2000)
Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)
Google Scholar
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: CVPR (2018)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Google Scholar
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)
Google Scholar
Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: CVPR (2018)
Google Scholar
Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, T., Yeung, S.K.: Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: ICCV (2019)
Google Scholar
Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: ICML (2019)
Google Scholar
Verma, V., Lamb, A., Kannala, J., Bengio, Y., Lopez-Paz, D.: Interpolation consistency training for semi-supervised learning. IJCAI (2019)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. TOG (2019)
Google Scholar
Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR (2019)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR (2015)
Google Scholar
Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Yu.: SpiderCNN: deep learning on point sets with parameterized convolutional filters. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 90–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_6
Chapter Google Scholar
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: NeurIPS, pp. 3391–3401 (2017)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: ICLR (2017)
Google Scholar
Zhang, Z., He, T., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of freebies for training object detection neural networks. In: CoRR (2019)
Google Scholar
Zhang, Z., Hua, B.S., Yeung, S.K.: ShellNet: efficient point cloud convolutional neural networks using concentric shells statistics. In: ICCV (2019)
Google Scholar

Download references

Acknowledgment

This research was supported in part by the SAVI/MediFor and the NWO VENI What & Where projects.

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes, Pengwan Yang & Cees G. M. Snoek
Google Research, Amsterdam, The Netherlands
Thomas Mensink
Peking University, Beijing, China
Pengwan Yang

Authors

Yunlu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Tao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Efstratios Gavves
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Mensink
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Mettes
View author publications
You can also search for this author in PubMed Google Scholar
Pengwan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Cees G. M. Snoek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yunlu Chen or Vincent Tao Hu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 241 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y. et al. (2020). PointMixup: Augmentation for Point Clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12348. Springer, Cham. https://doi.org/10.1007/978-3-030-58580-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-58580-8_20
Published: 03 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58579-2
Online ISBN: 978-3-030-58580-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PointMixup: Augmentation for Point Clouds

Abstract

Similar content being viewed by others

Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions

Unsupervised Point Cloud Representation Learning by Clustering and Neural Rendering

Improved Training for 3D Point Cloud Classification

Keywords

1 Introduction

2 Related Work

3 Point Cloud Augmentation by Interpolation

3.1 Problem Setting

3.2 Interpolation Between Point Clouds

Definition 1 (Shortest-path interpolation)

3.3 PointMixup: Optimal Assignment Interpolation for Point Clouds

3.4 Analysis

Property 1 (shortest path)

Property 2 (assignment invariance)

Property 3 (linearity)

3.5 Manifold PointMixup: Interpolate Between Latent Point Features

4 Experiments

4.1 Setup

4.2 Point Cloud Classification Ablations

4.3 Evaluation on Other Networks and Datasets

4.4 Beyond Standard Classification

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 241 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation