Keywords

1 Introduction

The main goal of this paper is to provide the reader with an extensive comparison of existing methods of 3D objects segmentation. Such algorithms are intensively developed in computer graphics community, where the segmentation might be useful for classification purposes but also it could provide some semantic meaning to the object. The robotic systems could also exploit this information. For example, if one is able to segment a handle of a mug, the robot could use this information to perform a successful grasp. The proliferation of the methods presented in this paper to the robotics community will be beneficial, as it will ease the development of semantic robotics. Most of the methods presented in this review are based on machine learning (ML) techniques making 3D object segmentation a very promising field of application for this group of algorithms. The main contribution of this paper is an extensive comparison of existing methods of 3D objects segmentation. The following section of the paper describes a dataset. Then, the overview of used methods is given, together with a short presentation of each algorithm. Next, the description of the measures used for comparing the algorithms is provided. Subsequently, the results are presented and discussed. Finally, concluding remarks are given.

2 Materials and Methods

2.1 Dataset Used for Testing

We will start our presentation of the dataset used for algorithms comparison from the short description of 3D shapes representations. Mainly, there are three possibilities: voxels, point clouds and meshes. The first one is primarily used in robotics, and its main advantage is a possibility of representing a volume of an object. The second one only provides the information about the points on a surface of the object, e.g. from object scanning. However, the information about the topology of the surfaces is lost, no connectivity data available. Hence, two points close to each other in a Euclidean space are always close to each other in this representation, when in reality they belong to two different surfaces. The third representation addresses such situation. The use of a mesh allows us to represent inner and outer part of the object and provide the means of representing a volume of watertight objects implicitly. Thus, it provides advantages of both previous representations; we are able to represent real objects scanned with the appropriate device (Point Clouds) and to some extent account for the volume of the object (Voxel Representation).

Taking into account the properties of meshes we chose it as a most convenient representation for our tests and selected a dataset which provides such data. Namely, it is Watertight Track of SHREC 2007 [1]. It contains a large number of categories (20). Additionally, the dataset provides segmentation file. Line number in this file represents the vertex number in the object file. The information given in each line is a segment number which this vertex belongs to.

2.2 Tested Methods

Having the dataset 11 methods of 3D objects segmentation were tested. In the following paragraphs, each of them is shortly described.

  • Learning 3D Mesh Segmentation and Labelling (LMS) [5] – for segmentation Conditional Random Fields (CRF) with conditions assessing conformity of neighbouring surfaces constituting an object were used. LMS is supervised learning algorithm. It uses already prepared templates with appropriate geometrical features.

  • Unsupervised 3D Shape Segmentation and Co-Segmentation via Deep Learning (Deep Learning) [11] – this algorithm initially divides the objects into segments formed of the surfaces of the object. From each segment, the information about its centre of gravity is found. Subsequently, using Shape Diameter Function, which measures the area diameter, being in a neighbourhood of this centre point and an average geodesic distance to other centres, most similar surfaces are found.

  • Mesh Segmentation with Concavity-aware Fields (Isoline Cuts) [7] – in this algorithm, it was assumed that the highest probability of segment borders is assigned to concave parts of the objects. Such borders are found through solving a Laplacian which is sensitive to such areas.

  • Spectral 3D Mesh Segmentation with a Novel Single Segmentation Field (SSF Seg) [12] – the operation of this algorithm is based on hierarchical grouping of surfaces constituting an object. Firstly, each surface is a separate segment, next they are joined based on appropriate transformation of eigenvectors of a matrix found using Laplace operator.

  • Consistent Mesh Partitioning and Skeletonisation Using the Shape Diameter Function (Shape Diam) [9] – similarly to [11] this algorithm used Shape Diameter Function. In the beginning, this function is used for finding a centre of gravity of each surface of an object. The surfaces are joined to form a segment based on appropriate energy function.

  • Normalized Cuts for 3D Mesh Analysis (Norm Cuts) [4] – is based on the hierarchical grouping. It begins with the assumption that each surface is a separate segment. Next, the surfaces with the least joint circumference divided by their joint area are merged to form a single segment. The stopping criterion is set by the user by providing the number of segments the object should contain.

  • Randomised Cuts for 3D Mesh analysis (Rand Cuts) [4] – the method presented in the same work as in a previous paragraph. Its operation is the opposite to the Norm Cuts. It starts with the assumption that the whole object is a single segment and next it is divided into smaller segments till the appropriate number of segments is achieved.

  • Mesh Segmentation Using Feature Point and Core Extraction (Core Extra) [6] – this is another hierarchical segmentation method. It does not require the information about the number of segments. The algorithm process the data as long as it is able to find characteristic points in a given segment. Such characteristic points are high convexities or concavities.

  • Fast Mesh Segmentation Using Random Walks (Rand Walks) [8] – first, all the surfaces of the object are assumed to be separate segments and next based on random walks and the highest probability of reaching next surface the segments are joined to give a larger segment. The stopping criterion is based on the number of segments provided by the user.

  • Hierarchical Mesh Segmentation Based on Fitting Primitives (Fit Prim) [2] – is based on grouping hierarchically and relies on matching adjacent surfaces to the arbitrary given geometrical primitives (surface, sphere, and cylinder). The segmentation starts with the assumption that all the surfaces of the object are separate segments and ends when the number of segments set by the user is reached.

  • Metamorphosis of Polyhedral Surfaces Using Decomposition (K-Means) [10] – is based on computing centroids. For a given number of k-segments, the algorithm initially sets the k-centres of concentration producing k-segments and based on Euclidean distance assigns subsequent vertexes to the closest centres. Next, the centres of concentration are recomputed, and the algorithm stops when there are no changes in centres positions.

  • Human – is given here as a reference. It provides the information how the inexperienced user will segment objects in provided dataset, without knowing the proper annotation.

3 Measures

Cut Discrepancy – it is a distance calculated along the segment borders provided in the reference file. The geodesic distance was used here. The distance of point \(p_i\), forming a border of a segment and belonging to a set of points \(C_1\), to a border of a reference segment represented by a set of points \(C_2\) is given by:

$$\begin{aligned} d_G(p_i,C_2) = min\{d_G(p_i,p_j), \forall p_j \in C_2\}, \end{aligned}$$
(1)

where:

\(d_G(p_i,p_j)\) – geodesic distance between points \(p_i\) and \(p_j\).

Directional Cut Discrepancy is computed as an average distance of all points from \(C_1\) to the border of reference segment represented as a set of points \(C_2\):

$$\begin{aligned} DCD(S_1 \Rightarrow S_2) = mean\{d_G(p_i,C_2), \forall p_i \in C_1\}. \end{aligned}$$
(2)

where:

\(S_1\) - segmentation provided by the algorithms,

\(S_2\) - reference segmentation.

Finally, the discrepancy of the borders is given by:

$$\begin{aligned} CD(S_1,S_2) = \frac{DCD(S_1 \Rightarrow S_2) + DCD(S_2 \Rightarrow S_1)}{avgRadius}, \end{aligned}$$
(3)

where:

avgRadius - average distance of points from the object’s Centre of Gravity [3].

Hamming Distance – is a measure of divergence of two sets (the segmentation provided by the algorithm and the reference one). Therefore, the sum of the differences between two sets is obtained. Since the tested objects were of different size, the distance was normalised by dividing the result by the area of the object, as it was described in [3].

A directional form of a Hamming distance is given by:

$$\begin{aligned} D_H(S_1 \Rightarrow S_2) = \sum _i \Vert S_{2i} \backslash S_{1it}\Vert \end{aligned}$$
(4)

where:

\(\backslash \) - segments difference operator,

\(\Vert x\Vert \) - size of x (overall area of the surfaces constituting an object),

\(S_1\) - segmentation provided by the algorithms,

\(S_2\) - reference segmentation,

i - given segment,

\(it = max_k\Vert S_{2i} \bigcap S_{1k}\Vert \).

Hamming distance is a directional average measured in both directions, scaled by the object area:

$$\begin{aligned} HD(S_1,S_2) = \frac{D_H(S_1 \Rightarrow S_2) + D_H(S_2 \Rightarrow S_1)}{2\Vert S\Vert } \end{aligned}$$
(5)

where:

S - overall surface of the object.

In the tests provided in Sect. 4, the results for each direction are given:

$$\begin{aligned} R_m(S_1,S_2) = \frac{D_H(S_1 \Rightarrow S_2)}{\Vert S\Vert }, \end{aligned}$$
(6)
$$\begin{aligned} R_f(S_1,S_2) = \frac{D_H(S_2 \Rightarrow S_1)}{\Vert S\Vert }. \end{aligned}$$
(7)

Rand Index – it is a measure of a membership of the same object surfaces (computed and reference one) to the same segment. For \(S_1\), \(S_2\) representing computed and reference segmentations respectively and for \(s_i^1\), \(s_i^2\) representing segment number of i-th surface from \(S_1\) and \(S_2\) and assuming coefficients:

$$\begin{aligned} C_{ij} = 1 \Longleftrightarrow s_i^1 = s_j^1, \end{aligned}$$
(8)
$$\begin{aligned} P_{ij} = 1 \Longleftrightarrow s_i^2 = s_j^2, \end{aligned}$$
(9)

Rand Index could be computed as follows:

$$\begin{aligned} RI(S_1,S_2) = {2 \atopwithdelims ()N}^{-1} \sum _{i,j,i<j} [C_{ij}P_{ij} + (1 - C_{ij})(1 - P_{ij})], \end{aligned}$$
(10)

where:

N - is the overall number of surfaces within the object.

Because the coefficients \(C_{ij}\) and \(P_{ij}\) are equal to 1 when compared surfaces belong to the same segment, RI is greater for better segmentations, in contrary to previous measures. For this reason the results in Sect. 4 were provided as \(1-RI(S_1,S_2)\) [3].

Consistency Error – measures similarities and discrepancies between computed and reference segments. It consists of two results – global (GCE) and local (LCE), which are calculated in the following way: Assuming \(S_1\), \(S_2\) to be segmentations (computed and reference respectively), \(\backslash \) – difference operator, \(\Vert x\Vert \) – size of x and \(f_i\) - i-th object surface, local error for a single surface is given by:

$$\begin{aligned} E(S_1,S_2,f_i) = \frac{\Vert R(S_1,f_i) \backslash R(S_2,f_i)\Vert }{\Vert R(S_1,f_i)\Vert } \end{aligned}$$
(11)

where:

\(R(S,f_i)\) - segment (set of connected surfaces) inside segmentation S, which contains surface \(f_i\).

Finally:

$$\begin{aligned} GCE(S_1,S_2) = \frac{1}{N} min\{\sum _i E(S_1,S_2,f_i), \sum _i E(S_2,S_1,f_i)\}, \end{aligned}$$
(12)
$$\begin{aligned} LCE(S_1,S_2) = \frac{1}{N} \sum _i min\{E(S_1,S_2,f_i), E(S_2,S_1,f_i)\}, \end{aligned}$$
(13)

where:

N - is the overall number of surfaces within the object [3].

4 Results

Taking the measures described in Sect. 3 a comparison of 11 different approaches presented in Sect. 2.2 (plus human annotator) is given. In most cases, the performance of the algorithms was tested by using the code provided by the research groups who developed each method. Only for [7, 11, 12], the results were taken directly from the papers. First, we focused our attention on Cut Discrepancy measure. The results of the comparison are shown in Fig. 1. As it can be observed from this figure the best results are achieved by LMS method.

Fig. 1.
figure 1

Average Cut Discrepancy (CD) for each algorithm.

Second, we tested the algorithm against Hamming Distance measure. The results of the comparison are shown in Fig. 2. For each algorithm three measures are given. Namely, directional \(R_m\), \(R_f\) and an average. As in the previous case the best results are achieved by LMS method.

Fig. 2.
figure 2

Average Hamming Distance (HD) for each algorithm together with directional distances \(R_m\) and \(R_f\).

Third, Rand Index measure was applied. The results of the comparison are shown in Fig. 3. Once more the best results were achieved for LMS method.

Fig. 3.
figure 3

Average Rand Index (RI) for each algorithm.

Taking into account Consistency Error the best results were obtained for LMS algorithm. The results of the comparison are shown in Fig. 4, where one can see both global and local error.

Fig. 4.
figure 4

Average global (G-) and local (L-) Consistency Error (CE) for each algorithm.

4.1 Discussion of the Results

From the previous section, one can get the impression that the LMS algorithm is the best of all presented algorithms. However, it is not always the case. All the Figures presented so far shows the average error, but taking a closer look at class by class comparison given in Table 1 one can see that Deep Neural Networks performs better for selected classes. LMS and Deep Neural Networks are the best of all algorithms for seven object classes each.

Table 1. Rand Index (RI) for each class separately. The best result for each class indicated by bold typeface.

5 Conclusions

In this paper, we provided an extensive comparison of existing methods of 3D objects segmentation. The results clearly show that currently a method called LMS outperforms other approaches and for certain measures, it even surpass inexperienced human annotator. However, if we perform class by class comparison, comparable results are achieved by Deep Learning method. This review is a basis for indicating future ways of improving existing approaches and also serves as a guide for using these algorithms in applications such as service robotics, where the segmentation might help in providing semantic meaning to the parts of the object.