Keywords

64.1 Introduction

3D mesh segmentation benefits various algorithms in Digital Geometry Processing, such as modeling, simplification, texture synthesis, collision detection, skeleton extraction, compression, and 3D shape retrieval and the establishment of the corresponding relations in morphing and deformation [1].

In the last several years, a number of segmentation algorithms have been proposed to cut 3D mesh into simple sub-meshes, including a few components with higher level shape semantic, or lower level disk-like patches, and many of them aim at a meaningful segmentation, but little work on the qualitative evaluation of quality, even little on the quantitative evaluation. [24].

The main contributions of this work is that, we investigate a novel evaluation metrics for quantitative comparing mesh segmentations—UMA, and discuss its properties; To improve the accuracy of our three metrics, we establish a subset of 3D mesh segmentation Benchmark as our experimental dataset.

64.2 Related Work

Most work evaluated their segmentation only by the vision; little work has been done on the qualitative and quantitative quail evaluation of segmentation.

The qualitative criteria frequently used to evaluate the automatic segmentation algorithms in a qualitative way: exhibit the segmentation image of the same set of meshes side by side, and point out which results look good or not. And the criteria include: the robustness and complexity, the number of control parameters, the hierarchical and multi scale properties, whether to produce over-segmentation, the smoothness of the segmentation boundary, and whether the segmentation results have a clear visual geometry semantics (meaningful), whether the segments boundaries lie on concave seams and valleys, as predicted by the minima rule.

Quantitative evaluation of 3D segmentation first appeared in 2009. To address the key problem of quantitative evaluation, Benhabiles et al. [2] proposed a ground truth corpus: two objective dissimilarity measures, which provide a quantitative comparison between the segmentation algorithms. The limitations of this work are the number of model and category, and the meaningful of manual segmentations.

Comparing with the work of Benhabiles et al. [2], this data set of Chen et al. [3] has a greater size and a higher sampled distribution. To provide a basis for quantitative comparisons between the human-generated segmentations and automatic segmentation [510] for the same mesh, they propose four quantitative evaluation metrics. In 2010, Benhabiles et al. [4] present an extensive experimental comparison of existing similarity metrics addressing the quality assessment problem of mesh segmentation and introduce a new metric, and basing on their own corpus and Chen et al. corpus [3], they get a better performs, but their work has a drawback with the accuracy of probability of two vertices belonging to the same segment in the ground-truth set.

All of these works provide a benchmark basing on a ground-truth corpus of human segmented 3D meshes. Using a set of similarity metrics, they evaluate the segmentation algorithm by quantifying the consistency between the reference segmentations of the ground-truth corpus and those obtained by this algorithm on the same models.

Some problems with the benchmark as follow: First, the manual segmentation results produced by different volunteers show significant subjectivity and randomness. Second, although Consistency Error evaluation metric proposed independently by Chen et al. and Benhabiles et al. are similar, they produce a significant different evaluation metrics for the same segmentation algorithm. Third, the four kinds of evaluation criteria proposed by Chen et al. still produce significant different evaluation metrics for the same segmentation algorithm with others. Finally, all the six quantitative metrics produce very differences evaluation to which by human visual perception.

In this paper, we first analyze the serious inconsistencies, errors and other problems of the data set produced by manual interaction segmentation in [24], and build a subset of Chen’s dataset with higher consistency and accurate. Finally, we perform quantitative comparison of seven automatic mesh segmentation algorithms according to the new metric UMA, and our other two metrics, FME and AME [11].

64.3 Ultimate Measurement Accuracy

The ultimate goal of 3D mesh segmentation is to obtain the precise measurement of target feature, so we can evaluate the quality of segmentation by means of measuring and calculating the target feature. The common features frequently-used to measure the target feature precisely include target area, perimeter, circular, eccentricity, shape parameters, Ultimate Measurement Accuracy (UMA) et al.

64.3.1 Shape Feature Form Factor

First, we define the sub-grid shape feature Form Factor (FF) as follows: Let S be a 3D boundary mesh, and S1 the set of sub-grid generated by manual interactive segmentation, i.e., \( {\text{S1}} = \left\{ {S_{1}^{1} ,S_{1}^{2} , \ldots ,S_{1}^{m} } \right\} \), and S2 the automatic segmentation produce by the given algorithm, i.e., \( {\text{S2}} = \left\{ {S_{2}^{1} ,S_{2}^{2} , \ldots ,S_{2}^{n} } \right\} \), m and n are the number of segments. Here we require m equals to n. For each sub-grid \( S_{1}^{i} \) in S1, note its match position in S2 is sub-grid \( S_{2}^{j} \), i.e., there is a correspondence \( j = p\left( i \right) \). We define the Form Factor of arbitrary sub-grid \( S_{1}^{i} \) in S1 as follows:

$$ FF(S_{1}^{i} ) = {{\sum\limits_{a = 1}^{{N_{e} }} {L(S_{1}^{i} ,e_{a} )} } \mathord{\left/ {\vphantom {{\sum\limits_{a = 1}^{{N_{e} }} {L(S_{1}^{i} ,e_{a} )} } {\sum\limits_{b = 1}^{{N_{f} }} {A(S_{1}^{i} ,f_{b} )} }}} \right. \kern-0pt} {\sum\limits_{b = 1}^{{N_{f} }} {A(S_{1}^{i} ,f_{b} )} }} $$
(64.1)

where e a and f b are the arbitrary boundary edge and face in sub-grid \( S_{1}^{i} \) respectively, and \( L(S_{1}^{i} ,e_{a} ) \) is the length of edge e a , \( A(S_{1}^{i} ,f_{b} ) \) is the area of f b , N e and N f are the number of boundary edges and faces of \( S_{1}^{i} \) respectively. Similarly, the Form Factor of arbitrary sub-grid \( S_{2}^{j} \) in S 2 is defined as follows:

$$ FF(S_{2}^{j} ) = {{\sum\limits_{a = 1}^{{N_{e} }} {L(S_{2}^{j} ,e_{a} )} } \mathord{\left/ {\vphantom {{\sum\limits_{a = 1}^{{N_{e} }} {L(S_{2}^{j} ,e_{a} )} } {\sum\limits_{b = 1}^{{N_{f} }} {A(S_{2}^{j} ,f_{b} )} }}} \right. \kern-0pt} {\sum\limits_{b = 1}^{{N_{f} }} {A(S_{2}^{j} ,f_{b} )} }} $$
(64.2)

where the e a , f b , L, A, N e and N f are defined similarly with that as before. Then, we obtain the Form Factors of all the sub-grid produced by manual interactive segmentation and automatic algorithm segmentation.

64.3.2 Ultimate Measurement Accuracy

With the manual interactive segmentation S 1 as criterion, the UMA of the segmentation S 2 produce by the given algorithm is defined as follows:

$$ UMA(S_{1} ,S_{2} ) = \frac{1}{m}\sum\limits_{i = 1,j = p(i)}^{m} {\frac{{FF(S_{1}^{i} ) - FF(S_{2}^{j} )}}{{\left\| {S_{1}^{i} } \right\|}}} $$
(64.3)

where \( \left\| {S_{1}^{i} } \right\| \) is the area of \( S_{1}^{i} \). The smaller the value of UMA(S 1 , S 2 ), the closer the automatic algorithm segmentation results with manual interactive segmentation results, the better the performance of segmentation algorithm, or vice versa.

To a given automatic segmentation algorithm, for each 3D boundary-mesh S in the data set of benchmark, one of its manual interactive segmentation is \( S_{1}^{i} \) and its segmentation produced by the given automatic algorithm is \( S_{2} = \left\{ {S_{2}^{1} ,S_{2}^{2} , \ldots ,S_{2}^{n} } \right\} \), note the Ultimate Measurement Accuracy as UMA(\( S_{1}^{i} \), S 2 ), then for the set of manual interactive segmentation {S i }, the UMAs of this automatic segmentation algorithm is defined as \( \sum_{i} UMA_{S} \left( {S_{1}^{i} ,S_{2} }\right){/}n \), where n is the number of manual interactive segmentation models.

For the manual segmentation results of a data set with N different meshes, we define its quantitative evaluation metrics using Ultimate Measurement Accuracy as \( {\text{UMA}} = \sum_{\text{j}} {\text{UMAS}}/{\text{N}} \). The smaller the UMA is, the closer the automatic algorithm segmentation results with manual interactive segmentation results, the better the performance of segmentation algorithm.

64.4 Results and Analysis

64.4.1 Experimental Environment

Our experiments are performed on a subset of the benchmark data set proposed by Chen et al. [3], but reject 3 kinds degenerative segmentation results as following: First, the segmentation with the sub-grid number less than or equal to 3; Second, and the ratio of that the face number in sub-grid to the total face number of the original mesh less than or equal to 0.01; Third, the segmentation results without a clear meaningful shape. The subset contains 3,697 manually generated segmentations for 380 meshes models spread evenly amongst 19 different object categories, and has stronger consistency and higher quality.

Our quantitative evaluations are performed on 7 automatic segmentation algorithms frequently-used recently [510]: Randomized Cuts, Shape Diameter Function, Normalized Cuts, Core Extraction, Random Walks, Fitting Primitives and K-Means.

64.4.2 Properties Comparison of UMA, FME and AME

In this section, we perform three experiments to illustrate some properties of UMA metrics, comparing with FME and AME, and the results are shown in Figs. 64.1, 64.2 and 64.3.

Fig. 64.1
figure 1

Robust to random and degenerative cases

Fig. 64.2
figure 2

Sensitivity to hierarchical refinement

Fig. 64.3
figure 3

Tolerance to the imprecision of cut boundaries

The first property is the robust when deal with some segmentation in degenerative case. We provide a k-mean segmentation with random cluster center, which includes some degenerative segmentation with only 2 or 3 segments or more than 50 segments, and some random segmentation with a number similar to that of ground-truths. The score of three new metric are computed for this segmentation and averaged over the entire data set, Fig. 64.1 presents the experiment results. Since this segmentations is random or degenerative, so the data set is totally different from the manually ground-truth, the scores of three metrics are low, but stable for the segmentation.

The second property is the sensitivity to refinement. We compared 12 versions to the ground-truths, Fig. 64.2 illustrates that all of the three novel metrics present a slight variation (not full invariant to refinement), but still be stable.

We proposed five versions of segmentation by manually segmented the model bi-torus into two segments, and each has a slight difference to others with the boundary position. And the results in Fig. 64.3 shows that, segmentation (a) and (b) are more imprecision than the segmentation (c–e), the scores illustrate this difference indeed, and all the three novel metrics present slight variation and a good tolerance to the imprecision of cut boundaries.

64.4.3 Comparisons of Our Metric and Rand Index

Basing on the subset with 3,697 manually generated segmentations and according to the metrics of UMA, FME, AME and Rand Index [3], we perform other experiments also to compare the rank of the seven algorithms for each object category, the results in Table 64.1 shows that, according to the rank of UMA, the algorithm Randomized Cuts, Shape Diameter function, Normalized Cuts are on the top three, basically identical with that of Rand Index metrics, the rank of other four algorithms are similar too, this indicates that the performance of UMA metric and Rand Index metric are almost same, but FME and AME show a slight difference to them. And there is little difference between the ranks according to the metric of UMA and Rand Index too.

Table 64.1 Comparison of UMA, FME, AME and Rand Index

From the chart of UMA, FME, AME and Rand Index in Fig. 64.4, we see that each of the four bars of all four evaluation metrics is remarkably consistent with others. Although the UMA metric is boundary-based, and other three metrics focus on region dissimilarities (FME, AME and Rand Index), all variants of all four metrics suggest almost the same relative performance of the seven algorithms. The ranks of the four kinds of metrics are basically same. Because our experiments are basing a selection of the Chen’s manual interactive segmentation data set, it has higher consistency, and leads to the rank of algorithms are significantly different with that in Chen et al. [3].

Fig. 64.4
figure 4

Comparison of FME, AME, UMA and Random Index metrics

64.5 Conclusion

This paper presents a novel metric UMA to evaluate the algorithm of 3D mesh segmentation quantitatively, and experiments show that UMA is robust to degenerative segmentation and hierarchical refinement, stable to the imprecision of cut boundaries. Because we have excluded some manual segmentation with obvious error and serious inconsistencies, and use a data set with larger size, the UMA metric gives a more precise quantitative evaluation than others.