Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 Introduction

Medial representations, introduced by Blum [2], simultaneously capture properties of an object’s outline and its interior. Abstractions of medial representations into graphs have become popular in the computer vision literature and have successfully been applied to view-based object recognition [12, 14]. Recent extensions and applications include alterations of medial graphs to capture salient object parts [8] and the use of medial fragments for perceptual grouping to form object part hypotheses directly from images [11].

Motivated by the success of medial representations, this chapter revisits a quantity related to medial axis computations—the limiting behavior of the average outward flux (AOF) of the gradient of the Euclidean distance function to the object’s boundary as the region through which it is computed is shrunk [4]. We exploit the property that at skeletal points the AOF reveals the object angle and thus can be viewed as a scalar descriptor from which the complete boundary can be reconstructed. We then introduce a novel measure of salience for a skeletal point by combining the AOF with a check on uniqueness of the inscribed medial disk to the host skeletal branch. The simplified skeletons are used to derive a directed graph-based representation of the object which we term the flux graph. Our experiments show that flux graphs are a good deal simpler than competing skeletal graphs such as shock graphs, by a number of standard complexity measures, with little loss in representational power. Furthermore, they yield competitive performance in object recognition experiments.

We begin by discussing mathematical properties of the geometry of the medial axis of an object and by introducing the appropriate notation.

Definition 3.1

Assume an n-dimensional object denoted by Ω with its boundary given by \(\partial \varOmega \in\mathbb{R}^{n}\). A closed disk \(D \in \mathbb{R}^{n} \) is a maximal inscribed disk in Ω if DΩ but for any disk D′ such that DD′, the relationship D′⊆Ω does not hold.

Definition 3.2

The Blum medial locus or skeleton, denoted by Sk(Ω), is the locus of centers of all maximal inscribed disks in ∂Ω.

As illustrated in Fig. 3.1, a skeletal point is characterized by its location p, the maximal inscribed disk radius r, the object angle θ, the direction of the unit tangent vector T, and the object angle θ given by \(\arccos (-\frac{dr}{ds} )\), where s is the arc length along a branch of the medial axis. The projection Π(p) is the set of closest points on the boundary ∂Ω to p, i.e., \(\varPi(\mathbf{p}) \buildrel\triangle\over= \{ \mathbf{q} \in\partial \varOmega \mbox{ }:\mbox{ }\|\mathbf{p} - \mathbf {q}\| = \min\{\|\mathbf{p}-\mathbf{q}\|\forall\mathbf{q} \in \partial \varOmega \} \} \). For a skeletal point p the projection set Π(p) is the set of points on the boundary touched by the maximal inscribed disk centered at p (the points \(\mathbf{b^{\pm1}}\) in Fig. 3.1). According to the “Maxwell set” definition of the medial locus [10], each skeletal point pSk(Ω) must have at least two closest boundary points (|Π Ω (p)|≥2).

Fig. 3.1
figure 1

Local geometry of a maximal inscribed disk centered at the skeletal point p with radius r and with object angle θ. The maximal inscribed disk touches the boundary at two points \(\mathbf{b^{\pm1}}\) (\(\varPi(\mathbf{p}) = \{\mathbf{b^{+1}},\mathbf {b^{-1}}\}\)) (adapted from [13])

Topologically Sk(Ω) consists a set of branches that join to each other at branch points to form the complete skeleton. A skeletal branch denoted by χ is a set of contiguous regular points from the skeleton that lie between a pair of junction points, a pair of end points or an end point and a junction point. As shown by Dimitrov et al. in [4] these three classes of points can be analyzed by considering the behavior of the average outward flux of the gradient of the Euclidean distance function to the boundary of a 2D object, given by \(\frac{\int_{\partial R} \langle\dot {\mathbf{q}},\mathbf{N}\rangle ds}{\int_{\partial R} ds}\), when shrunk to a circular neighborhood, where \(\dot{\mathbf{q}} = \nabla\mathbf {D}\) [4], with D the Euclidean distance function to the object’s boundary. In particular:

  1. 1.

    p is a regular point if the maximal inscribed disk at p touches the boundary at two corresponding boundary points such that |Π Ω (p)|=2. The computed AOF at a regular point p is given by \(\lim_{\varepsilon\rightarrow 0}\frac{\mathbf{F}_{\varepsilon}(\mathbf{p})}{2\pi\varepsilon} = - \frac{2}{\pi }\sin\theta\).

  2. 2.

    p is an end point if there exists δ (0<δ<r) such that for any ε (0<ε<δ) the circle centered at p with radius ε intersects Sk(Ω) just at a single point (r is the radius of the maximal inscribed disk at p). The computed AOF at an end point p is given by \(\lim_{\varepsilon\rightarrow0}\frac {\mathbf{F}_{\varepsilon}(P)}{2\pi\varepsilon} = - \frac{1}{\pi}(\sin\theta_{P} - \theta_{P})\).

  3. 3.

    p is a junction point if Π Ω (p) has three or more corresponding closest boundary points. Generically a junction point has degree 3. All other branch points are unstable. The computed AOF at a junction point p is given by \(\lim_{\varepsilon\rightarrow0}\frac{\mathbf{F}_{\varepsilon}(P)}{2\pi \varepsilon} = - \frac{1}{\pi}\sum_{i=1}^{n}\sin\theta_{i}\).

These different classes of skeletal points are shown in Fig. 3.2.

Fig. 3.2
figure 2

Different types of skeletal points are illustrated using segments of the skeleton Sk(Ω) of a given shape Ω. Left: A regular skeletal point. Middle: An end point. Right: A junction point. (Adapted from [4])

We now enumerate the main contributions of this chapter. First, previous approaches to compute flux-based skeletons and use them for boundary representation are not entirely complete. Section 3.2 addresses these limitations and presents a method that gives more complete boundary reconstruction results. Second, a new method for skeletal simplification which in turn leads to a simplified graph representation is presented in Sect. 3.3. Underlying this simplification is a measure of saliency that combines a notion of uniqueness of the inscribed medial disk to the host branch with the limiting AOF value.

3.2 Full Boundary Reconstruction

According to the Maxwell set definition of the medial axis, each point on the skeleton has two or more corresponding boundary points. Therefore, given a mapping between boundary points to skeletal points, it is possible to invert that mapping to reconstruct the boundary purely from skeletal points and their properties. Dimitrov et al. [4] attempted to do this by exploiting the relationship between regular points of the medial axis and the object angle. In this section, we will review the basic algorithm for doing this and then extend it to obtain a more complete boundary reconstruction by adding the cases of end points and junction points.

3.2.1 Boundary Representation Through Regular Points with First-Order Approximation of the Tangent Vector

Taking a regular point p on the skeleton, Dimitrov et al. outlined the reverse transform to obtain corresponding boundary points by \(\mathbf{b^{\pm1}} = \mathbf{p} + r \mbox{Rot}(\pm \theta)\mathbf{T}_{\mathbf{p}}\). To reconstruct \(\mathbf{b^{\pm1}}\) from a regular point on a parametrized skeleton, the following parameters of a skeletal point ought to be numerically computed: the coordinates of the point p, the radius value r, the object angle θ, and the unit tangent vector T p . During the skeletonization process, a parametrized discrete skeleton is computed where each skeletal point includes its position p, the radius at that point r, and the limiting AOF value. For the object angle θ, a numerical estimate is obtained based on the relationship for regular skeletal points: \(\theta= \arcsin \bigl(-\frac{\mathbf{F}_{\varepsilon}(P)}{4\varepsilon} \bigr)\). Finally, the tangent vector is estimated as the slope of the line that connects the prior (discrete) skeletal point p −1 to the subsequent (discrete) skeletal point p +1, i.e., \(\mathbf {T}_{\mathbf{p}} = \frac{\mathbf{p}_{+1}-\mathbf{p}_{-1}}{\|\mathbf {p}_{+1}-\mathbf{p}_{-1}\|}\). Figure 3.3 shows results from these skeletonization and boundary reconstruction algorithms, using the original implementations.

Fig. 3.3
figure 3

Top row: Outlines of binary images of a dog, a profile and a hand object, along with their derived skeletons using flux based skeletonization. Bottom row: Reconstructed boundary points (filled black disks) overlayed on the original outlines, using the method of Dimitrov et al. [4]

3.2.2 Full Boundary Reconstruction

As is evident from the results in Fig. 3.3, the reconstruction of regular points, though promising, does not provide a complete representation of the boundary. In this subsection, we extend this approach by considering all types of skeletal points and providing a better numerical approximation of the parameters required for reconstruction. To achieve this aim, three limitations of the boundary reconstruction method are considered and addressed:

  1. 1.

    Sensitivity of first-order approximation of tangent estimation: The two point stencil computation of the tangent vector is very sensitive to discretization effects along the skeleton, and can often fail at regular points. To mitigate these numerical errors, we deploy higher order methods for approximating the unit tangent. For those medial loci for which the two point method fails, we use a four point (discrete) stencil approximation [1] given by \(\mathbf{T}_{\mathbf{p}} = \frac{2}{3} \bigl(\frac{\mathbf{p}_{+1}-\mathbf {p}_{-1}}{\|\mathbf{p}_{+1}-\mathbf{p}_{-1}\|} \bigr) + \frac{1}{3} \bigl(\frac{P_{+2}-P_{-2}}{\|P_{+2}-P_{-2}\|} \bigr) \) where p +2 and p −2 represent the subsequent and the previous skeletal points to p +1 and p −1, respectively. Using the second-order of approximation of tangent estimation results in a number of newly reconstructed boundary points (see Fig. 3.4, top row).

    Fig. 3.4
    figure 4

    Top row: Along with the reconstructed points in Fig. 3.3 shown with black disks, newly reconstructed points resulting from the improved tangent estimation are shown with blue disks. Second row: Along with the reconstructed points in Fig. 3.3 shown with black disks, newly reconstructed boundary circular segments corresponding to end points are shown with green disks. Third row: Along with the reconstructed points in Fig. 3.3 shown with black disks, newly reconstructed boundary points corresponding to junction points are shown with violet disks. Bottom row: Along with reconstructed points in Fig. 3.3 shown with black disks, all the additional reconstructed boundary points are shown in orange

  2. 2.

    Boundary points that map to an end point: The boundary reconstruction method by Dimitrov et al. [3] does not explicitly consider the other two types of skeletal points (end points and junction points). This decision results in a number of circular segments missing from the boundary, which map to the end points. We present a numerical approach to recover such missing boundary points. Assume p is an end point such as the one shown in Fig. 3.2. Then, there would be a circular arc segment from the boundary corresponding to this skeletal point. The osculating disk at p touches the boundary along that circular segment, and the limiting tangent vector to the skeleton at that point bisects the angle that subtends the circular arc. Let γ represent the curve of that circular arc segment, then

    $$\begin{aligned} \gamma:I \rightarrow&\varOmega \end{aligned}$$
    (3.1)
    $$\begin{aligned} \gamma ( \theta ) =&\mathbf{p}+r \operatorname {Rot}( \theta ) \mathbf{T}_{\mathbf{p}} \end{aligned}$$
    (3.2)

    where I is an interval I=[−θ p ,θ p ]. The coordinates of the point p, and the radius value r are parameters that are computed during the skeletonization process. To compute γ, the following parameters need to be computed numerically other than p, and r: the object angle θ p , and the unit tangent vector T p . To compute the object angle, we use the end point equation \(\frac {F_{\varepsilon} ( P ) }{2\pi\varepsilon}=-\frac{1}{\pi} ( \sin\theta_{p}-\theta_{p} ) \). For the tangent vector T P , we simply use the tangent estimation of the (discrete) skeletal point prior to the end point, i.e., \(\mathbf{T}_{\mathbf{p}} = \mathbf{T}_{\mathbf{p}_{-1}}\). Figure 3.4 (second row) shows boundary reconstruction results with the newly found circular boundary segments corresponding to end points shown in green.

  3. 3.

    Boundary points that map to a junction point: Junction points are also not included in the initial boundary reconstruction method by Dimitrov et al. [3]. We compute the corresponding boundary points of a junction point the same way that we compute the corresponding boundary points of a regular point, with the difference that the tangent vectors near junction points are approximated by those at the prior points on the skeleton. The rest of the procedure is the same as that for computing boundary points for a regular point. Figure 3.4 (third row) shows the improvement with the newly found boundary points corresponding to junction points shown in violet.

The contribution of this approach to reconstructing boundary points is threefold: improved approximation of tangents for many regular points of the skeleton, the computing of circular segments that correspond to end points of the skeleton, and the computing of extra boundary points from junction points. In Fig. 3.4 (bottom row), the additional skeletal points added by these steps are shown in orange, which together with the original reconstructed points demonstrate a far more complete representation of the boundary (compare with Fig. 3.3). The remaining gaps between the reconstructed boundary points can be attributed to the fact that they are mappings of discretely sampled skeletal points.

3.3 Salient Parts of the Medial Axis

We now build on the previous results to obtain a novel measure of saliency for medial axis points that combines two criteria: (1) The object angle, which by the characterization of [4] is obtained directly from the computation of the AOF and (2) A notion of uniqueness of the maximal inscribed disk at a skeletal point to the host branch.

Definition 3.3

A unique skeletal point has the property that the maximal inscribed disk centered at it does not intersect the maximal inscribed disk associated with any skeletal point on any other branch.

Whereas the object angle has often been used as a criterion for saliency [13], the second notion is novel. The intuition here is that unique skeletal points are salient because without them a significant portion of the object’s area would not be represented. Examples of unique and non-unique skeletal points are shown in Fig. 3.5.

Fig. 3.5
figure 5

A part of the dog shape is shown with maximal inscribed disks corresponding to unique and non-unique skeletal points. The maximal inscribed disk centered at p 1(∈χ 1) does not intersect with any maximal inscribed disk from branches other than χ 1 so p 1 is a unique skeletal point. In contrast, p 2(∈χ 1) is not a unique skeletal point because the maximal inscribed disk centered at p 2(∈χ 1) intersects with the maximal inscribed disk centered at p 3(∈χ 2)

As explained in Sect. 3.1, the limiting average outward flux at a regular skeletal point p is computed by: \(\lim_{\varepsilon\rightarrow0}\frac{\mathbf{F}_{\varepsilon}(\mathbf{p})}{2\pi \varepsilon} = - \frac{2}{\pi}\sin\alpha\). This equation determines a relationship between the AOF and the object angle. The bigger the AOF, the higher the object angle and the more likely the shape silhouette is to be elongated locally. Since elongated parts admit a simple and stable medial axis structure, skeletal points with high AOF are salient.

3.3.1 Simplifying the Skeleton

We combine these two measures of saliency to simplify flux based skeletons using the following procedure: when the considered skeletal point is unique or its normalized AOF is greater than a certain threshold, the skeletal point is retained. In our experiments, we use the threshold τ=0.9045 for the AOF, which means that all non-unique skeletal points with object angle α greater than about 60° will be retained in the simplified skeleton. Figure 3.6 illustrates the result of applying this simplification procedure on the dog shape.

Fig. 3.6
figure 6

Left: The skeletal points found to be unique are shown in black on the medial axis of a dog example. Middle: Normalized flux values of a skeleton are shown in a range starting from white (minimum AOF) and ending in black (maximum AOF). Right: Several salient segments labeled as t i are shown as the result of simplifying the medial axis by retaining only those skeletal points that are unique or have AOF above a threshold

3.4 Flux Graphs

Our main motivation for simplifying the flux-based skeleton is to extract a graph representation which is simpler than but otherwise as complete and effective as popular existing approaches such as the shock graph [14] and the bone graph [8]. We propose a “Flux Graph” that uses the simplification process to describe a shape as a set of connected parts while preserving the topology of the original skeleton.

3.4.1 Nodes and Edges

The simplification process can result in a number of skeletal fragments, as illustrated by the example in Fig. 3.6. Not all these fragments described distinct parts, rather, those that share a significant portion of their volumes (obtained as the union of the associated medial disks) and are in close proximity of one another can be combined via a merging process. The segments which remain at the end of the merging process are treated as the nodes of a flux graph. The results of merging fragmented parts associated with the simplified skeleton of the dog shape are shown in Fig. 3.7 (left). The set of edges between nodes are then determined based on their connectivities on the original medial axis. To direct edges, we consider the average radii of inscribed disks associated with two adjacent nodes and compare them. The one with larger magnitude is chosen as the parent and the other as the child. The resulting directed flux graph for the dog shape is shown in Fig. 3.7 (right).

Fig. 3.7
figure 7

The flux graph of the dog shape. Left: The set of nodes is shown with the distinct parts depicted in different colors, each representing a union of medial disks. Right: The directed flux graph. The dummy node ♯ carries no geometrical information but serves as a parent to all the top level nodes

3.4.2 Qualitative Stability with Viewpoint Changes

We provide a qualitative demonstration that flux graphs remain stable under small changes in viewpoint, while providing an intuitive part structure. We consider a view of the dog (Fig. 3.8 (top row, middle)) and adjacent views obtained by rotating around it in clockwise and anti-clockwise directions. For each view the top row depicts the parts represented by each node of the flux graph, the second row the flux graph and the bottom row the shock graph. Changes to the flux graph typically occur when new parts, such as the tail, come into view (or disappear) but the overall graph structure is much simpler than that of the shock graph. This is essentially because the shock graph utilizes and hence represents the entire skeleton, without any simplification. The experimental results in Table 3.1 which shows averages over 1664 view-based silhouettes of objects used in [8] demonstrate that the flux graph representation is essentially complete, reconstructing 99 % of an object’s area. This will be discussed in further detail in Sect. 3.5.3.

Fig. 3.8
figure 8

Top row: A view of a dog (middle) with adjacent views obtained by rotating around it in the clockwise and anti-clockwise directions. For each view, the parts reconstructed by each node of the flux graph are shown as a colored union of disks. Middle row: The flux graph corresponding to the view in the top row. Bottom row: The shock graph corresponding to the view in the top row

Table 3.1 Efficiency of flux graphs over shock graphs. The measures in the first six columns are obtained by taking the ratios of the average values of these complexity measures for flux graphs and shock graphs, subtracting these ratios from 1, and then averaging over all the 1664 silhouettes in the database. The last column indicates the percentage of area of the original object reconstructed by flux graphs

3.5 Flux Graphs for Matching

A skeletal graph abstraction can be used as a tool in many visual shape problems including view-based object recognition. We now examine the potential of using flux graphs for matching, in comparison against the well established shock graphs. To carry out a comparative experiment against shock graphs, we used the same graph matching setup and database used for shock graphs in [5, 14].

3.5.1 Topological and Geometrical Similarity

Given two flux graphs, which are directed acyclic graphs (DAGs) a bipartite graph is constructed between their nodes in a hierarchical manner. Each edge is weighted based on the structural similarity between nodes; the weight is the normalized length of difference of the topological signature vectors (TSVs) introduced in [14]. The best matching of a maximum weighted bipartite matching is when the sum of the values of the edges is maximized. In a DAG representation, the TSV is defined as the vector of eigenvalue-sums derived from the corresponding adjacency matrix for the sub-DAG of the considered node. The matching algorithm used is a greedy algorithm [5] which has the benefit of finding a largest maximal matching in polynomial time. The similarity is computed by matching a query with a model node and then normalizing by the number of matched nodes according to the order of the model graph.

3.5.2 The DAG Matcher

To match a query shape with other shapes, we must develop a DAG matcher. The DAG matcher receives two DAGs as input and computes a value representing their similarity, as well as a list of corresponding nodes in the two DAGs. This analysis considers both topological structure (Γ) and geometric information (Δ) associated with a flux graph’s vertices. Each of these two measures returns a value normalized in the interval [0 1]. The final similarity score is a weighted combination of these two S(G 1,G2)=ωΓ(G 1,G 2)+(1−ω)Δ(G 1,G 2), where S(G 1,G2) represents the similarity between DAGs derived from two given shapes, and ω is a tuning weight in the interval [0 1]. At the end of the process, a list of corresponding nodes and a similarity measure are obtained.

3.5.3 The Dataset and Experimental Results

The matching problem we consider is to recognize unseen 2-D query views of 3-D objects by matching a query view against all the available silhouettes (reviewed in Sect. 3.5). We compare results of these experiments with those obtained using shock graphs in [6, 8].

The dataset used for our experiments is the same dataset used for experiments carried out for Bone Graphs in [7, 9] and Shock Graphs [5] and has 13 3-D models. Perspective projection of each 3-D object is computed onto the image plane where each model is centered in a uniformly tessellated view sphere. With 128 uniformly sampled views per object, the data set contains a total of 1664 2-D projected views.

3.5.4 Flux Graphs versus Shock Graphs

We begin by demonstrating that by a number of complexity measures the flux graph is simpler and hence more efficient than the shock graph, while essentially providing a complete reconstruction of the original object. To do this, in Table 3.1, for each of the 1664 views we compare: the count of graph vertices, the count of graph edges, the cumulative sum of number of nodes at each depth multiplied by the depth, the depth, total number of skeletal points on the graph, and the average of the TSV (topological signature vectors) values. The numbers reported in the table reflect the efficiency gained by using flux graphs over shock graphs, e.g., flux graphs have 50 % fewer nodes, 56 % fewer edges and 24 % fewer skeletal points. The last column shows the fraction of the area of the original object reconstructed by flux graphs (99 %), indicating that there is essentially no less in representational power.

3.5.5 Matching 2-D Views of 3-D Models

We now evaluate the flux graph against the shock graph in a set of view-based object recognition experiments. This comparison follow the matching framework of [7]. The recognition task is performed by: (a) Each view removed sequentially from the database (1664 2-D view-based shapes), and compared to all other remaining views (b) if the class of the closest matching view is the same as that of the query, then the recognition is interpreted as being correct. In the next set of trials, in each step 25 % of the total views are removed randomly from the database. The same experiment is then carried out with further subsampled databases. Figure 3.9 plots the recognition estimation success rates for both shock graphs and flux graphs, averaged over all views of all objects in the database. See [6] for a more detailed explanation of the experimental set up. We also note that the results reported in [6, 8] show that the use of bone graphs, which require a more elaborate construction process, outperforms shock graphs in this experiment.

Fig. 3.9
figure 9

Using the experimental set up of [6, 8], we compare the use of flux graphs versus shock graphs in a view-based object recognition experiment involving a total of 128 views of each of 13 3-D graphical objects (1664 silhouettes in total). The flux graphs, which are considerably simpler, provide recognition results that are a few percentage points below those of shock graphs

Flux graphs offer the advantage of efficiency in terms of fewer nodes, edges, depth levels and skeletal points than shock graphs, while still allowing for intuitive hierarchical part-to-part correspondences. However, in terms of the quantitative results, shock graphs outperform flux graphs slightly in this experiment. This could be in part because the matcher used has been tuned to shock graphs and their detailed features and has not been changed in any way to exploit the simplicity of flux graphs. A particular issue is that the geometric node similarity measure used in the matcher [6] implicitly assumes that a node contains a continuous locus of skeletal points. This assumption fails for flux graph nodes that arise from the simplification process we have outlined because the underlying skeletal segments maybe fragmented. The greedy matching approach may also suffer from some limitations and alternate hierarchical matching algorithms could be explored.

3.6 Conclusion

We have presented a novel skeletal shape representation that can be used to faithfully reconstruct the original object’s boundary from medial entities. The comprehensive recovery of the object’s boundary supports the integrity of using the average outward flux at skeletal points for shape analysis. In addition, a complete representation suggests a way of directly relating medial quantities to boundary features, because the medial features are easier to handle, to store and to compare with other represented objects than the shape boundaries directly.

We have suggested the use of the uniqueness of an inscribed disk to the host skeletal branch as a novel measure of saliency. Combining this measure with the limiting AOF leads to simplified skeletons which can be abstracted as graphs that are simpler than popular skeletal graphs in the literature such as shock graphs. In contrast with methods that carry out ligature analysis for simplification based on the limited number of configurations of the placement of ligature and non-ligature parts, such as the bone graph in [8], our investigation has the advantage that the notion of saliency is defined for each skeletal point separately. The flux graph representation has been evaluated using a matching framework designed for shock graphs ([8, 9]) to recognize 2D views of 3D objects and the results show that flux graphs are almost as good as shock graphs for matching. However, more work could be done to improve the robustness of the merging process of fragments left by our simplification method, which is presently based on a heuristic.

To advance the use of flux graphs for matching, a number of directions could be explored including the use of appropriate node similarity measures, the incorporation of a notion of types for nodes (those resulting from simplification, and those not) and the use of alternate hierarchical matching algorithms. The qualitative simplicity and stability of flux graphs with changes in viewpoint suggests their potential for view-based partitioning of the view sphere and view abstraction.