Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

13.1 Shape Inference

Changes in the properties of matter with position in space, on a scale small enough relative to our own that they can be treated as discontinuities, are ubiquitous in our world. The discontinuities define surfaces, which have geometric properties, which we call ‘shape’. The omnipresence of such surfaces, and the distinctive physical properties of the matter that they surround, means that shape is frequently informative about matters of importance to us and to other biological systems, and so inferences involving shape become useful. In particular, because the reflective properties of matter often change along with the properties that define shape surfaces, measurements of light intensity, whether by retina or CCD, can be used to make inferences about shape. Image formation can be approximated in geometric terms, meaning that inferences about two-dimensional shape become relevant too.

What is required to make such inferences? To solve an inference problem, we should construct a probability distribution describing our knowledge of the unknown quantity of interest given the known information. For inferences involving shape, this will involve probability distributions P(R|K), where is, in general, an element of a suitable set of subsets of a space possessing sufficient geometric structure to render the idea of shape meaningful. In any specific case, R will parameterize a set of propositions whose probabilities we wish to calculate, for example, “region R in the image domain contains entity X” (where X is ‘human being’, ‘Ian’, ‘road network’, ‘car’, etc.). The quantity K denotes all the knowledge we have of the situation (or anyway all the information that we choose, or are able, to express). In particular, this will include all the information we have about the shape of R, perhaps arising from knowledge of X.

In order to make inferences involving shape, then, we need to understand how to construct distributions P(R|K) for given knowledge K: that is, how to encode information about shape into the probability distribution. We will look at the properties probability distributions must have to be regarded as shape models, and then at how these properties have typically been implemented in machine vision. After looking at the drawbacks of this ‘classical’ approach, we then discuss an alternative, inspired by classes of shapes arising in certain image processing problems. In the resulting framework, shape becomes an emergent property of interactions in a network of simple nodes. It may therefore be of some biological relevance. We conclude with a discussion of this approach, and of what remains to be done to turn it into a complete shape modelling framework.

13.2 Modelling Shape

The key task, then, is to construct probability distributions P(R|K) on a suitable space of shapes .Footnote 1 We will focus attention on a space , where is a relevant domain (often the support of an image), although much of the discussion applies to other spaces and other dimensions. ‘Regions’ are open sets, but we do not specify them further; we assume they possess whatever properties are needed to render the models well defined. Note that the space is complicated: regions can have arbitrarily many connected components; connected components can contain holes; and these in their turn can contain connected components; and so on. Figure 13.1 shows an element of this space.

Fig. 13.1
figure 1

An example of an element of the space of ‘regions’, , showing the complicated nature of such elements

Now we need a category of mathematical objects to represent the elements of , a ‘representation space’ . Many such spaces have been used in the literature. Some are isomorphic to (indicator functions, distance functions [16]). This seems good, but leaves the complexity of intact: still contains infinitely many connected components, for example. For others (‘many-to-one’), there is a non-injective map from to (landmark points [6, 12], various Fourier descriptors [20], medial axis [2, 8]). Such representations are often low-dimensional, and can be intrinsically invariant to transformations. However, a region cannot be reconstructed from its representation, and so there is no probability distribution on shapes; we do not consider them further. Finally, there are representations (‘one-to-many’) for which there is a non-injective map (parameterized closed curves, phase field). The advantage is that may have a much simpler structure than ; however, we have to think about the distribution induced on by a distribution on :

(13.1)

where . Often one uses a saddle point approximation: P(R)∝P(S R ), where , with ρ(⋅) a suitable density for P(S).

Next, we have to construct probability distributions on expressing shape information. Do all distributions on count? The only precise answer is yes, but this is not very useful. The standard Ising model on \({\mathbb {Z}}^{2}\) can be viewed as a probability distribution on regions in \({\mathbb {R}}^{2}\) by associating the indicator function of a square with each vertex, but while this distribution undoubtedly contains information about region geometry, because regions with greater boundary length have lower probability, it cannot really be called a ‘shape model’. This can be seen by looking at distant parts of the sample in Fig. 13.2: high probability regions do not have any properties in common that we would normally call ‘shape’.

Fig. 13.2
figure 2

Predicting one part of a boundary from another. Left: a sample from the Ising model, where accurate prediction is not possible. Right: a case where accurate prediction is possible

Why do we say that the Ising model does not contain shape information? The reason is that the set of high probability regions is too large, that is, the entropy is too high. In order to reduce the entropy, we have to create more dependence in the distribution. Indeed, it is clear that what we normally refer to as ‘shape’, involves the ability to make quite precise inferences about the overall region give only partial information about its boundary. For example, most people could make a good estimate of the part of the object boundary that is concealed in the image on the right-hand side of Fig. 13.2, given the part that is revealed: the conditional entropy is small. The same is not true of the left-hand side. In other words, the probability distribution induced by our knowledge of the object’s identity and behavior contains strong, long-range dependencies between parts of the region boundary. Such dependencies, then, are key to the construction of non-trivial shape models. We now turn to how to build probability distributions that incorporate such dependencies.

13.3 The Classical Approach

We will first look at what we will call the ‘classical approach’ to shape modelling. The focus here is on defining a ‘dissimilarity measure’ between points in . Usually, although not always, this measure is metric [3, 4, 10, 14, 15, 18, 19]. Once defined, the metric can be used to define a probability distributions on as a function of distance to a ‘template’ shape S 0. The most common form of distribution is ‘pseudo-Gaussian’, taking the form:

$$ P(S | S_{0}) \propto d\! S\,e^{-\frac{1}{2} d^{2}(S, S_{0})} , $$
(13.2)

where S 0 is the ‘template’, and dS is an underlying measure on . (Mixtures of such distributions over a set of templates have also been used [5].)

The distribution (13.2) encourages S to be close to the region S 0, but this is rarely what is required. Typically, there will be uncertainty about the position, orientation, and perhaps scale of the shape. To incorporate such uncertainty in the classical approach, one must create mixture models over these transformations. Let G be the transformation group acting on , with the action denoted gS. Then the distribution one is really interested in is

$$ P(S | S_{0}) = \int_{G} P(S | g, S_{0})P(g | S_{0}) \propto d\! S\int _{G} d\! g\,\rho(g)e^{-\frac{1}{2} d^{2}(S, gS_{0})} , $$
(13.3)

with P(g|S 0)=dgρ(g), where dg is an invariant measure on G. Often complete invariance to G is needed. This requires dS to be G-invariant, ρ≡1, and G to act by isometries on : d 2(gS,S 0)=d 2(S,g −1 S 0). In practice, the integral in Eq. (13.3) is rarely evaluated. Rather a saddle point approximation is made in which g =argmin gG d 2(S,gS 0) is substituted, giving \(P({S | S_{0}}) \propto d\! S\,e^{-\frac{1}{2} d^{2}(S, g^{\ast}S_{0})}\), that is, pose is estimated. Although easier than performing the integral, this still requires significant computational effort.

How do the long-range dependencies necessary for nontrivial shape modelling arise in the classical approach based on templates? The answer is that the template itself, or rather its parameters, such as the group elements just discussed, act as hidden variables. Once they are integrated out, they introduce long-range dependencies between boundary points. A trivial example, in one dimension, is the following. In 1-d, regions are unions of intervals; we consider only connected regions. Let the template region be an interval of length 1, with center at c. Let the probability of a region [x,y] be \(P({x, y | c}) = \delta (x - (c - \tfrac{1}{2})) \delta (y - (c + \tfrac{1}{2}))\). Thus, given c, x and y are independent. If we now add a uniform prior on c (suppose c,x,yS 1 so this is normalized), we can integrate out c to obtain P(x,y)=δ(yx−1). Thus not knowing c, x and y are dependent: in this singular case, x determines y completely. In the general case, integration over a group as above, or integrations over other unknown template parameters, play exactly the same role as in this simple example, introducing the long-range dependencies that contain nontrivial shape information.

13.3.1 Drawbacks of the Classical Approach

While the classical approach to creating long-range dependencies using templates and a metric is useful and efficient in many applications, it does not apply, or is inefficient, in many important cases. In particular, the use of templates and metrics means that high probability shapes only occur ‘close’ to one or more points in the space of shapes. There are entities, however, for which our knowledge of their shape cannot be expressed in terms of small variations around a template shape or shapes. In particular, when the entity involved has an extent or a topology that is in some way unconstrained, the use of templates fails to allow sufficient variability.

Perhaps the most commonly occurring example is when multiple instances of an entity can be present: see Fig. 13.3 left. In this case, although each entity might be well described by a template and small variations, the whole may have any number of connected components, and hence is not amenable to a template/metric description. Although in principle this situation can be dealt with by using object point processes [13], in practice the large number of degrees of freedom per object, together with the necessity to estimate transformation group parameters for each instance, mean that such methods are very inefficient.

Fig. 13.3
figure 3

Left: multiple instances of an entity (‘gas of near-circles’). Right: a ‘network’ region

Another example is provided by ‘network’ regions: see Fig. 13.3 right. The set of network regions can be divided into topologically distinct subsets classified by the graph of which they are a fattened version. It is thus clear that such shapes cannot be described as variations around a finite number of templates.

To overcome these drawbacks, a new modelling framework is needed that allows the incorporation of strong constraints on region shape, without necessarily constraining region topology, and that provides intrinsic invariance. To achieve this, the long-range dependencies necessary for shape modelling will be encoded in the distribution in a new way. This turns out to be of interest in its own right, independently of the examples that inspired it.

13.4 Nonlocal Interactions

In this section, we will look at an alternative method for introducing the long-range dependencies needed in order to encode non-trivial shape information. Rather than using a template to introduce such dependencies, explicit nonlocal interactions between region boundary points will be introduced. These interactions generate long-range dependencies strong enough to constrain region shape, but because no template is used, they need not constrain region extent or topology. In addition, the models will be intrinsically invariant to Euclidean transformations, meaning that these transformations do not have to be estimated for each instance of an entity. Multiple instances thus become easier to handle. We first describe these models in terms of the ‘contour representation’, but then go on to reformulate them in terms of nonlocal interactions in networks of simple real- and binary-valued nodes.

The contour representation represents by its boundary ∂R, which consists of a set of oriented, closed curves. (The dark lines bounding the region in Fig. 13.1 show an example of a region boundary; the boundary orientation is not shown in the figure.) The contour representation space , is thus the space of multiple, oriented, closed curves, subject to certain constraints, to which we return later. In fact, it is often convenient to write probability distributions in terms of circle embeddings \(S^{1}\to {\mathbb {R}}^{2}\): making a distribution invariant to the action of \(\operatorname {Diff}(S^{1})\) then ensures that it is well-defined on Γ, and hence on .

We now introduce a class of models, expressed in the contour representation and known as ‘higher-order active contours’ [17], that encode nontrivial shape information via explicit nonlocal interactions.Footnote 2

13.4.1 Higher-Order Active Contours

The simplest Euclidean invariant model one can place on Γ is

$$ E_{C,0}( \partial R) = \lambda _{C}L( \partial R) + \alpha _{C}A( \partial R) , $$
(13.4)

where L and A are region boundary length and region area respectively; and \(\lambda _{C}, \alpha _{C}\in {\mathbb {R}}_{\geq 0}\). This model, or minor variants of it, has been much used as a region model in the literature, starting with [11]. Indeed, it is essentially the Ising model in a constant external field, expressed in the contour representation. As such, although this model contains important information about the (low-resolution) smoothness of region boundaries, it contains no real shape information. Indeed, both L and A can be expressed as single integrals over ∂R involving only tangent vectors, meaning that only ‘infinitesimally nearest neighbor’ points on the boundary interact: the model does not contain the long-range dependencies necessary to incorporate nontrivial shape information.

To incorporate nonlocal interactions, and hence long-range dependencies, one must move from single integrals to multiple integrals, thereby incorporating information from more than one contour point at a time. Two integrals is the simplest case: pairs of points on the boundary then interact. The idea is illustrated in Fig. 13.4. One possibility among many for such a ‘higher-order active contour’ energy is

$$ E_{C,\text {NL}}( \partial R) = -\beta _{C}\iint_{S^{1}\times S^{1}} dt\,dt'\,n\cdot {\mathbf {G}}\bigl(\gamma , \gamma '\bigr) \cdot n' , $$
(13.5)

where γ is an embedding of a circle (or multiple circles, if the region has more than one connected component or is multiply-connected), whose image is ∂R; t,t′ are coordinates on S 1; n indicates the un-normalized normal vector field to γ (i.e., \(\dot{\gamma }\) rotated by π/2); (un)primed quantities are evaluated at (t)t′; and \({\mathbf {G}}: {\mathbb {R}}^{2}\times {\mathbb {R}}^{2}\to T^{\ast }{\mathbb {R}}^{2} \boxtimes T^{\ast} {\mathbb {R}}^{2}\), where ⊠ indicates the outer tensor product, is a bitensor field. Note that it is easy to make E C,NL intrinsically Euclidean invariant, for example by taking \({\mathbf {G}}(x, x') = \varPsi (\vert x - x'\vert ){\mathbb {I}}\).

Fig. 13.4
figure 4

Left: the ‘nearest-neighbor’ interactions induced by first derivatives. Right: nonlocal interactions

Summing the nonlocal term (13.5) with Eq. (13.4) gives an energy E C =E C,0+E C,NL with interesting properties, documented in [9, 17]. In particular, for certain parameter ranges, calculable via stability analyses [7, 9], E C has local minima corresponding to ‘network’ regions or to ‘gas of near-circles’ regions. Examples of such local minima of E C , generated by gradient descent, are shown in Fig. 13.5. Network regions consist of a number of branches joining together at junctions, and can be thought of an ‘fattened embedded graphs’, as in Fig. 13.3. ‘Gas of near-circles’ regions consist of any number of connected components, each of which has infinitely many degrees of freedom, but which with high probability is ‘close’ to being a circle of a given radius. Note that ‘gas of near-circles’ regions represent multiple instances of a shape, with different interaction functions Ψ favoring different perturbations of the circle, and hence different shapes. Intrinsic Euclidean invariance of E C means that no pose estimation is required.

Fig. 13.5
figure 5

Local minima under E C . Left: network regions; right: gas of near-circles regions

Higher-order active contours demonstrate that non-trivial shape information can be encoded using explicit nonlocal interactions between boundary points, and they have been used successfully in a number of image processing applications [9, 17]. Nevertheless, the contour representation in which they are expressed suffers from a number of drawbacks arising from the fact that not all sets of oriented, closed curves are boundaries: constraints are needed to prevent (self-)intersections; curve orientations, which describe ‘inside’ and ‘outside’, have to be mutually consistent; and the space Γ is not connected, having one component for each topologically distinct set of regions (connected region components, holes, nested regions, …). These difficulties can all be alleviated by changing the shape representation. In the process, we will see that shape information can be encoded in terms of nonlocal interactions in a network of simple binary- or real-valued nodes.

13.4.2 Reformulation as a Network of Nodes

A region can be represented by a binary-valued function , with . It turns out to be convenient to relax this to a real-valued ‘phase field’ function (we will use the same symbol) controlled by an energy that encourages it to take on the values ±1. Perhaps the simplest such energy is the Ginzburg-Landau energy:

(13.6)

where all ϕ are evaluated at x. The ultralocal part of the integrand has minima at ±1. This means that if D=0, ϕ R =argmin ϕ: ζ(ϕ)=R E 0(ϕ) will take the value +1 inside, and −1 outside R, that is, will be binary. It is then easy to see that \(E( \phi _{R}) = \tfrac{2}{3}\alpha A(R)\) up to an additive constant. Nonzero D has the effect of smoothing the discontinuity, and also measures the boundary length. Indeed it can be shown that

$$ E_{0}( \phi _{R}) \simeq \lambda _{C}L(R) + \alpha _{C}A(R) , $$
(13.7)

where λ C and α C are functions of λ, α, and D. The energy E 0 is thus E C,0 reformulated in terms of the phase field representation.

It is now natural to ask whether it is possible to create a phase field energy that is equivalent to E C =E C,0+E C,NL. This is indeed the case. The equivalent of the nonlocal energy E C,NL in Eq. (13.5) is

(13.8)

It can then be shown that

$$ E( \phi _{R}) = E_{0}( \phi _{R}) + E_{\text {NL}}( \phi _{R}) \simeq E_{C,0}(R) + E_{C,\text {NL}}(R) = E_{C}( R) . $$
(13.9)

The phase field representation is a one-to-many representation. Equation (13.9) shows that in a saddle-point approximation to Eq. (13.1), the phase field model E is equivalent to the contour model E C . (The same is true in the Gaussian approximation also.) In particular, for different parameter ranges, the phase field model has local energy minima corresponding to networks and a gas of near-circles; the ranges can be found by translating the results of the stability analyses performed in the contour representation to the phase field representation, and are well verified numerically. This is useful because the phase field representation turns out to have multiple advantages. First, unlike the contour representation, there are no difficult constraints to implement: ϕ lives in a linear space Φ. Second, regions of arbitrary topological complexity are all represented in a single, connected space, so that no special methods are needed to deal with multiple connected components, handles, etc. Coupled with the intrinsic Euclidean invariance of the energy, this means that multiple instances of an entity are modeled at essentially no extra cost. Third, in the contour representation, γ appears as an argument to G, which makes the nonlocal term complicated. In the phase field representation, the nonlocal term is quadratic, and when translation invariant, is diagonal in the Fourier basis. This greatly simplifies any computations involving it.

The continuum form of the phase field energy is easy to manipulate, but computations, whether in a machine or biological visual system, will inevitably involve discretization of some sort. If we spatially discretize Eq. (13.6), the result is a Markov random field ψ, consisting of a set of real-valued nodes, interacting (see Fig. 13.6): with themselves via the potential (red); with their nearest neighbors via the derivative term (green); and also with the large number of nodes that lie within the support of the nonlocal interaction (blue). We thus see that nontrivial shape information can be encoded as nonlocal interactions in a network of real-valued nodes.

Fig. 13.6
figure 6

Diagram of the network interactions in the spatially-discretized phase field model

We can simplify things even further, at the cost of losing some geometric accuracy, as follows. On most of its domain, the phase field takes values very close to the set {−1,1}. This suggests replacing ψ by a field taking values only in the set {−1,1}, that is, by a binary-valued Markov random field ω. By definition, the distribution for ω is given in terms of that for ψ by P(ω)=∫ Ψ P(ω|ψ)P(ψ). Binarization means that \(P({\omega | \psi }) = \delta (\omega , \operatorname {sgn}(\psi ))\). In the saddle point approximation, the energy U of the binarized field is given by U(ω)=E(ψ ω ), where \(\psi _{\omega } = \arg\min_{\psi :\,\operatorname {sgn}{\psi } = \omega } E(\psi )\). Computing ψ ω is a difficult task in itself. A crude but practically effective approximation gives rise to the energy [1]:

$$ U(\omega ) = \frac{D_{b}}{2} \sum_{i,j:\,i\sim j} ( \omega _{i} - \omega _{j})^{2} + \alpha _{b}\sum _{i}\omega _{i} + \frac{\beta _{b}}{2}\sum _{i, j} \omega _{i}F_{ij}\omega _{j} , $$
(13.10)

where \(\alpha _{b}= \frac{2\alpha}{3}\), β b =β, \(D_{b}= \frac{D}{4}\), and F is related to 2 G. Gibbs sampling from this distribution, with appropriate temperature and parameter ranges (again derived from stability analyses performed in the contour representation) shows convergence to gas of near-circles or network regions. These then fluctuate, but remain stable, under further sampling. As with the other two representations, the probability distribution can have local maxima at these shape families. Thus, the same nontrivial shape information can be encoded as nonlocal interactions in a network of binary nodes.

13.4.3 Nonlocality via Local Interactions

Nonlocal interactions have so far been introduced explicitly. However, explicit nonlocality may not be plausible in some contexts, for example the biological, so it is important to note that nonlocality can arise from purely local energies. We introduce a vector field \(v: {\mathbb {R}}^{2}\to {\mathbb {R}}^{2}\), and define a joint distribution with Gibbs energy

$$ \hat{E}(\phi , v) = E_{0}(\phi ) + a \langle v | \partial \phi \rangle+ \frac{1}{2} \langle v \vert F \vert v \rangle , $$
(13.11)

where 〈|〉 is the L 2 inner product on ; F is a positive operator; and \(a\in {\mathbb {R}}\). Notice that v couples to the gradient of ϕ, that is, to the boundary. At the same time v is spatially correlated (e.g., smoothed) by the interaction represented by F. It therefore induces an interaction between points on the boundary. To find this interaction, we marginalize over v. The resulting Gibbs energy is

$$ \tilde{E}(\phi ) = E_{0}(\phi ) - \frac{1}{2}\beta \langle \partial \phi \vert F^{-1} \vert \partial \phi \rangle , $$
(13.12)

where β=a 2. This has the same form as the nonlocal phase field energy E defined in Eqs. (13.7), (13.8), and (13.9). A similar procedure works for the binary MRF model. Thus rather than encoding shape information via nonlocal interactions, it can instead be encoded by allowing the network to have several ‘layers’.

13.5 Discussion

Thus, we reach the end of the story. Shape information can be encoded via nonlocal interactions in a network of binary- or real-valued nodes. In turn, these nonlocal interactions can be re-encoded as local interactions in a network with multiple ‘layers’. Control of these interactions then allows different shape families to be modeled. Shape, therefore, does not have to be described by exogenous templates, or constructed from arbitrary building blocks. Instead, it can arise naturally, as an emergent property of the connections in a network of simple nodes.

The fact that shape information is encoded as interactions in a network means that shape processing can be inherently parallel. The domain can be separated into subdomains that can be processed simultaneously, with some communication overheads. Note that to search for a shape in multiple subdomains using a template would be equivalent to searching for multiple instances, requiring separate pose estimations for each subdomain.

For image processing applications, the nodes are usually generated by spatial discretization onto a square lattice, but any discretization is possible, for example, using a hexagonal lattice. The nodes need not even correspond to spatial elements: one of the strengths of the phase field representation is that it can be written in any basis, and so a discretization could be generated by imposing a frequency cut-off in Fourier space, or a scale cut-off in wavelet space. The local and non-local terms (but not the potential) are diagonal in the Fourier basis, while the potential term is diagonal in the spatial point basis. This suggests that a wavelet basis, which is intermediate between these two extremes, might simplify the interactions in the model. This would also have the advantage of providing a naturally multiscale representation of shape.

Despite all its promising aspects, however, the framework cannot yet be called a complete shape modelling method. Only simple shapes have been modeled so far, and the natural question is whether one can model more complex shapes.

One possible direction is suggested by observations in numerical experiments. Shapes have been seen that were neither circles nor bars, but instead were star-shaped: circles plus a sinusoidal perturbation of their radius. These appeared to be stable. While it is possible that this was an artefact of the numerical method, it is also possible that the chosen parameter values produced a new type of local minimum. A simple explanation is as follows. To second order in a small perturbation of a circle, only two behaviors are possible for each Fourier component of the perturbation: stable or unstable, corresponding to positive or negative second-order coefficient in the expansion. To fourth order, however, more complex behaviors can occur. In particular, if the second-order coefficient is negative but the fourth-order coefficient is positive, then although the zero amplitude state is unstable, there will be some finite amplitude that is stable. Now imagine that all Fourier components are stable quadratically except for one, which is unstable quadratically but stabilized by a fourth-order term. The circle itself is now a saddle-point of the energy, while a circle perturbed by a sinusoid of the correct frequency and amplitude is a local minimum. This is illustrated in Fig. 13.7.

Fig. 13.7
figure 7

A Fourier component, quadratically unstable at zero but stabilized at a finite value by quartic behavior, added to an otherwise stable circle

This picture suggests that by adjusting the interaction function of the model, one might be able to assign different stable amplitudes to each Fourier component. Were this possible, it would be mean that any star domain could be modeled.

An alternative approach to the modelling of more complex shapes involves the introduction of higher-order interactions. There are two issues with such interactions. The first is learning the interactions necessary to model a given family of shapes. In order for an energy to model such a family, it should have local minima at the appropriate points in , and this involves difficult analysis. It could perhaps be achieved using standard statistical estimation techniques verified a posteriori for local minimality, or by placing constraints on the parameters during estimation. The first seems wasteful, while the second is complex, and its theoretical basis is not clear. The second issue is algorithmic complexity. Simply evaluating higher-order terms is expensive, and although there are algorithms available for certain types of higher-order term in the binary MRF case, it is not likely that they would apply to the types of term needed. Nevertheless, some promising progress is being made in these areas, and there is good reason to hope that the picture of shape as an emergent property of interactions between network nodes can be fully realized.