Minimizing Energies with Hierarchical Costs

Delong, Andrew; Gorelick, Lena; Veksler, Olga; Boykov, Yuri

doi:10.1007/s11263-012-0531-x

Minimizing Energies with Hierarchical Costs

Published: 09 May 2012

Volume 100, pages 38–58, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Computer Vision Aims and scope Submit manuscript

Minimizing Energies with Hierarchical Costs

Download PDF

Andrew Delong¹,
Lena Gorelick¹,
Olga Veksler¹ &
…
Yuri Boykov¹

907 Accesses
27 Citations
Explore all metrics

Abstract

Computer vision is full of problems elegantly expressed in terms of energy minimization. We characterize a class of energies with hierarchical costs and propose a novel hierarchical fusion algorithm. Hierarchical costs are natural for modeling an array of difficult problems. For example, in semantic segmentation one could rule out unlikely object combinations via hierarchical context. In geometric model estimation, one could penalize the number of unique model families in a solution, not just the number of models—a kind of hierarchical MDL criterion. Hierarchical fusion uses the well-known α-expansion algorithm as a subroutine, and offers a much better approximation bound in important cases.

Generalized Fusion Moves for Continuous Label Optimization

Curve Propagation, Level Set Methods and Grouping

Ground Truth Energies for Hierarchies of Segmentations

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Energy minimization is of strong practical and theoretical importance to computer vision. An energy expresses our criteria for a good solution—low energies are good, high energies are bad—independent of any algorithm. Algorithms are however hugely important in practice. Even for low-level vision problems we are confronted by energies that are computationally hard (often NP-hard) to minimize. As a consequence, a significant portion of computer vision research is dedicated to identifying energies that are useful and yet reasonably tractable. Our work is of precisely this nature.

Computer vision is full of ‘labeling’ problems cast as energy minimization. For example, the data to be labeled could be pixels, interest points, point correspondences, or mesh data such as from a range scanner. Depending on the application, the labels could be either semantic (object classes, types of tissue) or describe geometry/appearance (depth, orientation, shape, texture).

There are many labeling problems for which the labels naturally form groups. In computer vision, a recent trend is the use of ‘context’ to resolve ambiguities in object recognition (e.g. Choi et al. 2010; Ladický et al. 2010a; Zhou et al. 2011). The idea is that certain groups of labels are self-consistent because they tend to appear together, e.g. the {car,road,sky} labels all belong to the “outdoors” context, while {table,chair,wall} all belong to the “indoors” context. In computer graphics, one may wish to automatically classify the faces of a 3D mesh into semantic parts for the benefit of artists and animators (Kalogerakis et al. 2010). The part labels arm, tail, and wheel naturally belong to different groups based on their context (humanoid, quadruped, vehicle). In operations research, facility location can be cast as a labeling problem, and hierarchical variants have been studied (Svitkina and Tardos 2006; Sahin and Süral 2007). All of these disparate labeling problems are similar from an optimization point of view.

When labels are explicitly grouped in a hierarchy, the costs in the energy are naturally structured. In this work, we characterize a class of energies as having hierarchical costs. If an energy satisfies our “h-metric” and “h-subset” conditions, then we can often minimize it much more effectively. We provide a novel hierarchical fusion (h-fusion) algorithm to minimize our class of energies. Our algorithm generalizes the well-known α-expansion algorithm (Boykov et al. 2001) yet we provide better empirical performance and a better approximation bound. The improved theoretical guarantees are important because, in practice, α-expansion can easily get stuck in poor local minima for this useful class of energies; to the best of our knowledge, our h-fusion algorithm is state of the art. Like the original fusion algorithm (Lempitsky et al. 2010) ours is highly parallelizable.

With respect to our energy itself, the most relevant work is the “label costs” of Delong et al. (2012). Our notion of hierarchical costs is a special case of their energy, yet it is important to explicitly characterize this subclass because, as we show, it permits a much better minimization algorithm. With respect to our algorithm, by far the most closely related work is the r-hst metrics of Kumar and Koller (2009). We review both these works in some detail.

2 Review of Related Work

First we review energies with “label costs” as described in Delong et al. (2012). Let the set $\mathcal{P}$ index the data that needs to be labeled, and let $\mathcal{L}$ be the set of possible labels. A labeling is any complete assignment $f = (f_{p})_{p \in \mathcal{P}}$ where variable $f_{p} \in\mathcal{L}$ designates the label assigned at index p. For example if $\mathcal{P}= \{p,q\}$ and $\mathcal{L}=\{\ell_{1},\ell _{2}\}$ then a valid labeling might be f=(ℓ ₂,ℓ ₁) where f _p=ℓ ₂,f _q=ℓ ₁.

We seek joint assignment f that minimizes an energy balancing three types of criteria

$$ E(f) = D(f) + V(f) + H(f). $$

(1)

The D term encodes the individual preference of each variable f _p, whereas the V term encourages pairs of variables to take similar values. These are typically expressed as

where the neighborhood set $\mathcal{N}$ identifies pairs of variables f _p and f _q that interact, and each weight w _pq≥0 scales the strength of V for that particular pair. Energies with these terms are quite standard in vision, particularly in models based on Markov random fields (MRFs) and when performing maximum a posteriori (MAP-MRF) inference (Li 1994).

The H term encourages the labeling f to use as few unique labels as is necessary. For example, do not explain the data with six labels if five would suffice just as well. The general form that we use in this work can be expressed as

where $\mathcal{H}$ is any collection of subsets of $\mathcal{L}$, each H(L) is the label cost of subset $L \subseteq \mathcal{L}$, and the label cost indicator δ _L(⋅) is defined as

$$ \delta_L(f) = \left\{ \begin{array}{l@{\quad}l} 1 & \text{if } \exists p : f_p \in L \\[3pt] 0 & \text{otherwise.} \end{array} \right. $$

Intuitively, H(L) is the shared cost to be paid if f uses any labels drawn from label group L. The cost is shared because it is paid at most once, regardless of how many labels from L appear in f. In many applications there is no reason to group the labels, and so setting $\mathcal{H}= \{ \{\ell\} \}_{\ell\in\mathcal{L}}$ restricts us to individual per-label costs H({ℓ}) or, with slight abuse of notation, setting $\mathcal{H}= \mathcal{L}$ and writing each per-label cost as H(ℓ). Label costs are an important special case of ‘high-arity’ potentials recently studied in computer vision (e.g. Werner 2008; Woodford et al. 2009).

From an energy standpoint there are a number of works that combine data costs D with V and/or H. Table 1 lists a small selection of such works in computer vision. As can be seen from the V column, many techniques are only applicable if V(⋅,⋅) satisfies certain assumptions.

Table 1 A selection of relevant energy-based problem formulations in computer vision. All are a special case of energy (1), with the exception of Ladický et al. (2010a) which in principle allows for a wider class of energies that encourage ‘parsimony’

Full size table

Definition 1

A smoothness cost V is semi-metric if it satisfies

and V is a metric if it additionally satisfies

These assumptions were originally outlined by Boykov et al. (2001) as sufficient conditions for their α-expansion and αβ-swap algorithms. These conditions arise because of the inherent limitations of graph cut methods (Boykov and Jolly 2001; Kolmogorov and Zabih 2004). Because our algorithmic approach is also based on graph cuts, we shall see a similar kind of limitation arise in Sect. 4.1.

We note that, with the exception of Zhu and Yuille (1996), all the works listed in Table 1 are of a discrete nature where $\mathcal{P}$ is a finite set. A number of variational formulations of E have recently been developed with continuous analogues of D+V (e.g. Pock et al. 2008, 2009; Olsson et al. 2009) and of D+V+H (Yuan and Boykov 2010). Our main ideas also apply to such continuous formulations, but we focus on the discrete setting.

Energies of the form (1) are NP-hard to minimize in all but a few special cases. Even D+V is NP-hard to minimize for $|\mathcal{L}|\geq3$ by reduction from 3-Terminal-Cut (Boykov et al. 2001); in fact this case is max-SNP-hard (Cunningham and Tang 1999), meaning there is some ϵ>0 for which no polynomial-time (1+ϵ)-approximation algorithm can exist (i.e. no polynomial-time approximation scheme (PTAS)). The D+H case is NP-hard by straight-forward reduction from Set-Cover using only per-label costs H(ℓ). A hardness result for approximating Set-Cover by Feige (1998) implies that D+H cannot be approximated within a ratio of $(1-\epsilon)\ln|\mathcal{P}|$ in polynomial time unless the complexity class NP ⊆ DTIME[n ^O(loglogn)], i.e. NP would have to be only slightly super-polynomial, which is currently deemed unlikely. This observation will help to put the approximation bound for our algorithm into perspective.

Observation 1

Feige’s hardness result is evidence that no polynomial-time algorithm can minimize energy (1) within a constant ratio of the optimum.

From an algorithmic standpoint, the most similar work to ours is a recent paper by Kumar and Koller (2009). Their aim is to efficiently minimize energies of the form D+V. They consider the class of r-hierarchically well-separated tree (r-hst) metrics (Bartal 1998) which are a special case of metrics defined above. We discuss r-hst metrics in Sect. 8, but for now it is enough to understand that they are a special case of metrics and that each has an associated constant r>1. Kumar and Koller provide an algorithm that, for a particular r-hst metric, provides an $\frac{2r}{r-1}$-approximation to the globally optimal labeling. Although this coefficient is very large for r≈1, the approximation only depends on V (not on $|\mathcal{P}|$ or $|\mathcal{L}|$). In some cases this ratio is better than the well-known bound for the α-expansion algorithm (Boykov et al. 2001) and the $O(\lg|\mathcal{L}|\lg\lg|\mathcal{L}|)$ bound for linear programming relaxation (Kleinberg and Tardos 2002).

Kumar and Koller describe their algorithmic process as hierarchical graph cuts. This does not refer to computing a graph cut in a hierarchical manner, but rather to minimizing an energy D+V via a hierarchical sequence of standard graph cuts. They show that the r-hst metric assumption on V is sufficient to apply their algorithm. We aim to minimize energies of the form D+V+H and, motivated by the difficult H term, have independently developed an algorithm we call hierarchical fusion (h-fusion). However, at the highest conceptual level our algorithm is the same as theirs—it is only our class of energies and our sequence of subproblems that is different, each of which we solve with the extended α-expansion of Delong et al. (2012). The h-fusion process will be explained in Sect. 5.

Also worth mentioning is a recent work by Felzenszwalb et al. (2010) concerning energies of the form E=D+V. By making the strong assumption that both D and V are tree metrics, they can compute a global optimum. However, most applications do not satisfy the metric assumption on data costs D. Note that our work makes no such assumption. We discuss tree metrics in Sect. 4.1.

Our contributions

This work is about characterizing a class of energies that will be useful in vision, along with a fast and effective algorithm for dealing with large-scale problems. Specifically, for energies of the form D+V+H,

(a)
we define h-metric smoothness costs V, a wider class than tree metrics yet still sufficient for our h-fusion algorithm to apply,
(b)
we define h-subset label costs H, a sufficient condition to apply h-fusion with high-order label costs,
(c)
we prove that the approximation bound of h-fusion is much better than α-expansion in important cases, and
(d)
we provide worst-case examples to show that our theoretical bound is tight in some reasonable sense.

Contribution (a) is about a more general characterization of V than that used by Kumar and Koller (2009) and by Felzenszwalb et al. (2010), while (b–d) are completely novel results on how to handle label costs in this framework.

The remainder of this paper is organized as follows. Section 3 reviews the α-expansion algorithm in some detail. Section 4 then introduces our notion of hierarchical costs, a useful subclass of energy (1). Section 5 describes our h-fusion algorithm, and Sect. 6 derives its approximation bound. Section 7 gives some experiments to suggest how our energies and algorithm work. Finally, Sect. 8 discusses other applications, relations to facility location, and extensions.

3 Review of α-Expansion

The algorithm we introduce in this paper uses the well-known α-expansion algorithm (Boykov et al. 2001) as a key subroutine. The algorithm was designed for energies of the form D+V, though we employ an extension to D+V+H by Delong et al. (2012). Our approximation bound is therefore intricately linked with α-expansion and its limitations, so we review the algorithm here. Readers familiar with α-expansion may skip ahead to Sect. 4.

3.1 How α-Expansion Works

The α-expansion algorithm performs local search using a powerful class of ‘moves’. Given an initial labeling $\hat{f}$ and some particular label $\alpha\in\mathcal{L}$, an α-expansion move gives each variable the following binary choice: either keep the current label $\hat{f}_{p}$, or switch to label α. Let $\mathcal{M}^{\alpha}(\hat{f})$ denote the set of all moves (labelings) that can be generated this way, in other words

$$ \mathcal{M}^{\alpha}(\hat{f}) = \bigl\{ f : f_p \in\{\hat{f}_p\} \cup\{\alpha\} \bigr\}. $$

(2)

All variables are simultaneously allowed to keep their current label or to switch, so there are an exponential number of possible moves. For each choice of α we must efficiently find the best possible move. In practice, this sub-problem is solved by casting it as a graph cut (Greig et al. 1989) and using combinatorial algorithms to compute the optimal binary configuration (e.g. Goldberg and Tarjan 1988; Boykov and Kolmogorov 2004; Strandmark and Kahl 2010). Figure 1 illustrates the steps behind a single expansion move. Because a graph cut finds the best move from an exponential number of possibilities, the α-expansion algorithm is a very large-scale neighborhood search (VLSN) technique (Ahuja et al. 2002) and is very competitive in practice (Szeliski et al. 2006).

With respect to some current labeling $\hat{f}$, the full set of possible expansion moves is $\mathcal{M}(\hat{f}) = \bigcup_{\alpha \in \mathcal{L}} \mathcal{M}^{\alpha}(\hat{f})$. The α-expansion algorithm simply performs local search over the full search neighborhood $\mathcal{M}(\hat{f})$. Perhaps surprisingly, local search with expansion moves will terminate with a labeling $\hat{f}$ that is within a constant factor from the globally optimal labeling f ^∗ (see Sect. 3.3). The α-expansion algorithm is generally implemented as shown below.

3.2 Graph Cuts and the Limits of α-Expansion

From Table 1 we see that α-expansion is applicable if V is a metric (Definition 1). We briefly review how this limitation arises, as it will be relevant to our new h-fusion algorithm.

The main subproblem for α-expansion (line 4) is to find, for a particular label $\alpha\in\mathcal{L}$, the best move $f \in \mathcal{M}^{\alpha}(\hat{f})$. The best move is the one with minimal E(f). Expansion moves are fundamentally binary so we can encode a move as a function f(x) of binary vector $\mathbf{x}=(x_{p})_{p \in \mathcal{P}}$ where

$$ f(\mathbf{x})_p = \left\{ \begin{array}{l@{\quad}l} \hat{f}_p & \text{if } x_p = 0 \\[3pt] \alpha& \text{if } x_p = 1. \\ \end{array} \right. $$

To solve the subproblem on line 4, we can construct a binary energy E′(x)=D′(x)+V′(x) but with specially constructed data terms $D'_{p}$ and smoothness terms $V'_{pq}$:

$$ \begin{array}{l@{\qquad}l} D'_p(0) := D_p\bigl(\hat{f}_p\bigr) & V'_{pq}(0,0) := V\bigl (\hat{f}_p,\hat{f}_q\bigr)\\[3pt] D'_p(1) := D_p(\alpha) & V'_{pq}(0,1) := V\bigl(\hat{f}_p,\alpha \bigr)\\[3pt] & V'_{pq}(1,0) := V\bigl(\alpha,\hat{f}_q\bigr)\\ [3pt] & V'_{pq}(1,1) := V(\alpha,\alpha)\\ \end{array} $$

(3)

where D and V are the terms of the original multi-label energy E(f). Since E′(x)=E(f(x)), minimizing E′ finds an optimal expansion move w.r.t. E.

A binary energy of form E′=D′+V′ can be minimized efficiently by a graph cut if E′(x) is a submodular function (Boros and Hammer 2002; Kolmogorov and Zabih 2004). Energy E′ is submodular if and only if it satisfies

(4)

for all $pq \in\mathcal{N}$. Now, for which multi-label energies does the expansion energy (3) result in submodular E′? The submodularity condition (4) holds if and only if

(5)

for all neighbors pq and all possible values of $\hat{f}_{p}$ and $\hat{f}_{q}$. Since we must assume $\hat{f}_{p}$ and $\hat{f}_{q}$ could take any label in $\mathcal{L}$ we arrive at the following due to Kolmogorov and Zabih (2004), Boykov and Veksler (2006).

Observation 2

The α-expansion algorithm is applicable to energies of the form D+V if and only if for all $\alpha,\beta,\gamma\in\mathcal{L}$

(6)

The metric assumption is sufficient for (6) to hold, and so α-expansion is applicable if V is metric energy.

Rother et al. (2005) showed that, by assuming a non-arbitrary initial labeling, α-expansion can be applied to a wider class of energies: for each $\beta,\gamma\in\mathcal{L}$, either V must satisfy (6) for all $\alpha\in\mathcal{L}$, or V(β,γ)=∞. Unfortunately, α-expansion offers no approximation guarantees for this extended class (Theorem 1 below). In this paper we define a class of smoothness costs V called h-metrics, and it too can be extended to non-metric infinities this way. However, we aim to quantify approximation bounds for our algorithm so, for simplicity, we will not include such infinities in our definition of h-metrics.

Delong et al. (2012) showed that α-expansion is also applicable to energies with label costs as long as H(L)≥0 for each label subset $L \subset\mathcal{L}$. The expansion step requires a binary energy of the form E′(x)=D′(x)+V′(x)+H′(x) where H′ defines very high-order potentials over x, unlike V′ which defines only ‘quadratic’ (pairwise) potentials. We will use their construction in our main subroutine.

3.3 Approximation Bounds of α-Expansion

Local search with expansion moves is guaranteed to terminate at a local minimum $\hat{f}$ that is within a constant factor of the global optimum f ^∗ (Veksler 1999; Boykov et al. 2001). The actual bound is $E(\hat{f}) \leq2c E(f^{*})$ where c≥1 is some constant that depends on V. If c is small, then we can expect α-expansion to do at least a reasonable job. If c is large, the bound is meaningless and we have even more reason to try other algorithms (e.g. Kolmogorov 2006).

Understanding the approximation bound of α-expansion will be helpful for understanding our generalized bound in Sect. 6. The following holds for any energy E=D+V with^{Footnote 1} D _p(⋅)≥0 and metric V(⋅,⋅).

Theorem 1

(Veksler 1999)

If f ^∗ is a global minimum of E=D+V, and $\hat{f}$ is a local minimum w.r.t. expansion moves, then

$$ E(\hat{f}) \leq2 c E\bigl(f^*\bigr) \quad \text{\textit{where} } c = \frac{ \max_{\alpha\neq\beta\in\mathcal{L}} V(\alpha, \beta) }{ \min_{\gamma\neq\zeta\in\mathcal{L}} V(\gamma, \zeta) }. $$

(7)

In other words, α-expansion is a 2c-approximation algorithm where c≥1 depends on the ratio of largest to smallest costs in V. Below are some V(⋅,⋅) terms commonly used in vision, shown in matrix form for $|\mathcal{L}|=5$.

Underneath we see the coefficient c corresponding to each case. The simplest potential (Potts) penalizes f _p≠f _q equally, and gives the best approximation bound. When the range of values is large, e.g. for the ‘linear’ penalty of |f _p−f _q|, the bound (7) gets worse. Our h-fusion algorithm beats the α-expansion bound for a wide class of ‘hierarchical’ costs.

Delong et al. (2012) showed that incorporating label costs H(⋅) into α-expansion can worsen the above bound. If arbitrary label costs H(L)≥0 are assumed on arbitrary subsets $L \subset\mathcal{L}$ then the bound is as follows.

Theorem 2

(Delong et al. 2012)

If f ^∗ is a global minimum of E=D+V+H, and $\hat{f}$ is a local minimum w.r.t. α-expansion, then the following bound is tight:

$$ E(\hat{f}) \leq(2 c + c_{2}) E\bigl( f^{*}\bigr) + \sum_{L \subset\mathcal{L}} H(L) $$

(8)

where c ₂=max_L:H(L)>0(|L|−1).

This tells us that, if arbitrary label costs are assumed, standard α-expansion is no longer a constant-ratio approximation algorithm (recall Observation 1) and furthermore the bound gets worse (c ₂ gets larger) if costs are defined on large subsets L.

4 Energies with Hierarchical Costs

We wish to minimize an energy of the general form (1), but we assume the labels are grouped in some kind of hierarchy. Depending on the application, the grouping will likely be either semantic (hierarchy of object labels) or geometric (families of geometric models). One option for minimizing such energies is to ignore the grouping and simply apply α-expansion. However, Theorem 2 suggests caution, because α-expansion finds poor local minima when, for example, strong high-order label costs are involved.

We express the label grouping structure as follows. The leaves of the tree are the actual labels $\mathcal{L}$. The root of the tree we denote by r. For trees of non-trivial structure, we call the extra nodes pseudo-labels and denote them by the set $\mathcal{S}$, so the set of all intermediate nodes is $\mathcal{S}\cup\{r\}$. The structure of the tree itself can be defined by a child-to-parent map π(⋅) where for each node $i \in\mathcal{L}\cup\mathcal{S}$ a parent $\pi(i) \in\mathcal{S}\cup\{r\}$ is defined. We use $\mathcal{T}= \mathcal{L}\cup\mathcal{S}\cup\{r\}$ to denote all tree nodes. Figure 2 shows a simple label hierarchy.

Merely declaring the labels to be ‘grouped’ does not in itself change energy E(f) (we still have $f_{p} \in\mathcal{L}$) nor is the standard α-expansion algorithm ‘aware’ of a label hierarchy. However, in Sects. 4.1–10 we describe energies for which a ‘good’ tree can be defined so that our h-fusion algorithm (Sect. 5) is provably better than α-expansion. In fact if one defines a flat tree ($\mathcal{S}= \{\,\}$) then our algorithm and approximation bounds all reduce to those of Boykov et al. (2001) and Delong et al. (2012) as a special case.

Sections 4.1 and 10 develop key definitions: the class of h-metrics for smoothness costs V, and the class of h-subsets for label costs H. These definitions are directly motivated by our h-fusion algorithm and how its computation is organized. If an energy satisfies our h-metric and h-subset assumptions, then it can be minimized by h-fusion (Sect. 5).

4.1 Hierarchical Smoothness Costs (h-Potts, h-Metrics)

The following notation will be helpful for discussing a tree defined by parent function π. We use π ⁿ(i) to denote n applications of π, as in π(⋯π(i)). The set of children of a node j is denoted by

$$ \mathcal{I}(j) = \bigl\{ i \in\mathcal{T}: \pi(i) = j \bigr\}. $$

The set of all nodes in the subtree rooted at j is denoted by

$$ \mathrm{subtree}(j) = \bigl\{ i \in\mathcal{T}: \pi^n(i) = j \text { for some } n \ge0 \bigr\}. $$

The set of labels belonging to the subtree of node i is

$$ \mathcal{L}_i = \bigl\{ \ell\in\mathcal{L}: \ell\in\mathrm {subtree}(i) \bigr\}. $$

A hierarchical grouping of labels $\mathcal{L}$ induces a grouping of the smoothness cost values inside each V(⋅,⋅) potential. Looking at the tree structure in Fig. 2 we can say that labels ℓ ₁ and ℓ ₂ are both in group A whereas ℓ ₁ and ℓ ₄ are from different groups; thus V(ℓ ₁,ℓ ₂) can be interpreted as a “within-group cost” and V(ℓ ₁,ℓ ₄) as a “between-group cost.” An example is illustrated below, where regions of the $|\mathcal{L}| \times |\mathcal{L}|$ matrix are delineated.

Individual costs inside each block can vary, though it is helpful to consider the simple case when the cost within each block is constant. So, we now define hierarchical Potts, a natural class of hierarchical smoothness costs V parameterized by node-based ‘transition’ costs $\{w_{i}\}_{i \in \mathcal{S}\cup\{r\}}$. In what follows, lca(α,β) denotes the lowest common ancestor of nodes α and β with respect to the tree structure π.

Definition 2

(Delong et al. 2011)

Given tree structure π, for each node $i \in\mathcal{T}$ let w _i≥0 be its transition cost so that V(α,β)=w _lca(α,β) for all $\alpha,\beta\in\mathcal{L}$ and w _i=0 for each leaf $i \in\mathcal{L}$. We then say that (V,π) forms a hierarchical Potts (h-Potts) potential.

For example, Delong et al. (2011) use a two-level tree where w _r is the transition cost between ‘super-labels’ and each w _i for $i \in\mathcal{S}$ is the transition cost between ‘sub-labels’ in group i. They show that if w _i≤2w _r then V is metric and standard α-expansion can be applied. For our h-fusion algorithm to apply, a simple sufficient condition is that w _i≤w _π(i) for all $i \in\mathcal{L}\cup\mathcal{S}$.

Now we define h-metrics, a class of hierarchical smoothness costs where V is not necessarily parameterized by w _i. As we shall see in Sect. 5, the h-metric assumption is necessary for our specialized algorithm.

Definition 3

We say that pair (V,π) forms a hierarchical metric (h-metric) if π is irreducible^{Footnote 2} and for every $i \in \mathcal{L}\cup\mathcal{S}$

(9)

Note that for a flat tree ($\mathcal{S}= \{\,\}$) each set $\mathcal{L}_{i}=\{ i \}$ so we always have α ₁=α ₂. It is easy to show, then, that the h-metric constraint (9) on a flat tree reduces to the standard α-expansion constraint (6). Figure 3 gives a concrete example of an h-metric and shows how (9) constrains the costs that V can encode.

For a fixed tree structure, the relationship between h-Potts, h-metrics, tree metrics and r-hst metrics is shown by the set inclusion diagram below (see Appendix A for proof). Our h-fusion algorithm will be applicable to the shaded cases.

4.2 Hierarchical Label Costs (h-Subsets)

We have already defined a notion of ‘hierarchical’ smoothness costs (h-metrics), and we now do the same for label costs. As we shall see, if an energy E(f) has hierarchical costs with respect to some tree, then E(f) can be minimized by our h-fusion algorithm on that tree (Sect. 5).

Definition 4

Given an irreducible tree structure π, we define its hierarchical label subsets (h-subsets) as

$$ \mathcal{H}= \{ L \subseteq\mathcal{L}: L \cap\mathcal{L}_i = \emptyset\vee L \subset\mathcal{L}_i \vee L \supseteq \mathcal{L}_i\ \forall i \in\mathcal{T}\} $$

or, equivalently

$$ \mathcal{H}= \biggl\{ L \subseteq\mathcal{L}: L=\bigcup _{i \in I} \mathcal{L}_i \text{for some siblings} I \subset\mathcal{I}(j) \biggr\}. $$

For example, assume we have $\mathcal{L}=\{\ell_{1},\ldots,\ell_{6}\}$ grouped in the structure shown below:

The hierarchical label subsets are

Note that sets like {ℓ ₂,ℓ ₃} and {ℓ ₁,ℓ ₂,ℓ ₃} are not in $\mathcal{H}$ because they cannot be generated from a union of siblings in the particular hierarchy chosen.

Definition 5

Given a tree π we say that (H,π) form hierarchical label costs if $H(L)>0 \Rightarrow L \in\mathcal{H}$, i.e. if label costs appear only on the h-subsets.

Note that for a flat tree, the set $\mathcal{H}= 2^{\mathcal{L}}$ and so all subsets are considered ‘hierarchical’ in this degenerate case.

5 Hierarchical Fusion Algorithm (h-Fusion)

Recall that the α-expansion algorithm (Sect. 3, Boykov et al. 2001) minimizes a multi-label energy by constructing a sequence of binary energies. Our h-fusion algorithm constructs a hierarchical sequence of multi-label energies, each of which is solved by running α-expansion as a subroutine. These intermediate multi-label energies are designed to ‘stitch’ or to ‘fuse’ labelings that were computed earlier in the sequence. As we shall see, this procedure provides better optimality guarantees than α-expansion for a wide class of energies, particularly those with strong label costs.

We name our algorithm h-fusion after the binary fusion moves of Lempitsky et al. (2010). They propose an iterative algorithm to minimize energies of the form E=D+V. Given a two candidate labelings $\hat{f}^{A}$ and $\hat{f}^{B}$, they try to find a lower-energy labeling by ‘fusing’ the best parts of each. This is highly analogous to optimized crossover operations (Aggarwal et al. 1997), a successful technique in genetic algorithms (discussed in Sect. 8). Figure 4 illustrates how new labelings are generated this way in our hierarchy. The key insight of Lempitsky et al. (2010) is that all fusion moves can be generated by a binary vector $\mathbf{x}=(x_{p})_{p \in\mathcal{P}}$ where the move itself is $f = \mathbf{x}\hat{f}^{A} + \bar{\mathbf{x}} \hat{f}^{B}$. Note the similarity with the inner loop of α-expansion: an expansion move from current labeling $\hat{f}$ can now be thought of as the binary fusion $f = \mathbf{x}\boldsymbol{\alpha}+ \bar{\mathbf{x}} \hat{f}$ with constant labeling α=(α,…,α). They propose a tennis-tournament strategy for selecting pairs of labelings to fuse, and they stop after the energy fails to decrease for some number of attempts. However, they cannot always find an optimal fusion because they fuse arbitrary labelings; the resulting binary energy is non-submodular and therefore NP-hard in general (see Sect. 3.2). They must rely on minimization methods like QPBO (Kolmogorov and Rother 2007; Rother et al. 2007) with no approximation guarantees.

In contrast, we construct multi-label subproblems and approximately minimize each one with α-expansion. Ours is therefore a graph cut approach that does not rely on QPBO, message passing (Kolmogorov 2006), or other LP relaxations (Werner 2008). Furthermore we,

handle energies with label costs (E=D+V+H),
characterize the subclass of energies (h-metrics, h-subsets) for which h-fusion is applicable, and
prove approximation bounds that generalize and improve upon those of α-expansion.

To the best of our knowledge, we are the first to incorporate high-arity costs (label costs) into a fusion-based algorithm.

At each step of our h-fusion algorithm, the main subproblem is to find $\operatorname*{\arg\hspace{-.1em}\min}_{f \in\mathcal{M}(\hat{f}^{1},\ldots,\hat{f}^{k})} E(f)$ where each $\hat{f}^{i}$ is a fixed labeling with $\hat{f}^{i}_{p} \in \mathcal{L}_{i}$ and the set of all possible fusions is

(10)

Crucially, in our framework no two labelings $\hat{f}^{i}$ and $\hat{f}^{i'}$ can contain the same labels, unlike the ‘tennis-tournament’ scheme. Given a tree structure π on label set $\mathcal{L}$, our algorithm generates labelings in a bottom-up fashion with respect to π. Figure 5 shows the order of sub-problems on a simple tree structure. For each internal node j a multi-label fusion energy is constructed and approximately minimized by α-expansion. The local minima computed at one level of the tree are subsequently fused at the next-higher level. Recursive code is given below, where calling h-Fusion(r) on root node r builds the final labeling in bottom-up manner.

The key question for h-Fusion is how to set up E′ (line 5) so that it encodes our original energy E over all possible fusions, i.e. over all labelings in $\mathcal{M}(\{ \hat{f}^{i}\}_{i \in\mathcal{I}(j)})$. Given a set of labelings $\{ \hat{f}^{i} \}_{i \in\mathcal{I}(j)}$ there is a one-to-one correspondence between mappings $g : \mathcal{P} \rightarrow\mathcal{I}(j)$ and labelings $f \in \mathcal{M}(\{\hat{f}^{i}\}_{i \in\mathcal{I}(j)})$. We let f(g) be the labeling $f_{p} = \hat{f}_{p}^{g_{p}}$ corresponding to g. We can then design an unconstrained energy E′ such that E′(g)=E(f(g)) for all g.

Again, we define E′ to be an energy over child indices. It will involve the familiar terms

$$ E'(g) = D'(g) + V'(g) + H'(g) $$

(11)

where D′ takes the usual form and V′ takes the form

$$ V'(g) = \sum_{pq \in\mathcal{N}}V'_{pq}(g_p,g_q). $$

As we shall see, the label costs H′ of our fusion energy must take the form of local label costs (Delong et al. 2012)

where cost $H_{P_{I}}'(I)$ is only applied for particular subset of variables $P_{I} \subseteq\mathcal{P}$ associated with child indices $I \subseteq\mathcal{I}(j)$. In other words, we need E′ to encode costs of the form “pay if g uses this child here.” The precise costs encoded by D′,V′ and H′ are shown in ConstructFusionEnergy below.

The correctness of D′ and V′ are self evident; the algorithm of Kumar and Koller (2009) includes lines 1–2 but on a more restrictive class of metrics. However, our work was mainly motivated by label costs. It is not obvious how lines 4–6 encode original label cost H(L) as a local label cost $H_{P_{I}}'(I)$. We now verify its correctness, first by simple example and then by proving it in general.

Example 1

Suppose E′ must fuse the two child labelings $\hat{f}^{1}$ and $\hat{f}^{2}$ shown below.

Assume $\mathcal{H}= \{\{\ell_{1}\},\{\ell_{2}\},\{\ell_{3}\},\{\ell_{4}\} ,\{\ell_{1},\ell_{2}\},\{\ell_{3},\ell_{4}\} \}$ and H(L)>0 for each $L \in\mathcal{H}$. Each g _p∈{1,2} and so E′(g) can account for H(L) by setting

Costs $H_{\mathcal{P}}'(i)$ (left-hand column) are global label costs in the fusion energy. The other costs $H_{P}'(i)$ (right-hand column), are localized label costs that encode the original (global) per-label costs H(ℓ). In this example, the fusion below at left should pay all label costs except H(ℓ ₃), whereas the fusion at right should pay all costs except H(ℓ ₂).

Theorem 3

If energy E has hierarchical label costs (H,π) then E′(g)=E(f(g)) for all $g : \mathcal{P} \rightarrow\mathcal{I}(j)$.

Proof

The correctness of D′ and V′ are self-evident so we focus on proving that $H_{P_{I}}'(I)$ correctly encodes H(L) for some subset of labels $L \in\mathcal{H}$. This reduces to showing that δ _L(f(g))=δ _I(g _P) where indices I and pixels P=P _I are as defined on lines 4 and 5 of ConstructFusionEnergy. In other words, we must prove that

$$ \exists\hat{f}^{g_p}_p \in L, p \in\mathcal{P}\quad\Longleftrightarrow\quad\exists g_p \in I, p \in P $$

(12)

where we can assume $\hat{f}^{g_{p}}_{p} \in\mathcal{L}_{g_{p}}$ due to the way h-fusion works.

Because we assume hierarchical label costs, each $L \in\mathcal{H}$ belongs to one of four cases derived from Definition 4.

1.
If $L \cap\mathcal{L}_{i} = \emptyset$ for all $i \in \mathcal{I}(j)$, then we know that any $\hat{f}^{g_{p}}_{p} \notin L$, then the cost H(L) is not applied in subtree j. By definition I=∅ ensuring δ _L(f(g))=δ _∅(g _P)=0, which is correct.
2.
If $L \subset\mathcal{L}_{i}$ for some $i \in\mathcal {I}(j)$, then by definition I={i} and $P = \{ p : \hat{f}^{i}_{p} \in L \}$. Since $g_{p} \neq i \Rightarrow\hat{f}^{g_{p}}_{p} \notin L$ and so f(g) contains a label in L if and only if g _p=i for some p∈P. Therefore δ _L(f(g))=δ _i(g _P) holds in this case.
3.
If $\mathcal{L}_{i} \subseteq L \subset\mathcal{L}_{j}$ for some $i \in\mathcal{I}(j)$, then clearly $P = \mathcal{P}$. Since $L \in\mathcal{H}$ we must also have $L = \bigcup_{i \in I} \mathcal{L}_{i}$, and so f(g) uses a label in L if an only if g uses a label in I. Therefore δ _L(f(g))=δ _I(g) holds in this case.
4.
If $L \supseteq\mathcal{L}_{j}$ then $I = \mathcal{I}(j)$ and $P=\mathcal{P}$, so H(L) can be added to E′ as a constant or simply ignored. □

Looking at the proof of Theorem 3 we can see that the structure of h-subsets is especially needed for the third case to hold. If we allow a subset $L \notin\mathcal{H}$, then α-expansion could not be applied to the resulting E′ because the internal binary steps would be non-submodular and potentially NP-hard. The purpose of Example 2 is to demonstrate why H(L)>0 for arbitrary L can be problematic.

Example 2

Suppose E′ must fuse the two child labelings $\hat{f}^{1}$ and $\hat{f}^{2}$ shown below.

Further suppose that H({ℓ ₁,ℓ ₄})>0 in the original energy. This cost should be paid in E′(g) if and only if any g _p variable in region A is assigned child index 1 or any g _p in region B is assigned child index 2. However, this potential cannot be encoded as a label cost any sort. Furthermore, encoding H({ℓ ₁,ℓ ₄}) would result in a non-submodular fusion energy. Let E′(i,j) denote the label cost of assigning index i to pixels A and index j to pixels B, then

That E′(i,j) is not submodular follows from inequality (4) if one assumes |A|=|B|=1, and so α-expansion is not applicable inside h-fusion. In fact, because this example has only two labels, we can conclude that the αβ-swap algorithm (Boykov et al. 2001) is also inapplicable to E′.

Finally, we establish that h-metrics give a precise characterization of smoothness costs V that h-fusion can handle.

Theorem 4

The h-fusion algorithm is applicable to V using tree π if and only if (V,π) forms an h-metric.

Proof

Recall from (6) that α-expansion is applicable if and only if V satisfies, for all $\alpha,\beta,\gamma\in\mathcal{L}$,

$$ V(\alpha,\alpha) + V (\beta,\gamma) \leq V(\alpha,\gamma) + V(\beta,\alpha). $$

(13)

Actually, if (13) holds for $\beta,\gamma\in \mathcal{L}\setminus\{\alpha\}$ the it trivially holds for all $\beta,\gamma\in\mathcal{L}$. This observation matters for h-fusion.

In the h-fusion case, each local fusion metric $V'_{pq}$ on line 2 of ConstructFusionEnergy must satisfy this constraint and so α-expansion can fuse a collection of labelings $\{\hat{f}^{i}\}_{i \in\mathcal{I}(j)}$ if and only if, for all $i \in\mathcal{I}(j), \; i',i'' \in\mathcal{I}(j)\setminus\{i\}$,

$$ V\bigl(\hat{f}^i_p, \hat{f}^{i}_q\bigr) + V\bigl( \hat{f}^{i'}_p,\hat{f}^{i''}_q \bigr) \leq V\bigl(\hat{f}^i_p, \hat{f}^{i''}_q\bigr) + V\bigl( \hat{f}^{i'}_p,\hat{f}^{i}_q \bigr). $$

(14)

Note that $\hat{f}^{i}_{p}$ and $\hat{f}^{i}_{q}$ could each be any label in $\mathcal{L}_{i}$ and are not necessarily identical, unlike the α-expansion case. The constraints on the original metric V for h-fusion will therefore be more restrictive than for α-expansion. Since inequality (14) must hold for all possible labelings $\hat{f}^{i},\hat{f}^{i'}$, and $\hat{f} ^{i''}$ then it is equivalent to

(15)

Since j=π(i) then inequalities (15) are identical to (9). □

6 Approximation Bounds of h-Fusion

Since α-expansion has an approximation bound and we use it as our main subroutine, it is natural to ask if h-fusion has some bound of its own. If we use a flat tree π and assume E=D+V, then h-fusion reduces to classical α-expansion and we directly inherit the 2c-approximation where

$$ c = \frac{\max_{\alpha,\beta\in\mathcal{L}} V(\alpha,\beta)}{\min_{\gamma\neq\zeta\in\mathcal{L}} V(\gamma,\zeta)}. $$

(16)

Our goal is to derive a generalized bound for h-fusion with arbitrary tree π and arbitrary label costs (i.e. $\mathcal{H}= 2^{\mathcal{L}}$ in (1)). Like the α-expansion bound in Theorem 2, the quality of our new bound will involve some c and c ₂ that depend on the particular energy. As we shall see, these two coefficients can be much smaller for our algorithm. We begin by defining some useful quantities for expressing the h-fusion bound.

Definition 6

Given smoothness costs V and a particular tree π, we define two quantities for each node i:

In other words, $V^{\max}_{i}$ is the maximum cost for any pair of labels in the subtree of node i, and $V^{\min}_{i}$ is the minimum cost for two labels from different subtrees descended from i. For example, in Fig. 3a we have $V ^{\max}_{A} = 2, V^{\min}_{A}=1$ and $V^{\max}_{B}=4,V ^{\min }_{B}=2$. For the root node r, this example has $V^{\max}_{r}=4$ and $V^{\min}_{r} = 3$.

Definition 7

Given an h-metric(V,π) where V is also semi-metric, we define the additional quantities

Observation 3

If π defines a flat tree, then c in Definition 7 reduces to quantity (16) from the α-expansion bound.

The ratio c is most important because it bounds the worst-case approximation error. As we will show, when c is large for standard α-expansion, choosing a non-flat tree can result in a much smaller constant for h-fusion and thereby a better bound. The easiest way to understand how V and π affect c is by looking at a few numeric examples. Figure 6 examines specific values of c for various pairs of smoothness cost matrices V and trees π. These examples suggests that for each choice of V there exists an optimal choice of π to give the best approximation bound. Since for every metric V we can use a flat tree, we can always find a tree for which h-fusion’s bound is at least as good as α-expansion’s.

We now consider label costs in h-fusion and generalize the related coefficient c ₂. The cardinality of set $I \subset\mathcal{I}(j)$ on line 4 of ConstructFusionEnergy is an important quantity affecting c ₂ for h-fusion. In general, the smaller |I| the better the bound. (Note that we use ⊂ to mean ⊆̷ throughout this paper.)

Definition 8

We define c ₂ to be the maximum cardinality of any index set I (ConstructFusionEnergy, line 4) that is a strict subset of $\mathcal{I}(j)$, minus 1. That is, the constant

where $I(L) \subset\mathcal{I}(j)$ for some j and $\bigcup_{i \in I(L)} \mathcal{L}_{i} = L$.

The fact that I(L) will always be the union of some siblings in the tree follows from the assumption that L is an h-subset.

Observation 4

If π defines a flat tree, then c ₂ in Definition 8 reduces to the same quantity for α-expansion in Theorem 2.

To see how h-fusion can beat α-expansion at minimizing energies with label costs, consider the following worst-case example for α-expansion.

Example 3

Suppose variables $\mathcal{P}= \{p_{1},\ldots,p_{n}\}$ and labels $\mathcal{L}=\{\ell_{1},\ldots,\ell_{n+1}\}$. The data costs D _p(⋅) are shown in the table below, where a>0, and smoothness costs are zero. We also assume a label cost H({ℓ ₁,…ℓ _n})=a, as illustrated.

The labeling $\hat{f}=(\ell_{n+1},\ldots,\ell_{n+1})$ is a local minimum for α-expansion because no individual variable wants to pay the (shared) label cost in order to switch its label. However the globally optimal labeling is f ^∗=(ℓ ₁,…,ℓ _n) and, since $E(\hat{f}) = n E(f^{*})$, the α-expansion solution can be made arbitrarily bad. The h-fusion algorithm will find f ^∗ if we use the tree shown above, at right. Notice that c ₂=0 for this tree, whereas c ₂=n−1 for a flat tree (α-expansion).

For an example of non-trivial c ₂, consider again the six-label tree structure:

If the only label cost H(L)>0 is on subset L={ℓ ₅,ℓ ₆} then I(L)={3} and so c ₂=|I(L)|−1=0. If instead we have a label cost on L={ℓ ₁,ℓ ₂,ℓ ₅,ℓ ₆} then I(L)={1,3} yielding coefficient c ₂=1. Again, the bound of h-fusion is stronger for energies where c ₂ is small.

Using our definitions of c and c ₂ we state the main theorem of this work: an improvement upon the bound of Delong et al. (2012). For the purposes of the bound we assume D _p(⋅)≥0 and that V is semi-metric.

Theorem 5

If f ^∗ is a global minimum of E=D+V+H with h-metric (V,π) and hierarchical label costs (H,π), then the solution $\hat{f}$ computed by h-fusion is bounded by

$$ E(\hat{f}) \leq(2 c +c_{2})E\bigl( f^{*}\bigr) + \sum_{L \in\mathcal{H}} H(L) $$

(17)

where c and c ₂ are constants given in Definitions 7 and 8 respectively (possibly much smaller than in Theorem 2).

Proof

See Appendix B. □

In the presence of arbitrary label costs, this is still not a constant-ratio approximation bound, but we can construct a worst-case example to show that our bound is indeed tight (see Delong 2011).

7 Application: Hierarchical Color Segmentation

We use hierarchical color segmentation as a simple, illustrative example because it allows us to visualize the effects of hierarchical smooth and label costs. Given an image we wish to group pixels with similar color. We treat segmentation as labeling where each label represents a color; the labels essentially re-colorize the image. However, we explicitly divide the possible colors into groups, and seek a pixel labeling that relies only on a few groups of colors. For example, a natural way to group colors is by hue, and the goal is then to re-color the image using as few hues as possible while staying reasonably faithful to the original image.

The tree below illustrates one possible grouping (hierarchy) of color labels. Each leaf corresponds to a specific color label (e.g. dark red, red, bright red, …) while each subtree corresponds to a group of labels (e.g. reds, blues, greens, grays, …)

In order to limit the number of hues used in re-colorization we introduce group costs in addition to regular label costs. A cost is associated with each group of colors L and is represented by a label subset cost H(L)>0. It is paid whenever any of the colors in the group is used in the labeling. For the smoothness costs V we use hierarchical Potts smoothness terms between and within color groups to encourage smooth re-colorization. If I _p is an image pixel, the data cost D _p(ℓ) is proportional to squared distance between I _p and the color represented by label ℓ.

We thereby formulate the re-colorization problem as hierarchical energy. We compare α-expansion and h-fusion when applied to this energy. In all the re-colorization experiments our color hierarchy consists of 121 groups of colors, each containing 20 different shades varying from dark to bright. This results in 2420 labels in total. We then demonstrate qualitative (the resulting re-colorizations) and quantitative (running time and energy value) comparisons. In all experiments we set w _i=1 and w _r=2. Each invocation of α-expansion performed two cycles only (a cycle expands on each label exactly once). This limitation was applied to all instances of α-expansion within h-fusion as well. (Allowing α-expansion to converge takes much longer but only decreases the energy by <0.01 % for both algorithms.)

Consider the input image shown in Fig. 7 top. It has a smooth gradient of color varying from black to yellow through the shades of green and red. Both α-expansion and h-fusion algorithms result in re-colorizations with 4 groups of colors each, namely shades of green, yellow, orange and red (see bottom left and right). However, it can be seen from the result of α-expansion (bottom right) that the optimization got stuck in a local minimum. By expanding on a wrong group of colors first (wrong hue), α-expansion was unable to match the bright portion of the image well. At this point expanding on any one label of a better matched hue did not justify adding the extra group costs associated with this new hue. In the case of h-fusion the algorithm is able to replace one group of hues with another at once and therefore attain a lower energy (see Fig. 7 bottom-left).

The plot in Fig. 7 provides quantitative comparison between α-expansion and h-fusion in terms of running time and energy values. The blue line corresponds to energy value attained by α-expansion as a function of time. The h-fusion algorithm begins with optimizing a set of sub-energies corresponding to child-labelings. Each child labeling is restricted to one sub-tree of labels and essentially re-colors the image with the colors from that group only. (For example one child-labeling re-colors the image with the shades of red, another with the shades of green, …) The sub-energies are independent and can be optimized either sequentially or in parallel. When sub-energies are optimized sequentially we represent each sub-energy with a pink diamond and plot them as a function of cumulative time. After all sub-energies are optimized, h-fusion algorithm fuses the resulting child-labelings by running α-expansion (starting from the child-labeling with the minimal energy. Again we limit the h-fusion to two cycles only). The energy of h-fusion is represented by the red line and attains a lower energy than regular α-expansion.

Unlike α-expansion, the running time of h-fusion can be dramatically improved by minimizing sub-energies in parallel. This is illustrated by the green line in the plot of Fig. 7. In our specific application the parallel version of h-fusion is faster by a factor of 10–15 compared to sequential h-fusion. At any time in our experiments, the energy curve of parallel h-fusion is dramatically below that of α-expansion, and terminates 20–30 times faster. In theory this speed-up factor should grow linearly with the number of siblings at each level of the label hierarchy. The speed-up is due to the fact that running one expansion cycle with h-fusion is more efficient than with regular α-expansion. This is because the number of unique possible labels in h-fusion corresponds to the number of groups in the hierarchy (121 in our case) while the number of unique labels for α-expansion corresponds to the number of leaves in the hierarchy (2420 colors in our case).

Figure 9 shows similar results for additional input images shown in Fig. 8. For all the experiments sequential h-fusion (pink-red line) attained lower energy and in shorter time than α-expansion (blue line), with even more significant speedup in the case of parallel optimization of the sub-energies in h-fusion (green line).

8 Discussion

The main results of this paper are a characterization of hierarchical costs (h-metrics and h-subsets), the h-fusion algorithm itself, and a significant improvement on the approximation bound of α-expansion. These results are theoretical, but we foresee a number of applications for such energies.

Applications of Hierarchical Costs

We presented hierarchical color segmentation as the simplest possible example that illustrates (a) the nature of energies with hierarchical costs, and (b) the qualitative and quantitative benefits of h-fusion for such energies. However, computer vision is full of problems for which hierarchical costs are natural.

The most obvious is using hierarchical context (e.g. Choi et al. 2010) for image segmentation, where in theory we could group the labels into some appropriate context as depicted below.

This is a very rudimentary form of context but can be integrated with segmentation via an energy with hierarchical V and H terms.

In vision it is also common to assign labels that have geometric meaning, such as depths (e.g. Boykov et al. 2001; Ladický et al. 2010b), homographies or motions (e.g. Birchfield and Tomasi 1999; Isack and Boykov 2012). For example, Isack and Boykov (2012) start with a set of observations (points, matches, etc.) and use random sampling to generate hundreds of candidate geometric models, much the way ransac does (Fischler and Bolles 1981). They formulate the model fitting problem as a labeling problem where each label represents a candidate model. They find a labeling that corresponds to a good configurations of models, and do this by minimizing an energy of the form E=D+V+H. However, there are many situations where geometric models fall into a natural hierarchy. Figure 10 is a hypothetical example to illustrate this point. Analogous hierarchical relationships exist between, for example, a fundamental matrix (a rigid motion) and the family of homographies (families of correspondences) compatible with that fundamental matrix (Hartley and Zisserman 2003).

Furthermore, hierarchical costs can be useful for detecting patterns, for compression, and for learning a database of inter-dependent patches from images (Gorelick et al. 2011). Outside vision, Sefer and Kingsford (2011) showed that the r-hst metrics of Kumar and Koller are effective at identifying protein function; our work could extend their results.

Relation to r-hst Metrics

Recall that, at a high level, the h-fusion process shown in Fig. 5 is the same as that used by Kumar and Koller (2009). Given a metric V, they find the set of r-hst metrics that best approximates V and try to minimize an energy of the form E=D+V using a bottom-up fusion process. The main idea of an r-hst metric is as follows. Assume we are given a tree with distances d(i,j) defined on each edge from child i to parent j. Assume that the distance from j to all its children is uniform, i.e. d(i,j)=d(i′,j) for all $i,i' \in\mathcal{I}(j)$. Further assume that we know the parent-to-child distance gets cheaper by a factor of r as we descend the tree, i.e. $\frac{d(i,j)}{d(k,i)} \ge r$ for some constant r>1. The total distance between two leaf nodes α and β is the cumulative sum of edge distances along the path from α to β in the tree. If the ‘costs’ of a pairwise potential V(α,β) correspond to such a distance function for all α,β, then V is said to be an r-hst metric.

Our concept of an h-metric is expressed directly in terms of constraints on V(⋅,⋅), not on edges or distances traversed in the tree. Furthermore, r-hst metrics are a strict subset of h-metrics (see Appendix A).

Generalizing Facility Location

In the optimization and operations research communities, uncapacitated facility location (UFL) is a well-studied problem (e.g. Shmoys et al. 1998). UFL assigns a ‘facility’ to serve each client such that the cost to clients and the cost of opening facilities is jointly minimized. UFL is connected to our energy because if we let $\mathcal{L}$ denote the facilities and $\mathcal{P}$ denote the clients then every problem instance can be expressed as minimizing an energy of the form

$$ E(f) = \sum_{p \in\mathcal{P}} D_p(f_p) + \sum_{\ell\in\mathcal{L}} H( \ell)\delta_\ell(f). $$

(18)

In vision, the UFL objective has recently been applied to motion segmentation by Li (2007) and by Lazic et al. (2009), but goes all the way back to Torr and Murray (1994).

There exist variants of UFL that allow for a hierarchy of facilities, e.g. Svitkina and Tardos (2006) and Sahin and Süral (2007). This generalization allows for more realistic modeling of complex interdependencies between facilities themselves. Some of these works derive constant-factor approximation bounds for hierarchical facility location, e.g. Kantor and Peleg (2009), but all such works assume metric client costs where the costs D _p(⋅) are computed as distances from a particular center. Without this assumption, Feige’s hardness result still holds. Strategies for optimizing hierarchical UFL include linear programming relaxation, primal-dual algorithms and, very recently, message passing algorithms (Givoni et al. 2011).

We can encode a kind of hierarchical facility cost with our framework as follows. Suppose facilities ℓ ₁ and ℓ ₂ require the services of facility ℓ ₃, which costs 50 to open. A label cost H({ℓ ₁,ℓ ₂,ℓ ₃}):=50 correctly accounts for the shared dependency of ℓ ₁ and ℓ ₂ on ℓ ₃. If we furthermore have a facility ℓ ₄ that depends on both ℓ ₃ and some facility ℓ ₅ (cost 80), then our label costs should instead be H({ℓ ₁,ℓ ₂,ℓ ₃,ℓ ₄}):=50 and H({ℓ ₄,ℓ ₅}):=80.

Furthermore, our h-fusion algorithm can handle smoothness costs V, which to the best of our knowledge are novel for UFL. In the UFL setting, V(f _p,f _q) can encode an explicit preference that clients p and q be serviced by the same facility. When clients are social, there are many scenarios where such a preference makes sense. When client costs D are metric (e.g. Euclidean distance) then this preference is implicitly encoded in D. However, when the client costs are not metric, such as clients connected by an irregular network despite being physically close, then our smoothness costs V may be useful for modeling such problems.

Improving the Bound

Recall from Observation 1 that for the Set-Cover problem the best we can hope for is a $\ln|\mathcal{P}|$-approximation. Yet one can formulate Set-Cover using an energy of the form (18), so minimizing energy E=D+V+H is at least as hard. However, Hochbaum (1982) gave a simple greedy algorithm for Set-Cover and proved that it yields precisely a $\ln |\mathcal{P}|$-approximation, the best possible according to Feige (1998). If label costs are arbitrary in (18), then α-expansion’s bound is also arbitrarily bad. So, there is a huge gap between what α-expansion can achieve on (18) versus what Hochbaum’s greedy algorithm can guarantee. For energies of the form E=D+H, it may be possible to extend Hochbaum’s algorithm and use it as a subroutine within h-fusion(rather than using α-expansion). One may ask if h-fusion could inherit a better approximation bound in that case. We also do not know if her approach can be applied in the presence of smoothness costs V.

Relation to Genetic Algorithms

Within our framework, the inner α-expansion subroutine is performing a sequence of fusion moves like proposed by Lempitsky et al. (2010). We point out that a binary fusion move is essentially an optimized crossover operation, already used to some success in genetic algorithms (Aggarwal et al. 1997; Meyers and Orlin 2007). A standard concern for genetic algorithms is how to maintain population diversity so that, when two chromosomes (labelings) are crossed, there is a chance that the descendant will be better. Our h-fusion process forces a kind of population diversity based on a tree: the labelings in our multi-label fusion each contain labels from different subtrees. It is interesting that this structured-diversity gives a provably better approximation bound in our case.

Notes

Note that α-expansion itself does not require D _p(⋅)≥0; this assumption is only needed for analysis of worst-case bounds.
A tree is irreducible if all its internal nodes have at least two children, i.e. there are no ‘redundant’ parent nodes and so for each i there exists some γ,ζ such that lca(γ,ζ)=i.
Due to our assumption that V is semi-metric and so V(ℓ,ℓ)=0, we can simply sum over all $pq \in \mathcal{A}_{j}$ instead of only where $f^{*}_{p} \neq f^{*}_{q}$.

References

Aggarwal, C. C., Orlin, J. B., & Tai, R. P. (1997). Optimized crossover for the independent set problem. Operations Research, 45(2), 226–234.
Article MathSciNet MATH Google Scholar
Ahuja, R. K., Ergun, Ö., Orlin, J. B., & Punnen, A. P. (2002). A survey of very large-scale neighborhood search techniques. Discrete Applied Mathematics, 123(1–3), 75–202.
Article MathSciNet MATH Google Scholar
Barinova, O., Lempitsky, V., & Kohli, P. (2010). On the detection of multiple object instances using Hough transforms. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010.
Google Scholar
Bartal, Y. (1998). On approximating arbitrary metrics by tree metrics. In ACM symposium on theory of computing (STOC).
Google Scholar
Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In International conference on computer vision (ICCV).
Google Scholar
Boros, E., & Hammer, P. L. (2002). Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1–3), 155–225.
Article MathSciNet MATH Google Scholar
Boykov, Y., & Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In International conference on computer vision (ICCV).
Google Scholar
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Recognition and Machine Intelligence, 29(9), 1124–1137.
Article Google Scholar
Boykov, Y., & Veksler, O. (2006). Graph cuts in vision and graphics: theories and applications. In N. Paragios, Y. Chen, & O. Faugeras (Eds.), Handbook of mathematical models in computer vision (pp. 79–96). New York: Springer.
Chapter Google Scholar
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Recognition and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Choi, M. J., Lim, J. J., Torralba, A., & Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010.
Google Scholar
Cunningham, W., & Tang, L. (1999). Optimal 3-terminal cuts and linear programming. In LNCS, Vol. 1610: Integer programming and combinatorial optimization (pp. 114–125).
Chapter Google Scholar
Delong, A. (2011). Advances in graph-cut optimization. PhD thesis, University of Western Ontario.
Delong, A., Gorelick, L., Schmidt, F. R., Veksler, O., & Boykov, Y. (2011). Interactive segmentation with super-labels. In Energy minimization methods in computer vision and pattern recognition (EMMCVPR), July 2011.
Google Scholar
Delong, A., Osokin, A., Isack, H. N., & Boykov, Y. (2012). Fast approximate energy minimization with label costs. International Journal of Computer Vision, 96(1), 1–27 (Earlier version in CVPR 2010).
Article MathSciNet MATH Google Scholar
Feige, U. (1998). A threshold of lnn for approximating set cover. Journal of the ACM, 45(4), 634–652.
Article MathSciNet MATH Google Scholar
Felzenszwalb, P. F., Pap, G., Tardos, É., & Zabih, R. (2010). Globally optimal pixel labeling algorithms for tree metrics. In IEEE conference on computer vision and pattern recognition (CVPR).
Google Scholar
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Article MathSciNet Google Scholar
Givoni, I. E., Chung, C., & Frey, B. J. (2011). Hierarchical affinity propagation. In Uncertainty in artificial intelligence (UAI), July 2011.
Google Scholar
Goldberg, A. V., & Tarjan, R. E. (1988). A new approach to the maximum-flow problem. Journal of the Association for Computing Machinery, 35(4), 921–940.
Article MathSciNet MATH Google Scholar
Gorelick, L., Delong, A., Veksler, O., & Boykov, Y. (2011). Recursive MDL via graph cuts: application to segmentation. In International conference on computer vision (ICCV), November 2011.
Google Scholar
Greig, D., Porteous, B., & Seheult, A. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society B, 51(2), 271–279.
Google Scholar
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Google Scholar
Hochbaum, D. S. (1982). Heuristics for the fixed cost median problem. Mathematical Programming, 22(1), 148–162.
Article MathSciNet MATH Google Scholar
Isack, H. N., & Boykov, Y. (2012). Energy-based geometric multi-model fitting. International Journal on Computer Vision, 97(2), 123–147.
Article MathSciNet MATH Google Scholar
Kalogerakis, E., Hertzmann, A., & Singh, K. (2010). Learning 3D mesh segmentation and labeling. In ACM SIGGRAPH.
Google Scholar
Kantor, E., & Peleg, D. (2009). Approximate hierarchical facility location and applications to the bounded depth Steiner tree and range assignment problems. Journal of Discrete Algorithms, 7(3), 341–362.
Article MathSciNet MATH Google Scholar
Kleinberg, J., & Tardos, É. (2002). Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. Journal of the ACM, 49, 5.
Article MathSciNet Google Scholar
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.
Article Google Scholar
Kolmogorov, V., & Rother, C. (2007). Minimizing non-submodular functions with graph cuts—a review. IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI), 29(7), 1274–1279
Article Google Scholar
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be optimized via graph cuts. IEEE Transactions on Pattern Recognition and Machine Intelligence, 26(2), 147–159.
Article Google Scholar
Kumar, M. P., & Koller, D. (2009). MAP estimation of semi-metric MRFs via hierarchical graph cuts. In Conference on uncertainty in artificial intelligence (pp. 313–320), June 2009.
Google Scholar
Ladický, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010a). Graph cut based inference with co-occurrence statistics. In European conference on computer vision (ECCV), September 2010.
Google Scholar
Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., & Torr, P. H. S. (2010b) Joint optimisation for object class segmentation and dense stereo reconstruction. In British machine vision conference (BMVC).
Google Scholar
Lazic, N., Givoni, I., Frey, B. J., & Aarabi, P. (2009). FLoSS: facility location for subspace segmentation. In International conference on computer vision (ICCV).
Google Scholar
Lempitsky, V., Rother, C., Roth, S., & Blake, A. (2010). Fusion moves for Markov random field optimization. IEEE Transactions on Pattern Analysis and Machine Inference, 32, 1392–1405.
Article Google Scholar
Li, S. Z. (1994). Markov random field modeling in image analysis. Berlin: Springer.
Google Scholar
Li, H. (2007). Two-view motion segmentation from linear programming relaxation. In IEEE conference on computer vision and pattern recognition (CVPR).
Google Scholar
Meyers, C., & Orlin, J. B. (2007). Very large-scale neighborhood search techniques in timetabling problems. In Practice and theory of automated timetabling (Vol. VI, p. 24).
Google Scholar
Olsson, C., Byröd, M., Overgaard, N. C., & Kahl, F. (2009). Extending continuous cuts: anisotropic metrics and expansion moves. In International conference on computer vision, October 2009.
Google Scholar
Pock, T., Schoenemann, T., Graber, G., Bischof, H., & Cremers, D. (2008). A convex formulation of continuous multi-label problems. In European conference on computer vision (ECCV), October 2008.
Google Scholar
Pock, T., Chambolle, A., Bischof, H., & Cremers, D. (2009). A convex relaxation approach for computing minimal partitions. In IEEE conference on computer vision and pattern recognition (CVPR), June 2009.
Google Scholar
Potts, R. B. (1952). Some generalized order-disorder transformations. Mathematical Proceedings of the Cambridge Philosophical Society, 48, 106–109.
Article MathSciNet MATH Google Scholar
Rother, C., Kumar, S., Kolmogorov, V., & Blake, A. (2005). Digital tapestry. In IEEE conference on computer vision and pattern recognition (CVPR).
Google Scholar
Rother, C., Kolmogorov, V., Lempitsky, V., & Szummer, M. (2007). Optimizing binary MRFs via extended roof duality. In IEEE conference on computer vision and pattern recognition (CVPR), June 2007.
Google Scholar
Sahin, G., & Süral, H. (2007). A review of hierarchical facility location models. Computers and Operations Research, 34(8), 2310–2331.
Article MathSciNet MATH Google Scholar
Sefer, E., & Kingsford, C. (2011). Metric labeling and semi-metric embedding for protein annotation prediction. In Research in computational molecular biology.
Google Scholar
Shmoys, D. B., Tardos, É., & Aardal, K. (1998). Approximation algorithms for facility location problems. In ACM symposium on theory of computing (STOC) (pp. 265–274).
Google Scholar
Strandmark, P., & Kahl, F. (2010). Parallel and distributed graph cuts by dual decomposition. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010.
Google Scholar
Svitkina, Z., & Tardos, É. (2006). Facility location with hierarchical facility costs. In ACM-SIAM symposium on discrete algorithms (SODA).
Google Scholar
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In European conference on computer vision (ECCV) (pp. 16–29).
Google Scholar
Torr, P. H. S. (1998). Geometric motion segmentation and model selection. In Philosophical transactions of the royal society A (pp. 1321–1340).
Google Scholar
Torr, P. H. S., & Murray, D. (1994). Stochastic motion clustering. In European conference on computer vision (ECCV).
Google Scholar
Veksler, O. (1999). Efficient graph-based energy minimization methods in computer vision. PhD thesis, Cornell University.
Werner, T. (2008). High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF). In IEEE conference on computer vision and pattern recognition (CVPR), June 2008.
Google Scholar
Woodford, O. J., Rother, C., & Kolmogorov, V. (2009). A global perspective on map inference for low-level vision. In International conference on computer vision (ICCV), October 2009.
Google Scholar
Yuan, J., & Boykov, Y. (2010). TV-based multi-label image segmentation with label cost prior. In British machine vision conference (BMVC), September 2010.
Google Scholar
Zhou, Q., Wu, T., Liu, W., & Zhu, S.-C. (2011). Scene parsing by data-driven cluster sampling. International Journal of Computer Vision.
Zhu, S.-C., & Yuille, A. L. (1996). Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(9), 884–900.
Article Google Scholar

Download references

Acknowledgements

We wish to thank the anonymous reviewers for careful reading and helpful comments. This work was supported by NSERC Discovery Grant R3584A02, the Canadian Foundation for Innovation (CFI), and the Early Researcher Award (ERA) program.

Author information

Authors and Affiliations

Department of Computer Science, University of Western Ontario, London, ON, Canada, N6A 5B7
Andrew Delong, Lena Gorelick, Olga Veksler & Yuri Boykov

Authors

Andrew Delong
View author publications
You can also search for this author in PubMed Google Scholar
Lena Gorelick
View author publications
You can also search for this author in PubMed Google Scholar
Olga Veksler
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Boykov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Delong.

Appendices

Appendix A: Proof of Metric Relationships

Pair (V,π) forms a tree metric if V represents an edge-weighted distance in tree π. This means that V(α,β)=d(α,β) where d(α,β) is the sum of edge weights d _ij≥0 along a path from leaf α to leaf β. A tree metric (V,π) is therefore entirely parameterized by its edge weights d _ij where j=π(i). An r-hst metric is just a tree metric where edge costs get cheaper by a factor of $\frac{1}{r}<1$ as we descend the tree, i.e. $d_{ij} \leq\frac{1}{r} d_{jk}$ for j=π(i),k=π(j). So, r-hst metrics are a subclass of tree metrics by definition.

[Tree metrics ⊂ h-metrics]: For a tree metric to be an h-metric, d must satisfy (according to Definition 3, p. 7)

(19)

For each $i \in\mathcal{L}\cup\mathcal{S}$ use shorthand j=π(i) and consider that

(20)

(21)

(22)

(23)

Use inequalities (20) and (21) to replace the left-hand side of (19) and cancel terms with (22) and (23) to get d(α ₁,i)+d(i,α ₂)≤d(α ₁,j)+d(j,α ₂), which is clearly satisfied since d _ij≥0. To see that some (non-h-Potts) h-metrics are not tree metrics, consider the tree and symmetric smoothness cost V(⋅,⋅) below.

[(h-Potts ∩ h-metrics) ⊈ tree metrics]: The example below is a simple h-Potts potential which is also an h-metric but is not a tree metric.

The fact that it is not a tree metric can be verified by setting up a linear program relating edge costs d _ij to node costs w _i, and noting that the system is infeasible if d _ij≥0.

[(h-Potts with w _i≤w _π(i)) ⊂ (h-Potts ∩ tree metrics)]: If node costs $\{w_{i}\}_{i \in\mathcal{S}\cup\{r\}}$ are non-negative and do not increase as we descend the tree (i.e. w _i≤w _π(i)) then we can construct a tree metric by induction. Given some node $j \in\mathcal{S}\cup\{r\}$, assume we have non-negative edge costs so that, for each child $i \in\mathcal{I}(j)$, $d(\alpha,i) = \frac {1}{2}w_{i}$ for all $\alpha\in\mathcal{L}_{i}$. Then we can assign cost $d_{ij} = \frac{1}{2}(w_{j} - w_{i})$ to each child edge of j to get $d(\alpha,j) = \frac{1}{2}w_{j}$ for all $\alpha\in\mathcal{L}_{j}$. Since w _i≤w _j we also have a tree metric for subtree j. It is not necessary to assume w _i≤w _π(i) for an h-Potts potential to be a tree metric, as the example below demonstrates (edge costs are shown on the tree).

[(h-Potts ∩ r-hst metrics) ⊂ (h-Potts with w _i≤w _π(i))]: As described by Kumar and Koller (2009), an r-hst metric has a constant edge cost d _ij between node j and all of its children $i \in\mathcal{I}(j)$. In other words, an r-hst metric is actually parameterized by one common ‘edge’ cost per parent node $\{d_{j}\}_{j \in\mathcal{S}\cup\{r\} }$, where $0 \leq d_{i} \leq\frac{1}{r} d_{\pi(i)}$ for all $i \in \mathcal{S}$. It is easy to see that, for an h-Potts potential to be an r-hst metric, it must have w _i=w _j−2d _j where j=π(i). Thus d _j≥0 implies w _i≤w _j. Also note that r>1 means quantity w _j−w _i must decrease at a rate of $\frac {1}{r}$ as we descend the tree. □

Appendix B: Proof of Theorem 5

Proof

Without loss of generality we assume that all weights w _pq=1. Consider any local minimum $\hat{f}^{j}$ computed by h-fusion at internal node j, and let us choose some child node $i \in\mathcal{I}(j)$. We first define a useful set of pixels for i with respect to a global optimum f ^∗

$$ \mathcal{P}_i = \bigl\{ p : f_p^{*} \in\mathcal{L}_i \bigr\}. $$

This set contains all pixels assigned a label within subtree i, and so for any other child i′≠i we know that $\mathcal{P}_{i} \cap \mathcal{P}_{i'} = \emptyset$.

We can produce a labeling $\hat{f}^{j \otimes i}$ within one h-fusion move from local minimum $\hat{f}^{j}$ as follows:

$$ \hat{f}^{j \otimes i}_p = \left\{ \begin{array}{l@{\quad}l} \hat{f}^i_p & \text{if $p \in\mathcal{P}_{i}$}\\ [3pt] \hat{f}^j_p & \text{otherwise}. \end{array} \right. $$

Since each $\hat{f}^{j}$ is known to be a local optimum w.r.t. expansion moves for each $i \in\mathcal{I}(j)$ we know that

$$ E\bigl(\hat{f}^j\bigr) \leq E\bigl( \hat{f}^{j \otimes i}\bigr). $$

(24)

The general strategy to use (24) for different i to build an inequality that is ultimately of the form $E(\hat{f}^{j}) \leq E(f^{*}) + \mathrm{error}$. This will be achieved by breaking the energy terms in E into parts in such a way that a recursive inequality can be established. The recursive inequality will then be expanded until all terms can be bounded relative to E(f ^∗).

Let $E(\cdot)|_{\mathcal{A}}$ denote a restriction of the summands of energy (1) to only the following terms:

$$ E(f)|_\mathcal{A} = \sum_{p \in\mathcal{A}} D_p( f_p ) + \sum _{pq \in\mathcal{A}} V( f_p, f_q). $$

We separate the unary and pairwise terms of E(f) via interior, exterior, and boundary sets with respect to pixels $\mathcal{P}_{i}$:

Let E _H(f) denote the total label cost incurred by a labeling f, i.e. the sum of label cost terms. The following facts now hold:

(25)

(26)

We have not accounted for the label costs yet, but for simplicity we break this proof into two parts: part 1 derives the coefficient c related to smoothness costs V, and part 2 derives the coefficient c ₂ related to label costs H. For part 1 we can assume there are no label costs at all.

Part 1. Derive Coefficient c for Smoothness Cost Bound

Using (25) and (26) we can cancel out all the $\overline{\mathcal{A}}_{i}$ terms and rewrite (24) as

$$ E\bigl(\hat{f}^j\bigr)|_{\mathcal{A}_i \cup \partial\mathcal{A}_i} \leq E \bigl(\hat{f}^i\bigr)|_{\mathcal{A}_i} + E\bigl(\hat{f}^{j \otimes i}\bigr)|_{\partial\mathcal{A}_i}. $$

(27)

For each $i \in\mathcal{I}(j)$ inequality (27) contains a subset of all the energy terms in $E(\hat{f}^{j})|_{\mathcal{A}_{j}}$ pertaining to pixels $\mathcal {P}_{i}$. Let $\mathcal{I}^{*} = \{ i \in\mathcal{I}(j) : \mathcal{P}_{i} \neq \emptyset\}$ be the set of children whose sub-trees contain a label used by f ^∗. If we sum inequality (27) over all $i \in\mathcal{I}^{*}$, the left-hand side will contain all the terms in $E(\hat{f}^{j})|_{\mathcal{A}_{j}}$ (and more). Adding up all the left-hand sides we have

(28)

Using (28) and likewise adding up the right-hand sides of (27) we have

(29)

The first important observation about (29) is that each $E(\hat{f}^{i})_{\mathcal{A}_{j}}$ term on the right-hand side can be substituted by recursively applying the inequality itself. We can recursively substitute, branching further and further down the tree, until the path finally stops at a leaf $\ell\in\mathcal{L}$ giving us base case $E(\hat{f}^{\ell})|_{\mathcal{A}_{\ell}} = \sum_{p \in \mathcal{P}_{\ell}} D_{p}(f_{p}^{*})$. The sets $\{\mathcal{P}_{\ell}\}_{\ell\in\mathcal{L}}$ must be disjoint and their union is $\mathcal{P}_{j}$ so expression (29), when fully expanded, becomes roughly

(30)

The second observation about (29) is that each edge pq on an outer boundary $\partial\mathcal{A}_{i} \cap\partial \mathcal{A}_{j}$ appears once in the sum over $\mathcal{I}^{*}$ whereas each edge on an interior boundary $\partial\mathcal{A}_{i} \setminus\partial\mathcal{A}_{j}$ appears twice: once for $p \in\mathcal{A}_{i}$ and once for some $q \in\mathcal{A}_{i'}$. By careful accounting we collect all the $V(\hat{f}^{i}_{p},\hat{f}^{\pi(i)}_{q})$ terms generated by the recursive substitution and express (29) as^{Footnote 3}

(31)

where we define $\mathcal{J}(\ell;\ell')$ to be the set of nodes along the path from a label $\ell\in\mathcal{L}$ up to, but not including, the lowest common ancestor of ℓ and ℓ′, namely

All that remains is to bound each $V(\hat{f}^{i}_{p},\hat{f}^{\pi(i)}_{q})$ in terms of $V(f^{*}_{p},f^{*}_{q})$ using b _i described in Definition 7. From now on we use $a_{i} = V ^{\max}_{i}$ and $d_{i} = V^{\min}_{i}$ as shorthand. For a particular edge pq shown in (31) we must have each $V(\hat{f}^{i}_{p},\hat{f}^{\pi(i)}_{q}) \leq a_{\pi(i)}$ and so their sum is

(32)

We also know that $V(f^{*}_{p},f^{*}_{q}) \geq d_{\mathrm{lca}(f^{*}_{p},f^{*}_{q})}$ so we can use ratio $\frac{b_{\mathrm{lca}(f^{*}_{p},f^{*}_{q})}}{d_{\mathrm {lca}(f^{*}_{p},f^{*}_{q})}}$ to bound the approximation error at each edge pq appearing in (31), giving upper-bound

(33)

If j is the root of the tree, then $\{p \in\mathcal{A}_{j} \} = \mathcal{P}$ and $\{ pq \in\mathcal{A}_{j} \} = \mathcal{N}$. Using the fact that any ratio $\frac{b_{i}}{d_{i}}$ is bounded from above by quantity c (Definition 7) we arrive at

(34)

(35)

(36)

This completes the proof of Part 1. When there are only smoothness costs, $E(\hat{f}) \leq2c E(f^{*})$ where $\hat{f}$ is the labeling generated at the root of the tree.

Part 2. Derive Coefficient c ₂ for Label Cost Bound

We now revisit from (27) onward but with the assumption that there are hierarchical label costs.

Let E _H(f) denote the total label cost incurred by a labeling f, i.e.the sum of label cost terms. We can bound the label cost $E_{H}(\hat{f}^{j \otimes i})$ of our fused labeling by

$$ E_{H}\bigl(\hat{f}^{j \otimes i}\bigr) \leq E_{H}\bigl(\hat{f}^j\bigr) + \sum_{\substack{L \subseteq \mathcal{L}\setminus\hat{\mathcal{L}}_j\\{L}\cap\hat{\mathcal {L}}_i \neq \emptyset}} H(L) $$

(37)

where $\hat{\mathcal{L}}_{j}$ and $\hat{\mathcal{L}}_{i}$ are the sets of unique labels appearing in $\hat{f}^{j}$ and $\hat{f}^{i}$ respectively.

Recall from Part 1 that, looking at the key inequality (24), we can break the energy terms on each side into parts based on sets $\mathcal{A}_{i}, \overline{\mathcal {A}}_{i}$, and $\partial\mathcal{A}_{i}$. Because $E(\hat{f}^{j \otimes i})|_{\overline{\mathcal{A}}_{i}} = E(\hat{f}^{j})|_{\overline{\mathcal{A}}_{i}}$ these terms cancel out, and we can substitute $E(\hat{f}^{j \otimes i})|_{\mathcal{A}_{i}} = E(\hat{f}^{i})_{\mathcal{A}_{i}}$. Along with bound (37) and canceling the $E_{H}(\hat{f}^{j})$ terms we can now rewrite (24) as

$$ E\bigl(\hat{f}^j\bigr)|_{\mathcal{A}_i \cup \partial\mathcal{A}_i} \leq E\bigl(\hat{f}^i\bigr)|_{\mathcal{A}_i} + E\bigl( \hat{f}^{j \otimes i}\bigr)|_{\partial\mathcal{A}_i} + \sum _{\substack{L \subseteq\mathcal{L}\setminus\hat{\mathcal{L}}_j\\ {L}\cap \hat{\mathcal{L}}_i \neq\emptyset}} H(L). $$

(38)

Again, let $\mathcal{I}^{*} = \{ i \in\mathcal{I}(j) : \mathcal{P}_{i} \neq\emptyset\}$ be the set of child nodes that contain a label used by f ^∗ in their subtree. We sum inequality (38) over all $i \in\mathcal {I}^{*}$ to arrive at a recursive expression, this time incorporating errors incurred by label costs. The key observation is that a particular label cost H(L) appears once on the right-hand side for each element in the set $\mathcal{I}^{*}_{L} = \{ i \in\mathcal{I}^{*} : L \cap\hat{\mathcal{L}} _{i} \neq\emptyset\}$. The sum of inequalities (38) thus implies

(39)

where the quantity in parentheses is identical to that of Part 1.

The above inequality can be recursively expanded for each $E(\hat{f} ^{i})|_{\mathcal{A}_{i}}$ until the recursion stops at a label used by f ^∗. We already know that, after recursive substitution, the quantity in parentheses is bounded by (33). We now must bound the total label cost accumulated by recursive application of (39). The central question is whether a particular subset L that appears in (39) with $|\mathcal{I}^{*}_{L}|>0$ for node j can appear again when we recursively substitute the children $i \in\mathcal{I}^{*}$. If the answer were ‘yes’ then each label cost H(L) could appear more than $|\mathcal{I}^{*}_{L}|$ total times by the end of recursive expansion, leading to a worse bound. Fortunately, Lemma 1 (after this proof) says that this is not the case; each L appearing in the sum for j and child i (38) can never reappear in the sums for i or its children.

From now on we assume j is the root of the tree structure, and so $\hat{f}^{j} = \hat{f}$, i.e.the final labeling output by h-fusion. If we let $\mathcal{H}^{*}$ denote the set of all subsets L generated by recursive substitution of (39), we can thereby write

(40)

Note that the left-hand side of (40) is still $E(\hat{f}^{j})|_{\mathcal{A}_{j}}$ which does not include the label costs incurred by $\hat{f}^{j}$. By adding $E_{H}(\hat{f}^{j})$ to both sides we have $E(\hat{f} ^{j})|_{\mathcal{A}_{j}} + E_{H}(\hat{f}^{j}) = E(\hat{f})$ on the left side, giving a new inequality below.

(41)

All that is left is to re-group the summands in the last three terms (the label cost terms) in a way that proves our theorem. First we rewrite the three sums more explicitly, using $\hat{\mathcal{L}}$ and L ^∗ to denote the unique labels used by $\hat{f}= \hat{f}^{j}$ and f ^∗ respectively.

(42)

First note that if $|\mathcal{I}^{*}_{L}|>1$ then this means $L \supset \mathcal{L}_{i}$ for some $\mathcal{L}_{i} \cap L^{*} \neq\emptyset$ and so L∩L ^∗≠∅ also. We can break the last sum in (42) into two parts based on whether L∩L ^∗≠∅.

(43)

We can also show that $L \in\mathcal{H}^{*} \Rightarrow L \cap \hat{\mathcal{L}}= \emptyset$ as follows. If $L \in\mathcal{H}^{*}$ then there must be some node i such that $L \cap\hat{\mathcal{L}}_{i} = \emptyset$ and $L \subset\mathcal{L}_{i}$. We know from (52) in Lemma 1 that $\hat{\mathcal{L}}\cap\mathcal{L}_{i} \subseteq\hat{\mathcal {L}}_{i}$, so this implies $\emptyset= L \cap\hat{\mathcal{L}}_{i} \supseteq L \cap(\hat {\mathcal{L}}\cap \mathcal{L}_{i}) = L \cap\hat{\mathcal{L}}$. This means the two leftmost sums of (43) have disjoint L and can be bounded by simply $\sum_{L \in\mathcal{H}} H(L)$. It furthermore implies that, for every L appearing in the rightmost sum of (43), the same L must appear in the negative sum. Putting these together we have upper bound on label costs

(44)

We can therefore revise bound (41) to

(45)

Inequality (45) is main result of our theorem. □

Lemma 1

If label subset L appears in the summand of (38) for node j and child i, then L does not appear in the summands of (38) for any k∈subtree(i).

Proof

To be clear, let us restate the claim more formally. Let $\mathcal{H}^{j \otimes i}$ denote all subsets L appearing in the label cost summands of (38) when applied to node j and child i, i.e.

$$ \mathcal{H}^{j \otimes i} \> \lower1pt \hbox{$\buildrel\mathrm{def} \over= $}\>\{ L : L \cap \hat{\mathcal{L}}_j = \emptyset, L \cap\hat{\mathcal{L}}_i \neq \emptyset\}. $$

(46)

We must prove that $L \in\mathcal{H}^{j \otimes i} \Rightarrow L \notin \mathcal{H}^{k \otimes l}$ for any k∈subtree(i) and $l \in \mathcal{I}(k)$.

First note that for each $L \in\mathcal{H}^{j \otimes i}$ we have

(47)

(48)

By the hierarchical label cost assumption (Definition 4) we can use (47) and (48) to conclude that $L \in \mathcal{H}^{j \otimes i} \Rightarrow L \subset\mathcal{L}_{j}$.

Now consider the set $\mathcal{H}^{j \otimes i} \cap\mathcal{H}^{k \otimes l}$. By the definition (46) an element L of this joint set must satisfy at least the following conditions:

(49)

(50)

(51)

However, no subset L can satisfy all three conditions, as we now show. In the h-fusion algorithm, if $\hat{f}^{i}$ contains a label $\ell\in\mathcal{L}_{k}$, then $\hat{f}^{k}$ must contain ℓ as well—after all, there is no other way that a label in $\mathcal{L}_{k}$ could have propagated up to $\hat{f}^{i}$. This relation can be restated as

$$ \hat{\mathcal{L}}_i \cap\mathcal{L}_k \subseteq\hat{\mathcal{L}}_k \quad\forall k \in\mathrm{subtree}(i). $$

(52)

Starting from (49) we can say

which contradicts requirement (50). Thus $\mathcal{H}^{j \otimes i} \cap\mathcal{H}^{k \otimes l} = \emptyset$ for all k∈subtree(i) and so L cannot reappear. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delong, A., Gorelick, L., Veksler, O. et al. Minimizing Energies with Hierarchical Costs. Int J Comput Vis 100, 38–58 (2012). https://doi.org/10.1007/s11263-012-0531-x

Download citation

Received: 15 October 2011
Accepted: 21 April 2012
Published: 09 May 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11263-012-0531-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Minimizing Energies with Hierarchical Costs

Abstract

Similar content being viewed by others

Generalized Fusion Moves for Continuous Label Optimization

Curve Propagation, Level Set Methods and Grouping

Ground Truth Energies for Hierarchies of Segmentations

Explore related subjects

1 Introduction

2 Review of Related Work

Definition 1

Observation 1

Our contributions

3 Review of α-Expansion

3.1 How α-Expansion Works

3.2 Graph Cuts and the Limits of α-Expansion

Observation 2

3.3 Approximation Bounds of α-Expansion

Theorem 1

Theorem 2

4 Energies with Hierarchical Costs

4.1 Hierarchical Smoothness Costs (h-Potts, h-Metrics)

Definition 2

Definition 3

4.2 Hierarchical Label Costs (h-Subsets)

Definition 4

Definition 5

5 Hierarchical Fusion Algorithm (h-Fusion)

Example 1

Theorem 3

Proof

Example 2

Theorem 4

Proof

6 Approximation Bounds of h-Fusion

Definition 6

Definition 7

Observation 3

Definition 8

Observation 4

Example 3

Theorem 5

Proof

7 Application: Hierarchical Color Segmentation

8 Discussion

Applications of Hierarchical Costs

Relation to r-hst Metrics

Generalizing Facility Location

Improving the Bound

Relation to Genetic Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proof of Metric Relationships

Appendix B: Proof of Theorem 5

Proof

Part 1. Derive Coefficient c for Smoothness Cost Bound

Part 2. Derive Coefficient c 2 for Label Cost Bound

Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Part 2. Derive Coefficient c ₂ for Label Cost Bound