Keywords

1 Introduction

In this work, we investigate combinations of watershed hierarchies through their saliency maps. Other approaches can be found in the literature, such as [6, 14].

Figure 1 provides an example of two different watershed hierarchies, an area-based one and a dynamics-based one, and their combination, which are built from successive filterings of an initial watershed segmentation. Hierarchies are represented thanks to their saliency maps [1, 3, 4, 7, 11], i.e., edge-weighted graphs which comprise information of hierarchical contours. We observe that the sky is oversegmented at high levels in the first hierarchy and that the beach ground is oversegmented in the second hierarchy, but this is balanced by their combination. This observation is general, and no hierarchy is optimal for a whole image. We expect combinations of hierarchies to perform better, as illustrated for instance in [4], where the authors have shown that a simple combination of area and dynamics-based watershed hierarchies produces better visual results than the individual hierarchies.

Fig. 1.
figure 1

First row from left to right: original image, saliency map and the \(50^{th}\) highest level set of area based hierarchy. Second row from left to right: saliency map and the \(50^{th}\) highest level sets of dynamics and the combination of area and dynamics based hierarchies.

The main contributions of this paper are:

  • Investigation of combinations of hierarchies through their saliency maps using supremum, infimum, linear combination and concatenation functions; and

  • Evaluation of supervised and unsupervised combinations of watershed hierarchies.

The plan of the paper is the following. We first review basic notions on hierarchies and saliency maps and present several ways for combining hierarchies (Sect. 2). Then we describe our assessment methodology in Sect. 3. The evaluation results and improvement of combinations compared to the individual watershed hierarchies are described in Sect. 4.

2 Combination of Hierarchies

In this section, we give the notations needed to define combinations of hierarchies through saliency maps and the types of combinations investigated here.

2.1 Hierarchy of Quasi Flat Zones and Saliency Map

This section presents the formal definitions of graphs, hierarchy of partitions, quasi-flat zones hierarchy and saliency map.

A graph is a pair \(G=(V, E)\), where V is a finite set and E is a set of unordered pairs of distinct elements of V, i.e., \(E \subseteq \{\{x,y\} \subseteq V, x \ne y\}\). Each element of V is called a vertex or a point (of G), and each element of E is called an edge (of G).

Let \(G= (V,E)\) be a graph. Let X be a subset of V. A sequence \(\pi =\langle x_0, \dots , x_n\rangle \) of elements of X is a path (in X) from \(x_0\) to \(x_n\) if, for any i in \(\{1, \ldots , n\}\)\(\{x_{i-1},x_i\}\) is an edge of G. The subset X of V is said to be connected if for any x and y in X, there exists a path from x to y. A subset X of V is a connected component of G if X is connected and, if, for any connected subset Y of V, if \(X \subseteq Y\), then we have \(X = Y\). The set of connected components of a graph G is denoted by \(\mathbf {C}(G)\).

Let V be a set. A partition of V is a set \(\mathbf {P}\) of non empty disjoint subsets of V whose union is V. If \(\mathbf {P}\) is a partition of V, any element of \(\mathbf {P}\) is called a region of \(\mathbf {P}\). Given a graph \(G=(V, E)\), the set \(\mathbf {C}(G)\) of all connected components of G is a partition of V.

Let V be a set and let \(\mathbf {P}_1\) and \(\mathbf {P}_2\) be two partitions of V. We say that \(\mathbf {P}_1\) is a refinement of \(\mathbf {P}_2\) if every element of \(\mathbf {P}_1\) is included in an element of \(\mathbf {P}_2\). A hierarchy (of partitions) is a sequence \({\mathcal {H}} =(\mathbf {P}_0, \dots , \mathbf {P}_n)\) of partitions of V such that \(\mathbf {P}_{i-1}\) is a refinement of \(\mathbf {P}_{i}\), for any i in \(\{1, \dots , n\}\) and such that \(\mathbf {P}_n = \{V\}\). A partition of a hierarchy \({\mathcal {H}}\) is called a level set of the hierarchy.

Let G be a graph, if w is a map from the edge set of G to the set \({\mathbb {R}}^+\) of positive real numbers, then the pair (Gw) is called an (edge-)weighted graph. If (Gw) is an edge-weighted graph, for any edge u of G, the value w(u) is called the weight of u (for w).

Important Notation. In the sequel of this article, we consider a weighted graph (Gw). We assume that the vertex set of G is connected. We also denote by \({\mathbb {W}}\) the range of w, i.e., the set \(\{w(u) \; | \;u \in E\}\) and by \({\mathbb {W}}^\bullet \) the set \({\mathbb {W}} \cup \{k+1\}\), where k is the greatest value of \({\mathbb {W}}\).

Let \(\lambda \) be any element in \({\mathbb {R}}\). We denote by \(G_\lambda \) the graph \((V,E_\lambda )\) such that \(E_{\lambda }=\{e \in E\ | \ w(e)<\lambda \}\). The set \(\mathbf {C}(G_\lambda )\) of all connected components of \(G_\lambda \) is called the \(\lambda \)-level partition of G. The sequence

$$\begin{aligned} \mathcal {QFZ}(w) = (\mathbf {C}(G_\lambda ) \; | \;\lambda \in {\mathbb {W}}^\bullet ) \end{aligned}$$
(1)

is a hierarchy called the Quasi-Flat Zones hierarchy of w.

The saliency map of a hierarchy \({\mathcal {H}} = (\mathbf {P}_0, \ldots , \mathbf {P}_n)\) is a map from E to \(\{0,\dots , n\}\), denoted by \(\varPhi ({\mathcal {H}})\), such that, for any edge \(e=\{x, y\}\) in E, we have \(\varPhi ({\mathcal {H}})(e) \) is the greatest value i in \(\{0, \dots , n\}\) such that x and y do not belong to the same region of \(\mathbf {P}_i\).

In [4], the authors provide a bijection between saliency maps and hierarchies based on quasi-flat zones hierarchies. Hence, a hierarchy is equivalently represented by its saliency map, a property that is particularly useful in the remaining part of this article.

Note also that, for visualization purposes, when the graph G is associated to a digital image [3, 4, 7, 11], saliency maps can be visualized with images, called ultrametric contour maps in [1], in which the contours brightness is proportional to their saliency values.

2.2 Generic Scheme of Combination of Hierarchies

Combining partitions and, a fortiori, hierarchies is not straightforward. This problem has been tackled in [2, 4, 7] thanks to the use of saliency maps and we follow the same approach as used in those papers. More precisely, in order to combine two hierarchies \({\mathcal {H}}_1\) and \({\mathcal {H}}_2\), built from the same fine partition, we proceed in three steps [2, 4, 7]: first the saliency maps of \({\mathcal {H}}_1\) and \({\mathcal {H}}_2\) are considered, then the two saliency maps are combined to obtain new weights on the edges of G, and, finally, the combination of hierarchies is the quasi-flat zones hierarchy of the new weight function.

Let \({\mathcal {F}}\) be the set of all maps from E into \({\mathbb {R}}^+\). Let n be any positive integer, any map c from \({\mathcal {F}}^n\) into \({\mathcal {F}}\) is called a combining n-weight function.

Given a sequence of hierarchies \(({\mathcal {H}}_1, \dots , {\mathcal {H}}_n)\) and a combining n-weight function c, the combinations of \(({\mathcal {H}}_1, \dots , {\mathcal {H}}_n)\) by c is the hierarchy \({\mathcal {H}}_c({\mathcal {H}}_1, \dots , {\mathcal {H}}_n)\) defined by:

$$\begin{aligned} {\mathcal {H}}_{c} ({\mathcal {H}}_1, \dots , {\mathcal {H}}_n) = \mathcal {QFZ}(c(\varPhi ({\mathcal {H}}_1), \dots , \varPhi ({\mathcal {H}}_n))). \end{aligned}$$
(2)

2.3 Combining n-weight Functions

We consider three classical functions in the instantiation of the combining n-weight function (supremum, infimum and linear combination) and we propose a new type of combination called concatenation of hierarchies.

The supremum, infimum and linear combination functions are respectively denoted by \(\curlyvee \), \(\curlywedge \) and \(\boxplus _{\varTheta }\). Given a sequence \((w_1, \dots , w_n)\) of n saliency maps and a sequence \(\varTheta =(\alpha _1, \dots , \alpha _{n-1})\) of \(n-1\) values in \({\mathbb {R}}\) such that \((\alpha _1+\dots + \alpha _{n-1}) \le 1\) and \(\alpha _i \ge 0\) for \(i \in \{1, \dots , n-1\}\), the linear combination of \((w_1, \dots , w_n)\) parametrized by \(\varTheta \) is the sum \(\alpha _1 w_1 + \dots + \alpha _{n-1} w_{n-1} + (1-\alpha _1 - \dots - \alpha _{n-1})w_n\). We denote by A the case where the linear combination is equal to the average. One example of combination of hierarchies by infimum is shown in Fig. 3.

The concatenation of hierarchies is based on merging different level sets of each hierarchy. This type of combination can be useful, for example, when one hierarchy \({\mathcal {H}}_1\) succeeds at describing the small details of an image at lower level sets, but fails at filtering the small regions to capture the main large objects at higher level sets. Therefore, it can be interesting to concatenate \({\mathcal {H}}_1\) with another hierarchy \({\mathcal {H}}_2\) whose high level sets describe well the important regions in the image. This general idea is represented in Fig. 2.

Fig. 2.
figure 2

Concatenation of low levels of \({\mathcal {H}}_1\) with high levels of \({\mathcal {H}}_2\).

Given two weight maps \(w_1\) and \(w_2\) and a threshold value \(\lambda \), the concatenation of \(w_1\) and \(w_2\) consists in: (i) setting to zero all weights of \(w_2\) lower than \(\lambda \); (ii) setting to \(\lambda \) all weights of \(w_1\) greater than \(\lambda \); and (iii) computing the supremum of the two maps obtained at steps (i) and (ii). More generally, given a sequence \((w_1, \dots , w_n)\) of n weight maps and a series \((\lambda _{1}, \dots , \lambda _{n-1})\) of \(n-1\) threshold values in \({\mathbb {R}}\) such that \(\lambda _{1}< \lambda _{2}< \dots <\lambda _{n-1}\), we define the concatenation of \((w_1, \dots , w_n)\) parametrized by \((\lambda _{1}, \dots , \lambda _{n-1})\), thanks to the combining n-weight function \(\uplus _{\varTheta }\), by:

$$\begin{aligned} \forall e \in E, \uplus _{\varTheta }(w_1, \dots , w_n)(e) = max\{T(w_1(e), 0, \lambda _{1}), \dots , T(w_n(e), \lambda _{n-1}, \infty )\} \end{aligned}$$
(3)

where, given a, b and \(c \in {\mathbb {R}}\), we have T(abc) equals 0 if a is lower than b and equals min(ac) otherwise. Consequently, given a sequence of hierarchies \(({\mathcal {H}}_1, \dots , {\mathcal {H}}_n)\) and threshold values \(\varTheta = (\lambda _{1}, \dots , \lambda _{n-1})\), the concatenation of \(({\mathcal {H}}_1, \dots , {\mathcal {H}}_n)\) with parameter \(\varTheta \) is \({\mathcal {H}}_{\uplus _{\varTheta }} ({\mathcal {H}}_1, \dots , {\mathcal {H}}_n)\). One example of concatenation of two hierarchies is shown in Fig. 3.

Fig. 3.
figure 3

Illustration of combination by infimum and concatenation of a pair of hierarchies. First row from left to right: \({\mathcal {H}}_1\), \(\varPhi ({\mathcal {H}}_1)\), \({\mathcal {H}}_{2}\) and \(\varPhi ({\mathcal {H}}_2)\). Second row from left to right: \(\curlywedge (\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\), \(\mathcal {QFZ}(\curlywedge (\varPhi ({\mathcal {H}}_1),\varPhi ({\mathcal {H}}_2)))\), \(\uplus _{(2)}(\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\) and \(\mathcal {QFZ}(\uplus _{(2)}(\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2)))\)

3 Assessment Methodology and Set-Up of Experiments

In this section, we present the assessment methodology used to evaluate hierarchies of segmentations.

3.1 Assessment Methodology

In order to account for the performance of the different combinations, we use a supervised assessment strategy developed in [12]. This framework evaluates the possibility of extracting a good segmentation from a hierarchy with respect to a given ground-truth, the quality of the extracted segmentation being measured using the Bidirectional Consistency Error (BCE) [8]. In order to take account for the hierarchical aspect of the representations, the score of a segmentation is measured against its level of fragmentation, i.e., the ratio between the number of regions in the proposal segmentation compared to the number of regions in the ground-truth segmentation.

Two ways of extracting segmentations from a hierarchy are considered. (1) We compute the cut that maximizes the BCE score for each fragmentation level, leading to the Fragmentation-Optimal Cut score curve (FOC). (2) We compute the BCE score of each level set of the hierarchy, leading to the Fragmentation-Horizontal Cut score curve (FHC). A large difference between the FOC and FHC curves, called here fragmentation curves, suggests that the optimization algorithm has selected regions from various levels of the hierarchy to find the optimal cut: the regions of the ground-truth segmentations are thus spread at different levels in the hierarchy. The normalized area under those curves, denoted respectively by AUC-FOC and AUC-FHC, provides an overall performance summary over a large range of fragmentation levels. Since the importance of having high AUC-FOC and AUC-FHC scores varies according to the application, we consider the average of both scores to quantitatively compare hierarchies. The average of AUC-FOC and AUC-FHC will be denoted here by AUC-FOHC.

3.2 Set-Up of Experiments

We describe here the set-up of evaluation of combinations of hierarchies.

In this work, hierarchical watersheds are considered. More precisely, the successive level sets of the considered hierarchies correspond to watershed segmentations of filtered versions of the weight map w, the higher level sets of the hierarchies being associated to the higher level of filterings [3]. The successive filtering levels are given by ranking the minima according to extinction values associated with regional attributes: area [10], dynamics [9], volume [10], topological height [15], number of minima, number of descendants and diagonal of bounding box [15]. To shorten the notations, we denote those attributes by Area, Dyn, Vol, Height, Min, Desc and DBB.

The evaluations were performed on the 200 test images of the Berkeley Segmentation Dataset and Benchmark 500 (BSDS500) [1].

The hierarchies of segmentation are computed from the image gradients obtained from the Structured Edge (SE) detector [5], which achieved a high contour detection rate on BSDS500.

4 Assessment of Combinations of Hierarchies

In this section we present the experiments with combinations of watershed hierarchies. We then compare the combinations of hierarchies with the individual hierarchies and other two techniques [1, 14].

4.1 Baseline

Our baseline is the AUC-FOHC scores of individual watershed hierarchies presented in Table 1. The scores were computed over the test set of BSDS500.

Table 1. AUC-FOC, AUC-FHC and AUC-FOHC scores of individual hierarchies computed over the test set of BSDS500.

4.2 Evaluation of Parameter-Free Combinations

The evaluation of parameter-free combinations consisted in computing the AUC-FOHC scores of all combinations of pairs of watershed hierarchies over the test set of BSDS500 using \(\curlyvee \), \(\curlywedge \) and A functions. For each pair of watershed hierarchies, \({\mathcal {H}}_1\) and \({\mathcal {H}}_2\), we applied the following combining n-weight functions to their saliency maps: \(\curlyvee (\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\), \(\curlywedge (\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\) and \(A(\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\).

The highest scores achieved by parameter-free combinations is shown in Table 2. We can observe that, for most pairs of watershed hierarchies, the combination with A presents the highest score. In addition, the highest scores are obtained with combinations using A.

Table 2. Combining n-weight functions and highest AUC-FOHC scores obtained from \(c(\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\). For each pair of hierarchies, we have the global combination function which provided the highest AUC-FOHC score and the score obtained from this combination.

4.3 Evaluation of Unsupervised Concatenation of Hierarchies

We present here the evaluation of concatenation of pairs of watershed hierarchies.

To determine the parameter that should be used in the concatenation of each pair of watershed hierarchies, we analyze their fragmentation curves. For each pair of watershed hierarchies, we check which one presents the highest AUC-FOC and AUC-FHC scores for low and high fragmented segmentations. If one of the hierarchies presents the highest scores for both low and high fragmented segmentations, we do not expect to obtain better results from their concatenation. For example, in the curves of Fig. 4, we see that, only for a fragmentation larger than 0 and smaller than approx. 0.65, area outperforms dynamics based watershed hierarchy. Therefore, we conclude that high level sets of area, which are less fragmented, describe an image better than dynamics based hierarchies, and the opposite is true for lower level sets. Hence, the parameters are tuned to concatenate high levels of area to the low levels of dynamics based hierarchy.

In general, the fragmentation curves of concatenations has a smaller difference than the curves of their individual counterparts, which can be seen in Fig. 4. This means that the segmentations extracted from the level sets of concatenations are closer to the optimal cuts for each fragmentation level. Also, half of the concatenations tested here presented higher AUC-FOHC scores than the individual watershed hierarchies, as shown in Table 3.

Fig. 4.
figure 4

Fragmentation curves of non-horizontal and horizontal cuts of the concatenation of area and dynamics based watershed hierarchies. The \(10^{th}\) highest levels of area were concatenated to the lower level sets of the dynamics based hierarchy.

Table 3. AUC-FOC, AUC-FHC and AUC-FHCO scores of \(\uplus _{\varTheta }(\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\), where different values of \(\varTheta \) were used for each concatenation. The AUC-FHCO scores in bold are the ones which are higher than the AUC-FHCO scores of individual \({\mathcal {H}}_1\) and \({\mathcal {H}}_2\) hierarchies.

4.4 Evaluation of Supervised Linear Combinations

We present here the evaluation of linear combinations of pairs of watershed hierarchies using learned parameters.

For each pair of watershed hierarchies, we determined the linear combination parameter \(\alpha \) that optimizes the AUC-FOHC score on the 300 images of the training set of BSDS500.

From the highest scores reached for each combination, we can see that not all combinations produce relevant results, mainly the ones which do not include either dynamics nor topological height based watershed hierarchies. This is shown in Table 4, that contains the best-fitting parameter for each linear combination and their AUC-FOHC scores.

Table 4. Parameters \(\alpha \) and AUC-FOHC scores of each linear combination \(\boxplus _{(\alpha )}(\varPhi ({\mathcal {H}}_1), \varPhi ({\mathcal {H}}_2))\). The AUC-FOHC scores in bold are the highest scores achieved with linear combination of hierarchies.

One example comparing segmentations extracted from the individual hierarchies based on number of descendants and topological height and their linear combination using the learned parameters is shown in Fig. 5. The linear combination computed for this single image presents a higher AUC-FHC score than the individual hierarchies (0.604 versus 0.465 and 0.551) and a slightly higher AUC-FOC score (0.802 versus 0.801 and 0.708). Based on the AUC-FHC score, we expect this combination to have better horizontal cuts than the the individual hierarchies. We can see that the segmentation extracted from the combination separates better the main regions in this image: sky, mountains and the two sea regions.

Fig. 5.
figure 5

From left to right: original image, saliency map and the \(5^{th}\) highest level set of three hierarchies: number of descendants and topological height based hierarchies, and their linear combination using learned parameters.

4.5 Comparison with Other Techniques

In order to have a more complete evaluation, we have performed a comparison with a state-of-the-art method [14], called Multiscale Combinatorial Grouping (MCG), and an well-know technique [1], named Ultrametric Contour Map (UCM), including different assessment measures: Precision-Recall (PR) for boundaries [1] and the Marked Segmentation [13] (Fig. 6).

The PR for boundaries score is also assessed on BSDS500 and it evaluates the matching between the boundaries of a given segmentation and the ground-truth segmentation. The PR curves are built from the F-measure scores of the level sets of the hierarchies and are summed up in two measures: Optimal Dataset Scale (ODS) and Optimal Image Scale (OIS).

The Marked Segmentation aims to measure the difficulty of extracting a good segmentation in a hierarchy given sets of background and foreground markers. The markers are generated through erosion, skeletonization and frame of the ground truth, and the score of each segmentation is given by the F-Measure computed over the Weizmann and Grabcut datasets. Each pair of Background-Foreground markers are denoted by the method used to compute them. Figure 6 shows the Marked Segmentation results for three pairs of background and foreground markers: Er-Er, Fr-Sk and Sk-Sk, in which Er, Fr and Sk stand for Erosion, Frame and Skeleton, respectively. The box plots show the quartile distribution of scores on both datasets. The median score of those three pairs of markers is denoted by ODM. Therefore, the best hierarchies in terms of Marked Segmentation correspond to the ones with highest ODM scores and most compressed box plots.

MCG explores hierarchies of images at different resolutions based on combined local contours cues. Single segmentations at different resolutions are aligned and combined into a single saliency map. The UCM described in [1] is obtained from the watershed transform of the output of a high quality contour detector.

Our best combination does not achieve the PR and fragmentation scores presented by MCG and UCM, but it outperforms UCM in terms of Fragmentation Curves for non-horizontal cuts and presents competitive Marked Segmentation results compared to MCG and UCM.

Fig. 6.
figure 6

Comparison of our linear combination of area and topological height with MCG and UCM: Precision-recall (PR) for boundaries, Fragmentation curves of non-horizontal cuts (plain curve) and horizontal cuts (dashed curve), and Marked Segmentation of three pairs of markers.

5 Discussion and Conclusion

This paper shows the potential of combination of hierarchies through the evaluation of supervised and unsupervised combinations of watershed hierarchies. We evaluated combinations of pairs of watershed hierarchies using classical functions (infimum, supremum and linear combination) and a newly proposed method called concatenation of hierarchies.

All combinations described here were proved to be useful for at least one pair of watershed hierarchies. However, not all pairs of hierarchies presented significant results when compared to the individual hierarchies. For example, the parameter-free combination of area and volume based hierarchies with highest score did not present a score superior to the individual volume based hierarchy. Also, the best-fitting parameter \(\alpha \) for the linear combination of area and volume based hierarchies is equal to zero. This means that none of the combinations of area and volume tested here was superior to both individual hierarchies.

We observed that the combinations with highest scores contained either dynamics or topological height based hierarchies, but not both. Both dynamics and topological height are related to depths. Therefore, this is not surprising that the combination of hierarchies based on those attributes does not bring new interesting results. This is also valid for other similar attributes as area and diagonal of bounding box.

Among all linear combinations with learned parameters, the ones with best performance presented scores close to the combination by average, with the learned parameter \(\alpha \) ranging between 0.39 and 0.51. So, the combination of hierarchies by average seems to be a valuable and simple choice.

The framework presented in this article can be used to combine other types of hierarchies, but we have not investigated whether the results could be interesting. The main point is that the contours of watershed hierarchies overlap and this ensures that none of their combinations will present duplicated contours for a same boundary of the input image, which could happen using other hierarchies.

Since we have explored only a few types of combinations, there is still room to find new functions able produce relevant results. The evaluation of combinations performed here also invites us to go a step further in this topic, for example, by learning how to choose the optimal combination parameters for each image.