Keywords

1 Introduction

The task of assigning a distinct label to object and background voxels in 3D medical images, named segmentation, is usually challenging due to poorly defined object boundaries, non-standard intensity distribution, field inhomogeneity, noise, partial volume, and the interplay among these factors [19]. As consequence, repairs of automatic segmentation from third-party software (e.g., FreeSurfer [4], SPM2 [8], CLASPFootnote 1) are often necessary. Interactive segmentation from scratch is certainly an undesirable alternative and manual corrections may be time-consuming, tedious, and subject to variations among distinct users. Ideally, one should be able to fix segmentation interactively without destroying its correct parts.

Fig. 1.
figure 1

(a) The given presegmentation. (b) Seed set computed by ISBI2011 [21] has many non-uniformly distributed seeds, and (c) its attempt to fix the segmentation by placing new background markers (red dots) fails. Proposed editing method: (d) Supervoxels by IFT-SLIC to find the seed set. (e) Supervoxels better conforming to the presegmentation are obtained by changing the cost function to \(f^\prime _D\). (f) The union of supervoxels from seeds contained in the presegmentation gives us a starting point to perform corrections. (g) A corrected result is obtained by adding a new background seed (red dot) and running DIFT. (Color figure online)

In the past, we proposed a first solution to resume an input segmentation [21, 22] for interactive correction by differential image foresting transforms (DIFTs) [7]. The method interprets a 3D medical image as a graph, whose nodes are the voxels and the arcs are defined by their 6-neighbors, and estimates a minimum set of seed nodes such that the image foresting transform (IFT) algorithm [5] can generate the exact representation of the input segmentation as an optimum-path forest rooted at those seeds. In this forest, the object is defined by the union of trees rooted at its internal seeds. The IFT algorithm executes in time proportional to the nodes (i.e., linear time) and subsequently, the user may add and/or remove seeds (their optimum-path trees) to correct segmentation in sublinear time by using the DIFT algorithm [7]. A drawback is the possible high number of seeds near the object boundaries, making difficult user interaction (Figs. 1a–c).

In this work, we propose an approximate solution in which the seed set consists of more regularly separated nodes, thereby facilitating the interactive correction (Figs. 1d–g). The method starts by estimating uniformly spaced seeds in a region of interest around the input object mask and applies a sequence of DIFTs followed by seed recomputation to conform the boundaries of the supervoxels (i.e., the optimum-path trees) to the object’s boundaries, as much as possible. Then, the user can add and/or remove seed nodes at each subsequent iteration of the DIFT algorithm to fix segmentation.

The first step is automatic and it is based on a recent approach for supervoxel segmentation, named IFT-SLIC [2], but using here a different choice of parameter (path-value function) in the DIFT algorithm to approximate the result to the desired segmentation. IFT-SLIC was inspired in the Simple Linear Iterative Clustering (SLIC) method, that defines supervoxels along a few iterations of the k-means clustering algorithm. In IFT-SLIC, however, the supervoxels are naturally obtained as single connected components and the object is defined by connected paths from its internal seeds. This makes possible the interactive editing of the segmentation.

In the following, we first describe related works on segmentation editing and basic concepts. Then, we detail the proposed method and our experimental results and conclusions.

2 Related Works

In spite of the vast literature on segmentation, only a few works have dealt with the editing issue, usually considering qualitative and highly subjective empirical evaluations [13].

Grady and Funka-Lea [10] apply Random Walk (RW) [9] to correct presegmented images, optimized with downsampling, which loses important and high frequency information, like small objects, negatively affecting the result. Harrison et al. [12] join discriminative classification and energy minimization with RW for contour-based correction, using GPU training. It inherits disadvantages from contour-based segmentations, like sensitivity to seed placing, lack of texture and region information. It depends on the training set size to propagate the labels to other slices, affecting its accuracy.

Jackowski et al. [14] approximate a digital volume representing the segmented object by a Rational of Gaussians (RaG) parametric surface, allowing the user to change the surface by its control points. Advantages are compression for fast transmission, sub-voxel correction and inclusion of graphical effects without a voxelized appearance. But editing non-compact objects by control points is not trivial. Valenzuela et al. [29] use Bézier-based surfaces. The user can modify the curve in one slice, and it propagates to the rest in 3D.

Yang and Choe [30] uses Graph Cut (GC) [3], with energy function composed by presegmentation and new user inputs. It considers the presegmentation is almost correct, restricting the user active field. It inherits GC disadvantages. The graph weights are based on the euclidean distance, not effective for non-compact objects, like veins and arteries. To remove parts of the presegmentation, the user must always unnecessarily place background and foreground seeds. Moreover, its conducted evaluation did not include a user effort analysis. Karimov et al. [16] develop a software that suggests correction candidates, based on extraction of region skeletons, which should be similar to ground truth, and histogram similarity analysis. Complex images can affect the candidates number.

Li et al. [17] proposed a user interface to tackle the interactive segmentation problem consisting of two steps. In the first step, the user selects seeds over the foreground and the background for region-based segmentation using Graph Cut on a superpixel-graph, derived from a watershed segmentation of the image. Then, the resulting segmentation boundary is turned into a polygon that can be interactively edited. Changes to the polygon structure serve as soft constraints for local corrections using GC on a pixel-graph. This method was designed for 2D images and the extension to 3D is not straightforward.

Miranda et al. [21, 22] proposed an editing solution based on the IFT with experimental analysis in MR-T1 tridimensional images. Contrary to previous methods, it can be applied to multidimensional images and to objects with arbitrary shapes, with low running time and without any special hardware support. It first solves the reverse segmentation problem, with strong theoretical background, reducing the required number of seeds by employing a conservative force [21]. The corrections can then be performed in sublinear time by differential IFT (DIFT) [7]. It is restricted to the max-arc path-cost function over a gradient image, which is usually not the best option to deal with blurred transitions.

Spina et al. [27] proposed a solution with robot users [11], which simulate user interaction by placing brush strokes automatically to iteratively perform the segmentation task resulting in the given presegmentation. It can correct any existing delineation method result [27]. However, it considered a robot user tailored to IFT-based segmentation, since the end goal was to learn the spatial distribution of seeds added to reproduce ground truth training masks, in order to output a statistical seed model of an object of interest to aid in its interactive segmentation. Hence, they were more interested in consistent seed positioning than high accuracy for editing.

Our proposed method is also based on the IFT framework, but it was designed to circumvent the main problems of [21], such as its high number of seeds and non-uniform seed distribution, in order to give more freedom to the user to perform corrections, and using a better path-cost function. The flexibility of the path-cost function of the IFT-SLIC makes it a more general framework than other similar methods that attempt to produce SLIC-like superpixels from watershed segmentation [18, 26].

Lastly, it is worth noting that using boundary-tracking methods in slice by slice fashion to fix 3D segmentations, such as live wire [6], intelligent scissors [25], Riverbed [23], and G-Wire [15], may demand considerable user interaction, proportional to the number of slices with errors, which can be infeasible in many cases. The Live Markers paradigm [28] might mitigate this problem, when coupled with our proposed method, by allowing the propagation of those corrections in 3D, given that the user-selected boundary segments are converted to seeds for competition in a 3D graph.

3 Background

A multidimensional and multispectral digital image is a mapping \(\mathbf {I} :\mathcal {I} \rightarrow \mathbb {Z}^m\), where \(\mathcal {I} \subset \mathbb {Z}^n\) is the image domain and \(\mathbb {Z}^m\) is a space of m bands (e.g., color channels). An adjacency relation \({{{\mathcal {A}}}}\) is a binary relation on \({{{\mathcal {I}}}}\). We use \(t\in {{{\mathcal {A}}}}(s)\) and \((s,t)\in {{{\mathcal {A}}}}\) to indicate that t is adjacent to s. By setting an adjacency relation, \( \mathbf {I} \) can be represented as a weighted digraph \( G = (V, E, w) \), where \( V = \mathcal {I} \) represents the set of nodes, \( E = {{{\mathcal {A}}}}\) is the set of arcs and \( w: E\rightarrow \mathbb {R} \) assigns a weight to each arc. In this work, we are interested in the 6-neighborhood relation \({{{\mathcal {A}}}}\) for 3D images.

A path \(\pi _{s\leadsto t}\) is a sequence of distinct nodes \( {\langle }v_1 = s, v_2, \dots , v_n = t {\rangle }\), with origin s and terminus t, where \((v_i, v_{i+1}) \in {{{\mathcal {A}}}}\) for \(i=1,2,\ldots ,n-1\). \( \pi _{t}\) represents a path with terminus t from any origin. We use \( \pi _{t} = \pi _{s}\cdot {\langle }s,t{\rangle }\) to denote the concatenation of a path \(\pi _{s}\) by an arc (st). \(\pi _t = {\langle }t{\rangle }\) is a trivial path. \( \varPi _t(G) \) is the set of all distinct paths with terminus t, \( \varPi _{s\leadsto t}(G) \) limits \( \varPi _t(G) \) for paths with origin s, and \( \varPi (G) \) is the set of all distinct paths: \( \varPi _{s\leadsto t}(G)\subseteq \varPi _t(G)\subseteq \varPi (G) \). A connectivity function \(f:\varPi (G)\rightarrow \mathbb {R}\) assigns a scalar value to any path \(\pi \) in the graph G. A path \( \pi ^*_{t} \) is optimum if \( f(\pi ^*_{t}) \le f(\pi _{t}), \forall \pi _{t} \in \varPi _{t}(G)\).

A predecessors map is a function \(Pr:V\rightarrow V\cup \{nil\}\) where for \(Pr(t) = s\) we have \(t \in A(s)\) or \(s = nil\). For any pixel \(t\in V\), a predecessors map Pr with no cycles defines a path \(\pi ^{Pr}_t\) recursively as \({\langle }t {\rangle }\) if \(Pr(t) = nil\), and \(\pi ^{Pr}_s\cdot {\langle }s,t{\rangle }\) if \(Pr(t)=s\ne nil\). Hence, a predecessors map with no cycles defines a spanning forest, where all nodes are connected to a set of root nodes \({\mathcal {R}}(Pr) = \{v \in V: Pr(v) = nil \}\).

3.1 Image Foresting Transform (IFT)

An Optimal-Path Spanning Forest Problem (OPSFP) consists on finding a spanning forest Pr, such that \(\pi ^{Pr}_t\) are optimal paths, for all \(t\in V\), according to a connectivity function f. The IFT is an OPSFP solver by extending Dijkstra shortest path algorithm with multiple sources and different connectivity functions [5]. It uses a dynamic approach by storing the best connectivity values found so far in a map \(C:V\rightarrow \mathbb {R}\), which converges to \(C(s) = \min _{\pi _s\in \varPi _s(G)} f(\pi _s) \) in the case of smooth connectivity functions [5].

In the context of binary interactive segmentation, we usually restrict the optimal paths to paths starting in a set of seed pixels \({{{\mathcal {S}}}} = {{{\mathcal {S}}}}_0 \cup {{{\mathcal {S}}}}_1\), where \({{{\mathcal {S}}}}_0\) and \({{{\mathcal {S}}}}_1\) denote the sets of background and object seeds, respectively. The segmented object is defined by the union of all pixels t that are reached by optimal paths \(\pi ^{Pr}_t\) rooted at \({{{\mathcal {S}}}}_1\). Seeds can be added and/or removed to perform corrections to intermediate results by re-executing the algorithm. Falcão et al. [7] proposes Differential IFT (DIFT) to compute sequences of IFTs in a differential way, which takes sublinear time complexity for subsequent IFT executions on the same session.

4 IFT-SLIC for Segmentation Editing

A label map \( L: V \rightarrow \{1,\dots ,c\}\) defines a partition set \({\mathcal {P}}_L = \{P_1,P_2,\dots ,P_c\},\) where \(\bigcup _{i=1}^{c}P_i = V \). A set of supervoxels is a partition set composed by regions which share common structural information, like intensity, proximity and texture, and that have uniform size and shape.

The IFT-SLIC [2] combines the benefits of IFT and SLIC [1] to provide a more regular and powerful supervoxel generation. It uses a non-smooth connectivity function \(f_D\) (Eq. 1), which is based on the path-cost function \(f_{\sum |\bigtriangleup I|}\) from [20]. \(f_D\) uses the sum of the color distances relative to its root node and the sum of Euclidean distances for encoding the boundary adherence and proximity (compactness), respectively, with a parameter \(\alpha \) controlling their trade-off.

Firstly, k equidistant seeds are sampled following a regular grid. Then, two iterative steps are applied: assignment, where the nodes are labeled to the closest cluster, according to \(f_D\) by computing the IFT using 6-neighborhood, and update, where the seed positions and their attribute vectors are moved to their mean values within their respective labeled regions. The assignment and update steps are repeated for a total of 10 iterations. The method outputs a spanning forest in a predecessors map Pr, where each tree defines a supervoxel and its cluster center corresponds to its root r.

$$\begin{aligned} \begin{aligned} f_D(\pi _t=\langle t \rangle )&= {\left\{ \begin{array}{ll}0,&{}\text { if } t\in {\mathcal {S}}\\ +\infty ,&{} \text {otherwise}\end{array}\right. } \\ f_D(\pi _{r\leadsto s}\cdot \langle s,t \rangle )&= f_D(\pi _{r\leadsto s})\text { } \\&+ \underbrace{(||\mathbf {I}(t) - \mathbf {I}_r||\cdot \alpha )^\beta }_{\text {Boundary Adherence}} + \underbrace{d_{euc}(s,t)}_{\text {Compactness}} \end{aligned} \end{aligned}$$
(1)

where \(\mathbf {I}_r\) is the mean attribute vector associated to the seed r, and we use in this work \(\alpha =0.04\) and \(\beta = 12\), which are values within the range of recommended values in [2].

In order to resume a previous given presegmentation by IFT, we need to devise a seed set that assembles it for the given image. In our proposed method, a first idea is to consider the seeds (cluster centers) obtained by IFT-SLIC for the given image, to obtain a more efficient solution. IFT-SLIC results in a seed set \({\mathcal {S}} = \{ s_1, s_2, \dots , s_k \}\), where k is the number of supervoxels. If the presegmentation object is small relative to the whole graph, we can use its bounding box (with a proper extension margin) and compute the seeds by IFT-SLIC only inside it, in order to reduce the running time. The number of seeds should be proportional to the bounding box size, to keep the seed density constant for different object sizes.

The seeds by IFT-SLIC are then divided in two subsets \({\mathcal {S}}_0\) and \({\mathcal {S}}_1\), according to their values in the binary mask B of the presegmentation (\(s_i \in {\mathcal {S}}_0\) if \(B(s_i) = 0\), and \(s_i \in {\mathcal {S}}_1\) otherwise). The union of all supervoxels from seeds in \({\mathcal {S}}_1\), gives us an initial approximation of the presegmentation, denoted as the initial supervoxel segmentation, which does not perfectly resemble the presegmentation (Fig. 1d). To further boost the results, we improve the final supervoxel segmentation by changing the connectivity function to \(f^\prime _D\) (Fig. 1e) as follows:

$$\begin{aligned} \begin{aligned} f^\prime _D(\pi _t=\langle t \rangle )&= f_D(\pi _t=\langle t \rangle ) \\ f^\prime _D(\pi _{r\leadsto s}\cdot \langle s,t \rangle )&= f^\prime _D(\pi _{r\leadsto s})\text { } + \underbrace{d_{euc}(s,t)}_{\text {Compactness}} \\&+ \underbrace{(||\mathbf {I}(t) - \mathbf {I}_r||\cdot \alpha \cdot \gamma ^{B(r,t)} + \gamma \cdot B(r,t))^\beta }_{\text {Boundary Adherence}} \end{aligned} \end{aligned}$$
(2)

where \(B(r,t) = |B(r)-B(t)|\), that is, B(rt) captures the transitions in the binary mask B of the presegmentation, and \(\gamma \) plays the same role as the liberal and conservative forces used in [21]. For higher values of \(\gamma \), the final supervoxel segmentation better resembles the presegmentation, conserving its fine details. Thus, higher values of \(\gamma \) allow us to reduce the number of supervoxels k, giving more freedom to the user to perform corrections. So we used \(k = vol/(200 \cdot \gamma )\), where vol is the number of object voxels in the presegmentation.

The final supervoxel segmentation can then be used as a starting point, so that the user can insert and/or remove seeds from \({{{\mathcal {S}}}}_0\) and \({{{\mathcal {S}}}}_1\) in order to correct the segmentation in a differential way, by using DIFT [7] with function \(f^\prime _D\) (Figs. 1f–g). Therefore, the corrections take sublinear time.

5 Experimental Results

In this section, we conducted experiments to measure the user involvement in the editing process of the wrong parts of the presegmentation in real 3 T MRI-T1 images of the brain of size \(240\times 240\times 180\) voxels with severe inhomogeneity problems. We also quantified the number of estimated seeds, where lower values indicate more flexibility for posterior user corrections. We compared our proposed method with the best solution so far by IFT, denoted as ISBI2011 [21]. In all cases, the corrective actions were conducted by a robot user [11], in order to get impartial results, with a spherical brush size of 5 voxels, using an Intel core i3 laptop with 4 GB memory.

Table 1 shows the results of the first experiment (data set D1, composed of ten MRI volumes) to correct the wrong parts of automatic segmentation of the cerebral hemispheres, where the errors are related mainly to the bad positioning of the fuzzy model [24] during the automatic segmentation (Figs. 2a–b). The mean execution time to obtain the initial seeds by the proposed method was 24.0 s and 13.5 s for ISBI2011 [21]. The mean Dice value for the initial supervoxel segmentation using the seeds by IFT-SLIC increased from 89.75% to 99.96% when changing the path cost-function to \(f^\prime _D\) for \(\gamma =3\), and from 88.64% to 99.98% for \(\gamma =4\). We noted that lower values of \(\gamma \) (\(\gamma < 3\)) can lead to a loss of presegmentation details. The proposed method reduced the number of markers required for corrective actions in 68.2% and reduced the total number of initial seeds in 4.3% for \(\gamma =3\). For \(\gamma =4\), we had a reduction of 60.8% for corrective actions and 29.2% for the number of initial seeds.

Table 1. Data set D1: number of markers (nm) required for corrective actions and number of computed initial seeds (ns) per voxels in parts per thousand.
Table 2. Data set D2: number of markers (nm) required for corrective actions and number of computed initial seeds (ns) per voxels in parts per thousand.

On the second experiment (data set D2, composed of ten MRI volumes, in Table 2), we considered a more challenging scenario. We conducted experiments to fix the segmentation of the cortical surface of the brain, where several pronounced errors were intentionally introduced by manual editing along the 3D surface (Figs. 2c-d). The mean Dice value for the initial supervoxel segmentation using the seeds by IFT-SLIC increased from 93.08% to 99.95% when changing the path cost-function to \(f^\prime _D\) for \(\gamma =3\), and from 92.48% to 99.95% for \(\gamma =4\). The proposed method reduced the number of markers required for corrective actions in 45% (39.9%) and reduced the total number of initial seeds in 79.4% (84%) for \(\gamma =3\) (\(\gamma =4\)).

Fig. 2.
figure 2

3D renditions of presegmentations with errors (first column) and respective ground truths (second column), with their main differences highlighted in another color. A sample image from each data set composed of ten 3D volumes: (a–b) Data set D1. (c–d) Data set D2 with severe errors. (Color figure online)

6 Conclusions

From the experiments we can conclude that the proposed method can substantially reduce the number of markers required for corrective actions in both scenarios and with a strong reduction of initial seeds in the second case. Our method has better seed distribution over the image than ISBI2011, due to the regular sized supervoxels, avoiding the negative effect of seed concentrations in specific regions, which makes the corrections in these areas to behave like manual segmentation. Moreover, it can be easily extended to multi-class. DIFT runs only within modified trees, thus in sublinear time. As future work, we will investigate other path-cost functions and the applications in other image modalities.