Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Segmentation problems in medical imaging often require the precise identification of multiple boundaries/regions in volumetric images. Graph-based segmentation approaches are becoming increasingly popular for the automated delineation of medical structures/objects of interest within three-dimensional images, in part, due to the ability of many of the more recent approaches (such as graph cuts [16] and the Layered Optimal Graph Image Segmentation of Multiple Objects and Surfaces (LOGISMOS) approach [711]) to efficiently produce globally optimal three-dimensional segmentations in a single pass (and correspondingly not get stuck in local optima). In addition, graph-based approaches such as LOGISMOS enable the simultaneous optimal detection of multiple surfaces in volumetric images, which is important in many medical image segmentation applications.

In this chapter, after presenting an introduction to a more traditional 2D segmentation technique in Sect. 2 (which is still very useful in its own right for 2D boundary delineation problems), we present an in-depth overview of two state-of-the-art-graph-based methods for segmenting three-dimensional structures in medical images: graph cuts (Sect. 3) and the LOGISMOS approach (Sect. 4). In each case, an overview of the underlying optimization problem is presented first (i.e., the formulation of an energy/cost function and the specified constraints), followed by the graph-based representation of the optimization problem which enables the globally optimal solution to be found in polynomial time. In particular, in the 2D case, a 2D boundary segmentation optimization problem is transformed into that of finding a minimum-cost path in a graph. In the graph-cuts approach, a 3D object/background labeling problem is transformed into that of finding a minimum s–t cut in a graph, and in the LOGISMOS approach, a single or multiple 3D surface segmentation problem is first transformed into that of finding a minimum-cost closure in a graph (which is further transformed into finding a minimum s–t cut in a graph). For each approach, we also present example applications and extensions.

2 2D Boundary Delineation as a Minimum-Cost Path Problem

2.1 The 2D Single Boundary Optimization Problem

For our first example graph-based segmentation approach, we will describe how one can formulate a single 2D boundary segmentation problem as a minimum-cost path problem. While many of the high-level concepts discussed in this section (such as the formulation of an optimization problem with constraints) and the transformation to a well-known graph problem are similar to that discussed in later sections, the graph-based path problem representation in the 2D case is often much more intuitive to understand. For simplicity, we will only consider the case in which we desire to find an optimal boundary that is defined from the leftmost column of an image to the rightmost column of an image. In particular, we assume that only one boundary point should exist from each column of the image so that the desired boundary can be written by the function f(x) mapping each column (x-position) to a desired y-value within the image. We also define a smoothness constraint that requires that the boundary’s position between neighboring columns does not change more than a given constant: | f(x + 1) − f(x) | ≤ Δ for 0 ≤ x ≤ N x − 2, where N x is the number of columns of the image. Figure 1 provides an example of a feasible boundary and two non-feasible boundaries in a small image when Δ = 1. While such feasibility constraints may initially seem quite limiting, it is important to remember that many common transforms, such as a Cartesian-to-polar transform, may be used to enable the desired boundaries to satisfy these constraints. For example, use of polar coordinates can make the boundaries of roughly circular objects, such as the boundaries of the heart, the boundaries of vessel walls, and the boundaries of the optic disc/cup, satisfy such constraints (Figs. 2b, c and 3).

Fig. 1
figure 1

Three candidate boundaries (in red) for the minimum-cost-path-based approach shown on a small image grid. While all three examples seem to contain reasonable boundaries to separate the top and bottom of the image, only (a) would be a feasible boundary using a smoothness constraint, Δ, of 1. (b) Is not feasible because some of the columns have more than one boundary point defined. (c) Is not feasible because the change in boundary position from column 6 to column 7 and from column 7 to column 8 has a magnitude of 2 and the smoothness constraint requires a maximum change of 1 unit. However, (c) would be feasible if Δ were defined to be 2

Fig. 2
figure 2

Example applications for the 2D single boundary optimization problem. (a) Single retinal boundary (in red) on spectral-domain optical coherence tomography slice (no transformation necessary). (b) Optic cup boundary (in blue) on spectral-domain optical coherence tomography projection image (Cartesian-to-polar transform necessary). (c) Left ventricular boundary (in yellow) on magnetic resonance imaging slice (Cartesian-to-polar transform necessary)

Fig. 3
figure 3

Example use of Cartesian-to-polar transform to enable the desired boundary to satisfy the feasible constraints required for the 2D single boundary optimization problem (e.g., that one boundary point exists in each column of the image and that the boundary is sufficiently smooth). The boundary is found from a cost image defined in polar coordinates using a minimum-cost path approach, with the final boundary being obtained with a polar-to-Cartesian transform

From the original image I(x, y), we also assume that we can define a cost image, c(x, y), reflecting each pixel’s unlikeliness of belonging to the boundary. The cost E(f) of a feasible boundary f(x) can correspondingly be defined as follows:

$$\displaystyle{ E(f) =\sum _{ x=0}^{N_{x}-1}c(x,f(x)). }$$
(1)

Thus, the optimization problem we desire to solve is to find the minimum-cost (according to Eq. (1)) boundary f(x) subject to the feasibility constraints.

2.2 Graph Representation of the 2D Single Boundary Optimization Problem

The 2D single boundary optimization problem as defined in Sect. 2.1 can be readily transformed into a minimum-cost path problem within a node-weighted graph (Fig. 4). Each pixel in the cost image, c(x, y), becomes a node with weight given by the intensity value. Because of this direct correspondence between pixels and nodes, we can label each node with its corresponding (x, y) position. Directed edges are added for each “feasible” transition from a boundary point from one column to the next column. In particular, given a smoothness constraint Δ, for each node (x, y), a directed edge is added to nodes (x + 1, yΔ), (x + 1, yΔ + 1), …, (x + 1, y +Δ) (as long as these destination nodes exist). If a destination node does not exist (as on the boundaries), no directed edge is added. Finding the minimum-cost path in such a node-weighted graph directly corresponds to finding the minimum-cost boundary in the original image.

Fig. 4
figure 4

Example construction of graph for finding minimum-cost boundary according to Eq. (1). (a) Small example cost image c(x, y). (b) Node-weighted graph representation when the smoothness constraint, Δ, is 1. Edges reflect allowed transitions between boundary points. The minimum-cost path from any node in the leftmost column to any node in the rightmost column will correspond to the minimum-cost 2D boundary in the image. (c) Highlighted (in yellow) minimum-cost path in graph

While a number of approaches exist for finding minimum-cost paths in general graphs, the simple structure of this graph (including its acyclic nature) makes a dynamic programming approach a good choice for finding the minimum-cost path. In fact, in implementing this approach, one does not even need to explicitly define the nodes/edges and can simply use an implicit definition of the underlying graph (e.g., given a node position, you can use a simple formula to determine the neighboring nodes). Figure 5 illustrates the basic concepts of a dynamic programming approach. When Δ = 1, you can define the following recursive definition for the cost of the minimum-cost path p(x, y) from any node in the first column (x = 0) to node at position (x, y):

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} c(x,y) \quad &x=0\text{(base case)} \\ \min [p(x - 1,y),p(x - 1,y + 1)] + c(x,y) \quad &x >\ 0;y = 0 \\ \min [p(x - 1,y - 1),p(x - 1,y),p(x - 1,y + 1)] + c(x,y)\quad &x > 0;0 < y < N_{y}-1 \\ \min [p(x - 1,y - 1),p(x - 1,y)] + c(x,y) \quad &x > 0;y=N_{y} - 1\\ \quad \end{array} \right.& &{}\end{array}$$
(2)

Here, N y is the number of rows in the image. A similar recursive definition can be defined for other values of Δ. Note that the nodes/edges are defined “implicitly” by directly writing the recursive formula for the path costs.

Because of this recursive definition, the path costs can be efficiently found by computing the path-cost values p(x, y) one column at a time from x = 0 to x = N x − 1. Conceptually, in computing each path-cost value, one also keeps track of the predecessor “node” that provided the minimum (although this is not absolutely necessary as determining this node can be done in constant time by simply re-examining all of the values). Finding the actually minimum-cost path involves simply finding the smallest value in the last column and then backtracking through the predecessor nodes until reaching a node in the first column.

Fig. 5
figure 5

Use of dynamic programming to find minimum-cost path corresponding to the optimal boundary when Δ = 1. (a) Cost image, c(x, y) with highlighted column 2 (note that the leftmost column is column 0). (b) Computation of minimum-cost path costs for all of the nodes in column 3. For each node, the minimum-cost path value is computed by adding the cost image value to the path-cost value of the predecessor [for a non-boundary case and Δ = 1, the predecessors of node (x, y) are nodes (x − 1, y − 1), (x − 1, y) and (x − 1, y + 1)] with the minimum path-cost value. Arrows are shown to indicate each selected predecessor node. (c) Highlighted minimum-cost path. The minimum-cost path is found by selecting the node in the last column of p(x, y) with the smallest value and then backtracking (i.e., following the predecessor arrows) until reaching a node in the first column

If the underlying boundary is roughly circular such that a Cartesian-to-polar transform was used in creating the cost image (meaning that boundaries define one radial value per angle), it is also useful to require that the found boundary be “closed” (i.e., a smoothness constraint exists from the last column to the first column). With a dynamic programming approach, one option for enforcing this is to separately find the minimum-cost path from each of the starting nodes in the first column to the set of nodes in the last column that would obey the constraint by setting the node weights of all other nodes in the first column and the weight of all non-feasible nodes in the last column to . The overall minimum-cost path can then be found as the smallest out of these N y minimum-cost paths.

3 3D Object Segmentation Using Graph Cuts

3.1 The Graph-Cut Optimization Problem (A Labeling Problem)

In this section, we describe the graph-cut approach of Boykov et al. [16] for segmenting a 3D object within a 3D image. While the 2D single boundary segmentation problem discussed in Sect. 2 and the single and multiple surface segmentation problems to be discussed in Sect. 4 are inherently formulated to focus on the detection of an object boundary or set of object boundaries (with the number of boundaries specified a priori), the optimization problem associated with the graph cut approach is inherently formulated as a labeling problem (i.e., determining the set of pixels/voxels that should be labeled as object). One example corresponding difference is that it is possible that the “object” to be segmented will be disconnected using the standard formulation of the graph-cut approach, whereas the feasibility constraints prevent this in the surface-based methods. (In some applications, preventing such disconnections is desirable, and in other applications, it is not desirable.)

More specifically, in the graph-cut approach, if \(\mathcal{P}\) is the set of voxels in the image, we assume that each voxel \(p \in \mathcal{P}\) has a cost R p (“obj”) and R p (“bkg”) associated with being labeled as an object voxel or background voxel, respectively. We furthermore assume that each neighboring pair of voxels \(\{p,q\} \in \mathcal{N}\), where \(\mathcal{N}\) is the set of pairs of neighboring voxels, has a cost B p, q associated with the pair of voxels having different labels. Our goal is to find the optimal labeling \(A = (A_{1},A_{2},\ldots,A_{\vert \mathcal{P}\vert })\) to assign each voxel p to a label A p where A p is “bkg” or “obj” to minimize the following “energy” or “cost” function:

$$\displaystyle{ E(A) =\lambda \cdot R(A) + B(A)\;, }$$
(3)

where R(A) is the regional term given by

$$\displaystyle{ R(A) =\sum _{p\in \mathcal{P}}R_{p}(A_{p})\;, }$$
(4)

and B(A) is the boundary term given by

$$\displaystyle{ B(A) =\sum _{\{p,q\}\in \mathcal{N}}B_{p,q} \cdot \delta _{A_{p}\neq A_{q}}\;, }$$
(5)

where \(\delta _{A_{p}\neq A_{q}}\) is 0 if the label of A p is equal to that of A q and 1 otherwise. λ provides the relative weighting between the regional cost terms and the boundary cost terms.

Figure 6 provides example regional (i.e., object/background) and boundary costs associated with a small toy image and Fig. 7 provides two example labelings and their associated costs based on Eq. (3).

Fig. 6
figure 6

Example toy image and object (R p (“obj”)), background (R p (“bkg”)), and boundary costs (B p, q ) for use in a graph-cuts approach

Fig. 7
figure 7

Two example labelings and associated total costs from toy example from Fig. 6. For simplicity, the weighting λ in Eq. (3) between region and boundary costs is set to 1

3.2 Direct Representation of the Graph-Cut Optimization Problem as a Minimum st Cut Problem

The graph-cut optimization problem discussed in Sect. 3.1 can be readily formulated as a minimum s–t cut problem in an edge-weighted graph [2, 6]. In an s–t cut formulation, you have an edge-weighted graph with two terminal nodes: s and t. A cut of (S, T) of the graph is a partition of the nodes V into two disjoint sets: S and T (T = VS) with the requirement that s ∈ S and t ∈ T. The cost of a cut is defined as the summation of the weights of the edges from nodes in S to nodes in T (intuitively the edges that are “cut” in the separation) [12]. The goal in a minimum s–t cut problem is to find the cut with the minimum cost. Once the problem is formulated as a minimum s–t cut problem, many algorithms exist for finding the globally optimal solution in low-order polynomial time [4].

Fig. 8
figure 8

Example graph representation and minimum cut from regional and boundary costs shown in Figs. 6 and 7. Graph example adapted from examples presented by Boykov et al. [2, 6]

In transforming the graph-cut optimization problem into a minimum s–t cut problem, first, a node p is added for every voxel in the image and edges are added between each pair of nodes corresponding to neighborhood voxels with weight B p, q (these edges between neighbors are often referred to as “n-links” as their weights correspond to the neighbor discontinuity costs). Next, two extra “terminal” nodes, s (the “obj” terminal) and t (the “bkg” terminal), are added to the graph and for each voxel in the image (corresponding to node p in the graph), two edges are added (often called “t-links” for “terminal”): {s, p} and {p, t}. The weight of each edge between “obj” terminal s and node p, {s, p}, is given by the background cost value for the associated voxel (times a constant): λ ⋅ R p (“bkg”). The weight of each edge between node p and “bkg” terminal t, {p, t}, is given by the object cost value for the associated voxel (times a constant): λ ⋅ R p (“obj”). When determining the minimum s–t cut of this constructed graph, the nodes remaining connected to the “obj” terminal s correspond to the voxels belonging to the segmented object (i.e., the voxels that should be given a label “obj”). Figure 8 illustrates the graph and associated minimum cut associated with the toy example shown in Figs. 6 and 7. The associated labeling (segmentation) associated with the cut is also shown. The edges cut include the edges from the nodes in the final object set (with a label “obj”) to the background terminal t (with each p in this set contributing λ ⋅ R p (“obj”) to the cut cost), the edges from object terminal s to the nodes in the final background set (with label “bkg”, with each p in this set contributing λ ⋅ R p (“bkg”) to the cut cost) and each of the edges between a node in the object set and a node in the background set (with each such pair of edges {p, q} contributing B p, q to the cut cost). Thus, with the above graph construction, the value of the minimum-cut corresponds to the value of the minimal labeling according to Eq. (3).

Note that the above formulation assumes an undirected edge-weighted graph, but many s–t cut algorithms are inherently designed for directed edge-weighted graphs. However, the conversion to the corresponding directed graph is straightforward. In particular, each edge {s, p} between terminal s and node p becomes directed with s as the tail and p as the head. Similarly, each edge {p, t} between node p and terminal t becomes directed with p as the tail and t as the head. Each edge {p, q} between neighbors p and q becomes two edges (each with the same weight): one from node p to node q and one from node q to node p.

When using graph cuts approaches, it is also common to specify a set of “seed” object and background voxels (i.e., voxels you wish to constrain as being labeled as object and background, respectively). Let \(\mathcal{O}\) be the set of object seeds and \(\mathcal{B}\) be the set of background seeds. You can enforce this constraint in the minimum s–t cut graph representation by modifying the edge weight from \(s\) to every node in \(\mathcal{O}\) to have a weight of infinity (intuitively, ensuring the edges from the object terminal to those nodes that correspond to seed object voxels will not be cut) and modifying the edge weight of every node in \(\mathcal{B}\) to t to have a weight of infinity (intuitively, ensuring the edges from nodes that correspond to seed background voxels and the background terminal will not be cut). It is also common to modify the weight of edges {s, p} where \(p \in \mathcal{B}\) and {p, t} where \(p \in \mathcal{O}\) to have a weight of zero, but this is not theoretically necessary (assuming the original weights were also not infinity).

3.3 Example Applications and Extensions

3.3.1 Multi-Label (Multi-Region) Extensions

A multi-label extension of the original graph cuts energy function in Eq. (3) follows:

$$\displaystyle{ E(A) =\lambda \cdot \sum _{p\in \mathcal{P}}R_{p}(A_{p}) +\sum _{\{p,q\}\in \mathcal{N}}B_{p,q}(A_{p},A_{q}) \cdot \delta _{A_{p}\neq A_{q}}\;, }$$
(6)

where \(A \in \{\mathrm{obj}_{1},\mathrm{obj}_{2},\ldots,\mathrm{obj}{_{N}\}}^{\vert \mathcal{P}\vert }\) is a multi-object image labeling. However, unfortunately, solving this particular multi-label problem is NP-hard in the general case [3, 6]. Thus, approximation algorithms, such as the α β-swap algorithm and the α-expansion algorithm, have been proposed for efficiently obtaining a close-to-optimal solution [3]. At a high level, both the α β-swap algorithm and the α-expansion algorithm are examples of a local search approach (Fig. 9), where, after assigning an arbitrary initial labeling as the current candidate solution, a best “neighboring” solution is generated and saved as the current candidate solution if it has a lower cost based on Eq. (6). The generation of “neighboring” candidate solutions (and saving them as the current solution if they have a lower cost) is repeated until convergence. The key clever idea behind both approaches is in the efficient generation of the best “neighboring” solution. In the α β-swap algorithm, a neighboring candidate solution is one in which, after an arbitrary selection of any two labels α and β, every pixel with label α or β maintains its current label or switches (“swaps”) to the other label (Fig. 10). In the α-expansion algorithm, a neighboring candidate solution is one in which after arbitrary selection of label α every pixel maintains its current label or switches to the label α (Fig. 11). With either definition of a neighboring solution, determination of the best neighboring solution can be obtained by solving a single graph cut problem as only a binary labeling choice needs to be made at each voxel location (i.e., α versus β for the α β-subset of voxels in the α β-swap algorithm and α versus “original label” for all voxels in the α-expansion algorithm). In practice, the α-expansion algorithm is often more effective than the α β-swap algorithm, but care must be taken to ensure that submodularity requirements of the cost function are satisfied [3, 13].

Fig. 9
figure 9

The α β-swap algorithm and the α-expansion algorithm are two local search approaches for efficiently determining a close-to-optimal solution to the multi-label graph cuts problem with the energy function of Eq. (6) [3]. In both cases, determination of the best “neighboring” solution can be obtained by solving a single graph cuts problem

Fig. 10
figure 10

Two example α β-swap moves from initial input labeling on the left. In this example, α and β were arbitrarily selected as obj2 and obj4, respectively. Valid neighboring solutions are thus labelings where any pixels originally labeled as obj2 keep their original label or change to label obj4 and any pixels originally labeled as obj4 keep their original label or change to label obj2. (Pixels with labels other than obj2 and obj4 must keep their original label.) The binary nature of these moves means that the best move can be determined with a single graph cut

Fig. 11
figure 11

Two example α-expansion moves from initial input labeling on the left. In this example, α was arbitrarily selected as obj4. Valid neighboring solutions are thus labelings where any pixels keeps its original label or changes to label obj4 (thus “expanding” the number of pixels labeled as obj4). The binary nature of these moves means that the best move can be determined with a single graph cut

While approximation algorithms discussed above may work well for solving the multi-label optimization problem for certain application domains, by recognizing special cases (e.g., through additional geometric constraints) of the multi-label/multi-region labeling problem, certain multi-label/multi-region problems can be actually solved optimally. (Of course, the binary graph cuts formulation using the energy term of Eq. (3) is an obvious example of a special case.) For example, through the addition of specific geometric interaction constraints (in part, motivated from the multi-surface graph-based approach discussed in Sect. 4) such as the fact that “region C contains region A,” “region D contains region B,” and “region C excludes region D,” Delong and Boykov [13] have proposed a formulation of the multi-region graph cuts approach for which an optimal solution can be found with a single graph cut.

3.3.2 Co-segmentation of Tumors in PET–CT Images

Positron emission tomography–computed tomography (PET–CT) has revolutionized modern cancer imaging. The integrated PET–CT, by adding precise anatomic localization to functional imaging, currently provides the most accurate information available on tumor extent and distribution for various common cancers. It has been increasingly playing an indispensable role in cancer treatment planning, therapy response assessment, and tumor staging. This section presents an optimal co-segmentation method for tumor delineation from PET–CT scans with the extension of the graph-cut method [14].

Modeling PET–CT Co-segmentation. The basic idea is to formulate the simultaneous segmentation of tumor from PET–CT scans as minimizing the Markov Random Field (MRF) energy function, which consists of a data fidelity term and a boundary smoothness term. The data fidelity term measures how well voxels fit into the object or background model. The boundary term penalizes the discontinuity between the object and background. More precisely, the MRF energy functions commonly used for segmentation on the individual PET and CT images (denoted by \(\mathcal{I}_{\mathrm{PET}}\) and \(\mathcal{I}_{\mathrm{CT}}\), respectively) are as follows:

$$\displaystyle\begin{array}{rcl} \mathcal{E}_{\mathrm{PET}}({f}^{P})& =& \sum _{ p\in \mathcal{I}_{\mathrm{PET}}}\mathcal{D}_{p}^{P}(f_{ p}^{P}) +\lambda _{ 1}\sum _{(p,q)\in \mathcal{N}_{P}}\mathcal{V}_{\mathit{pq}}^{P}(f_{ p}^{P},f_{ q}^{P}){}\end{array}$$
(7)
$$\displaystyle\begin{array}{rcl} \mathcal{E}_{\mathrm{CT}}({f}^{C})& =& \sum _{ p\in \mathcal{I}_{\mathrm{CT}}}\mathcal{D}_{p}^{C}(f_{ p}^{C}) +\lambda _{ 2}\sum _{(p,q)\in \mathcal{N}_{C}}\mathcal{V}_{\mathit{pq}}^{C}(f_{ p}^{C},f_{ q}^{C}){}\end{array}$$
(8)

\(\mathcal{D}_{p}^{P}(f_{p}^{P})\) is the individual penalty for assigning a voxel p to “object” or “background” in the PET \(\mathcal{I}_{\mathrm{PET}}\); \(\mathcal{V}_{\mathit{pq}}^{P}(f_{p}^{P},f_{q}^{P})\) is the penalty for assigning labels \(f_{p}^{P}\) and \(f_{q}^{P}\) to two neighboring voxels p and q according to the neighborhood setting \(\mathcal{N}_{P}\). \(\mathcal{D}_{p}^{C}(f_{p}^{C})\) and \(\mathcal{V}_{\mathit{pq}}^{C}(f_{p}^{C},f_{q}^{C})\) have the same meaning for \(\mathcal{I}_{\mathrm{CT}}\). Most close voxels are expected to have the same label (“object” or “background”). Thus, one may expect no penalty if neighboring voxels have the same label and a penalty w pq otherwise. We can then apply the graph cut method in Sect. 3.2 to segment the target tumor from PET and CT, respectively, by minimizing the energy functions \(\mathcal{E}_{\mathrm{PET}}({f}^{P})\) and \(\mathcal{E}_{\mathrm{CT}}({f}^{C})\), which, however, does not utilize the information from the other modality.

To co-segment the tumor from both PET and CT scans, an additional PET–CT context term \(\mathcal{E}_{\mathrm{PET}-\mathrm{CT}}\) is introduced to the energy function, which penalizes the segmentation difference between two image datasets. Without loss of generality (WLOG), assume that the PET and CT images are well registered. Let (p, p′) denote a pair of corresponding voxels in \(\mathcal{I}_{\mathrm{PET}}\) and \(\mathcal{I}_{\mathrm{CT}}\). The label difference is penalized with \(\delta _{pp^{\prime}}(f_{p}^{P},f_{p^{\prime}}^{C})\) for p and p′, with

$$\displaystyle\begin{array}{rcl} & & \delta _{pp^{\prime}}(f_{p}^{P},f_{ p^{\prime}}^{C}) = \left \{\begin{array}{l l} g(p,p^{\prime})&\mathrm{if}\ f_{p}^{P}\neq f_{ p^{\prime}}^{C} \\ 0 &\mathrm{if}\ f_{p}^{P} = f_{p^{\prime}}^{C}\\ \end{array},\right.{}\end{array}$$
(9)

where g(p, p′) > 0 is employed to penalize the disagreement between labels of corresponding voxels p and p′. The PET–CT context term then takes the form:

$$\displaystyle{ \mathcal{E}_{\mathrm{PET--CT}}({f}^{P},{f}^{C}) =\sum _{ (p,p^{\prime})\mbox{ with }p\in \mathcal{I}_{\mathrm{PET}},p^{\prime}\in \mathcal{I}_{\mathrm{CT}}}\delta _{pp^{\prime}}(f_{p}^{P},f_{ p^{\prime}}^{C}) }$$
(10)

Note that the PET–CT context constraint is soft with \(\delta _{pp^{\prime}}(f_{p}^{C},f_{p^{\prime}}^{P}) < +\infty \). It is not assumed that the tumor contour in CT is identical to that in PET. Hence, the uncertainties of imaging and registration are accommodated in this model. The energy function of the co-segmentation is then defined as follows:

$$\displaystyle{ \mathcal{E}_{\mathrm{cs}}({f}^{P},{f}^{C}) = \mathcal{E}_{\mathrm{ PET}}({f}^{P}) + \mathcal{E}_{\mathrm{ CT}}({f}^{C}) + \mathcal{E}_{\mathrm{ PET}-\mathrm{CT}}({f}^{P},{f}^{C}) }$$
(11)

Most previous co-segmentation algorithms in computer vision [1518] work on a pair of images with similar (or nearly identical) foregrounds and unrelated backgrounds, and explicitly make use of histogram matching, which makes the models computationally intractable. Batra et al. recently applied the graph cut method for co-segmentation [19], in which a common appearance model is assumed across all the segmented images. However, the PET and CT images may not have such a common appearance model. The co-segmentation scheme here provides a more flexible PET–CT context term \(\mathcal{E}_{\mathrm{PET}-\mathrm{CT}}\) to make use of the dual modalities information.

Fig. 12
figure 12

Graph construction of G with two subgraphs G CT and G PET for the co-segmentation of PET–CT images [14]. Three types of arcs are introduced. The orange arcs (n-links) encode the boundary terms; the brown arcs (t-links) encode data terms; and the green arcs (d-links) enforce the PET–CT context term in the co-segmentation energy function

Optimization. The co-segmentation problem is solved by extending the graph cut method in Sect. 3.2 with a computation of a minimum-cost st cut in a transformed graph G, which admits a globally optimal solution in low-order polynomial time.

The graph G consists of two node-disjoint subgraphs G PET and G CT, each being used for the search of the tumor in the PET \(\mathcal{I}_{\mathrm{PET}}\) and the CT \(\mathcal{I}_{\mathrm{CT}}\). The construction of each G PET and G CT follows the graph cut method in Sect. 3.2 to encode the energy terms \(\mathcal{E}_{\mathrm{PET}}\) and \(\mathcal{E}_{\mathrm{CT}}\), respectively. To enforce the PET–CT context term \(\mathcal{E}_{\mathrm{PET}-\mathrm{CT}}\), additional inter-subgraph arcs are introduced between G PET and G CT. For every pair of corresponding voxels (p, p′) with \(p \in \mathcal{I}_{\mathrm{CT}}\) and \(p^{\prime} \in \mathcal{I}_{\mathrm{PET}}\), two directed arcs are added between the corresponding nodes of the two subgraphs in opposite directions. The weight of each arc is assigned to penalize the labeling difference of the two voxels p and p′ (i.e., \(\delta _{pp^{\prime}}(f_{p}^{P},f_{p^{\prime}}^{C})\) with \(f_{p}^{P}\not =f_{p^{\prime}}^{C}\)). Figure 12 illustrates the construction of the graph G, with green arcs encoding the PET–CT context term \(\mathcal{E}_{\mathrm{PET}-\mathrm{CT}}\).

As shown in [14], the minimum-cost st cut \({\mathcal{C}}^{{\ast}} = ({A}^{{\ast}},{\overline{A}}^{{\ast}})\) in G defines an optimal delineation of tumor in both PET and CT images. The target tumor volume on the CT image includes those voxels whose corresponding nodes in G CT belong to the source set A . Similarly, the segmented tumor volume on the PET image is given by those voxels whose associated nodes in G PET belong to the source set A .

Fig. 13
figure 13

Typical tumor segmentation in the transverse (left), coronal (middle), and sagittal (right) views [14]. (ac) 2D slices of a 3D CT image with the reference standard (red) and outlines of spherical initialization (green and yellow). (df) Proposed co-segmentation results in the CT image. (gi) Co-segmentation results in the PET image

Typical lung tumor segmentation in three (transverse, coronal and sagittal) views are shown in Fig. 13. The co-segmentation method demonstrated accurate PET and CT segmentation results [14].

4 Optimal Layered Graph Search for Image Segmentation

This section presents the theory of the optimal layered graph search method[711], also sometimes called the LOGISMOS approach, which is applicable to the multi-surface segmentation of terrain-like surfaces (represented by orthogonal n-D graphs), tubular objects (cyclic regular and irregular graphs), objects with complex shapes, and multiple partially interacting objects. WLOG, we focus on using terrain-like surfaces and 4-neighborhoods to discuss the optimal layered graph search method. However, the simpler principles used for this illustration are directly applicable to arbitrarily irregular surfaces.

4.1 Optimal Surface Detection Problems

Let \(\mathcal{I}\) be a given 3D volumetric image with size of n = X × Y × Z. For each (x, y) pair, 0 ≤ x < X and 0 ≤ y < Y, the voxels with different z-coordinates, that is, the voxel subset \(\{\mathcal{I}(x,y,z)\ \vert \ 0 \leq z < Z\}\), form a column parallel to the z-axis, denoted by Col(x, y). Two columns are neighboring if their (x, y)-coordinates satisfy some neighborhood conditions. For example, under the 4-neighboring setting, the column Col(x, y) is neighboring to Col(x′, y′) if | xx′ | + | yy′ |  = 1. Henceforth, a model of the 4-neighboring setting is used; this simple model can be easily extended to other neighboring settings. Each of the target terrain-like surfaces contains one and only one voxel in each column of \(\mathcal{I}\) (Fig. 14). The feasibility of the target surfaces is governed by the surface smoothness and separation constraints. The surface smoothness constraint is specified by two smoothness parameters, Δ x and Δ y , which define the maximum allowed change in the z-coordinate of a surface along each unit distance change in the x and y dimensions, respectively. If \(\mathcal{I}(x,y,z^{\prime})\) and \(\mathcal{I}(x + 1,y,z^{\prime\prime})\) (resp., \(\mathcal{I}(x,y + 1,z^{\prime\prime})\)) are two (neighboring) voxels on a feasible surface, then \(\vert z^{\prime} - z^{\prime\prime}\vert \leq \varDelta _{\mathbf{x}}\) (resp., | z′ − z″ | ≤ Δ y ) (Fig. 15a). In multiple surface detection, for each pair of the target surfaces S and S′, we use two parameters, δ l ≥ 0 and δ u ≥ 0, to represent the surface separation constraint, which defines the relative positioning and the distance range of the two surfaces. That is, if \(\mathcal{I}(x,y,z) \in S\) and \(\mathcal{I}(x,y,z^{\prime}) \in S^{\prime}\), we have \({\delta }^{l} \leq z^{\prime} - z {\leq \delta }^{u}\) for every (x, y) pair (Fig. 15b). A set of λ surfaces \(\mathcal{S} =\{ S_{1},S_{2},\ldots,S_{\lambda }\}\) are considered feasible if each individual surface in the set satisfies the given surface-specific smoothness constraints and if each pair of surfaces satisfies the surface separation constraints.

Fig. 14
figure 14

Optimal surface detection. (a) An example 3D retinal optical coherence tomography (OCT) image with surfaces of interest specified. (b) The schematic surfaces and the orientation

Fig. 15
figure 15

The surface feasibility. (a) The surface smoothness constraint. (b) The surface separation constraint

Each voxel \(\mathcal{I}(x,y,z)\) may be considered as having an edge-based real valued cost b i (x, y, z) (called an on-surface cost) associated with it for each target surface S i . As shown in Fig. 14b, the λ surfaces form λ + 1 regions \(\{R_{0},R_{1},\ldots,R_{\lambda }\}\). For each region R i (i = 0, 1, , λ), every voxel \(\mathcal{I}(x,y,z)\) is assigned a real-valued region-based cost c i (x, y, z) (called an in-region cost). The edge-based cost of each voxel in \(\mathcal{I}\) is inversely related to the likelihood that it may appear on a desired surface, while the region-based costs c i (⋅ ) (i = 0, 1, , λ) measure the inverse likelihood of a given voxel preserving the expected regional properties (e.g., homogeneity) of the partition \(\{R_{0},R_{1},\ldots,R_{\lambda }\}\). Then, the total energy \(\mathcal{E}(\mathcal{S})\) induced by the λ surfaces in \(\mathcal{S}\) is defined as (Fig. 16)

Fig. 16
figure 16

Example schematic cost of two surfaces for the optimal surface detection problem. The two surfaces divide the volume into three regions

$$\displaystyle\begin{array}{rcl} \mathcal{E}(\mathcal{S}) =\sum _{ i=1}^{\lambda }b_{ i}(S_{i}) +\sum _{ i=0}^{\lambda }c_{ i}(R_{i}) =\sum _{ i=1}^{\lambda }\sum _{ \mathcal{I}(x,y,z)\in S_{i}}b_{i}(x,y,z) +\sum _{ i=0}^{\lambda }\sum _{ \mathcal{I}(x,y,z)\in R_{i}}c_{i}(x,y,z).& &{}\end{array}$$
(12)

The optimal surface detection (OSD) problem on medical images is: Given a 3D image \(\mathcal{I}\) and an integer λ > 0, find a feasible set of λ surfaces \(\mathcal{S} =\{ S_{1},S_{2},\ldots,S_{\lambda }\}\), such that the total cost \(\mathcal{E}(\mathcal{S})\) is minimized.

The OSD problem (for which λ = 1) was first proposed for the cardiac border detection by Thedens et al. [20, 21]. In the past decades, variations of the OSD problem have been studied extensively in theoretical computer science, computer vision, and operations research. The most related problems include the metric labeling problem in theoretical computer science or so-called MRF optimization problem in computer vision, which introduces an additional pairwise penalty energy in the objective function. The metric labeling problem captures a broad range of classification problems where the quality of a labeling depends on the pairwise relations between the underlying set of objects, such as image restoration [3, 22], image segmentation [2326], visual correspondence [27, 28], and deformable registration [29]. After introduced by Kleinberg and Tardos [30], it has been studied extensively in theoretical computer science [3034]. The best known approximation algorithm for the problem is an O(logL) (L is the number of labels) [30, 33] and has no \(\varOmega (\sqrt{\log L})\) approximation unless NP has quasi-polynomial time algorithms [32]. Due to the application nature of the problem, researchers in image processing and computer vision have also developed a variety of good heuristics that use classical combinatorial optimization techniques, such as network flow and local search (e.g., [2224, 27, 35]), for solving some special cases of the metric labeling problem. This section focuses on polynomial-time solutions to the OSD problem.

4.2 Overview of the OSD Algorithms

The basic idea for solving the OSD problem is to formulate it as computing a minimum-cost closed set in a directed graph with arbitrary node weights. A closed set \(\mathcal{C}\) in a directed graph with real-valued node weights is a subset of graph nodes such that there is no arc from a node in \(\mathcal{C}\) to a node in the complement of \(\mathcal{C}\). The cost of a closed set \(\mathcal{C}\) is simply the total sum of the weights of all nodes in \(\mathcal{C}\). It is well known that a minimum-cost closed set can be computed in low-order polynomial time by solving a minimum st cut problem [25, 36]. The solution to the OSD problem is built upon novel observations that capture the self-closure structures of the OSD problem, which relate the target problem to the minimum-cost closed set problem. The OSD algorithm uses the following three major steps:

  • Graph construction. Construct a node-weighted directed graph G = (V, E), which consists of λ node-disjoint subgraphs \(G_{i} = (V _{i},E_{i})\). Each subgraph G i is used for identifying the ith surface in \(\mathcal{I}\). The construction hinges on the self-closure structures of the OSD problem. The surface smoothness and separation constraints are enforced with graph arcs; while the optimality of the solution is encoded with node weights.

  • Computing a minimum-cost closed set. Compute a minimum-cost nonempty closed set \({\mathcal{C}}^{{\ast}}\) in G, which is solvable by computing a minimum st cut in an arc-weighted directed graph transformed from G.

  • Surface recovery. The set of λ optimal surfaces is recovered from the minimum-cost closed set \({\mathcal{C}}^{{\ast}}\) with each surface being specified by the envelop of \({\mathcal{C}}^{{\ast}}\cap V _{i}\).

4.3 Optimal Single Surface Detection

This section presents the algorithm for solving the optimal single surface detection (OSSD) problem (in which λ = 1).

To facilitate the discussion of transforming the OSD to a minimum-cost closed set problem, we introduce the concept of bottom-most neighbors and the intra-layer self-closure structure of the OSSD problem. For a voxel \(\mathcal{I}(x,y,z)\) and each neighboring column Col(x′, y′) of Col(x, y), the bottom-most neighbor of \(\mathcal{I}(x,y,z)\) on Col(x′, y′) is the voxel on Col(x′, y′) with the smallest z-coordinate that can appear together with \(\mathcal{I}(x,y,z)\) on the same feasible surface in \(\mathcal{I}\) (Fig. 17a). Note that the bottom-most neighbor of \(\mathcal{I}(x,y,z)\) on Col(x + 1, y) (resp., Col(x, y + 1)) is the voxel \(\mathcal{I}(x + 1,y,\max \{z -\varDelta _{\mathbf{x}},0\})\) (resp, \(\mathcal{I}(x,y + 1,\max \{z -\varDelta _{\mathbf{y}},0\})\)). For a feasible surface S, denote by S(x, y) the z-coordinate of the voxel \(\mathcal{I}(x,y,z)\) on the surface S. We say that a voxel \(\mathcal{I}(x,y,z)\) is below a surface S if S(x, y) > z, and denote by BL(S) all the voxels of \(\mathcal{I}\) that are on or below S. A key observation is that for any feasible surface S in \(\mathcal{I}\) , the bottom-most neighbors of every voxel in BL(S) are also contained in BL(S) (Fig. 17b). This intra-layer self-closure property relates our target problem to the minimum closed set problem. In our approach, instead of finding an optimal surface S directly, we seek in \(\mathcal{I}\) a voxel set BL(S ), which uniquely defines the surface S .

Fig. 17
figure 17

Modeling single surface detection. An example 2D slice from a 3D image is used. The surface smoothness parameter Δ x  = 1. (a) Dashed line arrows point to lowest-neighbors of a voxel. (b) Illustrating the intra-layer self-closure structure. The surface S is a feasible one while S′ is not. The red line arrow indicates the violation of the smoothness constraint for S′. (c) The constructed graph enforces the surface geometry. The minimum-cost closed set in (d) consisting of all purple nodes defines the optimal surface in (e)

4.3.1 Graph Construction

The construction of the directed node-weighted graph G = (V, E) hinges on the intra-layer self-closure structure of the OSSD problem. Every node v(x, y, z) ∈ V represents one and only one voxel \(\mathcal{I}(x,y,z) \in \mathcal{I}\). G can thus be viewed as a geometric graph defined on a 3D grid. Arcs are added to G to make sure that each closed set includes all the nodes associated with the corresponding surface voxels plus all those below the surface. This is done by adding two types of arcs: intracolumn arcs and intercolumn arcs. The intracolumn arcs ensure that all nodes below a given node (within one column) are also included in the closed set. The intercolumn arcs ensure that the smoothness constraints are satisfied. As an example in Fig. 17c, we consider the added arcs for one node v not involved in boundary conditions to avoid cluttering the exposition of our key ideas. It will be associated with two intracolumn arcs: one directed towards the node below it in the column and one from the node above it (red arcs in Fig. 17c). Two intercolumn arcs will also exist for each neighboring column in the x-direction (y-direction): one directed to the bottom-most neighbor of v on the neighboring column and one from the node on the neighboring column whose bottom-most neighbor on the column of v is node v (green arcs in Fig. 17c).

To finish the construction of the graph G, we need to encode the energy function Eq. (12) by assigning appropriate weights to the nodes in G. For each node v(x, y, z) ∈ V corresponding the voxel \(\mathcal{I}(x,y,z)\), its weight w(x, y, z) is defined as follows:

$$\displaystyle\begin{array}{rcl} w(x,y,z) = \left \{\begin{array}{@{}l@{\quad }l@{}} b(x,y,z) + [c_{0}(x,y,z) - c_{1}(x,y,z)] \quad &\text{if }z = 0, \\{} [b(x,y,z) - b(x,y,z - 1)] + [c_{0}(x,y,z) - c_{1}(x,y,z)]\quad &\text{if }z > 0. \end{array} \right.& &{}\end{array}$$
(13)

The following theorem is the key for solving the OSSD problem as computing a minimum-cost nonempty closed set in G.

Theorem 1 ([7, 8]).

(1) Any finite nonempty closed set \(\mathcal{C}\) in G specifies a feasible surface S in \(\mathcal{I}\) whose total cost \(\mathcal{E}(S)\) differs from the cost of \(\mathcal{C}\) by a constant. (2) Any feasible surface S in \(\mathcal{I}\) corresponds to a nonempty closed set \(\mathcal{C}\) in G whose cost differs from \(\mathcal{E}(S)\) by a constant.

4.3.2 Computing a Minimum-Cost Nonempty Closed Set

It is well known that a minimum-cost closed set in a directed node-weighted graph can be computed by solving a minimum st cut problem [25, 36]. However, if there is no “negative” cost closed set, the minimum-cost closed set is empty, which offers little useful information for recovering the target surface. In fact, we need to find a minimum-cost “nonempty” closed set.

Based on the construction of G, the bottom-most nodes of all columns form a closed set \(\mathcal{C}_{0}\), that is, \(\mathcal{C}_{0} =\{ v(x,y,0)\ \vert \ 0 \leq x < X,0 \leq y < Y \}\). Note that \(\mathcal{C}_{0}\) is a subset of any nonempty closed set in G. If the minimum-cost closed set in G is empty, it indicates that the cost of every nonempty closed set in G is non-negative. We do the following transformation to reduce the cost of each closed set by a constant and to make sure that at least one closed set whose cost is negative after the transformation. Let C be the total cost of nodes in \(\mathcal{C}_{0}\); If C ≥ 0, choose arbitrarily a node \(v(x^{\prime},y^{\prime},0) \in \mathcal{C}_{0}\) and assign a new weight w(x′, y′, 0) − C − 1 to v(x′, y′, 0). After this transformation, the total cost of the closed set \(\mathcal{C}_{0}\) is negative. Since \(\mathcal{C}_{0}\) is a subset of any nonempty closed set in G, the cost of any nonempty closed set is reduced by (C + 1). Hence, the minimum-cost nonempty closed set in G is not changed after the transformation and it is not empty. Thus, we can simply apply the algorithm in [25, 36] to find the minimum-cost closed set \({\mathcal{C}}^{{\ast}}\) in G after performing the transformation. \({\mathcal{C}}^{{\ast}}\) is then a minimum-cost nonempty closed set in G before the transformation.

4.3.3 Surface Recovery

From a minimum-cost nonempty closed set \({\mathcal{C}}^{{\ast}}\), an optimal surface S in \(\mathcal{I}\) can be defined as follows. For each (x, y)-pair, let \(\mathcal{C}(x,y) \subset \mathcal{C}\) be the set of nodes on the column Col(x, y) of G. Due to the construction of G, we have \(\mathcal{C}_{0} \subset {\mathcal{C}}^{{\ast}}\). Thus, \(\mathcal{C}(x,y)\) is not empty. Let z x, y denote the largest z-coordinate of the nodes in \(\mathcal{C}(x,y)\). Then, the optimal surface S is defined as S(x, y) = z x, y for every (x, y)-pair. Figure 17e shows an example of an optimal surface defined by a minimum-cost closed set in Fig. 17d.

4.4 Optimal Multiple Surface Detection

This section presents the algorithm for solving the Optimal Multiple Surface Detection (OMSD) problem, i.e., simultaneous detection of λ > 1 interrelated surfaces in a 3D image \(\mathcal{I}\) such that the total cost of the λ surfaces is minimized.

In simultaneously detecting multiple distinct but interrelated surfaces, the optimality is not only determined by the inherent costs and smoothness properties of the individual surfaces, but also confined by their interrelations (i.e., the surface separate constraints). Obviously, computing each of the λ surfaces individually using our algorithm in Sect. 4.3 does not work well. The solution thus obtained might even be infeasible. In addition, the integration of the region-based costs makes the problem more complicated.

To efficiently solve the OMSD problem, the intrinsic common characteristics of the smoothness and the separation constraints are explored to extend the OSSD technique to multiple surface detection. Assume that \(\mathcal{S} =\{ S_{1},S_{2},\ldots,S_{\lambda }\}\) is a feasible set of λ surfaces in \(\mathcal{I}\) with S i+1 being “on top” of S i . For each pair of the sought surfaces, S i and S i+1, two parameters \(\delta _{i}^{l} \geq 0\) and \(\delta _{i}^{u} \geq 0\) are used to specify the surface separation constraint (note that the separation constraint can be defined between any pair of the surfaces in \(\mathcal{S}\)). First, consider each individual surface \(S_{i} \in \mathcal{S}\). Recall that BL(S i ) denotes the subset of all voxels of \(\mathcal{I}\) that are on or below S i . As in Sect. 4.3, we observe that each BL(S i ) also has the intra-layer self-closure structure. However, the OMSD problem is more involved since the λ surfaces in \(\mathcal{S}\) are inter-related. The following observation reveals the common essential structure between the smoothness and the separation constraints, leading to further study the closure structure between the BL(S i ) and BL(S i+1). One may view the 3D image \(\mathcal{I}\) as a set of X 2D slices embedded in the \(\mathbf{yz}\)-plane. Thus, a feasible surface S of \(\mathcal{I}\) is participated into X z-monotone curves with each in a 2D slice. Observe that each feasible z-monotone curve is subjective to the smoothness constraint in the corresponding slice, and any pair of adjacent z-monotone curves expresses to meet the analogical separation constraints. This observation suggests that the surface separation constraint in a d-D image may be viewed as the surface smoothness constraint in the (d + 1)-D image consisting of the stack of a sequence of λ d-D images. Hence, we intend to map the detection of λ optimal surfaces in d-D to the problem of finding a single optimal surface in (d + 1)-D. To distinguish the self-closure structures, define below the upstream and downstream voxels of any voxel \(\mathcal{I}(x,y,z)\) in \(\mathcal{I}\) for the given surface separation parameters \(\delta _{i}^{l}\) and \(\delta _{i}^{u}\): if \(\mathcal{I}(x,y,z) \in S_{i}\), then the ith upstream (resp., downstream) voxel of \(\mathcal{I}(x,y,z)\) is the voxel on column Col(x, y) with the smallest z-coordinate that can be on S i+1 (resp., S i−1). Together with the intra-layer self-closure structures, the following inter-layer self-closure structure is the key for solving the OMSD problem: Given any set \(\mathcal{S}\) of λ feasible surfaces in \(\mathcal{I}\) , the ith upstream (resp., downstream) voxel of each voxel in BL(S i ) is in BL(S i+1 ) (resp., BL(S i−1 )), for every 1 ≤ i < λ (resp., 1 < i ≤λ). Both the intra-layer and inter-layer self-closure structures together bridge the OMSD problem and the minimum closed set problem.

Fig. 18
figure 18

Modeling multiple surface detection. An example 2D slice from a 3D image is used. Two surfaces are sought and the distance in between ranges from δ l = 1 to δ u = 2. The surface smoothness parameter Δ x  = 1. (a) Illustrating the inter-layer self-closure structure. For the visualization purpose, the slice is duplicated and the sought two surfaces are visualized in the two separated slices. The red dashed line arrows point to the upstream voxels and the blue dashed line arrows point to the downstream voxels. S 1 and S 2 are two feasible surfaces, but S 1 and S2 are not. The red line arrow with a mark “X” indicates the violation of the surface separation constraint. (b) The constructed graph to enforce the surface separation constraints (arcs are only shown for the first and the last columns). The envelopes of the minimum-cost closed set in (c) consisting of all shaded nodes defines the optimal surfaces in (d)

4.4.1 Graph Construction

The graph G = (V, E) constructed for solving the OMSD problem consists of λ node-disjoint subgraphs \(\{G_{i} = (V _{i},E_{i})\:\ i = 1,2,\ldots,\lambda \}\). Each G i is constructed as in Sect. 4.3 to reflect the intra-layer self-closure structure of each surface and is used for searching the ith surface S i . Every node \(v_{i}(x,y,z) \in V _{i}\) represents one and only one voxel \(\mathcal{I}(x,y,z)\). The intracolumn and intercolumn arcs are added in E i to reflect the intra-layer self-closure structure of surface S i , which enforce the monotonicity and the smoothness constraint of each sought surface. The separation constraints between any two surfaces S i and S j are enforced in G by a set of arcs E s , connecting the corresponding subgraphs G i and G j , in a way to reflect the inter-layer self-closure structure. Suppose for the two sought surfaces S i and S j , the prior knowledge puts S i below S j (Fig. 18a in which i = 1 and j = 2). Assume that the distance between S i and S j ranges from \(\delta _{\mathit{ij}}^{l} = 1\) to \(\delta _{\mathit{ij}}^{u} = 2\). For convenience, denote by Col i (x, y) (Col j (x, y)) the column of nodes in G i (G j ) corresponding to the column Col(x, y) of voxels in \(\mathcal{I}\). The separation constraints between S i and S j are incorporated into G in the following way. For each node v i (x, y, z) on Col i (x, y) with \(z < Z -\delta _{\mathit{ij}}^{l}\), as illustrated in Fig. 18b, an arc is put in E s from v i (x, y, z) to \(v_{i}(x,y,z +\delta _{ \mathit{ij}}^{l})\) in G i (red arcs in Fig. 18b). Note that \(\mathcal{I}(x,y,z +\delta _{ \mathit{ij}}^{l})\) is the upstream voxel of \(\mathcal{I}(x,y,z)\) (red dashed line arrows in Fig. 18a). On the other hand, each node v j (x, y, z) on Col j (x, y) with \(z \geq \delta _{\mathit{ij}}^{l}\) has an arc in E s to \(v_{i}(x,y,{z}^{o}) \in \mathrm{Col}_{i}(x,y)\) with \({z}^{o} =\max \{ 0,z -\delta _{\mathit{ij}}^{u}\}\) (blue arcs in Fig. 18b). \(\mathcal{I}(x,y,{z}^{o})\) is the downstream voxel of \(\mathcal{I}(x,y,z)\) (blue dashed line arrows in Fig. 18a). The node set of G is the union of the node sets of the λ subgraphs (i.e., \(V = \cup _{i=1}^{\lambda }V _{i}\)), and the arc set of G is the union of the arc sets of all the subgraphs plus E s (i.e., \(E = \cup _{i=1}^{\lambda }E_{i} \cup E_{s}\)).

To encode the objective function \(\mathcal{E}(\mathcal{S})\) in the graph model, we assign the node weight for each node v i (x, y, z) as follows:

$$\displaystyle\begin{array}{rcl} w_{i}(x,y,z) = \left \{\begin{array}{ll} b_{i}(x,y,z) + \left [c_{i-1}(x,y,z) - c_{i}(x,y,z)\right ] &\mbox{ if $z = 0$,} \\ \left [b_{i}(x,y,z)-b_{i}(x,y,z - 1)\right ]+\left [c_{i-1}(x,y,z)-c_{i}(x,y,z)\right ]&\mbox{ for $z > 0$.} \end{array} \right.& &{}\end{array}$$
(14)
Fig. 19
figure 19

Schematic showing how the assignment of node weights encodes the total in-region cost of the objective function \(\mathcal{E}(\mathcal{S})\)

Figure 19 illustrates the intuition behind the node weight assignment to encode the objective function, in which two surfaces are simultaneously detected. Readers are referred to [8] for the formal proof.

4.4.2 Computing a Minimum-Cost Nonempty Closed Set

Note that our goal is to compute a minimum-cost nonempty closed set in G, which can be used to define an optimal set of λ surfaces in \(\mathcal{I}\). However, the constructed graph G so far does not yet work for this purpose. In the graph construction above for G i and G j , the node v i (x, y, z) with \(z \geq Z -\delta _{\mathit{ij}}^{l}\) has no arc to any node on Col j (x, y), and there is no arc from the node v j (x, y, z) with \(z <\delta _{ \mathit{ij}}^{l}\) to any node on Col i (x, y). Those nodes are called deficient nodes. The voxels corresponding to the deficient nodes cannot be on any corresponding feasible surfaces. We need to remove those deficient nodes from G. Otherwise, if a closed set \(\mathcal{C}\) in G includes a deficient node as the topmost node on a column that is in \(\mathcal{C}\), which is possible, then it does not define a set of λ feasible surfaces in \(\mathcal{I}\). However, the removal of deficient nodes may in turn cause other nodes in G to become deficient (i.e., the voxels associated with those nodes cannot be on any corresponding feasible surfaces). We can apply the deficient node pruning scheme in [8] to remove all the deficient nodes as well as those recursively caused by their removal, resulting in a graph G′ without any deficient nodes. Note that in the resulting graph G′ if any column Col i (x, y) = , then there is no feasible solution to the OMSD problem. We thus assume in the rest of this section that the OMSD problem admits feasible solutions. For each (x, y)-pair and every i = 1, 2, , λ, let \(z_{i}^{\mathrm{bot}}(x,y)\) and \(z_{i}^{\mathrm{top}}(x,y)\) be the smallest and largest z-coordinates of the nodes on the column Col i (x, y) of G′, respectively. The deficient node pruning process may remove some useful arcs whose ending nodes need to be changed. For any node v i (x, y, z) with \(z < z_{i}^{\mathrm{bot}}(x,y)\) in G, if it has an incoming arc, the ending node of the arc needs to be changed to the node \(v_{i}(x,y,z_{i}^{\mathrm{bot}}(x,y))\) in G′.

Denote by Z 0 the set of the bottom-most nodes (i.e., whose z-coordinate is \(z < z_{i}^{\mathrm{bot}}(x,y)\)) of every column Col i (x, y) in G′. Z 0 in fact is a closed set in G′, and for any nonempty closed set \(\mathcal{C}^{\prime}\) in G′, Z 0 is a subset of \(\mathcal{C}^{\prime}\) [8]. Thus, the same transformation as that for the OSSD problem can be applied to compute a minimum-cost nonempty closed set \({\mathcal{C}^{\prime}}^{{\ast}}\) in G′.

4.4.3 Surface Recovery

From the minimum-cost nonempty closed set \({\mathcal{C}^{\prime}}^{{\ast}}\) in G′, we define an optimal set \({\mathcal{S}}^{{\ast}}\) of λ surfaces, \({\mathcal{S}}^{{\ast}} =\{ S_{1}^{{\ast}},S_{2}^{{\ast}},\ldots,S_{\lambda }^{{\ast}}\}\), as follows. Let \({\mathcal{C}}^{{\ast}}\) denote the corresponding node set of \({\mathcal{C}^{\prime}}^{{\ast}}\) in G. Recall that we search for each surface \(S_{i}^{{\ast}}\) in the subgraph \(G_{i} = (V _{i},E_{i})\). For each i = 1, 2, , λ, let \(\mathcal{C}_{i} = \mathcal{C}\cap V _{i}\). Considering any (x, y)-pair, denote by \(\mathcal{C}_{i}(x,y)\) the set of nodes of \(\mathcal{C}_{i}\) that are on Col i (x, y) of G i . Then, the voxel \(\mathcal{I}(x,y,z)\) corresponding to the “top” node in \(\mathcal{C}_{i}(x,y)\) (i.e., the node whose z-coordinate is the largest among those in \(\mathcal{C}_{i}(x,y)\)) is on the ith optimal surface \(S_{i}^{{\ast}}\). We thus define an optimal set \({\mathcal{S}}^{{\ast}}\) of λ surfaces from the minimum-cost nonempty closed set \({\mathcal{C}^{\prime}}^{{\ast}}\).

Theorem 2 ([8]).

The OMSD problem is solvable by computing a minimum-cost closed set in a derived graph.

4.5 Optimal Surface Detection with Shape and Context Priors

Up to this point, in the OSD framework, only node weights are employed in the graph to represent the desired segmentation properties, and the desired surface smoothness is hard-wired as connectedness of neighboring columns. This representation limits the ability to incorporate a broader variety of a prior knowledge. The connectedness of one voxel to the voxels of its neighboring columns is basically of equal importance in the OSD framework discussed above, which prevents us from fully utilizing image edge information as well as from taking full advantage of shape priors. To make full use of prior information, an arc-weighted graph representation has been proposed, which utilizes the weights of both graph nodes and arcs to incorporate a wider spectrum of constraints [37]. Two additional pairwise terms are added into the energy function, which encode the shape and the context prior information using a set of convex functions. For optimization, the new pairwise terms are enforced by adding specific weighted arcs in the graph. A globally optimal solution is then computed by solving a single maximum flow problem in the graph, which corresponds to optimal surfaces.

4.5.1 Incorporation of Shape and Context Priors

To simplify the notation, we use p(x, y) or simply p to denote a column Col(x, y) in \(\mathcal{I}\). For surface S i , let S i (p) denote the z-coordinate of the voxel of the column p on S i .

The shape prior of a surface is incorporated by enforcing the surface height changes between pairs of neighboring columns. More specifically, for any pair of neighboring columns p and q, the shape change of surface S i between p and q can be measured by \(h_{\mathit{pq}}^{i} = S_{i}(p) - S_{i}(q)\). Assume that \(\overline{h}_{\mathit{pq}}^{i}\) represents the learned shape change model. The deviation of the shape changes from the learned model (that is, \(\vert h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i}\vert \)) is penalized by a convex function \(f_{s}(h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i})\). In addition, we can enforce the possible ranges of the shape change deviations with \(\vert h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i}\vert \leq \varDelta _{\mathit{pq}}^{i}\) (namely, a hard shape constraint), where \(\varDelta _{\mathit{pq}}^{i}\) is a given shape change deviation parameter between columns p and q for surface S i .

For a set of surfaces, the context prior is enforced by penalizing the surface distance changes between two adjacent surfaces. Let S i and S j be two adjacent surfaces in the set of λ target surfaces, and without loss of generality, assume that S i is on the top of S j . The surface distance between S i and S j on column p is defined as \(d_{p}^{\mathit{ij}} = S_{i}(p) - S_{j}(p)\). Denote by \(\overline{d}_{p}^{\mathit{ij}}\) the learned prior surface distance model. As for the shape prior constraint, the deviation of the surface distances from the learned model (i.e., \(d_{p}^{\mathit{ij}} -\overline{d}_{p}^{\mathit{ij}}\)) is penalized by a convex function \(f_{c}(d_{p}^{\mathit{ij}} -\overline{d}_{p}^{\mathit{ij}})\). We may also enforce the possible range of the surface distance deviation on each column p, for instance, \(\vert d_{p}^{\mathit{ij}} -\overline{d}_{p}^{\mathit{ij}}\vert \leq \delta _{p}^{\mathit{ij}}\), which is called a hard context constraint.

Two additional terms are added into the energy function \(\mathcal{E}(\mathcal{S})\) (see Eq. (16)) to incorporate the shape and context priors, with

$$\displaystyle\begin{array}{rcl} \mathcal{E}(\mathcal{S})& =& \sum _{i=1}^{\lambda }\sum _{ \mathcal{I}(x,y,z)\in S_{i}}b_{i}(x,y,z) +\sum _{ i=0}^{\lambda }\sum _{ \mathcal{I}(x,y,z)\in R_{i}}c_{i}(x,y,z) + \\ & & \underbrace{\mathop{\sum _{i=1}^{\lambda }\sum _{(p,q)\in \mathcal{N}_{s}}f_{s}(h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i})}}\limits _{\mbox{ shape prior penalty term}} +\underbrace{\mathop{ \sum _{p}\sum _{(i,j)\in \mathcal{N}_{c}}f_{c}(d_{p}^{\mathit{ij}} -\overline{d}_{p}^{\mathit{ij}})}}\limits _{\mbox{ context prior penalty term}},{}\end{array}$$
(15)

where \(\mathcal{N}_{s}\) denotes the pairwise neighboring relation of all columns in the image \(\mathcal{I}\) and \(\mathcal{N}_{c}\) specifies a set of interacting surface pairs for which the surface context constraint needs to be enforced. The problem is to find an optimal set \(\mathcal{S}\) of λ surfaces, such that (1) each surfaces satisfies the hard shape constraint; (2) each pair of surfaces satisfies the hard context constraint; and (3) the energy \(\mathcal{E}(\mathcal{S})\) in Eq. (15) is minimized. We call this problem an OMSD with both shape and context priors (OMSD-P).

The basic idea for solving the OMSD-P problem is to transform it to computing a so-called minimum-cost s-excess set in a directed graph. Instead of forcing no arc leaving the sought node set as in the minimum-cost closed set problem, the minimum s-excess problem [7, 25, 38] charges a penalty onto each arc leaving the set (i.e., the tail of the arc is in the set while the head is not). More precisely, given a directed graph G′ = (V ′, E′), each node v′ ∈ V ′ having an arbitrary weight w′(v′) and each edge e′ = (u′, v′) ∈ E′ having a non-negative cost c′(u′, v′), the minimum-cost s-excess problem seeks a node subset \(\mathcal{X}^{\prime}\subseteq V ^{\prime}\) such that the cost \(\gamma (\mathcal{X}^{\prime})\) of \(\mathcal{X}^{\prime}\), with \(\gamma (\mathcal{X}^{\prime}) =\sum _{v^{\prime}\in \mathcal{X}^{\prime}}w^{\prime}(v^{\prime}) +\sum _{{ (u^{\prime},v^{\prime})\in E^{\prime} \atop u^{\prime}\in \mathcal{X}^{\prime},v^{\prime}\in V^{\prime}-\mathcal{X}^{\prime}} }c^{\prime}(u^{\prime},v^{\prime})\), is minimized. Our goal is to construct a both arc- and node-weighted graph so that: (1) the feasibility of the sought surfaces is enforced with the graph structure; (2) the total on-surface and in-region cost of the surfaces is encoded as the total node-weight of the s-excess set; and (3) the total shape-prior penalty and the context-prior penalty is encoded as the total cost of the cut induced by the s-excess set. The minimum-cost s-excess problem can be solved by using a maximum flow algorithm [25].

4.5.2 Arc-Weighted Graph Construction

We construct a graph G = (V, E) consisting of λ node-disjoint subgraphs \(\{G_{i} = (V _{i},E_{i})\ \vert \ i = 1,2,\ldots,\lambda \}\) to transform the OMSD-P problem into computing a minimum-cost s-excess set in G. Note that the hard shape and context constraints in the OMSD-P problem are essentially the same as the surface smoothness and separation constraints in the OMSD problem. Each subgraph G i is constructed as in Sect. 4.4 for the OMSD problem. The hard shape constraint is enforced between any two neighboring columns in G i ; and the hard context constraint is encoded between the corresponding columns in G i and G j for any two interacting surfaces S i and S j . To translate to the minimum-cost s-excess problem, we assign a + weight to each arc added so far to the graph G. We next put additional arcs in G to incorporate both the shape prior penalty term and the context prior penalty term.

Let p(x, y) and q(x′, y′) denote any two neighboring columns in \(\mathcal{I}\). For the shape prior penalty term, we want to “distribute” the shape prior penalty \(f_{s}(h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i})\) to arcs between the two adjacent columns Col i (p) and Col i (q) in G i . Two intertwined questions need to be answered: (1) how to put arcs between the two columns; and (2) how to assign a non-negative cost to each arc (negative arc weights make the minimum-cost s-excess problem computationally intractable)? Fortunately, the convexity of f s (⋅ ) can be used to resolve the problems. Define the (discrete equivalent of) second derivative of f s (⋅ ) as

$$\displaystyle{[f_{s}(\kappa )]^{\prime\prime} = [f_{s}(\kappa +1) - f_{s}(\kappa )] - [f_{s}(\kappa ) - f_{s}(\kappa -1)].}$$

The first derivative of f s (⋅ ), \([f_{s}(\kappa )]^{\prime} = f_{s}(\kappa +1) - f_{s}(\kappa )\), will be used to guide the introduction of new arcs between Col i (p) and Col i (q). Consider each \(\kappa = h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i}\) with \(-\varDelta _{\mathit{pq}}^{i} <\kappa <\varDelta _{ \mathit{pq}}^{i}\). We distinguish the following three cases for all possible z (0 ≤ z < Z). Note that we do not show the boundary conditions to avoid cluttering the exposition of the key ideas.

  • If [f s (κ)]′ ≥ 0, an arc with a weight of [f s (κ)]″ is added from v i (x, y, z) to \(v_{i}(x^{\prime},y^{\prime},z -\overline{h}_{\mathit{pq}}^{i}-\kappa )\).

  • If [f s (κ)]′ ≤ 0, an arc with a weight of [f s (κ)]″ is added from v i (x′, y′, z) to \(v_{i}(x,y,z + \overline{h}_{\mathit{pq}}^{i}+\kappa )\).

  • Assume that f s (κ) has its minimum at κ 0. We put in an arc from v i (x, y, z) to \(v_{i}(x^{\prime},y^{\prime},z -\overline{h}_{\mathit{pq}}^{i} -\kappa _{0})\) whose weight is \([f_{s}(\kappa _{0}^{+})]^{\prime\prime}:= f_{s}(\kappa _{0} + 1) - f_{s}(\kappa _{0})\). In addition, an arc with a weight of \([f_{s}(\kappa _{0}^{-})]^{\prime\prime}:= f_{s}(\kappa _{0} - 1) - f_{s}(\kappa _{0})\) is introduced from v i (x′, y′, z) to \(v_{i}(x,y,z + \overline{h}_{\mathit{pq}}^{i} +\kappa _{0})\).

Figure 20 shows an example graph construction for incorporating the shape prior penalty term.

The context prior penalty term is enforced in a similar way by introducing weighted arcs between corresponding subgraphs. Suppose S i and S j are two interacting surfaces. For each column p(x, y) in \(\mathcal{I}\), by making use of the (discrete equivalent of) second derivative of f c (⋅ ), the context prior penalty \(f_{c}(d_{p}^{\mathit{ij}} -\overline{d}_{p}^{\mathit{ij}})\) is distributed to the arcs between the corresponding columns Col i (p) and Col j (p) in the subgraphs G i and G j , respectively. An example construction is shown in Fig. 21.

Fig. 20
figure 20

Arc-weighted graph construction for the incorporation of the shape prior penalty on surface S i between neighboring columns p and q [37]. The intra-column arcs are shown in orange with + weight. The hard shape constraint \(\vert h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i}\vert \leq \varDelta _{\mathit{pq}}^{i}\) is enforced by green arcs. Here we suppose \(\overline{h}_{\mathit{pq}}^{i} = 0\) and \(\varDelta _{\mathit{pq}}^{i} = 2\). The shape prior penalty is incorporated by arcs with dashed lines (brown, purple, yellow, and gray). Here we assume that [f s (0)]′ = 0 and f s (0) = 0. The target surface S i cuts arcs with weight [f s (1)]″ (brown) and \([f_{s}({0}^{+})]^{\prime\prime} = f_{s}(1) - f_{s}(0)\) (yellow). The total weight is equal to f s (2)

Fig. 21
figure 21

Arc-weighted graph construction for the incorporation of the context prior constraints between subgraphs G i (red) and G j (blue) on column p [37]. The hard context constraint \(\vert d_{p}^{\mathit{ij}} -\overline{d}_{p}^{\mathit{ij}}\vert \leq \delta _{p}^{\mathit{ij}}\) is incorporated by green arcs. Here \(\overline{d}_{p}^{\mathit{ij}} = 1\), \(\delta _{p}^{\mathit{ij}} = 1\). The context-prior penalty is enforced by gray and purple arcs. Assume that [f c (0)]′ = 0 and f c (0) = 0. The pseudo-surface S (connecting S i and S j ) cuts arcs with weight [f c (0+)]″ (gray). The total weight is equal to f c (1)

4.5.3 Computing a Minimum-Cost s-Excess Set

From the construction of the graph G and the argument for the OMSD problem in Sect. 4.4, to show that a minimum-cost s-excess set can be used to define an optimal set of λ surfaces for the OMSD-P problem, we need to demonstrate that: (1) the total weight of the arcs cut by a feasible surface S i between two neighboring columns Col i (p) and Col i (q) equals the shape prior penalty \(f_{s}(h_{\mathit{pq}}^{i} -\overline{h}_{\mathit{pq}}^{i})\); and (2) the total weight of the arcs cut by the pseudo-surface of two interacting surfaces S i and S j (connecting the node \(v_{i}(p) \in \mathrm{Col}_{i}(p)\) on the surface S i and the corresponding node \(v_{j}(p) \in \mathrm{Col}_{j}(p)\) on the surface S j for all the columns p) equals the context prior penalty \(f_{c}(d_{\mathit{pq}}^{i} -\overline{d}_{\mathit{pq}}^{i})\). The reader is referred to [7] for the detailed proof.

A minimum s-excess set \({\mathcal{X}}^{{\ast}}\) in G can then be computed by using a minimum st cut algorithm [7, 25]. If the minimum s-excess set in G thus obtained is empty, we can first apply on G the same transformation as in Sect. 4.4 to compute a minimum nonempty s-excess set in G.

4.5.4 Surface Recovery

The minimum s-excess set \({\mathcal{X}}^{{\ast}}\) can be used to specify an optimal set of λ surfaces as in Sect. 4.4.

4.6 Layered Graph Search for Segmentation of Objects with Complex Shapes

This section generalizes the OSD framework for solving the problem of detecting boundary surfaces for objects with complex topologies. Several key issues need to be resolved for the success of this generalization: (i) how to obtain relevant information about the target object boundaries; (ii) how to capture such information by a graph; and (iii) how to search the graph for the optimal surfaces of the target objects. The following five steps constitute the general strategy to address these three key issues.

Step 1: Pre-segmentation.:

Image segmentation frequently starts with the localization of the object of interest (to distinguish the object from other objects in the image). Given an input 3D/4D image, a pre-segmentation obtains an approximation to the (unknown) surfaces for the target object boundaries. Such approximate surface detection methods are generally available, e.g., using parametric deformable models (cf. [3942]), geometric deformable models (cf. [4345]), and other similar approaches. The pre-segmentation gives useful information about the topological structures of the target objects.

Step 2: Mesh Generation.:

From the resulting approximate surfaces, a mesh is constructed. The mesh is used to specify the structure of a graph G B, called base graph. G B defines the neighboring relations among voxels on the sought (optimal) surfaces. Trivial surface geometries (e.g., terrain-like, tubular, or spherical surfaces) may not need a pre-segmentation and allow a direct definition of a mesh.

Fig. 22
figure 22

Illustrating image resampling with column interferences [46]. Interferences caused by inappropriate column lengths introduce disordered structures in the constructed graph

Step 3: Image Resampling.:

For each voxel v on the sought surfaces, a vector of voxels is created that is expected to contain v. This is done by resampling the input image along a ray intersecting every vertex u of the mesh (one ray per mesh vertex). These voxel vectors produced by the resampling form a new image. A big challenge for image resampling is how to avoid column interference (see Fig. 22). A few methods based on medial axes [46, 47] and the electric lines of force (ELF) [11] have been developed for resolving this problem. Steps 1–3 were developed for solving the issue (i) from the (i)–(iii) list presented at the beginning of this section.

Step 4: Graph Construction.:

A weighted directed graph G is built using the vectors of voxels (columns) in the resampled image. Each column corresponds to a list of nodes in G. G is a geometric graph since it is naturally embedded in a d-D space (d ≥ 3). The neighboring relations among voxels on the sought surfaces are represented by the adjacency relations among the columns of G, as specified by the edges in the base graph G B. Each column contains exactly one voxel on the sought surface. The edges of G enforce constraints on the sought surfaces, such as smoothness and inter-surface separation. The node costs of G can encode edge-based and region-based cost functions. This step solves the issue (ii).

Step 5: Graph Search.:

The OSD scheme ensures that the sought optimal surfaces correspond to a structure of interest in the weighted directed graph G (as proven in [79]). The sought optimal surfaces are obtained by searching for a minimum-cost closed set in G. This step solves the issue (iii).

4.7 Example Applications and Extensions

4.7.1 Segmentation of Retinal Layers in Optical Coherence Tomography Volumes

Ophthalmology has recently witnessed a transformation with the relatively recent (since 2007) commercial availability of volumetric spectral-domain optical coherence tomography (SD-OCT) images of the eye. The retinal layers are one important structure within SD-OCT images of the back of the eye as these layers often change in the presence of blinding diseases such as glaucoma, diabetic retinopathy, and age-related macular degeneration. For example, the retinal nerve fiber layer and ganglion cell layer are known to thin in glaucoma. The need for multiple layered surfaces within SD-OCT volumes makes the graph search approach discussed in Sect. 4.4 an excellent choice for their segmentation [10]. For example, Fig. 23 illustrates the use of the graph search approach (with both on-surface and in-region cost terms) for the simultaneous segmentation of seven major surfaces within an SD-OCT slice of the macular region. While the segmentation result of Fig. 23 was obtained through the simultaneous segmentation of all seven surfaces using the graph search approach, in practice, the surface segmentation often occurs in groups with the “easier” bounding surfaces being simultaneously segmented first (e.g., surfaces f 1, f 6, and f 7 from Fig. 23) followed by the simultaneous segmentation of the more difficult interior surfaces [10, 48]. In addition, a multi-resolution approach is often used for efficiency purposes [49, 50] with up to 11 surfaces commonly being segmented, depending on the ophthalmic application. The cost functions for use in the segmentation of the retinal layers have been both designed primarily “by hand” [10, 51] (e.g., specifying an edge-based filter for bright-to-dark and dark-to-bright on-surface edge-based terms) in possible combination with learning the parameters for specifying the relative importance between in-region and on-surface cost terms as in [10] and/or using a machine-learning approach for a more complete automated design on the cost function terms [52].

Fig. 23
figure 23

Example simultaneous segmentation of seven layered surfaces from SD-OCT slice using the multisurface graph-theoretic approach [7, 9, 10]. Given the input image, seven on-surface cost images and eight in-region cost images are first automatically generated. Given these cost images, along with feasibility constraints, the graph-based approach finds the optimal set of surfaces. While 2D images are shown for illustrative purposes, in practice the cost functions and generated and layer segmentations are obtained in 3D

In specifying the feasibility constraints for use for the simultaneous segmentation of the retinal layers, instead of specifying global smoothness and surface-interaction constraints (as described in the early versions of the graph search approach), one can specify varying constraints such that each location (i.e., each pair of neighboring columns for smoothness constants and each column for surface-interaction constraints) has its own (potentially different) values for the constraint [10]. The particular values of these location-specific constraints can be learned from a training set [10] to enable, for example, greater changes in surface position to be allowed near the expected location of the fovea of the macula and the cup of the optic nerve head. Soft smoothness constraints can also be imposed through the addition of arc-weighted edges [37] as discussed in Sect. 4.5.

4.7.2 Segmentation of Prostate and Bladder in Computed Tomography Scans

This section shows how to apply the OSD framework for the segmentation of prostate and bladder in CT images [5355]. The segmentation of pelvic structure is particularly difficulty. It involves soft tissues that present a large variability in shape and size. Those soft tissues also have similar intensity and have mutual influence in position and shape. To overcome these difficulties, multiple constraints (i.e., the shape prior and the context prior) are incorporated into the segmentation process using the method in Sect. 4.6. The workflow is shown in Fig. 24.

Fig. 24
figure 24

Workflow for simultaneous segmentation of the bladder and the prostate

Initial Model. First, a pre-segmentation step is performed to construct an initial model for each target object, which contains the basic topological information. A 3D geodesic active contour method [56] is conducted for pre-segmentation of the bladder. Three user-defined points are required as an initial input. The prostate shows a much better coherency in shape than the bladder. Hence the mean shape of the prostate is computed from the training set of manual contours. Then an approximate bounding box of interest for the prostate is interactively defined and the obtained mean shape is roughly fitted into the never-before seen CT images using rigid transformations as the initial model of the prostate. Note that the model only serves to provide basic topological structural information; thus accurate segmentation is not required at this stage. Overlapping between the models of the bladder and the prostate is also allowed, which can be resolved in the graph optimization step. From the pre-segmentation results, two triangulated meshes \(M_{1}(V _{1},E_{1})\) and \(M_{2}(V _{2},E_{2})\) are constructed respectively using an isosurfacing algorithm (Fig. 25), where V i (i ∈ 1, 2) denotes the vertex set of M i and E i denotes the edge set of M i .

Fig. 25
figure 25

(a) Triangulated meshes for the bladder (yellow) and the prostate (blue) based on initial models. (b) Corresponding graph construction. An example 2D slice is presented. p(v) represents the column with respect to the vertex v on the mesh. Dots represent nodes \(n_{i} \in G_{i}\). Two subgraphs G 1 and G 2 are constructed for the segmentation of the bladder and the prostate, respectively. Note that in the interacting region (dashed box), for each column \(p(v_{1}) \in G_{1}\), there exists a corresponding column \(p(v_{2}) \in G_{2}\) with the same position. The inter-surface arcs (purple) between corresponding columns enforce the surface context constraints in the interacting region

Graph Construction. The weighted graph G i (N, A) is built from the triangulated mesh M i as follows. For each vertex v ∈ V i , a column of K nodes n(v, k) is created in G i , denoted by p(v) (Fig. 25b). The positions of nodes reflect the positions of corresponding voxels in the image domain. The length of the column is set according to the required search range. The number of nodes K on each column is determined by the required resolution. The direction of the column is set as the triangle normal. The nodes on the same column are connected by the intra-column arc from n(v, k) to n(v, k − 1) with an infinity weight, as described in Sect. 4.3. The intra-column arcs make sure that the surface containing exactly one node in each column.

Each column also has a set of neighbors, i.e., if (v, u) ∈ E i , then p(v) and p(u) are neighboring columns. The shape prior penalty serves to keep the topology of the original shape model. Specifically, for any pair of neighboring columns p and q, the shape change is defined by \(h_{\mathit{pq}}^{i} = S_{i}(p) - S_{i}(q)\). Note that the graph is constructed based on the mesh of the initial shape model. Thus the original shape prior \(\overline{h}_{\mathit{pq}}^{i}\) is equal to 0. The shape prior penalty of surface S i is set as \(f_{s}(h_{\mathit{pq}}^{i})\), where f s (⋅ ) is a convex function penalizing the shape changes of S i on p and q. For bladder and prostate segmentation, the penalty function is set with the form: \(f_{s}(h_{\mathit{pq}}^{i}) =\beta \cdot {(h_{\mathit{pq}}^{i})}^{2}\), where β is a parameter learned from the training data. The shape prior is enforced by introducing additional arcs as in Sect. 4.5.

To avoid the overlapping of two target surfaces, a “partially interacting area” is defined according to the distance between two meshes, which indicates that the two target surfaces may mutually interact with each other at that area. To model the context relation, for each column \(p(v_{1}) \in G_{1}\) in the partially interacting area, there exists a corresponding column \(p(v_{2}) \in G_{2}\) with the same position in \(\mathcal{I}\), and the target surfaces S 1 and S 2 both cut those columns, as shown in Fig. 25b. For implementation, a one-to-one correspondence between two surface meshes needs to be computed on the partially interacting region. We project the pre-segmented prostate surface mesh on the interacting region to the mesh of the pre-segmented bladder boundary surface. Then we use the projected mesh patch to replace the original bladder surface mesh on the interacting region. Thus, a one-to-one mesh correspondence on the interacting region is established since the two new meshes on that area have exactly the same topological structure. In this way, two graphs share the same nodes’ positions in the partially interacting area. The non-overlapping constraint is enforced in the area as the hard context prior constraint by adding arcs between corresponding columns using the approach proposed in Sect. 4.5. No soft context prior penalties are introduced in our energy.

The optimal set \(\mathcal{S}\) of two surfaces corresponding to the bladder and the prostate can then be found by minimizing the following energy through the constructed graph:

$$\displaystyle{ \mathcal{E}(\mathcal{S}) =\sum _{ i=1}^{2}\mathcal{E}_{\mathrm{ boundary}}(S_{i}) +\sum _{ i=0}^{2}\mathcal{E}_{\mathrm{ region}}(R_{i}) +\sum _{ i=1}^{2}\mathcal{E}_{\mathrm{ shape}}(S_{i}) }$$
(16)

The boundary energy term serves as an external force, which drives the mesh towards the best fit to the image data. The shape energy term functions as an internal force, which keeps the shape of the original model and restricts the flexibility of the mesh. To incorporate the learned regional information, an additional region energy term is added in our energy function. Specifically, two surfaces for the bladder and the prostate naturally divide the volume into three regions denoted by R 0, R 1, and R 2, which corresponds to the region enclosed by the bladder surface S 1, one between S 1 and the prostate surface S 2 at the partially interacting area, and the region enclosed by S 2, respectively. Our region energy term \(E_{\mathrm{region}}(R_{i})\) reflects the region property of all voxels inside R i . An example cost function design for this application can be referred to [55].

Fig. 26
figure 26

Typical slices of the simultaneous segmentation result of the bladder (yellow) and the prostate (blue) in 3D CT images. (a and b) Transverse view. (c) Sagittal view. (d) Coronal view. (e) 3D representation of the segmentation result

The illustrative results in three views are displayed in Fig. 26a–d. The 3D representation is shown in Fig. 26e. From all views, the proposed algorithm produces a very good delineation of both the bladder and the prostate in the 3D space. The shape prior constraints succeed to keep the original topological structure of the target organs. No overlapping of the bladder and the prostate is found due to the enforcement of the context constraints.

4.7.3 Robust Delineation of Pulmonary Tumors Using Surface-Region Context

This section presents a segmentation method that integrates the graph cut method with the OSD method for segmenting the target objects of an arbitrary shape mutually interacting with terrain-like surfaces [57]. This scenario widely exists in the medical imaging field. For instance, lung tumors may be in touch with lung parenchyma or close to the diaphragm (Fig. 27a), and fluid and fluid-associated abnormalities may appear in between the retinal layers (Fig. 27b). The delineation of such a target object could be very challenging due to the similar intensity profiles of the surrounding tissues. However, in that setting, the boundary surfaces of the adjacent structures of the target object can serve as valuable prior information to help separate the target object from those structures. It is expected to be promising to simultaneously segment those boundary surfaces together with the target object. The computational feasibility is achieved by integrating the graph cut method in Sect. 3 and the OSD method in Sect. 4. The integration of those two methods into one single optimization process, yet still admitting globally optimal solutions, will certainly become a more powerful tool for medical image analysis.

Fig. 27
figure 27

Examples of surface and region interactions [57]. (a) Example slices of a lung tumor (red) in megavoltage cone-beam CT. The tumor is attached/adjacent to the diaphragm (green). (b) Example slices of a fluid-filled region (green) in retinal optical coherence tomography (OCT). The fluid region is surrounded by intraretinal layers

We illustrate the method using lung tumor segmentation on Mega-Voltage Cone Beam CT (MVCBCT) images. WLOG, assume that the CBCT image \(\mathcal{I}\) is oriented such that the target surface S intersects each column Col(p) of p(x, y) in \(\mathcal{I}\), and the target tumor is below the surface (Fig. 28). The same principles used for this illustration are directly applicable to multiple pairs of surfaces and regions with interactions between those two types of targets. The method can also handle the case when the tumor region is above the target surface [57].

Fig. 28
figure 28

A sketched example of a 2D slice from a 3D image [57]. The terrain-like surface S is shown in blue and the tumor region R of arbitrary shape is shown in brown. Col(p) and Col(q) indicate two neighboring columns in the image, and S(p) = 2 and S(q) = 3

The neighborhood setting of those columns is specified by \(\mathcal{N}_{s}\). Let S(p) denote the height (z-coordinate) of the surface S on the column Col(p). The OSD method is adopted to compute the surface S. Each voxel \(v(x,y,z) \in \mathcal{I}\) is associated with an on-surface cost b(x, y, z), which is inversely related to the likelihood that the desired surface S indeed contains the voxel. We can also incorporate the shape prior of the target surface S as in Sect. 4.5. The optimal surface S can be computed by minimizing the following energy function:

$$\displaystyle{ \mathcal{E}_{s}(S) =\sum _{v(x,y,z)\in S}b(x,y,z) +\sum _{(p,q)\in \mathcal{N}_{s}}h_{\mathit{pq}}(S(p) - S(q)), }$$
(17)

where h pq (⋅ ) is a convex function.

The graph cut method in Sect. 3 is employed to segment the target tumor region R. Let f v denote the label of each voxel \(v \in \mathcal{I}\): f v  = 1 means that v belongs to the target tumor region R and f v  = 0 means that v is in the background. The MRF energy \(\mathcal{E}_{c}\) for segmenting the tumor region R can be expressed as follows:

$$\displaystyle{ \mathcal{E}_{c}(f) =\sum _{v\in \mathcal{I}}\mathcal{D}_{v}(f_{v}) +\sum _{(v_{i},v_{j})\in \mathcal{N}_{c}}\mathcal{V}_{i,j}(f_{v_{i}},f_{v_{j}}), }$$
(18)

where \(\mathcal{N}_{c}\) defines the neighboring setting between voxels, and \(\mathcal{D}_{v}(\cdot )\) and \(\mathcal{V}_{i,j}(\cdot,\cdots \,)\) are the data term and the boundary term in the graph cut method (Sect. 3), respectively.

Fig. 29
figure 29

The surface-region interaction constraint in which the region R tends to be positioned “lower” than the terrain-like surface S [57]. For voxels v ∈ R and S(p) − z(v) < d with v ∈ Col(p) (yellow voxels), a penalty γ v is enforced; d is set as 1

The geometric interaction relation between the boundary surface S and the target tumor region R is to enforce that R is at least d > 0 voxels “below” S. Denote by γ v the penalty of a tumor voxel v ∈ R that violates the interaction constraint. We introduce a surface-region interaction penalty term \(\mathcal{E}_{\mathrm{surf-tumor}}\), which is the total penalty of those tumor voxels that violates the interaction constraint (Fig. 29). More specifically, let z(v) denote the height (z-coordinate) of voxel \(v \in \mathcal{I}\), and v is on the column Col(p) of p(x, y) (that is, the x- and y-coordinates of v is x and y, respectively). Then, if v ∈ R and S(p) − z(v) < d, a penalty γ v is enforced. Hence,

$$\displaystyle{ \mathcal{E}_{\mathrm{surf-tumor}}(S,f) =\sum _{v\in \mathcal{I}}\sum _{{ v\in \mathrm{Col}(p) \atop S(p)-z(v)<d} }\gamma _{v} \cdot f_{v}. }$$
(19)

To enforce the boundary surface prior, we co-segment the tumor region as well as the boundary surface by minimizing the energy function \(\mathcal{E}(S,f)\), with

$$\displaystyle{ \mathcal{E}(S,f) = \mathcal{E}_{s}(S) + \mathcal{E}_{c}(f) + \mathcal{E}_{\mathrm{surf-tumor}}(S,f). }$$
(20)

To optimize the energy function \(\mathcal{E}(S,f)\), two subgraphs, \(G_{c} = (N_{c},A_{c})\) and \(G_{s} = (N_{s},A_{s})\), are constructed to encode the terms \(\mathcal{E}_{c}(f)\) and \(\mathcal{E}_{s}(S)\), respectively, using the approaches in Sects. 3 and 4.5. For each voxel \(v(x,y,z) \in \mathcal{I}\), denote by n c (x, y, z) (n s (x, y, z)) the node in G c (G s ) corresponding to v. The surface-region penalty term \(\mathcal{E}_{\mathrm{surf-tumor}}\) is incorporated by adding directed arcs between the corresponding nodes of the two subgraphs with a weight of the penalty for violating the interaction constraint. More specifically, for each voxel v(x, y, z), a directed arc from n c (x, y, z) to n s (x, y, z + d), whose weight is γ v , is added between the two subgraphs G c and G s (Fig. 30). Note that we do not consider the boundary conditions here to avoid cluttering the exposition of the key ideas.

Once the graph is constructed, a globally optimal solution can be found by solving a single maximum flow problem, which minimizes the total energy \(\mathcal{E}(S,f)\) in a low-order polynomial time [57].

Fig. 30
figure 30

The introduced arcs in the constructed graph to enforce the surface-region interaction constraint. The node column Col s (p) (Col c (p)) in the subgraph G s (G c ) corresponds to the column Col(p) of voxels in the input image. The green arcs introduced to enforce the interaction constraint. The two arcs with a red cross indicate the penalties for the present surface violating the interaction constraint

Fig. 31
figure 31

A typical tumor segmentation result [57]. (a) One 2D slice of 3D MVCBCT image with outlines of spherical initialization. (b) Manual segmentation of the lung tumor—independent standard. (c) Simultaneous region-and-surface segmentation of the diaphragm (green) and the lung tumor (blue) using the reported approach showing excellent segmentation performance—the Dice similarity coefficient (DSC) is 0. 878. (d) The 3D representation of the diaphragm (green) and the tumor (blue)

Fig. 32
figure 32

Performance comparison on a difficult case. (a) Independent standard obtained by manual segmentation and shown in one 2D slice of the 3D volume. (b) Tumor segmentation failure resulting from the conventional graph cut method—DSC = 0. 70. (c) Tumor segmentation obtained using the method that simultaneously segments the tumor (blue) and the lung boundary surface (shown in yellow and green in two orthogonal directions)—DSC = 0. 84

Figure 31 shows a typical tumor segmentation result on a pulmonary MVCBCT dataset. Figure 32 demonstrates a segmentation result on a difficult case, in which the tumor is closely adjacent to the lung boundary surface from two directions.