Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Image segmentation and registration are key tasks in image-guided therapy. End-to-end systems for image-guided therapy in use today perform segmentation, registration, as well as navigation and visualization. Segmentation involves identifying meaningful regions and structures within an image, such as normal anatomical tissue, pathology, or resection, for the purpose of planning, guiding, and measuring the outcome of a therapeutic procedure. Registration focuses on identifying a spatial mapping between two images of the same underlying tissue or patient, acquired from different imaging modalities or at different time points, fusing complementary information sources for planning and intra-procedural guidance. Intra-procedural navigation allows the movement of the patient and instruments during the procedure to be shown on the images, and the visualization updates the enhanced reality display to be consistent with the view of the patient that is visible to the physician. State-of-the-art image-guided therapy systems provide functionality to perform semiautomatic segmentation, a rigid registration with six degrees of freedom (and are in the early stages of providing limited nonrigid registration methods) to align the pre-procedural and intra-procedural imagery, and use of-the-shelf tracking hardware that uses either optical or electromagnetic sensors to track the motion of the patient during the intervention. XPlan [1] was one of the earliest image-guided surgery systems developed for research at Brigham and Women’s Hospital and MIT. It used semiautomatic segmentation of pathology and normal tissue, a rigid registration technique, and tracked patient motion using an optical tracker. The enhanced reality visualizations shown in Fig. 5.1 were generated using XPlan. In the 15 years since the development of XPlan, research in segmentation, registration, navigation, and visualization methods for image-guided surgery has continued in academic as well as industry laboratories. Several commercial image-guided surgery systems (including Brainlab, Medtronic StealthStation, Philips Traxtal, Veran) provide semiautomated segmentation solutions along with registration and navigation functionality.

Fig. 5.1
figure 00051

Three-dimensional visualization used by neurosurgeons to examine the relationship of pathology to familiar landmarks. These visualizations are used for preoperative surgical planning as well as intraoperative guidance and verification [1]

The focus of this chapter is on segmentation and registration in image-guided therapy. In practice, the tasks of registration and segmentation are complementary and are inextricably linked. Registration is facilitated by segmented images and is not possible without basic segmentation in the form of pixel labeling or image intensities. Likewise, segmentation is facilitated by registration to other images, e.g., an atlas with expert segmentation labels [2], in which complementary information can be used to drive segmentation. This chapter aims to provide an overview of segmentation and registration individually, in the context of image-guided therapy, including fundamental methods and validation techniques. Joint investigations of segmentation and registration are not explicitly discussed; however, the interested reader is referred to the literature [3].

Registration

Image-guided therapy involves acquiring images of a patient using different modalities at different points in time, from pre-procedural diagnosis and planning, to intra-procedural guidance and visualization, and finally to post-procedural assessment. The goal of registration is to align different images of the same underlying tissue or patient into a common reference frame, for the purpose of visualization and navigational guidance over the course of therapy.

Registration can be accomplished by a variety of means. Manual registration can be performed, where a human manipulates images via a software interface. Registration can be performed visually; for instance, in the context of neurosurgery, the surgeon can mentally fuse intraoperatively acquired ultrasound (US) with a magnetic resonance (MR) volume acquired preoperatively and used in surgical planning, in order to carry out the procedure. Tracking devices external to the image can be used, again in the context of neurosurgery; the 3D position and orientation of the US probe can be determined by a stereo camera system in the context of a neuronavigation system and related to the preoperative MR via a 3D rigid transform for display.

Automatic image registration involves the use of computational algorithms to establish image alignment based on image intensity data. A large body of research literature has been dedicated to the study of automatic image registration. This section aims to highlight general considerations and techniques that apply to the context of image-guided therapy and to describe several concrete examples.

General Approach

Automatic registration aims to align images into a common coordinate system or reference frame. Here we assume that there are two images to be registered, where one image is chosen as a fixed image or model I and the other is considered a moving image J, and the goal of registration is to identify a spatial transform T mapping locations in J to I. Registration is driven by a measure of image similarity that quantifies the degree to which image intensities in I and J agree, given a mapping T. In general, T is non-unique in the sense that there may be many mappings that are equivalent in terms of image similarity. Furthermore, registration must typically process large amounts of image data, and it is generally intractable to consider all possible mappings between images. Additional constraints are thus placed on permissible mappings T, in order to bias registration towards solutions that agree with prior expectations and to reduce computational complexity.

Transformation Model

The transform model determines the space of permissible spatial mappings T between I and J and should be chosen to reflect the true underlying mapping as closely as possible. The general context of image-guided therapy typically involves intra-subject registration, e.g., where the images to be aligned are acquired from the same physical tissue, and T can thus be designed to follow a physically justified model mapping tissues from one image to the next. Furthermore, in a number of contexts, tissue may be present in one image but absent in another, for instance, due to surgical resection and the introduction of instruments. In such contexts it is important to model the event that a valid mapping may not exist in certain regions of the image.

In the case where little shape change has occurred between image acquisitions, a low-parameter linear transform model, e.g., a rigid transform, is sufficient, for example, images of the head prior to major resection in an image-guided neurosurgical context. In general, however, soft tissues tend to deform over the course of a therapeutic procedure, and nonlinear transform models may be required. A biomechanical tissue model can be used, that takes into account the physical properties of the underlying tissue. For example, the image can be represented by a finite element model (FEM) consisting of a mesh where interactions of mesh nodes are governed by physical properties such as elasticity [4, 5]. The FEM can be modified to account for physical phenomena such as gravity and resection [6]. Tissue properties may be difficult to specify in a biomechanical model, and various classes of alternative nonlinear transforms can be used, for example, deformation fields governed by splines [7] or local translations [8] (Fig. 5.2).

Fig. 5.2
figure 00052

Preoperative (left) and intraoperative (right) MR images of a brain with a tumor. The transform model might include global rotation, nonlinear deformation in the neighborhood of the tumor, and occlusion in the area of resection

Patient Motion: Tissue deformation caused by patient motion is a confounding factor. Motion due to respiration is important in abdomen and thoracic procedures; this can be reduced via breath holding or accounted for by maintaining a temporal model accounting for tissue movement over the breathing cycle [9]. Cardiac motion during can be accounted for by acquiring and registering pre- and intra-procedural images at the same point in the cardiac cycle [10].

2D-3D: Registration may be important in cases where image intensities are sampled in different image dimensions. Modalities such as computed tomography (CT) and MR sample volumetric data in three spatial dimensions; however, other technologies such as X-ray, fluoroscopy, or B-mode US sample in two spatial dimensions. To register images of different spatial dimensionality, the transform must account for the image formation model, for instance, projection and attenuation in X-ray or fluoroscopy [1]. In the case of a 2D projection based imaging modality such as X-ray, registration typically attempts to determine a rigid transform, for example in case of registration between 2D intra-procedural fluoroscopic and pre-procedural 3D CT cardiac images [10], 2D pre-procedural portal and pre-procedural 3D CT cardiac for patient positioning in the context of radiation therapy [11]. In the case of 2D US data, the image represents a slice through a 3D volume, and a navigation system is typically used to track the position of the US probe and slice plane relative to the 3D volume. Registration can then be accomplished by either aligning 2D slice data directly with the 3D volume [12], or by creating a 3D US volumetric image from US slices and performing 3D-3D registration [13].

Occlusion: Registration in the context of image-guided therapy must often consider the scenario where tissue or objects may be occluded, or not present in all images being registered. For example, tissue may not be present in all images due to resection, may not be recognizable or visible due to properties of different image modalities, or the images may contain instruments.

The issue of occlusion scenario raises a challenge for registration, in that there may be regions in which a valid image-to-image mapping may not exist. Registration techniques may be unaffected by minor occlusions and produce reasonable results [14]; however, in significant occlusion, it is important to address the issue directly. An occlusion model can be explicitly incorporated, delineating image regions of anomalous image intensity where T may be undefined over the course of registration [15]. Registration can focus on identifying local image regions where correspondence is most probable, local block matching strategies register local regions independently throughout the image, and then identifying a transform supported by the majority of regions [8]. Informative image regions can be identified by saliency operators and used to improve registration [12].

In the case of therapeutic planning, prior information may be available as to how the images may be affected during the course of the procedure, for instance, regions in which resection may occur and the location and appearance instruments. Such information can be incorporated into the registration process in order to more effectively model occlusion, for instance, resection and retraction of brain tissue in the case of neurosurgery [6, 16].

Image Similarity

Registration is driven by a measure of image similarity, given a transform T. For effective image similarity measurement, the images being registered must exhibit contrast from homologous structures. Similarity may be computed between image intensity data or between measurements or features derived from intensity data such as hierarchical features [17], probabilistic intensity class labels [12], combined intensity, and gradient information [18]. In data defined purely by geometrical features, e.g., points or surfaces derived from anatomical structure in images, individual features bear no distinguishing information. Similarity is thus evaluated based on the geometry of feature sets, e.g., the sum of nearest-neighbor distances (i.e., the Procrustes measure) [19].

In the general case where data consists of intensity measurements, similarity measures can be generally understood by the assumptions made regarding the joint relationship between data [20]. Measures such as the sum of squared differences (SSD) or correlation assume a linear relationship between data and are useful and computationally efficient in the case of intra-modality registration. The correlation ratio [21] assumes a functional relationship between data and can be used where data exhibit nonlinear contrast differences. The mutual information (MI) [22] and normalized mutual information (NMI) [23] assume only a statistical relationship between data and can be used in the general case of multimodal image alignment. MI is powerful but computationally intensive, and in general the joint relationship between image data can be learned from training images [24], after which point similarity can be computed via maximum likelihood methods. Figure 5.3 illustrates the joint relationship between corresponding intensity measurements in proton density (PD) and MR images; note the nonlinear nature of the relationship between intensities.

Fig. 5.3
figure 00053

The left and center images are proton density (PD) and T1 MR images of a brain; squares indicate regions of interest (ROIs). The rightmost image shows the joint intensity histogram (right) generated from corresponding intensity samples within the ROIs

Optimization

Image registration typically involves identifying a value of T that maximizes or optimizes a fitness function f(I,J,T) including terms for image similarity and the transform model. A variety of optimization methods can be brought to bear in this respect, depending on the formulation of f(I,J,T), and generally described as global or local in nature. Global optimization identifies the transform T that results in the extremum of f(I,J,T). Such optimization is typically only feasible for transform models consisting of a small number of parameters, for instance, image translations, for which an exhaustive search can be performed over the range of possible T. Such a global search is generally computationally prohibitive, and instead local optimization methods are typically used. Local optimization operates on the principle that f(I,J,T) is a smooth function of T, and that a path can be followed starting from an initial transform to a local maximum in an iterative fashion. A number of local optimization techniques exist [25]. Gradient ascent strategies such as Newton’s method or the conjugate gradient method iteratively update T according to the gradient of f(I,J,T) until a maximum is reached. The simplex method and Powell’s method modify T without requiring gradient computation and may be useful in cases where the gradient of f(I,J,T) may be difficult to calculate. The expectation maximization (EM) algorithm [26] is used in a maximum likelihood framework, iteratively computing expected values of registration parameters, then updating their likelihoods.

Local optimization techniques are prone to converging to suboptimal local maxima, and a variety of methods have been proposed to avoid this difficulty. Simulated annealing or genetic algorithms randomly perturb the solution during a search and are helpful in avoiding suboptimal local maxima. Coarse-to-fine or multi-resolution [27] optimization strategies operate by first computing registration of large-scale image structure at low image resolution, then using these results to guide finer-scale alignment at higher image resolutions. Multi-resolution optimization is useful in both speeding up registration and avoiding suboptimal minima.

Validation

Validation of registration methods is an important consideration; here we focus validation of automatic registration in terms of accuracy, precision, and robustness. Various other validation considerations exist; Maintz et al. [28] list resource requirements, algorithm complexity, assumption verification, and suitability for clinical use. These considerations may vary from one clinical context to the next, and we refer the reader to [28] for further reading.

In the context of image-guided therapy, it is important to know the accuracy and precision with which registration is capable of correctly aligning or localizing therapeutic targets, for instance, a tumor in image-guided radiation therapy or surgery. Accuracy is typically evaluated retrospectively by comparing the mapping T obtained by registration against ground truth. Ground truth can consist of locations of fiducial points, surfaces, or regions defined manually or automatically in the images registered.

The accuracy of ground truth regions is typically evaluated by measures of overlap, for instance, the Jaccard index [29] or Dice’s coefficient [30], also used in image segmentation. The Jaccard and Dice measures are useful in that they measure registration accuracy in a manner independent of the number of image measurements or voxels. The drawback of overlap measures is that, while they quantify the ability of registration to align homogenous regions, they do not effectively quantify the accuracy of the spatial mapping. For instance, perfect overlap can generally be achieved by multiple, possibly physically implausible spatially mappings between ground truth regions.

The accuracy of ground truth can be defined in terms of image boundaries or surfaces in the image, for instance, where the goal is to correctly register tumor margins in preoperative and intraoperative surgical images [31]. The Hausdorff distance [32] measures the maximum discrepancy between two point sets and can be used to evaluate the agreement between registered boundaries. For the same non-uniqueness line of reasoning with overlap measures, boundary-based discrepancy measures do not necessarily reflect the correctness of the registration mapping T.

Validation of registration accuracy has been perhaps most studied in the context of quantifying the discrepancy between ground truth point landmarks or fiducials. Unlike region or boundary labels, fiducial points make it possible to estimate the accuracy of the spatial mapping T directly. The quantities of interest in quantifying accuracy are the fiducial localization error (FLE) and fiducial registration error (FRE) [33]. FLE is the intrinsic error in localizing of fiducial landmarks in a single image, e.g., markers or distinctive anatomical structures, and the FRE is the error in landmark alignment following registration. In certain cases fiducial landmarks associated directly with the target of interest are not available, e.g., a low-contrast tumor; in these cases the target registration error (TRE) can be estimated as a function of the FRE. The case of point-to-point rigid registration has been intensely studied [33, 34]; nonrigid registration accuracy is also quantified in terms of fiducial landmark error [35].

While registration accuracy has been addressed in a relatively large body of literature, other important validation considerations include precision and robustness, which have been less studied. Precision refers to the repeatability with which registration obtains the same results across similar conditions. Precision is difficult to calculate directly, but estimates can be obtained via sampling techniques, for instance, performing registration trials by artificially deforming intensities in one image and inspecting the covariance of results [2]. Robustness refers to the ability of registration to converge to a reasonable result despite the presence of outliers or incorrect initialization. The literature contains few measures of robustness; in general the “capture radius” is often cited [36], i.e., the range of deviation about the true mapping within which optimization can converge to a correct solution.

Segmentation

Segmentation of digital medical imagery is a labeling problem in which the goal is to assign to each voxel in an input gray-level image, a unique label that represents an anatomical structure. This labeled image is referred to as the “segmentation” of the input image or the “segmented” image. An example input grayscale image is a magnetic resonance imaging (MRI) scan of a human head in Fig. 5.4. A labeling or segmentation for one such cross-sectional image into skin, brain tissue, ventricles, and tumor, and three-dimensional visualizations created from such segmentations of a complete scan are shown in Fig. 5.5.

Fig. 5.4
figure 00054

A set of two-dimensional sagittal cross-sectional slices comprising a 3D brain MRI scan. This is the view of an MRI scan that is printed on film and mounted on a light-box and is traditionally available to clinicians to aid them in making diagnosis and in planning therapy

Fig. 5.5
figure 00055

Top row left: a 3D rendering for visualization of the white matter from an MRI scan of a normal subject. Right: a three-dimensional rendering of the brain surface of a patient, overlaid with pathology which is shown in green and red. Bottom left: a segmented MRI slice showing skin in pink, brain tissue in white, ventricles in blue, and tumor in green. Right: a three-dimensional rendering of segmented skin surface, brain tissue, major blood vessels, and tumor from a diagnostic MRI scan

Segmentation Nomenclature

Based on the nature of the inputs required and outputs generated by a segmentation method, it may be categorized into one or more of the following classes.

Binary or Multilabel: If the number of output labels in segmented images is 2 or greater than 2.

Supervised vs. Unsupervised: If a set of labeled images or “training data” is available to the segmentation method to “learn” the differences between the underlying classes, the method is considered supervised, vs an unsupervised method which deduces the characteristics of the classes without training data.

Hard vs. Soft Segmentation: A hard segmentation method is one in which the output of the segmentation is a unique label at each location in the image. A soft segmentation method is one in which the output is the probability of each class at each location in the image.

Intensity-Based vs. Model-Based Segmentation: Purely intensity-based segmentation methods, also referred to as classification methods, assign labels based solely on gray-level values in the image. Model-based methods combine the image gray levels with explicit or implicit models of the underlying structures that are to be segmented.

In sections “Graph Based Segmentation, Particle Filters, and Interactive Segmentation” we summarize approaches to segmentation that have gained momentum in the last decade and that we believe will continue to be applied to image-guided therapy (As mentioned earlier in this chapter, joint investigations of segmentation and registration are a very interesting development of the last decade as well, and we refer the interested reader to our other recent publications on that topic [3]). And in sections “Braintumor: Glioma Segmentation and Segmentation for Prostate Interventions” we then describe how image segmentation has been used specifically to guide neurosurgery and prostate interventions.

Graph-Based Segmentation

The formulation of image segmentation as a graph-partitioning problem was introduced in [37] and in our view has been an influential approach that has led to several successful developments in the last decade [3840].

In a graph representation, each pixel in the image is a node in the graph, and a subset of neighboring pixels is connected by edges. Weights on the edges measure how dissimilar the nodes are. Graph-based segmentation methods partition an image by selecting a subset of these edges from the graph. While graph-based segmentation methods were developed successfully in the last decade, the underlying theory traces back to Markov Random Fields which had previously been used as a basis for image representation and segmentation. Solving MRFs is an NP-hard problem and is typically accomplished via iterative algorithms. However, using a graph representation, globally optimal segmentations can be obtained in polynomial time by solving the maxcut/minflow problem using the Ford-Fulkerson algorithm [41]. Notable extensions in the last decade include solving the graph cut in N-dimensions [42], an iterative and interactive approach where the edges are re-weighted according user input [43], and a formulation that restricts the graph cut by using additional user-defined seed points to set up fixed nodes in the graph [44, 45].

Particle Filters

Another category of image segmentation in the field of computer vision is tracking moving and deforming objects. A typical current example is the automatic tracking of persons in a security camera video stream. In medical image processing, the lung is a moving and deforming organ that makes automatic delineation over time a challenging task. Tracking the lung is especially relevant in image-guided radiation therapy, where modeling complex tumor motion is important in order to achieve optimal dose delivery and to avoid or minimize radiation of the surrounding healthy tissue. In contrast to static images for segmentation and registration problems, tracking involves estimating the global motion of the object and its local deformations as a function of time. Particle filters – also known as sequential Monte Carlo methods – use swarms of points (so-called particles) to approximate posterior densities. This approximation is realized by assigning a weight to each particle and uses a discrete distribution of the particles which results in particle probabilities that are proportional to the particle weights [46]. Particle filters were introduced by Gordon et al. in 1993 in order to implement recursive Bayesian filters [47]. Since then, several algorithms based on this original approach have been developed that differ primarily in the way particle swarms evolve and adapt to input data [48]. For example, Rathi et al. [49] formulate a particle filtering algorithm in the geometric active contour framework that can be used for tracking moving and deforming objects. They propose a scheme which combines the advantages of particle filtering and geometric active contours realized via level set models for dynamic tracking. A probabilistic algorithm for simultaneously estimating the pose of a mobile robot and the positions of nearby people in a previously mapped environment called the conditional particle filter was introduced by Montemerlo [50]. The method tracks a large distribution of people locations conditioned upon a smaller distribution of robot poses over time.

Interactive Segmentation

The active research focus on automated segmentation methods in the last decade has revealed the need for interactive methods that can efficiently bootstrap from information provided by an expert user, for example, several mouse clicks. Vezhnevets and Konouchine [51] give an overview of methods for generic image editing and methods for editing medical images. In particular, they present an algorithm for interactive multilabel segmentation of N-dimensional images. The segmentation process is iterative and works with a small number of user-labeled pixels to automatically segment the rest of the image by a Cellular Automaton. The algorithm has been implemented in the Slicer software platform (see http://www.slicer.org/) and can be freely downloaded [52]. An interactive segmentation technique called Magic Wand [51] is a common selection tool for many current image editors. The tool gathers color statistics from the user specified image point (or region), then segments (connected) image regions with pixels whose color properties fall within some given tolerance of the gathered statistics.

Brain Tumor: Glioma Segmentation

Manual segmentation of brain tumors is an important but time-consuming process that can be overcome by new (semi-)automatic segmentation algorithms [53]. Gliomas are the most common primary brain tumors, evolving from the cerebral supportive cells. The grading system for astrocytomas according to the World Health Organization (WHO) subdivides grades I–IV, whereas grade I tumors tend to be least aggressive [54]. Seventy percent count to the group of malignant gliomas (anaplastic astrocytoma WHO grade III, glioblastoma multiforme (GBM) WHO grade IV). According to its histopathological appearance, the grade IV tumor is given the name glioblastoma multiforme. The glioblastoma multiforme is the most frequent malignant primary tumor and is one of the most malignant human neoplasms. Due to their biological behavior, surgery alone cannot cure this disease. Thus, current interdisciplinary therapeutic management combines maximum safe resection, percutaneous radiation, and in most cases chemotherapy. Despite new radiation strategies and the development of oral alkylating substances (e.g., temozolomide), the survival rate is still only approximately 15 months [55]. Although in former years the surgical role was controversial, current literature shows maximum safe surgical resection as a positive predictor for extended patient survival [56]. Microsurgical resection is currently optimized with the technical development of neuronavigation containing functional data sets such as diffusion tensor imaging (DTI), functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), magnetic resonance spectroscopy (MRS), or positron emission computed tomography (PET). An early postoperative MRI with a contrast agent at the point of origin quantifies the tumor mass removal. From here on, the patient undergoes frequent MRI scans during the time of adjuvant therapy. Especially in case of a remnant tumor, the tumor volume has to be rigidly registered so that a new tumor growth is not missed.

For (semi-)automatic glioma segmentation in general (World Health Organization grade I–IV), several MRI-based algorithms have been proposed. Szwarc et al. [57] present a segmentation approach that uses fuzzy clustering techniques. In a first step the contrast-enhanced T1-weighted images are processed with the Kernelized Weighted C-Means (KWCM) method, yielding a mask that is superimposed over the perfusion maps. In a final step, the mean perfusion intensity value is computed for both hemispheres, and it is checked whether the ratio of these is less than a user-defined threshold. In their evaluation, the authors used six MR studies of three subjects, where the Dice Similarity Coefficient (DSC) [58] ranged from 67.21 to 75.63 %. Angelini et al. [59] presented an extensive overview of deterministic and statistical approaches. The majority are region-based approaches; more recent ones are based on deformable models and include edge information. Gibbs et al. [60]. introduced a combination of region growing and morphological edge detection for segmenting enhancing tumors in T1-weighted MRI data. Based on a manually specified initial sample of tumor signal and surrounding tissue, an initial segmentation is performed using pixel thresholding, morphological opening, and closing and fitting to an edge map. The method was evaluated with one phantom data set and ten clinical data sets. The average tumor segmentation time was ten minutes, and the tumors used were not classified exactly. An interactive method for segmentation of full-enhancing, ring-enhancing, and non-enhancing tumors was proposed by Letteboer et al. [61] and evaluated on 20 clinical cases. Several morphological filter operations were applied to the MRI volume to separate the data in homogenous areas, based on manual tracing of an initial slice. Droske et al. [62] presented a deformable model, using a level set formulation, to divide the MRI data into regions of similar image properties for tumor segmentation, using intensity-based pixel probabilities for tumor tissue. This model-based segmentation was performed on 12 patient images. Clark et al. [63] introduced knowledge-based automated segmentation in order to partition glioblastomas from multispectral data. After a training phase with fuzzy C-means classification and clustering analysis and a brain mask computation, initial tumor segmentation from vectorial histogram thresholding is postprocessed to eliminate non-tumor pixels. The presented system was trained on three volumetric data sets and tested on 13 unseen volumetric data sets. Segmentation based on outlier detection in T2-weighted MR data was proposed by Prastawa et al. [64]. The image is registered to a normal brain atlas to detect the abnormal tumor region, after which the tumor and edema are isolated using statistical clustering of differing voxels and a deformable model. The method was applied to three sets of real data, where automatic segmentation required approximately 90 min per case. Sieg et al. [65] introduced an approach to segment contrast-enhanced, intracranial tumors and anatomical structures from registered, multispectral MR data. In this approach, multilayer feedforward neural networks with backpropagation are trained, and pixel-oriented classification is applied for segmentation. Tests were performed on 22 data sets, and computation times were not provided. Egger et al. presented a graph-based approach evaluated on 50 manual segmented GBMs [66] which has also been used for pituitary adenoma segmentation [67]. The approach utilizes a user-defined seed point to set up a directed 3D graph, where the nodes of the 3D graph are obtained by sampling along rays that are sent through the surface points of a polyhedron. After the graph has been constructed, the minimal cost closed set on the graph is computed via a polynomial time s-t cut, creating an optimal segmentation of the object (Fig. 5.6). In the meantime this approach has been adapted for bladder [68] and vertebrae segmentation [69].

Fig. 5.6
figure 00056

Segmentation result of a glioblastoma multiforme (GBM) in a magnetic resonance imaging (MRI) scan

Segmentation for Prostate Interventions

The prostate is a gland in the male reproductive system involved in the production and storage of seminal fluid, weighing twenty grams and measuring three centimeters in length in the typical adult male [70]. Prostate cancer is a form of cancer that develops in the prostate. There are cases of aggressive prostate cancers, but most prostate cancers are slow growing [71]. Brachytherapy is a form of radiotherapy, or radiation treatment, that can be offered as a treatment option to eligible patients with certain types of prostate cancer. Two different brachytherapy techniques exist: high-dose-rate (HDR) [72] and low-dose-rate (LDR) brachytherapy [73]. LDR brachytherapy is most commonly used in prostate cancer treatment and may also be referred to as “seed implantation” or “pinhole surgery.”

Precise detection of the prostate boundary in medical images is crucial for diagnosis and classification of prostate cancer. Precise boundary detection is also necessary in clinical applications, for example, accurate placement of needles during biopsy, accurate prostate volume measurement from multiple frames, and construction of anatomical models. In comparison with other medical imaging modalities such as CT and MRI, prostate transrectal ultrasound (TRUS) images are an attractive alternative due to low cost and real-time acquisition capability. The prostate can be manually delineated in TRUS images to calculate its size and volume, and this information can be used to support the diagnosis in typical TRUS screening [74]. In comparison to manual delineation, automated prostate delineation techniques are faster and can achieve higher accuracy and precision with minimal manual input from the physician [75]. Several algorithms for (semi-)automatic prostate segmentation for prostate brachytherapy have been proposed in the literature [76]. Mahdavi et al. [77] provide an overview of such algorithms, where most can be categorized as either incorporating prior information about the expected shape of the prostate or as relying solely on image data. They find that the advantage of not constraining the solution to certain geometrical classes may result in more robust segmentation of irregular prostate shapes and some prostate surface abnormal deformations that may be caused by cancer. Contrast enhancement, image smoothing and Canny’s edge detector [75], multi-resolution discrete wavelet pyramids for coarse-to-fine segmentation [78], and the use of phase symmetry for outlining dominant edges and linking edge segments to generate a final contour [79] are examples of image-based segmentation methods used in the literature.

Segmentation methods that rely on image information only have the disadvantage that they are more sensitive to factors such as image quality and noise and thus generally require more effort to achieve reasonable results. In contrast to other medical image modalities, ultrasound images of the prostate have poorer image quality and contrast. For this reason, the majority of general segmentation methods not incorporating prior shape information also perform poorly on ultrasound images of the prostate. For medical image segmentation, deformable models such as active contour models (ACM) and snake models have been widely used. Briefly, these models can be viewed as curves in 2D [80] or surfaces in 3D [81], in which the deformation is influenced by internal and external forces. Internal forces generally constrain the curve being smooth, while external forces often rely on edges in the image. Such methods are generally intolerant of the large amount of noise in ultrasound images and typically require incorporating additional constraints, for example, limiting the deformation by a given shape or allowing significant user interaction [76]. The active shape model (ASM) is a segmentation method that uses shape models that deform within constraints, where the constraints and the initial shape model are derived statistically from a training set [82]. Among others, prostate boundary segmentation from ultrasound images with ASM has been used by Hodge et al. [83] and Shen et al. [84]. Ellipses, ellipsoids, superellipses, and other similar shapes are commonly used in prostate segmentation, as the prostate conforms well to these shapes and their reduced parameter space results in fast segmentation algorithms. Kachouie et al. [85] present an approach, where an ellipse is initially placed on the center of the prostate. Then, the velocity of an evolving ellipse is guided by the first and second moments of the Gaussian probability density function fitted to the image histogram. However, for robustness to poorer images, further work is said to be needed. Ellipses that can be deformed by adjusting their aspect ratio, squareness, tapering, and bending are called deformable superellipses and have been used by Gong et al. [86]. Tutar et al. [87] presented a semiautomatic 3D segmentation technique that fits the best surface to a set of images under shape constraints, where constraints are derived by modeling the prostate shape using spherical harmonics. Mahdavi et al. [77] introduced a semiautomatic prostate segmentation method for prostate brachytherapy, based on anatomical evidence and requirements of the treatment procedure. In this work, a warped and tapered ellipsoid was found to be a suitable model for the a priori 3D shape of the prostate. By transforming the acquired endorectal transverse images of the prostate into ellipses, the shape-fitting problem was cast into a convex problem.

Conclusion

In this chapter we have provided some background and accomplishments of computation research in the interdependent tasks of segmentation and registration for image-guided therapy. Excellent additional discourse is available to the reader in a recent textbook [88].