Introduction

As medical image acquisition technology has advanced, significant investment has been made towards adapting classification techniques for neuroanatomy. Early work appropriated NASA satellite image processing software for statistical classification of head tissues in 2-D MR images (Vannier et al. 1985). A proliferation of techniques ensued with increasing sophistication in both core methodology and degree of refinement for specific problems. The chronology of progress in segmentation may be tracked through both technical reviews (Bezdek et al. 1993; Pal and Pal 1993; Clarke et al. 1995; Pham et al. 2000; Viergever et al. 2001; Suri et al. 2002; Duncan et al. 2004; Balafar et al. 2010) and evaluation studies (e.g. Cuadra et al. 2005; Zaidi et al. 2006; Klauschen et al. 2009; de Boer et al. 2010).

The problem of accurately delineating the white matter, grey matter and cerebrospinal fluid (and subdivisions) of the human brain continuously spurs technical development in segmentation. Following Vannier et al. (1985), many researchers adopted statistical methods for n-tissue anatomical brain segmentation. The Expectation-Maximization (EM) framework is natural (Dempster et al. 1977) given the “missing data” aspect of this problem. The work described in Wells et al. (1996) was one of the first to use EM for finding a locally optimal solution by iterating between bias field estimation and tissue segmentation. A core component of this work was explicit modeling of the tissue intensity values as normal distributions (Cline et al. 1990) for both 2-D univariate simulated data and T1 coronal images, which continues to find utility in contemporary developments. A secondary component was an extended non-parametric probability model, also influenced by earlier work (Kikinis et al. 1992), where Parzen windowing is used to model the tissue intensity distribution omitting consideration of the underlying bias field. Although technically not an EM-based algorithm, the robustness of the latter has motivated its continued use even more recently (e.g. Weisenfeld and Warfield 2009).

Subsequent development included the use of Markov Random Field (MRF) modeling (Geman and Geman 1984) to regularize the classification results (Held et al. 1997) with later work adding heuristics concerning neuroanatomy to prevent over-regularization and the resulting loss of fine structural details (Leemput et al. 1999a, b). A more formalized integration of generic MRF spatial priors was employed in the work of Zhang et al. (2001), commonly referred to as FAST (FMRIB’s Automated Segmentation Tool), which is in widespread use given its public availability and good performance. More recently, a uniform distribution of local MRFs within the brain volume and their subsequent integration into a global solution has been proposed obviating the need for an explicit bias correction solution (Scherrer et al. 2009).

Several initialization strategies have been proposed to overcome the characteristic susceptibility of EM algorithms to local optima. Common low-level initialization steps include uniform probability assignment (Wells et al. 1996), Otsu thresholding (Zhang et al. 2001), and K-means clustering (Pappas 1992). More sophisticated low-level initialization schemes include that of Greenspan et al. (2006) in which a dense spatial distribution of Gaussians is used to capture the complex neuroanatomical layout with subsequent processing used to conjoin subsets of such Gaussians belonging to the same tissue classes. Recently, reseachers have begun to rely on spatial prior probability maps of anatomical structures of interest to encode domain knowledge (Leemput et al. 1999b; Marroquin et al. 2002; Ashburner and Friston 2005). These spatial prior probability maps may also provide an initial segmentation. Related technological developments model partial volume effects for increased accuracy in brain segmentation (Ruan et al. 2000; Ballester et al. 2002; Leemput et al. 2003).

A general trend towards more integrative neuroanatomical image processing led to the work described in Ashburner and Friston (2005) which is publicly available within SPM5, a large-scale Matlab module in which registration, segmentation, and bias field correction can be simultaneously modeled within a single optimization scheme. The roots of this very popular software package stem back to early work by Karl Friston which laid the basis for statistical parametric mapping (Friston et al. 1990). Similar integrative brain processing was provided in Pohl et al. (2006) in which segmentation and registration parameters were optimized simultaneously while casting the inhomogeneity model parameters of Wells et al. (1996) as nuisance variables. Continued work involved recursive parcellation of the brain volume by considering sub-structures in a hierarchical manner (Pohl et al. 2007). An implementation is provided in 3D slicer (Pieper et al. 2006)—an open source medical image computation and visualization package with developmental contributions from multiple agencies including both private and academic institutions.

Related neuroanatomical research concerns the selection of geometric features of the cortex (e.g. Goualher et al. 1999) which aims at understanding the functional-anatomical relationship of the human brain. Recent endeavors produce a dense cortical labeling in which every point of the cortex is classified, i.e. a cortical parcellation (Fischl et al. 2004; Heckemann et al. 2006; Destrieux et al. 2010). Various techniques have been proposed to reduce the manual effort required to densely label a high-resolution neuroimage; one example is the popular software package known as Freesurfer (Dale et al. 1999; Fischl et al. 1999; Fischl et al. 2004). In contrast to the volumetric approach detailed in this work, Freesurfer is primarily a surface-based technique in which the brain structures such as the grey-white matter interface and pial surfaces are processed, analyzed, and displayed as tessellated surfaces (Dale et al. 1999; Fischl et al. 1999). Advantages of surface representations include the ability to map processed neuroanatomy to simple geometric primitives such as spheres or planes and the ease of including topological constraints in the analysis workflow. These types of methods, including Klein’s Mindboggle (Klein and Hirsch 2005), would usually follow an initial segmentation by a volumetric method such as Atropos.

Researchers in aging often focus on accurately segmenting the T1 MRI of elderly controls and subjects suffering from neurodegeneration, for instance, via SIENA (Smith et al. 2007). A recent evaluation study compared kNN segmentation, SPM Unified Segmentation and SIENA and found different performance characteristics under different evaluation criteria (de Bresser et al. 2011). Klauschen et al. (2009) had similar findings when comparing SPM5, FSL and FreeSurfer. These studies suggest that no single method performs best under every measurement and, along with the No Free Lunch theorem (Wolpert and Macready 1997), highlight the need for segmentation tools that are tunable for different problems and research goals.

Our open source segmentation tool, which we have dubbed Atropos,Footnote 1 efficiently and flexibly implements an n-tissue paradigm for voxel-based image segmentation. Atropos allows users to harness its generalized EM algorithm for standard tissue classification of the brain into gray matter, white matter and cerebrospinal fluid even in cases of multivariate image data—relevant when more than one view of anatomy aids segmentation, as in neonatal brain tissue classification (e.g. Prastawa et al. 2005; Weisenfeld and Warfield 2009). Atropos equally allows incarnations that use EM to simultaneously maximize the posterior probabilities of many classes with minimal random access memory requirements, for instance, when parcellating the brain into hemispheres, cortical regions and deep brain structures such as amygdala, hippocampus and thalamus. Atropos contains features of its predecessors for performing n-tissue segmentation including imposition of prior information in the form of MRFs and template-based spatial prior probability maps as well as weighted combinations of these terms. We also borrow an idea from Boykov and Kolmogorov (2004) and use sparse spatial priors to provide initialization and boundary conditions for Atropos EM segmentation in a semi-interactive manner. In short, Atropos seeks to provide a segmentation toolbox that may be modified, tuned and refined for different use scenarios.

Coupled with the registration (Avants et al. 2011) and template building (Avants et al. 2010b) already included in the ANTs, Atropos is a versatile and powerful software tool which touches multiple aspects of our brain processing pipeline. We use Atropos to address brain extraction (Avants et al. 2010a), grey matter/white matter/cerebrospinal fluid segmentation, label fusion/propagation and cortical parcellation. We also allow Atropos to interact with the recently developed N4 bias correction software (Tustison et al. 2010a) in an adaptive manner. To further highlight the value of this open source contribution, we performed a search of software attributes on NITRC and found that as of November 2010 no stand-alone EM methods are currently listed. We also evaluate Atropos performance on two brain MRI segmentation objectives. First, three-tissue classification. Second, we test our ability to parcellate the brain into 69 neuroanatomical regions to illustrate the practical value of the low-memory implementation within this paper. Although Atropos may be applied to multivariate data from arbitrary modalities, we limit our evaluation to tissue classification in T1 neuroimaging in part due to the abundance of “gold-standard” data for this modality. Consistent with our advocacy of open science (not to mention the facilitation of analysis due to accessibility) we also only use publicly available data sets. For this reason, all results in this paper are reproducible with the caveat that users may require some guidance from the authors or other users in the community.

Organization of this work is as follows: we first describe the theory behind the various components of Atropos while acknowledging that more theoretical discussion is available elsewhere. This is followed by a thorough discussion of implementation which, though often overlooked, is of immense practical utility. We then report results on the BrainWeb and Hammers dataset. Finally, we provide a discussion of our results and our open source contribution in the context of the remainder of this paper and of previous and future work.

Theoretical Foundations for Atropos Segmentation

Atropos encodes a family of Bayesian segmentation techniques that may be configured in an application-specific manner. The theory underlying Atropos dates back 20+ years and is representative of some of the most innovative work in the field. Although we summarize some of the theoretical work in this section, we recommend that the interested reader consult the deep literature in this field for additional perspective and proofs behind the major concepts.

Bayes’ theorem provides a powerful mechanism for making inductive inferences assuming the availability of quantities defining the relevant conditional probabilities, specifically the likelihood and prior probability terms. Bayesian paradigms for brain image segmentation employ a user selected observation model defining the likelihood term and one or more prior probability terms. The product of likelihood(s) and prior(s) is proportional to the posterior probability. The likelihood term has been previously defined both parametrically (e.g. a Gaussian model) and non-parametrically (e.g. Parzen windowing of the sample histogram). The prior term, as given in the literature, has often been formed either as MRF-based and/or template-based. An image segmentation solution in this context is an assignment of one label to each voxelFootnote 2 such that the posterior probability is maximized. The next sections introduce notation and provide a formal description of three essential components in Bayesian segmentation, viz.

  • the likelihood or observation model(s),

  • the prior probability quantities derived from a generalized MRF and template-based prior terms, and

  • the optimization framework for maximizing the posterior probability.

These components are common across most EM segmentation algorithms.

Notation

Assume a field, \(\mathcal{F}\), whose values are known at discrete locations, i.e. sites, within a regular voxel lattice that makes up an image domain, \(\mathcal{I}\). Note that \(\mathcal{F}\) can be a scalar field in the case of unimodal data (e.g. T1 image only) or a vector field in the case of multimodal data (e.g. T1, T2, and proton density images). A specific set of observed values, denoted by y, are indexed at N discrete locations in \(\mathcal{I}\) by i ∈ {1, 2, ..., N}. This random field, Y = {y 1, y 2, ..., y N }, serves as a discrete representation of an observed image’s intensities. A labeling of this image, also known as a hard segmentation, assigns to each site in \(\mathcal{I}\) one of K labels from the finite set \(\mathcal{L} = \{l_1, l_2, \ldots, l_K\}\). Also considered a random field, this discrete labeling is X = {x 1, x 2, ..., x N } where each \(x_i \in \mathcal{L}\). We use x to denote a specific set of labels in \(\mathcal{I}\) and a valid, though not necessarily optimal, solution to the segmentation problem.

Segmentation Objective Function

Atropos optimizes a class of user selectable objective functions each of which may be represented in a generic Bayesian framework, as described by Sanjay-Gopal and Hebert (1998). This framework requires likelihood models and prior models which enter into Bayes’ formula,

$$\label{eq:bayes} p(\mathbf{x}|\mathbf{y})=\underbrace{ p(\mathbf{y}|\mathbf{x})}_{\text{Likelihood(s)}} \underbrace{ p(\mathbf{x})}_{\text{Prior(s)}}\frac{1}{p(\mathbf{y})} $$
(1)

where the normalization term, 1/y, is a constant that does not affect the optimization (Sanjay-Gopal and Hebert 1998). Given choices for likelihood models and prior probabilities, the Bayesian segmentation solution is the labeling \(\hat{\mathbf{x}}\) which maximizes the posterior probability, i.e.

$$ \hat{\mathbf{x}} = \arg\max\limits_{\mathbf{x}} \left\{p(\mathbf{y}|\mathbf{x})p(\mathbf{x})\right\}. $$
(2)

Similar to its predecessors, Atropos employs the EM framework (Dempster et al. 1977) to find maximum likelihood solutions to this problem. The following sections detail the Atropos EM along with choices for the likelihood and prior terms.

Likelihood or Observation Models

To each of the K labels corresponds a single probabilistic model describing the variation of \(\mathcal{F}\) over \(\mathcal{I}\). We denote this set of K likelihood models as Φ = {p 1, p 2, ..., p K }. Using the standard notation, Pr(S = s) = p(s), Pr(S = s|T = t) = p(s|t), we can define these voxelwise probabilities, Pr k ( Y i  = y i | X i  = l k ) = p k (y i |l k ), in either parametric or non-parametric terms. Given its simplicity and good performance, in the parametric case, p k is typically defined as a normal distribution, i.e.

$$ \begin{aligned}\label{eq:param} p_k\left(y_i|l_k\right) &= G\left(\mu_k;\sigma_k\right) \\ &= \frac{1}{\sqrt{2\pi \sigma_k^2}}\exp\left( \frac{ -(y_i - \mu_k)^2 }{2\sigma_k^2} \right) \end{aligned} $$
(3)

where the parameters μ k and \(\sigma_k^2\) respectively represent the mean and variance of the kth model. When y i is a vector quantity, we replace the Euclidean distance by Mahalanobis distance and define multivariate Gaussian parameters via a mean vector, \(\boldsymbol{\mu}_k\), and covariance matrix, Σ k .

A common technique for the non-parametric variant is to define p k using Parzen windowing of the sample observation histogram of y, i.e.

$$\label{eq:nonparam} p_k\left(y_i|l_k\right) = \frac{1}{N_B} \sum\limits_{j=1}^{N_B} \frac{1}{\sqrt{2\pi \sigma_j^2}}\exp\left( \frac{ -(y_i - c_j)^2 }{2\sigma_j^2} \right) $$
(4)

where N B is the number of bins used to define the histogram of the sample observations (in Atropos the default is N B  = 32) and c j is the center of the jth bin in the histogram. σ j is the width of each of the N B Gaussian kernels. For multi-modal data in which the number of components of y i is greater than one, a Parzen window function is constructed for each component. The likelihood value is determined by the joint probability given by their product.

Atropos segmentation likelihood estimates are based on the classical finite mixture model (FMM). FMM assumes independency between voxels to calculate the probability associated with the entire set of observations, y. Spatial interdependency between voxels is modeled by the prior probabilities discussed in the next section. Marginalizing over the set of possible labels, \(\mathcal{L}\), leads to the following probabilistic formulation

$$\label{eq:likelihood} p(\mathbf{y}|\mathbf{x}) = \prod\limits_{i=1}^N \left( \sum\limits_{k=1}^K \gamma_k p_k(y_i|l_k) \right) $$
(5)

where γ k is the mixing parameter (Ashburner and Friston 2005).

Prior Probability Models

By modeling \(\mathcal{F}\) via the set of observation models Φ, this so called finite-mixture model could be used to produce a labeling or segmentation (e.g. Wells et al. 1996). However, as pointed out by Zhang et al. (2001), exclusive use of the intensity profile produces a less than optimal solution because spatial contextual considerations are ignored. This has been remedied by the introduction of a host of prior probability models including those characterized by use of MRF theory and template-based information. For example, in the works of Leemput et al. (1999b) and Weisenfeld and Warfield (2009), the original global prior term given in Wells et al. (1996) is replaced by the product of the template-based and the MRF-based prior terms. In addition to their descriptions below, we discuss a third possible prior/objective combination in the form of a (sparse) prior labeling which fixes specific points of the segmentation and uses EM to propagate this information elsewhere in the image.

Generalized MRF Prior

One may incorporate spatial coherence into the segmentation by favoring labeling configurations in which voxel neighborhoods tend towards homogeneity. This intuition is formally described by MRF theory in which spatial interactions in voxel neighborhoods can be modeled (Li 2001).

We assume the random field introduced earlier, X, is an MRF characterized by a neighborhood system, \(\mathcal{N}_i\), on the lattice, \(\mathcal{I}\), composed of the neighboring sites of i. This neighborhood system is both noninclusive, i.e. \(i \notin \mathcal{N}_i\), and reciprocating, i.e. \(i \in \mathcal{N}_j \Leftrightarrow j \in \mathcal{N}_i\). As an MRF, X also satisfies the positivity and locality conditions, i.e.,

$$ \label{eq:mrf} p(\mathbf{x}) > 0, \,\, \forall\mathbf{x} $$
(6)

and where x is any particular labeling configuration on X (in other words, any labeling permutation on X is a priori possible). The MRF locality condition is then,

$$ \label{eq:cond} p\left(x_i | x_{\mathcal{I}-\{i\}}\right) = p\left(x_i | x_{\mathcal{N}_i}\right) $$
(7)

where \(x_{\mathcal{I}-\{i\}}\) is the labeling of the entire image lattice except at site i and \(x_{\mathcal{N}_i}\) is the labeling of \(\mathcal{N}_i\). This locality property enforces solely local considerations based on the neighborhood system in calculating the probability of the particular configuration, x. Following these two assumptions, the Hammersley–Clifford theorem provides the basis for treating the MRF distribution (cf. Eq. 6) as a Gibbs distribution (Besag 1974; Geman and Geman 1984), i.e.

$$ \label{eq:gibbs} p(\mathbf{x}) = Z^{-1} \exp\left(-U(\mathbf{x})\right) $$
(8)

with Z a normalization factor known as the partition function and U(x) the energy function which can take several forms (Li 2001). In Atropos, as is the case with many other segmentation algorithms of the same family, we choose U(x) such that it is only composed of a sum over pairwise interactions between neighboring sites across the image,Footnote 3 i.e.

$$\label{eq:U} U(\mathbf{x}) = \beta \sum\limits_{i = 1}^N \sum\limits_{j \in \mathcal{N}_i} V_{ij}( x_i, x_j ) $$
(9)

where V ij is typically defined in terms of the Kronecker delta, δ ij , based on the classical Ising potential (also known as a Potts model) (Besag 1974)

$$ \begin{aligned} V_{ij}(x_i, x_j) &= \delta_{ij} \\ &= \left\{ \begin{array}{ll} 0 & \text{if } x_i = x_j \\ 1 & \text{otherwise} \end{array} \right. \end{aligned} $$
(10)

and β is a granularity term which weights the contribution of the MRF prior on the segmentation solution. Since Atropos allows for non-uniform neighborhood systems and systems in which not just the immediate face-connected neighbors are considered, we use the modified function also used in Noe and Gee (2001), which weights the interaction term by the Euclidean distance, d ij , between interacting sites i and j such that

$$ V_{ij} = \frac{\delta_{ij}}{d_{ij}} $$
(11)

so that sites in the neighborhood closer to i are weighted more heavily than distant sites.

Template-based Priors

A number of researchers have used templates to both ensure spatial coherence and incorporate prior knowledge in segmentation. A common technique is to select labeled subjects from a population from which a template is constructed (e.g. Avants et al. 2010b, which is also available in ANTs). Each labeling can then be warped to the template where the synthesis of warped labeled regions produces a prior probability map or prior label map encoding the spatial distribution of labeled anatomy which can be harnessed in joint segmentation/registration or Atropos/ANTs hybrids involving unlabeled subjects.

We employ the strategy given in Ashburner and Friston (2005) in which the stationary mixing proportions, Pr(x i  = l k ) = γ k (cf. Eq. 5), describing the prior probability that label l k corresponds to a particular voxel, regardless of intensity, are replaced by the following spatially varying mixing proportions,

$$ \mathrm{Pr}(x_i = l_k) = \frac{\gamma_k t_{ik}}{\sum_{j=1}^K\gamma_j t_{ij}}. $$
(12)

The t ik is the prior probability value at site i which was mapped, typically by image registration, to the local image from a template data set. The user may also choose mixing proportions equal to

$$ \mathrm{Pr}(x_i = l_k) = \frac{t_{ik}}{\sum_{j=1}^K t_{ij}} $$
(13)

via the command line interface to the posterior formulation.

Supervised Semi-interactive Segmentation

Brain segmentation methods have relied on user interaction for many years (Lim and Pfefferbaum 1989; Julin et al. 1997; Freeborough et al. 1997; Yushkevich et al. 2006). Atropos is capable of benefitting from user knowledge via an initialization and optimization that depends upon a spatially varying prior label image passed as input. Rapid, sparse labeling—with visualization provided by ITK-SNAP (www.itksnap.org)—enables an interaction and execution processing loop that can be critical to solving segmentation problems with challenging clinical data in which automated approaches fail. This part of Atropos design is inspired by the interactive graph cuts pioneered by Boykov and Jolly (2001) and which has spawned many follow-up applications. The Atropos prior label image prespecificies the segmentation results at a subset of the spatial domain by fixing the priors and likelihood (and, thus, the posterior) at a subset of \(\mathcal{I}\) to be 1 for the known label and 0 for each other label at the same site. The user input therefore not only initializes the optimization, but also gives boundary conditions that influence the EM solution outside of the known sites. While the graph-based min-cut max-flow solution is globally optimal for two labels, only locally optimal optimizers are available for 3 or more classes. Thus, in most practical applications, EM is a reasonable and efficient alternative to Boykov’s solution. Furthermore, one may automate the initialization process. We provide this capability to allow the user to implement an interactive editing and segmentation loop. The user may run Atropos with sparse manual label guidance, evaluate the results, update the manual labels and repeat until achieving the desired outcome. This processing loop may be performed easily with, e.g., ITK-SNAP.

Optimization

Atropos uses expectation maximization to find a locally optimal solution for the user selected version of the Bayesian segmentation problem (cf. Eq. 1). After initial estimation of the likelihood model parameters, EM iterates between calculation of the missing optimal labels \(\hat{\mathbf{x}}\) and subsequent re-estimation of the model parameters by maximizing the expectation of the complete data log-likelihood (cf. Eq. 5). The expectation maximization procedure is derived in various publications including Zhang et al. (2001) which yields the optimal mean and variance (or covariance), but sets the mixing parameter γ k as a constant. The Atropos implementation estimates γ k at each iteration, similar to Ashburner and Friston (2005).Footnote 4 When spatial coherence constraints are included as an MRF prior in Atropos, the optimal segmentation solution becomes intractable.Footnote 5 Although many optimization techniques exist (see the introduction in Marroquin et al. (2002) for a concise summary of the myriad optimization possibilities)—each with their characteristic advantages and disadvantages in terms of computational complexity and accuracy—Atropos uses the well-known Iterated Conditional Modes (ICM) (Besag 1986) which is greedy, computationally efficient and provides good performance. The EM employed in Atropos may therefore be written as a series of steps:

Initialization

In all cases, the user defines the number of classes to segment. The simplest initialization is by the classic K-means or Otsu thresholding algorithms with only the number of classes specified by the user. Otherwise, the user must provide prior information for each class in the form of either a single n-ary prior label image or a series of prior probability images, one for each class. The initialization also provides starter parameters.

Label Update (E-Step)

Given the initialization and fixed model parameters, Atropos is capable of updating the current label estimates using either a synchronous or asynchronous scheme. The former is characterized by iterating through the image and determining which label maximizes the posterior probability without updating any labels until all voxels in the mask have been visited at which point all the voxel labels are updated simultaneously (hence the descriptor “synchronous”). This option is specified with –icm [0]. However, unlike asynchronous schemes characteristic of ICM, synchronous updates lack convergence guarantees. To determine the labeling which maximizes the posterior probability for the asynchronous option, an “ICM code” image is created once for all iterations by iterating through the image and assigning an ICM code label to each voxel in the mask such that each MRF neighborhood has a non-repeating code label set. Thus each masked voxel in the ICM code image is assigned a value in the range {1,...,C} where C is the maximum code label. Such an image can be created and viewed with Atropos by assigning a valid filename in the –icm [1] set of options. An example adult brain slice and the associated code image is given in Fig. 1 for an MRF neighborhood of 5 ×5 pixels. This produces a maximum code label of ‘13’. For each iteration, one has the option to permute the set {1,...,C} which prescribes the order in which the voxel labels are updated asynchronously. After the first pass through the set of code labels, additional passes can further increase the posterior probability until convergence (in ∼5 iterations). One can specify a maximum number of these “ICM iterations” on the command line. For our example in Fig. 1, this means that for each ICM iteration, we would iterate through the image 13 times only updating those segmentation labels associated with the current ICM code.

Fig. 1
figure 1

An adult brain image slice is shown with its ICM code image corresponding to a 5×5 MRF neighborhood. To the right of the ICM code image, we focus on a single neighborhood with a center voxel associated with the ICM code label of ‘10’. Each center voxel in a specified neighborhood exhibits a unique ICM code label which does not appear elsewhere in its neighborhood. When performing the segmentation labeling update for ICM, we iterate through the set of ICM code labels and, for each code label, we iterate through the image and update only those voxels associated with the current code label

Parameter Update (M-Step)

Note that the posteriors used in the previous iteration are used to estimate the parameters at the current iteration. We use a common and elementary estimate of the mixing parameters:

$$ \gamma_k \leftarrow \frac{1}{N} \sum\limits_{i=1}^N p_k(l_k|y_i). $$
(14)

We update the parametric model parameters by computing, for each of K labels, the mean,

$$ \mu_k \leftarrow \frac{ \sum_{i=1}^N y_i p_k(l_k|y_i)}{ \sum_{i=1}^N p_k(l_k|y_i) } $$
(15)

and variance,

$$ \sigma^2_k \leftarrow \frac{ \sum_{i=1}^N (y_i - \mu_k )^T p_k(l_k|y_i) (y_i - \mu_k )}{ \sum_{i=1}^N p_k(l_k|y_i) }. $$
(16)

The latter two quantities are modified, respectively, in the case of multivariate data as follows:

$$ \boldsymbol{\mu}_k \leftarrow \frac{ \sum_{i=1}^N \mathbf{y}_i p_k(l_k|\mathbf{y}_i)}{ \sum_{i=1}^N p_k(l_k|\mathbf{y}_i) } $$
(17)

and the kth covariance matrix, \(\boldsymbol{\Sigma}_k\), is calculated from

$$ \boldsymbol{\Sigma}_{k} \leftarrow \frac{ \sum_{i=1}^N p_k(l_k|\mathbf{y}_i) ( \mathbf{y}_{i} - \boldsymbol{\mu}_{k} )^{\mathrm{T}} ( \mathbf{y}_{i} - \boldsymbol{\mu}_{k} )}{1 - \sum_{i=1}^N p^2_k(l_k|\mathbf{y}_{i}) }. $$
(18)

This type of update is known as soft EM. Hard EM, in contrast, only uses sites containing label l k to update the parameters for the kth model. A similar pattern is used in non-parametric cases.

EM will iterate toward a local maximum. We track convergence by summing up the maximum posterior probability at each site over the segmentation domain. The E-step, above, depends upon the selected coding strategy (Besag 1986). Atropos may use either a classical, sequential checkerboard update or a synchronous update of the labels, the latter of which is commonly used in practice. Synchronous update does not guarantee convergence but we employ it by default due to its intrinsic parallelism and speed. The user may alternatively select checkerboard update if he or she desires theoretical convergence guarantees. However, we have not identified performance differences, relative to ground truth, that convince us of the absolute superiority of one approach over the other.

Implementation

Organization of the implementation section roughly follows that of the theory section.

The Atropos User Interface

As with other classes that comprise ANTs, Atropos uses the Insight Toolkit as a developmental foundation. This allows us to take advantage of the mature portions of ITK (e.g. image IO) and ensures the integrity of the ancillary processes such as those facilitated by the underlying statistical framework. Although Atropos is publicly distributed with the rest of the ANTs package, we plan to contribute its core elements to the Insight Toolkit where it can be vetted and improved by other interested researchers.

An overview of Atropos components can be gleaned, in part, from the flowchart depicted in Fig. 2. Given a set of input images and a mask images, each is preprocessed using N4 to correct for intensity inhomogeneity. For our brain processing pipeline, the mask is usually obtained from the standard skull-stripping preprocessing step which also uses Atropos. Initialization can be performed in various ways using standard clustering techniques, such as K-means, to prior-based images. This initialization is used to provide the initial estimate of the parameters of the likelihood model for each class. These likelihoods combine with the current labeling to generate the current estimate of the posterior probabilities at each voxel for each class. At each iteration, one can also integrate N4 by using the current posterior probability estimation of the white matter to update the estimate of bias field.

Fig. 2
figure 2

Flowchart illustrating Atropos usage typically beginning with bias correction via N4. Initialization provides an estimate before the iterative optimization in which the likelihood models for each class are tabulated from the current estimate followed by a recalculation of the posterior probabilities associated with each class. The multiple options associated with the different algorithmic components are indicated by the colored rounded rectangles connected to their respective core Atropos processes via curved dashed lines

To provide a more intuitive interface without the overhead costs of a graphical user interface, a set of unique command line parsing classes were developed which can also provide insight to the functionality of Atropos. The short version of the command line help menu is given in Listing 1 which is invoked by typing ‘Atropos -h’ at the command prompt. Both short and long option flags are available and each option has its own set of possible values and parameters introduced in a more formal way in both the previous discussion and related papers cited in the introduction. Here we describe these options from the unique perspective of implementation.

Listing 1
figure a

Atropos short command line menu which is invoked using the ‘-h’ option. The expanded menu, which provides details regarding the possible parameters and usage options, is elicited using the ‘–help’ option

Initializing the Atropos Objective

Atropos has a number of parameters defined within Listing 2 and will function on 2, 3 or 4 dimensional data. However, the majority of the time, users will be concerned with a smaller set of input parameters. Here, we list the recommended input and an example definition for each parameter:

Fig. b
figure b

N4 short command line menu which is invoked using the ‘-h’ option. The expanded menu, which provides details regarding the possible parameters and usage options, is elicited using the ‘–help’ option

Input images to be segmented

If more than one input image is passed, then a multivariate model will be instantiated. E.g. -a Image.nii.gz for one image and -a Image1.nii.gz -a Image2.nii.gz for multiple images.

Input image mask

This binary image defines the spatial segmentation domain. Voxels outside the masked region are designated with the label 0. E.g. -x mask.nii.gz.

Convergence criteria

The algorithm terminates if it reaches the maximum number of iterations or produces a change less than the minimum threshold change in the posterior. E.g. -c [5,1.e-5].

MRF prior

The key parameter to increase or decrease the spatial smoothness of the label map is β. A useful range of β values is 0 to 0.5 where we usually use 0.05, 0.1 or 0.2 in brain segmentation. E.g. -m [0.1, 1x1x1] would define β = 0.1 with a MRF radius of one voxel in each of three dimensions.

Initialization

The initialization options include (where the first parameter defines K, here 3 for each below),

  • -i Kmeans[3] standard K-means initialization for three classes,

  • -i PriorLabelImage[3,label_image.nii. gz] and

  • -i PriorProbabilityImages[3, label_ prob%02d.nii.gz,w] where w = 0 (use the prior probability images only for initialization) or w > 0.0 (use the prior probability images throughout the optimization). If one chooses 0 < w < 1.0 then one will increase (from zero) the weight on the priors. These images, like the PriorLabelImage, should be defined with the same domain as the input images to be segmented.

Posterior formulation

The user may choose to estimate the mixture proportions (or not) by setting -p Socrates[1] or -p Socrates[0]. Fixed label boundary conditions may be employed by selecting the PriorLabelImage initialization and -p Plato[0].

Output

Atropos will output the hard segmentation and the probability image for each model. E.g. -o [segmentation.nii.gz, seg_prob%02d.nii. gz] will write out the hard segmentation in the first output parameter and a probability image for each class named, here, seg_prob01.nii.gz, seg_prob02.nii.gz, etc.

Higher dimensions than 4 are possible although we have not yet encountered such an application-specific need. Multiple images (assumed to be of the same dimension, size, origin, etc.), will automatically enable multivariate likelihoods. In that case, the first image specified on the command line is used to initialize the Random, Otsu, or K-means labeling with the latter initialization refined by incorporating the additional intensity images, i.e. an initial univariate K-means clustering is determined from the first intensity image which, along with the other images, provides the starting multivariate cluster centers for a follow-up multivariate K-means labeling. More details on each of the key implementation options are given below.

Likelihood Implementation

As mentioned previously in the introduction, different groups have opted for different likelihood models which have included either parametric (Gaussian) or non-parametric variations. However, these approaches are similar in that they require a list sample of intensity data from the input image(s) and a list of weighting values for each observation of the list sample from which the model is constructed. In general, one may query model probabilities by passing a given pixel’s single intensity (for univariate segmentation) or multiple intensities (for multivariate segmentation) to the modeling function, regardless of whether the function is parametric or non-parametric. These similarities permitted the creation of a generic plug-in architecture where classes describing both parametric and non-parametric observational models are all derived from an abstract list sample function class. Three likelihood classes have been developed, one parametric and two non-parametric, and are available for usage although one of the non-parametric classes is still in experimental development. The plug-in architecture even permits mixing likelihood models with different classes during the same run for a hybrid parametric/non-parametric model although this possibility has yet to be fully explored.

If the Gaussian likelihood model is chosen, the list sample of intensity values and corresponding weights comprised of the posterior probabilities are used to estimate the Gaussian model parameters, i.e. the mean and variance. For the non-parametric model, the list sample and posteriors are used in a Parzen windowing scheme on a weighted histogram to estimate the observational model (Awate et al. 2006).

Prior Probability Models

Label Regularity

Consistent with our previous discussion, we offer both an MRF-based prior probability for modeling spatial coherence and the possibility of specifying a set of prior probability maps or a prior label map with the latter extendable to creating a dense labeling. To invoke the MRF ‘-m/–mrf’ option, one specifies the smoothing factor (or the granularity parameter, β, given in Eq. 9, and the radius (in voxels) of the neighborhood system using the vector notation ‘1x1x1’ for a neighborhood radius of 1 in all 3 dimensions. This radius is defined such that voxels including but not limited to those that are face-connected will influence the MRF.

Registration and Probability Maps

Image registration enables one to transfer information between spatial domains which may aid in both segmentation and bias correction. We rely heavily on template-building strategies (Avants et al. 2010a, b) which are also offered in ANTs. Since aligned prior probability images and prior label maps are often associated with such templates, Atropos can be initialized with these data with their influence regulated by a prior probability weighting term. Although prior label maps can be specified as a single multi-label image, prior probability data are often represented as multiple scalar images with a single image corresponding to a particular label. For relatively small classifications, such as the standard 3-tissue segmentation (i.e. white matter, gray matter, and cerebrospinal fluid), this does not typically present computational complexities using modern hardware. However, when considering dense cortical parcellations where the number of labels can range upwards of 74 per hemisphere (Destrieux et al. 2010), the memory load can be prohibitive if all label images are loaded into run-time memory simultaneously. A major part of minimizing memory usage in Atropos, which corresponds to the boolean ‘-u/–minimize-memory-usage’ option, is the sparse representation of each of the prior probability images. Motivated by the observation that these spatial prior probability maps tend to be highly localized for large quantities of cortical labels, a threshold is specified on the command line (default = 0.0) and only those probability values which exceed that threshold are stored in the sparse representation. During the course of optimization, the prior probability image for a given label is reconstructed on the fly as needed. For instance, the NIREP (www.nirep.org) evaluation images are on the order of 300 ×300 ×256 with 32 cortical labels. Our novel memory minimizing image representation typically shrinks run-time memory usage from a peak of 10+ GB to approximately 1.5 GB and enable these datasets to be used for training/prior-based cortical parcellation.

Integrating N4 Bias Correction

Assumptions about bias correction may be thought of as another prior model. As such, the typical segmentation processing pipeline begins with an intensity normalization/bias correction step using a method such as the recently developed N4 algorithm (Tustison et al. 2010a). N4 extends the popular nonparametric nonuniform intensity normalization (N3) algorithm (Sled et al. 1998) in two principal ways:

  • We replace the least squares B-spline fitting with a parallelizable alternative (which we also made publicly available in the Insight Toolkit)— the advantages being that 1) computation is much faster and 2) smoothing is not susceptible to outliers as is characteristic with standard least squares fitting algorithms.

  • The fitting algorithm permits a multi-resolution approach so whereas standard N3 practice is to select a single resolution at which bias correction occurs, the N4 framework permits a multi-resolution correction where a base resolution is chosen and correction can then occur at multiple resolution levels each resolution being twice the resolution of the previous level.

Specifically, with respect to segmentation, there exists a third advantage with N4 over N3 in that the former permits the specification of a probabilistic mask as opposed to a binary mask. Recent demonstrations suggest improved white matter segmentation produces better gain field estimates using N3 (Boyes et al. 2008). Thus, when performing 3-tissue segmentation, we may opt to use, for instance, the posterior probability map of white matter at the current iteration as a weighted mask for input to N4. This is done by setting the ‘–weight-image’ option on the N4 command line call (see Listing 2) to the posterior probability image corresponding to the white matter produced as output in the Atropos call, i.e. ‘Atropos –output’. N4 was recently added to the Insight Toolkit repositoryFootnote 6 where it is built and tested on multiple platforms nightly. The evaluation section will illustrate inclusion of Atropos, N4 and ANTs in a brain processing pipeline.

Running the Atropos Optimization

The Atropos algorithm is cross-platform and compiles on, at minimum, modern OSX, Windows and Linux-based operating systems. The user interface may be reached through the operating system’s user terminal. Because of its portability and low-level efficiency, Atropos can easily be called from within other packages, such as Matlab or Slicer, or, alternatively, integrated at compile time as a library. A typical call to the algorithm, illustrated here with ANTs example data, is: Atropos -d 2 -a r16slice.nii.gz -i kmeans[3] -c [5,0] -x mask.nii.gz -m [0.2,1x1] -o [r16_seg.nii.gz,r16_prob_% 02d.nii.gz]. In this case, Atropos will output the segmentation image, the per-class probability images and a listing of the parameters used to set up the algorithm. A useful feature is that one may re-initialize the Atropos EM via the -i PriorProbabilityImages[...] option. This feature allows one to compute an initial segmentation via K-means, alter the output probabilities by externally computed functions (e.g. Gaussian smoothing, image similarity or edge maps) and re-estimate the segmentation with the modified priors. Finally, the functionality that is available to parametric models is equally available to the non-parametric models enabled by Atropos.

Evaluation

Atropos encodes a family of segmentation techniques that may be instantiated for different applications but here we evaluate only two of the many possibilities. First, we perform an evaluation on the BrainWeb dataset using both the standard T1 image with multiple bias and noise levels and also the BrainWeb20 data (Aubert-Broche et al. 2006; Battaglini et al. 2008). In combination, these data allow one to vary not only noise and bias but also the underlying anatomy. Second, we evaluate the use of Atropos in improving whole-brain parcellation and exercise its ability to efficiently solve many-class expectation maximization problem. We choose this evaluation problem in part to illustrate the flexibility of Atropos and also the benefits of the novel, efficient implementation that allows many-class problems to be solved with low memory usage (<2GB for a 69-class model on 1 mm3 brain data).

BrainWeb Evaluation

The BrainWeb data is freely available at http://mouldy.bic.mni.mcgill.ca/BrainWeb/. We employ both the individual subject data and the BrainWeb20 data in this evaluation.

Single-subject Evaluation

We use the single-subject data with 3% noise and three levels of bias referred to as 0, 20 and 40% RF inhomogeneity. We study the effect of the MRF prior term and initialization on the Dice overlap between ground truth and the segmentation result for each tissue. We test both K-means and prior label image initialization with MRF β ∈ { 0.00 , 0.05 , 0.10 , 0.15 , 0.20, 0.25 , 0.30 } at each bias field. We also feed the white matter probability map derived from K-means into N4 to guide the bias correction.Footnote 7 Segmentation is then repeated, with the same parameters, but with the N4-corrected image as input. The resulting algorithm is similar to those that fix segmentation parameters while estimating bias and fix bias while estimating segmentation parameters. Thus, with this simple evaluation, we are able to compare the impact of bias on the combination of N4 and Atropos and also the validity of our prior label image initialization. Results of these evaluation scenarios, in terms of Dice overlap, are shown in Fig. 3. Because overlap ratios with N4 bias correction approximate those of the zero bias data, we may conclude that simple N4 pre-processing is adequate to correct even the 40% RF bias level. An example of this procedure, using BrainWeb data with 40% RF bias, is in Fig. 4. We supply the information necessary to repeat the results in this figure in the script entitled ‘atroposBwebRF40FigureExample.sh’ which is available in the ANTs Atropos documentation folder as of SVN commit 711. The script may be easily modified to run the whole evaluation. Figure 4 shows the results of simultaneously using proton density and T1-weighted BrainWeb data to perform the segmentation. This multivariate input data outperforms the univariate T1-weighted data alone.

Fig. 3
figure 3

BrainWeb single-subject results for each tissue. The results show that N4 bias correction, combined with Atropos, results in a minimal effect of bias, even at the 40% level. The optimal β for the MRF term appears to be between 0.1 and 0.2. The legend is in the same position in each graph, allowing a visual comparison of the results. As one may see, the N4-assisted overlap values are consistent across bias field/RF inhomogeneity

Fig. 4
figure 4

We combine N4 and Atropos by simple sequential processing and apply to BrainWeb T1-weighted single-subject data with 40% RF bias and 3% noise. The β for the MRF term is, here, 0.2. Slice 71 of the input data is in a. The initial K-means (K = 3) segmentation is in panel b. We use the brain mask to guide N4 bias correction and produce the image in c. We repeat the K-means segmentation, but with the N4-corrected image as input and produce the segmentation in d. The average 3-tissue Dice overlap of result b is 0.906 while the average overlap for d is 0.954. Arrows highlight a region of large before-after segmentation discrepancy. In e we see the BrainWeb proton density image with no inhomogeneity and 3% noise. Its segmentation is in f with average 3-tissue Dice overlap of 0.895. In g we use both proton density data and T1 data as multiple modality input to Atropos. The segmentation of this two-modality input data, using a multivariate Gaussian model, produces average 3-tissue Dice overlap of 0.958, which exceeds the univariate solution. An arrow highlights one region where there is small, visually recognizable improvement in sulcal segmentation relative to the result from T1 data alone. A second area of improvement is the putamen segmentation. The ground truth segmentation is in h. The multivariate segmentation result, in combination with the low PD segmentation performance, suggests PD and T1 provide complementary information that may improve 3-tissue segmentation and serves to validate the multivariate Atropos implementation. In this case, the benefit is likely to derive from the fact that the PD image has no bias

20-subject Evaluation

The single-subject BrainWeb study in the previous section tested the basic Atropos options and the benefit of N4 for segmentation in the presence of bias. The 20 subject BrainWeb data allows us to use 2-fold cross-validation to test our ability to segment different individuals reliably. In this study, we divide the 20 subjects equally into training and testing groups. We then exploit the ground-truth labeling of the training data to build both a group template (Avants et al. 2010b) and also prior probability maps for each of the three major tissues in the cerebrum. Each prior probability map is gained by deforming the ground truth labels from each of the 10 training subjects to their template and averaging component by component. We then deform the template—and priors—to the ten testing subjects and run Atropos with not only KMeans[3] initialization but also PriorProbabilityImages[3,priors%02d.nii.gz,w] where w ∈ {0.0,0.5}. We then switch the roles of testing and training sets to gain 3-tissue segmentation for each of the twenty subjects. When w = 0, the priors are only used in initializing the model parameters but not during subsequent EM iterations. When w = 0.5, the priors are maintained in the product with the likelihood during all EM iterations. Results, in terms of bar plots for Dice overlap mean and standard deviation, are shown in Fig. 5.

Fig. 5
figure 5

BrainWeb 20-subject results for each tissue as a function of MRF-β parameter where MRF-β is in {0,0.05,0.1,0.15,0.2} and increases left to right. The results show that the PriorProbabilityImages with w = 0.5 (far right) gives the best performance for all tissues

The Hammers Dataset Evaluation

We evaluate the ability to improve multi-template labeling results by converting the group labels to probability maps and using them to drive many-class EM segmentation. The ground truth labels cover 69 classes and much of the brain. Some unlabeled regions remain which we assign to label 69 such that all brain parenchyma contains a unique label. Following Avants et al. (2010a), the initialization of our evaluation applies the script ants_multitemplate_labeling.sh (available in ANTs) to the 19 Hammers evaluation datasets located at http://www.brain-development.org/ and currently under the adult atlases section (Hammers et al. 2003; Heckemann et al. 2006). These initial majority voting results are competitive with prior work (Heckemann et al. 2006, 2010) and serve as a baseline against which we compare.

We first convert each of the 69 labels within the original evaluation dataset to an individual image. The remaining steps, summarized briefly, are the same for each of the 19 subjects. We select one subject as an unlabeled target. The other 18 datasets are then mapped (as in the script above) to that subject. We then deform, individually, the 69 × 18 label images to the unlabeled subject. The label probability map is gained by averaging the 18 deformed images associated with each label. We repeat this for each subject. The following parameters are the most relevant to this discussion: -i PriorProbabilityImages[69,label_prob% 02d.nii.gz,0.5] -m [0.2,1x1x1] -c [5,0] -p Socrates[1]. Results, in terms of Dice overlap, are shown in Fig. 6.

Fig. 6
figure 6

The figure compares the Dice overlap results from Atropos versus the raw results from majority voting for each of 68 neuroanatomical regions and, in addition, the unlabeled portions of the brain from the Hammers evaluation dataset. We evaluated Atropos via N-fold cross-validation and employed PriorProbabilityImages for each class where probabilities are gained by averaging mapped subject labels. The color coding highlights those regions that have the highest (yellow) and lowest (pink) improvement. The significance of the improvement, measured by pairwise T-test, is also shown as is a trinary coding of that improvement as: + significant improvement, − performance reduction, ~ no change

Reproducibility of this Evaluation

Data

The BrainWeb data is freely available. We used single-subject BrainWeb data as is but added a metaformat data header to the raw binary files. An example copy of this header is contained in atropos BwebRF40FigureExample.sh. The 20-subject data, however, required excluding non-cerebrum tissue classes. The Hammers data was also used as is (http://www.brain-development.org/).

Software

The ANTs software is available at http://www.picsl.upenn.edu/ANTs with download and compilation instructions at http://www.picsl.upenn.edu/ANTS/download.php. SVN release 711 was used for the examples and evaluations performed in this paper. Some components of ANTs depend on the Insight ToolKit. The most critical dependency, for Atropos, is the ITK statistics framework used to implement the univariate and multivariate parametric models. We linked to the git version of ITK current as of Dec. 1, 2010. See http://www.itk.org/Wiki/ITK/Git/Download for instructions on git ITK.

Scripts

The complete script for the single-subject BrainWeb study is based on generalizing atropos BwebRF40FigureExample.sh, which is available in the ANTs toolkit (SVN release 711 or greater) and which reproduces Fig. 2 results. The template-based normalization procedure for the BrainWeb 20 and the Hammers evaluation data is based on freely available scripts included with ANTs, ants_ multitemplate_labeling.sh and buildtemplateparallel.sh. A release version of ANTs—with a final version of Atropos—will be prepared with the final version of this manuscript.

Discussion

We introduced Atropos, the theory and implementation details and documented its performance in a variety of use cases. We also showed evidence that the openly available N4 bias correction can easily be used with Atropos to improve segmentation. Furthermore, we used multiple subject BrainWeb data to build dataset-specific priors that provided the most consistent segmentation performance across tissues. Finally, we used majority voting to initialize an Atropos EM solution to a 69-class brain parcellation problem. Significant improvements were gained in multiple brain regions, in particular in temporal lobe cortex, the hippocampi and amygdalae and the lateral ventricles. This work, in summary, proves the applicability of Atropos in both basic and extended use cases.

Performance on BrainWeb Data

Atropos results are competitive with the state of the art. For instance, Ashburner and Friston (2005) (SPM5) evaluated on 0% RF bias field, 3% noise BrainWeb single subject data finding 0.932 (GM) and .961 (WM) Dice overlap. Results on 40% RF bias were 0.934 (GM) and 0.961 (WM). SPM5 exhibits insensitivity to bias similar to our own best results on the 40% RF bias, 3% noise case (MRF-β = 0.2, K-means + N4) with Dice overlap for GM is 0.951 and for WM is 0.963. Nakamura and Fisher (2009) gave GM Dice overlap results (BrainWeb single 3% noise) of 0.962 (0% RF bias), 0.964 (20% RF bias) and 0.956 (40% RF bias) which are slightly higher than either SPM5 or Atropos results. However, Nakamura and Fisher (2009) do not report WM or CSF numbers for comparison. Topology-preserving methods also perform well. Shiee et al. (2010) achieved Dice overlap for 3% noise 20% RF bias BrainWeb single subject with 0.912 (GM), 0.927 (WM) and 0.900 (CSF) Dice overlap. These are excellent numbers given the topological constraint applied to the segmentation. Bazin and Pham (2007) proposed TOADS and, estimating from the paper’s graph, showed that the average Dice overlap accuracy for 3% noise for various RF was 0.930–0.950 (GM), 0.950–0.960 (WM), and 0.920–0.940 (CSF). Perhaps the most recent balanced evaluation was performed in (Klauschen et al. 2009), which reports confusion matrix numbers, rather than Dice overlap. Because the absolute true number of GM and WM voxels for BrainWeb are known, we can convert the confusion matrix to Dice overlap. In that case, the SPM5 Dice overlap for BrainWeb single-subject data is 0.885 (GM) and 0.909 (WM), while FreeSurfer and FSL’s accuracy is lower. The best GM Dice overlap result for the 20 subject BrainWeb data is obtained by SPM5: 0.930; the best WM Dice overlap is from FSL: 0.950. We note that Klauschen et al. (2009) used a comprehensive evaluation where quality of brain extraction also contributed to the outcome. Thus, the results must be interpreted slightly differently than those from other papers. Finally, in our evaluation of 20-subject BrainWeb data, the prior probability models performed best of all the models used. Compared to the K-Means based segmentation, the prior based segmentation performance also peaked at lower values of the MRF-β term (0.0 and 0.05). This is reasonable in that the spatial priors themselves impose a degree of regularity on the segmentation, as in SPM5.

Performance on Hammers Data

Our prior work, (Avants et al. 2011), showed that the majority vote initialization provided to Atropos by ANTs template mapping is competitive with Heckemann et al. (2006). Overall, the Atropos EM extension improved these results further. However, in a few regions of the mid-brain, the Atropos EM segmentation performed significantly worse. This is not surprising, in that Atropos EM assumes that signal from the likelihood and MRF term is valuable in improving the segmentation. This assumption held for amygdala and lateral ventricles among other areas. However, in pallidum and corpus callosum (the most significant areas with loss of performance), this is not true. We believe the explanation is that the intensity varies within these structures and that a more complex intensity model (or finer parcellation) would be needed here. An alternative solution would be to use boundary conditions for these structures, as in the PriorLabelImage Atropos initialization option.

Clinically-related Evaluation

While specifying performance on BrainWeb is highly valuable, clinical validation is a second important aspect of segmentation evaluation. For instance, (Freeborough and Fox 1997; Westlye et al. 2009; Sánchez-Benavides et al. 2010; Chou et al. 2009; de Bresser et al. 2011) are only a few of the papers that evaluate segmentation performance with respect to a known neurobiological outcome measure. Atropos is currently used in clinical studies and a number of clinically focused, application-specific evaluations are ongoing and will constitute future work. One early example of a clinically-focused Atropos neuroimaging application is in (Avants et al. 2010c). A second successful application area is that of ventilation-based segmentation of hyperpolarized helium-3 MRI (Tustison et al. 2010b) which also used the open source Glamorous Glue algorithm to impose topology constraints (Tustison et al. 2010c). Thus, future work may incorporate topology more closely into the Atropos methodology.

A more general advantage which extends beyond the scope of the experimental evaluation section of this paper is the flexibility of Atropos. This includes not only n-tissue segmentation and dense volumetric cortical parcellation, as reported in this work, but Atropos is also used in conjunction with our ANTs registration tools for robust brain extraction which has reported good performance in comparison with other popular, publicly available brain extraction tools (Avants et al. 2010a).

Conclusion

The Atropos software is freely available to the public. We release this code not only to make it available to clinical researchers but with the hope that other researchers in segmentation will provide feedback about the implementation decisions that we made. EM segmentation is non-trivial and there are numerous design alternatives available not only in the models selected but also in the ICM coding, alternatives to ICM and the method in which prior and likelihood are combined. Due to the flexibility of Atropos, we also hope that some of its capabilities, though not evaluated here, are explored by the segmentation or clinical community.

Information Sharing Statement

Atropos software is available in ANTs http://www.picsl.upenn.edu/ANTs which depends on ITK http://www.itk.org/Wiki/ITK/Git/Download. The data used in this work is available in the ANTs software repository, BrainWeb http://mouldy.bic.mni.mcgill.ca/BrainWeb/ and at www.brain-development.org. We employed itk-SNAP for visualization www.itksnap.org.