Keywords

1 Introduction

Image synthesis in MRI is a process in which the intensities of acquired MRI data are transformed in order to enhance the data quality or render them more suitable as input for further image processing. Image synthesis has been gaining traction in the medical image processing community in recent years [6, 15], as a useful pre-processing tool for segmentation and registration. It is especially useful in MR brain imaging, where a staggering variety of pulse sequences like Magnetization Prepared Gradient Echo (MPRAGE), Dual Spin Echo (DSE), FLAIR etc. are used to interrogate the various aspects of neuroanatomy. Versatility of MRI is a boon for diagnosticians but can sometimes prove to be a handicap when performing analysis using image processing. Automated image processing algorithms are not always robust to variations in their input [13]. In large datasets, sometimes images are missing or corrupted during acquisition and cannot be used for further processing. Image synthesis can be used as a tool to supplement these datasets by creating artificial facsimiles of the missing images by using the available ones. An additional source of variability is the differing image quality between different pulse sequences for the same subject. An MPRAGE sequence can be quickly acquired at a resolution higher than 1 mm\(^3\), which is not possible for FLAIR. Image synthesis can be used to enhance the resolution of existing low resolution FLAIRs using the corresponding high resolution MPRAGE images, thus leading to improved tissue segmentation.

Previous work on image synthesis has proceeded along two lines, (1) registration-based, and (2) example-based. Registration-based [3, 12] approaches register the training/atlas images to the given subject image and perform intensity fusion (in the case of multiple training/atlas pairs) to produce the final synthesis. These approaches are heavily dependent on the quality of registration, which is generally not accurate enough in the cortex and abnormal tissue regions. Example-based approaches involve learning an intensity transformation from known training data pairs/atlas images. A variety of example-based approaches [5, 8, 15] have been proposed. These methods treat the problem as a regression problem and estimate the synthetic image voxel-by-voxel from the given, available images. The voxel intensities in the synthetic image are assumed to be independent of each other, which is not entirely valid as intensities in a typical MR image are spatially correlated and vary smoothly from voxel-to-voxel.

In this work, we frame image synthesis as an inference problem in a probabilistic discriminative framework. Specifically, we model the posterior distribution \(\mathrm {p}(\mathbf{y}|\mathbf{x})\), where \(\mathbf{x}\) is the collection of known images and \(\mathbf{y}\) is the synthetic image we want to estimate, as a Gaussian CRF [10]. Markov random field (MRF) approaches lend themselves as a robust, popular way to model images. However in a typical MRF, the observed data \(\mathbf{x}\) are assumed to be independent given the underlying latent variable \(\mathbf{y}\), which is a limiting assumption for typical images. A CRF, by directly modeling the posterior distribution allows us to side-step this problem. CRFs have been used in discrete labeling and segmentation problems [9]. A continuous-valued CRF, modeled as a Gaussian CRF, was first described in [17]. Efficient parameter learning and inference procedures of a Gaussian CRF were explored in the Regression Tree Fields concept in [7]. We also model the posterior distribution as a Gaussian CRF, the parameters of which are stored in the leaves of a single regression tree. We learn these parameters by maximizing a pseudo-likelihood objective function given training data. Given a subject image, we build the Gaussian distribution parameters from the learned tree and parameters. The prediction of the synthetic subject image is a maximum a posteriori (MAP) estimate of this distribution and is estimated efficiently using conjugate gradient descent.

We refer to our method as Synthesis with Conditional Random Field Tree or SyCRAFT. We applied SyCRAFT to synthesize \(T_2\)-weighted (\(T_2\)-w) images from \(T_1\)-w (\(T_1\)-w) images and showed a superior quality of synthesis compared to state-of-the-art methods. We also applied our method to synthesize FLAIRs from corresponding \(T_1\)-w, \(T_2\)-w, and \(P_D\)-weighted (\(P_D\)-w) images and showed that tissue segmentation on synthetic images is comparable to that achieved using real images. Finally, we used our method in an example-based super-resolution framework to estimate a super-resolution FLAIR image and showed improved tissue segmentation. In Sect. 2, we describe our method in detail, followed by experiments and results in Sect. 3 and discussion in Sect. 4.

2 Method

2.1 Model

We start with the definition of a CRF, initially proposed in [10]. A CRF is defined over a graph \(G = (V, E)\), V and E are the sets of vertices and edges respectively, of G. In an image synthesis context, the set of all voxels i in the image domain form the vertices of V. A pair of voxels \((i,j), i , j \in V\), that are neighbors according to a predefined neighborhood, form an edge in E. Let \(\mathbf{x}= \{\mathbf{x}_1,\ldots ,\mathbf{x}_m\}\) be the observed data. Specifically, \(\mathbf{x}\) represents the collection of available images from m pulse sequences from which we want to synthesize a new image. Let \(\mathbf{y}\) be the continuous-valued random variable over V, representing the synthetic image we want to predict. In a CRF framework, \(\mathrm {p}(\mathbf{y}| \mathbf{x})\) is modeled and learned from training data of known pairs of \((\mathbf{x}, \mathbf{y})\). Let \(\mathbf{y}= \{y_i, i \in V\}\). Then \((\mathbf{y}, \mathbf{x})\) is a CRF if, conditioned on \(\mathbf{x}\), \(y_i\) exhibit the Markov property, i.e. \(\mathrm {p}(y_i | \mathbf{x}, \mathbf{y}_{V\setminus i} ) = \mathrm {p}(y_i | \mathbf{x}, \mathbf{y}_{\mathcal {N}_i} )\), where \(\mathcal {N}_i = \{ j \mid (i,j) \in E\}\), is the neighborhood of i.

Assuming \(\mathrm {p}(\mathbf{y}| \mathbf{x}) > 0, \forall \mathbf{y}\), from the Hammersley-Clifford theorem, we can express the conditional probability as a Gibbs distribution. The factorization of \(\mathrm {p}(\mathbf{y}| \mathbf{x})\) in terms of association potentials and interaction potentials is given as,

$$\begin{aligned} \mathrm {p}(\mathbf{y}| \mathbf{x}) = \frac{1}{Z}\exp [-\{\sum _{i\in V} E_{\mathcal {A}}(y_i, \mathbf{x}; \theta ) + \lambda \sum _{i \in V} \sum _{j \in \mathcal {N}_i} E_{\mathcal {I}}(y_i, y_j,\mathbf{x};\theta )\}]. \end{aligned}$$
(1)

\(E_{\mathcal {A}}(y_i, \mathbf{x}; \theta )\) is called an association potential, defined using the parameter set \(\theta \), \(E_{\mathcal {I}}(y_i, y_j,\mathbf{x};\theta )\) is called an interaction potential, \(\lambda \) is a weighting factor, and Z is the partition function. If \(E_{A}\) and \(E_{I}\) are defined as quadratic functions of \(\mathbf{y}\), we can express this distribution as a multivariate Gaussian as below,

$$\begin{aligned} \mathrm {p}(\mathbf{y}| \mathbf{x})&= \frac{1}{ (2\pi )^{\frac{|V|}{2}} |\varvec{\varSigma }|^{\frac{1}{2}} }\exp (-\frac{1}{2}(\mathbf{y}-\varvec{\mu }(\mathbf{x}))^T\varvec{\varSigma }(\mathbf{x})^{-1}(\mathbf{y}-\varvec{\mu }(\mathbf{x})))\nonumber \\&= \frac{1}{Z}\exp (-\frac{1}{2}(\mathbf{y}^T\mathbf{A}(\mathbf{x})\mathbf{y}) - \mathbf{b}(\mathbf{x})^{T}\mathbf{y}). \end{aligned}$$
(2)

The parameters \(\mathbf{A}(\mathbf{x})\) and \(\mathbf{b}(\mathbf{x})\) are dependent on the association and interaction potential definitions. In most classification tasks involving CRF’s the association potential is defined as the local class probability as provided by a generic classifier or a regressor [9]. Image synthesis being a regression task, we chose to model and extract both association and interaction potentials from a single regressor, in our case a regression tree. We define a quadratic association potential as

$$\begin{aligned} E_{\mathcal {A}}(y_i, \mathbf{x}; \theta ) = \frac{1}{2}(a_{L(i)}y_i^2) - b_{L(i)}y_i, \end{aligned}$$
(3)

where \(\{a_{L(i)}, b_{L(i)}\} \in \theta \) are the parameters defined at the leaf L(i). L(i) is the leaf where the feature vector \(\mathbf{f}_i(\mathbf{x})\) extracted for voxel i from the observed data \(\mathbf{x}\), lands after having been passed through successive nodes of a learned regression tree, \(\varPsi \). The features and regression tree construction are described in Sect. 2.2.

The interaction potential usually acts as a smoothing term, but can also be designed in a more general manner. We define interaction potentials for each type of neighbor. A ‘neighbor type’ \(r \in \{1,\ldots ,|\mathcal {N}_i|\}\) is given by a voxel i and one of its \(n~(= |\mathcal {N}_i|\)) neighbors. For example, a neighborhood system with four neighbors (up, down, left, right) has four types of neighbors, and hence four types of edges. The complete set of edges E can be divided into non-intersecting subsets \(\{E_1,\ldots , E_r, \ldots ,E_n\}\) of edges of different types. Let the voxel j be such that \((i,j) \in E\) is a neighbor of type r, that is \((i,j) \in E_r\). Let the corresponding feature vectors \(\mathbf{f}_i(\mathbf{x})\) and \(\mathbf{f}_j(\mathbf{x})\) land in leaves L(i) and L(j) of the trained tree \(\varPsi \), respectively. The interaction potential is modeled as

$$\begin{aligned} E_{\mathcal {I}}(y_i, y_j, \mathbf{x};\theta ) = \frac{1}{2}({\alpha _{L(i)}}_r y_i^2 + {\beta _{L(i)}}_r y_i y_j + {\gamma _{L(i)}}_r y_j^2) - {\omega _{L(i)}}_{1r} y_i - {\omega _{L(i)}}_{2r} y_j . \end{aligned}$$
(4)

Let the set of leaves of the regression tree \(\varPsi \) be \(\mathcal {L}_{\varPsi }\). Each leaf \(l \in \mathcal {L}_{\varPsi }\) stores the set of parameters \(\theta _{l} = \{a_l, b_l, {\alpha _{l}}_1,{\beta _{l}}_1, {\gamma _{l}}_{1},{\omega _{l}}_{11}, {\omega _{l}}_{21}, \ldots , {\alpha _{l}}_n,{\beta _{l}}_n,{\gamma _{l}}_{n},{\omega _{l}}_{1n}, {\omega _{l}}_{2n} \}\). The complete set of parameters is thus, \(\theta = \{\theta _{l} | l \in \mathcal {L}_{\varPsi }\}\). Our approach bears similarity to the regression tree fields concept introduced in [7], where the authors create a separate regression tree for each neighbor type. Thus with a single association potential and a typical 3D neighborhood of 26 neighbors, they would need 27 separate trees to learn the model parameters. Training a large number of trees with large training sets makes the regression tree fields approach computationally expensive. It was especially not feasible in our application with large 3D images, more neighbors, and high dimensional feature vectors. We can however train multiple trees using bagging to create an ensemble of models to create an average, improved prediction. The training of a single regression tree is described in the next section.

2.2 Learning a Regression Tree

As mentioned before, let \(\mathbf{x}= \{\mathbf{x}_1, \mathbf{x}_2,\ldots , \mathbf{x}_m\}\) be a collection of co-registered images, generated by modalities \(\varPhi _1, \ldots , \varPhi _m\), respectively. The image synthesis task entails predicting the image \(\mathbf{y}\) of a target modality \(\varPhi _t\). The training data thus consists of known co-registered pair of \(\{\mathbf{x}, \mathbf{y}\}\). At each voxel location i, we extract features \(\mathbf{f}_i(\mathbf{x})\), derived from \(\mathbf{x}\). For our experiments we use two types of features, (1) small, local patches, (2) context descriptors. A small 3D patch, denoted by \(\mathbf{p}_i(\mathbf{x}) = [\mathbf{p}_i(\mathbf{x}_1), \ldots , \mathbf{p}_i(\mathbf{x}_m)]\), where the size of the patch is typically \(3\times 3\times 3\) and provides us with local intensity information.

We construct the context descriptors as follows. The brain images are rigidly aligned to the MNI coordinate system [4] with the center of the brain approximately at the center of the image. Thus for each voxel we can find out the unit vector \(\mathbf{u}\) from the voxel i to the origin. We can define 8 directions by rotating the component of \(\mathbf{u}\) in the axial plane by angles \(\{0,\frac{\pi }{4},\ldots ,\frac{7\pi }{4}\}\). In each of these directions, we select average intensities of cubic regions of cube-widths \(\{w_1, w_2,w_3,w_4\}\) at four different radii \(\{r_1, r_2, r_3, r_4\}\) respectively. This becomes a 32-dimensional descriptor of the spatial context surrounding voxel i. In our experiments we used \(w_1 = 3, w_2 = 5, w_3 = 7, w_4 = 9\) and \(r_1 = 4, r_2 = 8, r_3 = 16, r_4 = 32\). These values were chosen empirically. We denote this context descriptor by \(\mathbf{c}_i(\mathbf{x})\). The final feature vector is thus \(\mathbf{f}_i(\mathbf{x}) = [\mathbf{p}_i(\mathbf{x}),\mathbf{c}_i(\mathbf{x})]\). \(\mathbf{f}_i(\mathbf{x})\) is paired with the voxel intensity \(y_i\) at i in the target modality image \(\mathbf{y}\) to create training data pairs \((\mathbf{f}_i(\mathbf{x}), y_i)\). We train the regression tree \(\varPsi \) on this training data using the algorithm described in [2]. Once the tree is constructed, we initialize \(\theta _l\) at each of the leaves \(l \in \mathcal {L}_\varPsi \). \(\theta _l\) is estimated by a pseudo-likelihood maximization approach.

2.3 Parameter Learning

An ideal approach to learn parameters would be to perform maximum likelihood using the distribution in Eq. 2. However as mentioned in [7], estimation of the mean parameters \(\varvec{\varSigma }\) and \(\varvec{\mu }\), requires calculation of \(\mathbf{A}^{-1}\) (see Eq. 2). The size of \(\mathbf{A}\) is \(|V \times V|\) where |V| is the number of voxels in \(\mathbf{y}\) and for large 3D images, |V| is of the order of \(\sim \)10\(^{6}\), which makes the computation practically infeasible. We follow [7] and implement a pseudo-likelihood maximization-based parameter learning.

Pseudo-likelihood is defined as the product of local conditional likelihoods,

$$\begin{aligned} \hat{\theta }_{\mathrm {MPLE}} = \mathrm { arg max }_{\theta } \prod _{i \in V} \mathrm {p}(y_i \mid \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta ). \end{aligned}$$
(5)

The local conditional likelihood can be expanded as

$$\begin{aligned} \mathrm {p}(y_i \mid \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta )&= \frac{\mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta )}{\int _{\mathbb {R}} \mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta ) dy_i },\nonumber \\ -\log \mathrm {p}(y_i \mid \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta )&= - \log \mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta ) + \log Z_i , \end{aligned}$$
(6)

where \(Z_i = {\int _{\mathbb {R}} \mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta ) dy_i }\). Using the CRF definition in Eq. 1, we can write \(- \log \mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta )\) as

$$\begin{aligned} - \log \mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta )&= E_{\mathcal {A}}(y_i, \mathbf{x};\theta ) + \lambda \sum _{j \in \mathcal {N}_i} E_{\mathcal {I}}(y_i, y_j,\mathbf{x};\theta )\nonumber \\&= \frac{1}{2}a_{Ci} y_i^2 - b_{Ci}y_i , \end{aligned}$$
(7)

where we can find \(a_{Ci}\)(Eq. 8) and \(b_{Ci}\),(Eq. 9) by matching quadratic and linear terms. Equations 8 and 9 show the contribution of interaction potentials induced by the neighbors of voxel i. The \(\widetilde{r}\) denotes the type of edge which is symmetric to type r. For example, if edges of type r are between voxel i and its right neighbor, then \(\widetilde{r}\) denotes the type that is between a voxel and its left neighbor.

$$\begin{aligned} a_{Ci} = a_{L(i)} + \lambda (\sum _{j | (i,j) \in E_r} {\alpha _{L(i)}}_r + \sum _{h | (h,i) \in E_{\widetilde{r}}} {\gamma _{L(h)}}_{\widetilde{r}}) \end{aligned}$$
(8)
$$\begin{aligned} b_{Ci}= & {} b_{L(i)} + \lambda (\sum _{j | (i,j) \in E_r} {\omega _{L(i)}}_{1r} + \sum _{h | (h,i) \in E_{\widetilde{r}}} {\omega _{L(h)}}_{2\widetilde{r}}\nonumber \\&-\frac{1}{2}\sum _{j | (i,j) \in E_r} {\beta _{L(i)}}_r y_j - \frac{1}{2}\sum _{h | (h,i) \in E_{\widetilde{r}}} {\beta _{L(h)}}_{\widetilde{r}}y_h). \end{aligned}$$
(9)

The integral of exponential terms \(Z_i\) in Eq. 6, is also known as the log partition term. To optimize objective functions with log partition terms, we express \(Z_i\) in its variational representation using the mean parameters \(\varvec{\mu }_i = [\mu _i,\sigma _i]\) [18]. The parameter set \(\varvec{\theta }_{Ci} = \{b_{Ci},a_{Ci} \}\) that defines the exponential distribution is known as the canonical parameter set. The conjugate dual function of \(Z_i\) is defined as follows,

$$\begin{aligned} Z^{*}_i(\mu _i, \sigma _i) = \mathrm {sup}_{\theta _{Ci}} \langle \varvec{\theta }_{Ci}, \varvec{\mu }_i \rangle - Z_i(\varvec{\theta }_{Ci}), \end{aligned}$$
(10)

where \(\langle \rangle \) denotes inner product. Substituting \(\varvec{\theta }_{Ci}\) and the expression for

\(- \log \mathrm {p}(y_i, \mathbf{y}_{\mathcal {N}_i}, \mathbf{x}; \theta )\) from Eq. 7, we get the negative pseudo-likelihood contributed by voxel i to be,

$$\begin{aligned} \mathrm {NPL}_i(\theta ) = b_{Ci}(\mu _i - y_i) + \frac{1}{2}a_{Ci}(y_i^2 - \sigma _i) + \log (\sigma _i - \mu _i^2) + \log (2\pi e), \end{aligned}$$
(11)

where the mean parameters are given by \(\mu _i = \frac{b_{Ci}}{a_{Ci}}\) and \(\sigma _{i} = \frac{1}{a_{Ci}} + \mu _i^2\).

Equation 11 is similar to the one in [7], as the overall model is a Gaussian CRF. The objective function is linearly related to \(\theta \) and is thus convex [7, 18]. We minimize \(\sum _{i \in V} \mathrm {NPL}_i(\theta )\) using gradient descent. The weighting factor \(\lambda = 0.1\), was chosen empirically in our experiments. The regression tree fields concept performed a constrained, projected gradient descent on the parameters to ensure positive definiteness of the final precision matrix (\(\mathbf{A}(\mathbf{x})\) in Eq. 1) [7]. We observed that unconstrained optimization in our model and applications generated a positive definite \(\mathbf{A}(\mathbf{x})\). Training in our experiments takes about 20–30 min with \(\sim \)10\(^6\) samples of dimensionality of the order of \(\sim \)10\(^2\) and neighborhood size of 26, on a 12 core 3.42 GHz machine.

2.4 Inference

Given a test image set \(\hat{\mathbf{x}} = \{\hat{\mathbf{x}}_1,\ldots ,\hat{\mathbf{x}}_m\}\), which are co-registered, we first extract features \(\mathbf{f}_i(\hat{\mathbf{x}})\) from all voxel locations i. Next, we apply the learned regression tree \(\varPsi \) to each of \(\mathbf{f}_i(\hat{\mathbf{x}})\) to determine the leaf node L(i) in \(\varPsi \). Using the learned parameters at these leaves, we construct the matrix \(\mathbf{A}(\hat{\mathbf{x}})\) and the vector \(\mathbf{b}(\hat{\mathbf{x}})\), (see Eq. 2). The diagonal and off-diagonal elements of \(\mathbf{A}(\hat{\mathbf{x}})\) are populated by matching the linear and quadratic terms from Eq. 2. The MAP estimate for \(\mathrm {p}(\mathbf{y}|\hat{\mathbf{x}})\) as well as the conditional expectation \(\mathrm {E} [ \mathbf{y}| \hat{\mathbf{x}}]\) is the mean of the multivariate Gaussian described in Eq. 2. The expression for the mean and hence the estimate \(\hat{\mathbf{y}}\) is given by,

$$\begin{aligned} \hat{\mathbf{y}} = \mathbf{A}(\hat{\mathbf{x}})^{-1} \mathbf{b}(\hat{\mathbf{x}}). \end{aligned}$$
(12)

\(\mathbf{A}(\hat{\mathbf{x}})\) is a large (\(\sim \)10\(^6\times \)10\(^6\)), sparse (\(\sim \)27\(\times \)10\(^6\) non-zero entries), symmetric positive definite matrix. Thus, we use an iterative preconditioned conjugate gradient descent method to solve the linear system in Eq. 12. The estimate \(\hat{\mathbf{y}}\) is our synthetic image. Estimates from multiple (5 in our experiments) trained models using bagging can also be averaged to produce a final result.

Fig. 1.
figure 1

Shown are (a) the input MPRAGE image, (b) the true \(T_2\)-w image, and the synthesis results from the MPRAGE for each of (c) FUSION, (d) MIMECS, and (e) SyCRAFT (our method). The lesion (green circle) and the cortex (yellow circle) in the true image are synthesized by MIMECS and SyCRAFT, but not by FUSION (Colour figure online).

3 Results

3.1 Synthesis of \(T_2\)-w Images from \(T_1\)-w Images

In this experiment, we used MPRAGE images from the publicly available multimodal reproducibility (MMRR) data [11] and synthesized the \(T_2\)-w images of the DSE sequence. The multimodal reproducibility data consists of 21 subjects, each with two imaging sessions, acquired within an hour of each other. Thus there are 42 MPRAGE images. We excluded data of five subjects (ten images), which were used for training and synthesized the remaining 32. We compared SyCRAFT to MIMECS [15] and multi-atlas registration and intensity fusion (FUSION) [3]. We used five subjects as the atlases for FUSION with the parameters \(\beta = 0.5\) and \(\kappa = 4\) (fuse the four best patch matches).

Table 1. Mean and standard deviation (Std. Dev.) of the PSNR, UQI, and SSIM values for synthesis of \(T_2\)-w images from 32 MPRAGE scans.

We used PSNR (peak signal to noise ratio), universal quality index (UQI) [19], and structural similarity (SSIM) [20] as metrics. UQI and SSIM take into account image degradation as observed by a human visual system. Both have values that lie between 0 and 1, with 1 implying that the images are equal to each other. SyCRAFT performs significantly better than both the methods for all metrics except PSNR. Figure 1 shows the results for all three methods along with the true \(T_2\)-w image. FUSION results (Fig. 1(b)) have the highest PSNR, but produce anatomically incorrect images, especially in the presence of abnormal tissue anatomy (lesions for example) and the cortex. Overall, SyCRAFT produces an image that is visually closest to the true \(T_2\)-w image (Table 1).

Fig. 2.
figure 2

Subject input images along with the SyCRAFT FLAIR and true FLAIR images.

3.2 Synthesis for FLAIR Images

In this experiment, given atlas \(P_D\)-w, \(T_2\)-w, \(T_1\)-w, and FLAIR images, we trained SyCRAFT and applied it to subject \(P_D\)-w , \(T_2\)-w, and \(T_1\)-w images, to predict the subject synthetic FLAIR image. We used our in-house multiple sclerosis (MS) patient image dataset with 49 subject images, with four training subjects and testing on the remaining 45. We computed average PSNR (20.81, std = 1.19), UQI (0.81, std = 0.03) and SSIM (0.78, std = 0.03), over these 45 subjects. These values indicate that the synthetic FLAIRs are structurally and visually similar to their corresponding real FLAIRs. Figure 2 shows the input images and the synthetic FLAIR image along side the real FLAIR image.

Fig. 3.
figure 3

LesionTOADS segmentations for real and synthetic FLAIRs.

Fig. 4.
figure 4

A Bland-Altman plot of lesion volumes for synthetic FLAIRs vs real FLAIRs.

Next, we investigated the segmentations acquired from these synthetic FLAIRs. We would like the segmentation algorithm, LesionTOADS [16], to behave identically for real and synthetic images. LesionTOADS uses a \(T_1\)-w and a corresponding FLAIR to generate a multi-class, topologically correct segmentation in the presence of lesions. We compared the overlap of segmentations obtained using synthetic FLAIRs to those obtained using real FLAIRs in terms of Dice coefficients (averaged over 45 subjects) for white matter (0.97, std = 0.01) (WM), gray matter (0.99, std = 0.01) (GM), cerebrospinal fluid (0.97, std = 0.01) (CSF) and white matter lesions (0.52, std = 0.17) (WML). Figure 3 shows the segmentations by LesionTOADS on real and synthetic FLAIRs. The overlap is very good for WM, GM, and CSF, however it is low for the WML class. The lesions being small and diffuse, even a small difference in the overlap can cause a low value for the Dice coefficient. So, we looked at the overall lesion volumes as provided by the algorithm for real and synthetic FLAIRs. To understand how different the lesion volumes are for the synthetic images as compared to the real images, we created a Bland-Altman [1] plot shown in Fig. 4. Let \(\mathrm {RFlv}\) be the lesion volumes given by LesionTOADS using real FLAIRs as input. Let \(\mathrm {SFlv}\) be the lesion volumes using synthetic FLAIRs as input. Bland-Altman plot is a scatter plot of \((\mathrm {RFlv}-\mathrm {SFlv})/2\) (y axis) vs. \(\mathrm {RFlv}+\mathrm {SFlv}\) (x axis). The measurements are considered to be interchangeable if 0 lies within \(\pm 1.96\mathrm {\sigma }\) where \(\sigma \) is the standard deviation of \((\mathrm {RFlv}-\mathrm {SFlv})/2\). There is a small bias between \(\mathrm {RFlv}\) and \(\mathrm {SFlv}\) (mean = \(0.88\times 10^3\)) however, 0 does lie between the prescribed limits and hence based on this plot we can say that these two measurements are interchangeable.

3.3 Super-Resolution of FLAIR

Next, we applied SyCRAFT to synthesize super-resolution (SR) FLAIRs using corresponding high resolution (HR) MPRAGE and low resolution (LR) FLAIRs. During a clinical or a research scan, not all the pulse sequences acquired are acquired at the same fixed resolution. Sequences like the \(T_1\)-w MPRAGE can be acquired very fast and hence are easy to image at a high resolution–usually higher than \(1\,\mathrm {mm}^3\) isotropic. However sequence like DSE and FLAIR have long repetition times (\(\mathrm {TR}\)) and inversion times (\(\mathrm {TI}\)), which limits the amount of scan time, and therefore, are acquired at a low (2–\(5\,\mathrm {mm}\)) through plane resolution.

Fig. 5.
figure 5

Coronal slices of LR, HR, and SR FLAIRs along with their corresponding LesionTOADS segmentation are shown. It is evident that using a LR FLAIR affects the segmentation of the lesions and even the cortex.

Our approach can be described as an example-based super-resolution [14] technique. Example-based methods leverage the high resolution information extracted from a HR image—an MPRAGE, for example—in conjunction with a LR input image—corresponding FLAIR image—to generate a SR version of the LR image. We used HR (\(1\times 1\times 1\) mm\(^3\)) MPRAGE and FLAIR data, and downsampled the HR FLAIR to create a LR (\(1\times 1\times 4\) mm\(^3\)) FLAIR. The atlas data included an HR MPRAGE + LR FLAIR and we trained SyCRAFT to predict the HR FLAIR. Given a test HR MPRAGE and LR FLAIR, we applied SyCRAFT to synthesize a SR FLAIR. We ran the LesionTOADS [16] segmentation algorithm on three scenarios for each subject: (a) HR MPRAGE + LR FLAIR (b) HR MPRAGE + SR FLAIR (c) HR MPRAGE + HR FLAIR. The last case acting as the ground truth for how the segmentation algorithm should behave on best case data. We aim to show that the tissue segmentation using SR FLAIR is closer to that achieved using HR FLAIR, than using LR FLAIR. Figure 5(d) shows the super-resolution results, the LR FLAIR image is shown in Fig. 5(b), and the HR FLAIR image in Fig. 5(c). The corresponding LesionTOADS segmentations are shown in Figs. 5(e, f, g), respectively. The lesion boundaries as well as the cortex is overestimated when a LR FLAIR is used. Shown in Fig. 6 are the lesion volumes on 13 subjects for each of the three scenarios.

Fig. 6.
figure 6

Shown are the lesion volumes acquired by LesionTOADS on HR FLAIR+HR MPRAGE (black), LR FLAIR+HR MPRAGE (blue), and SR FLAIR+HR MPRAGE (red). Note that the black plot is closer to the red plot than the blue plot for all but one of the subjects (Colour figure online).

4 Conclusion

We have described an image synthesis framework, SyCRAFT, as a learning and inference problem on a Gaussian CRF. The parameters of the Gaussian CRF are built from parameters stored at the leaves of a single regression tree. Parameter learning is done by maximizing a pseudo-likelihood objective function. Our approach is extremely flexible in terms of features it can use to create the initial regression tree. It is also general enough to add larger neighborhoods and long-range relationships among voxels. Adding more neighbors leads to additional parameters, but these can be stored in the same initial tree and we do not need to create any more trees. Our approach is also computationally efficient, training from millions of samples in 20–30 min, and inference taking less than five minutes. We compared SyCRAFT to competitive image synthesis algorithms and showed that the image quality is superior. We also demonstrated practical benefits of using our algorithm to synthesize FLAIRs and validated the synthesis by showing tissue segmentation equivalent to that obtained using real FLAIRs. This shows that our image synthesis algorithm can be used in realistic scenarios, where imaging data is missing and needs to be replaced by a feasible alternative. Finally we also applied our algorithm to enhance the resolution of low resolution FLAIRs and showed improved tissue segmentation as a result.