Keywords

1 Introduction

Registration of pre-operative and post-recurrence brain MRI images plays a significant role in discovering accurate imaging markers and elucidating imaging signatures for aggressively infiltrated tissue, which are crucial to the treatment plan and diagnosis of intracranial tumors, especially brain gliomas [11, 26]. To better understand the location and extent of the tumor and its biological activity after resection, pre-operative and follow-up structural brain MRI scans of a patient first need to be aligned accurately. However, deformable registration between the pre-operative and follow-up scans, including post-resection and post-recurrence, is challenging due to possible large deformations and absent correspondences caused by tumor’s mass effects [7], resection cavities, tumor recurrence and tissue relaxation in the follow-up scans.

Conventional registration methods mostly deal with the absent correspondence issue by (1) excluding the similarity measure of pathological regions [3, 5], 2) replacing the pathological images with quasi-normal appearance [9, 18, 30] or 3) joint registration and segmentation framework [4, 27]. Excluding the pathological regions often requires manual delineation [3] or initial seed [4, 17] of the tumor regions in brain scans, which are often prohibitive and daunting to acquire in terms of labour cost and resources. Replacing the pathological image with the quasi-normal appearance, alternately, avoids the prerequisite of a prior pathological segmentation. However, modeling the tumor-to-quasi-normal appearance with a statistical model [9, 18] often requires extra image scans, i.e., image scans from a healthy population. Moreover, existing approaches based on quasi-normal images require accurate registration to a common atlas space for quasi-normal reconstruction. Ironically, accurate alignment with images suffered from mass effect is very hard to achieve without reconstruction. Therefore, the registration and reconstruction problems with quasi-normal approaches need to be interleaved in a costly iterative optimization process. Alternatively, an unsupervised approach [27] to accommodate resection and retraction of tissue was proposed for registering pre-operative and intra-operative brain images. Their method alternates between registering the brain scans using the demons algorithm with an anisotropic diffusion smoother and segmenting the resection using the level set method in the space with high image intensity disagreement. Chitphakdithai et al. [4] extended this idea to a simultaneous registration and resection estimation approach with the expectation-maximization algorithm and a prior on post-resection image intensities. Nevertheless, these methods rely on the costly iterative optimization, which can be up to \(\sim 3.5\) h per case [17].

While recent deep learning-based deformable registration (DLDR) methods have achieved remarkable registration speed and superior registration accuracy [2, 6, 10, 13, 15, 23,24,25], these registration algorithms are incapable of accurately registering pre-operative and post-recurrence images due to the absent correspondence problem. A learning-based registration method for images with pathology was presented in [8] which dealt with missing correspondence by joint estimating the vector-momentum parameterized stationary velocity field (vSVF) and quasi-normal image to drive the registration. Nevertheless, the reconstruction of the quasi-normal image requires explicit tumor segmentation in the training phase. Moreover, the large deformation caused by the mass effect of tumor is difficult to model without resorting to complex multi-stage warping pipelines.

In this paper, we present an unsupervised joint registration and segmentation learning framework, in which a large deformation image registration network and a forward-backward consistency constraint are leveraged to estimate the valid and absent correspondence regions along with the dense deformation fields in a bidirectional manner, for pre-operative and post-recurrence registration. Instead of using a manual delineation or image intensity disagreement to segment the pathological regions, our method leverages the forward-backward consistency constraint of the bidirectional deformation fields to explicitly locate regions with absent correspondence and excludes them in the similarity measure in an unsupervised manner. We present extensive experiments with a pre-operative and post-recurrence brain MR dataset, demonstrating that our method achieves accurate registration accuracy in brain MR scans with pathology.

Fig. 1.
figure 1

Overview of the proposed method (Left) and the semantic representation of the forward-backward consistency constraint (Right). Our method jointly estimates the bidirectional deformation fields and locates regions with absent correspondence (denoted as mask). The regions with absent correspondence are excluded in the similarity measure during training. For brevity, the magnitude loss of the masks is omitted in the figure.

2 Methods

Our goal is to establish a dense non-linear correspondence between the pre-operative scan and the post-recurrence scan of the same subject, where regions without valid correspondence are excluded in the similarity measure during optimization. Our method builds on the previous DLDR method [24] and extends it to accommodate the absent correspondence issue in the pre-operative and post-recurrence scans.

2.1 Bidirectional Deformable Image Registration

Let B and F be the pre-operative (baseline) scan B and post-recurrence (follow-up) scan defined over a n-D mutual spatial domain \(\varOmega \subseteq \mathbb {R}^n\). In this paper, we focus on 3D deformable registration, i.e., \(n = 3\) and \(\varOmega \subseteq \mathbb {R}^3\) and assume that B and F are affinely aligned to a common space.

Figure 1 depicts an overview of our method. We parametrize the deformable registration problem as a bidirectional registration problem \(\boldsymbol{u}_{bf} = f_\theta (B, F)\) and \(\boldsymbol{u}_{fb} = f_\theta (F, B)\) with CNN, where \(\theta \) is a set of learning parameters and \(\boldsymbol{u}_{bf}\) represents the displacement field that transform B to align with F, i.e., \(B(x+\boldsymbol{u}_{bf}(x))\) and F(x) define similar anatomical locations for each voxel \(x\in \varOmega \) (except voxels with absent correspondence). The proposed method works with any CNN-based DLDR methods. In order to accommodate the large deformation and variation of anatomical structures caused by the tumor’s mass effect, we parametrize an example of the function \(f_\theta \) with the conditional deep Laplacian pyramid image registration network (cLapIRN) [24], which is capable of large deformation and rapid hyperparameter tuning for the smoothness regularization in a wide range of applications [12]. Despite the multi-resolution optimization strategy used in the cLapIRN, vanilla cLapIRN is incapable of accurately registering images with absent correspondence, i.e., missing correspondence caused by the tumor resection and recurrence, edema and cavity. Therefore, instead of measuring the similarity of B and F for every voxel \(x \in \varOmega \), our method estimates the regions with absent correspondence in both B and F domains using the bidirectional displacement fields and the forward-backward consistency constraint, and only measures the similarity on regions with valid correspondence during optimization.

2.2 Forward-Backward Consistency Constraint

Conventionally, regions with absent correspondence can be detected by comparing the appearance or image intensities of the warped scan to the target scan or an atlas [18, 27]. However, corresponding regions in the pre-operative and post-recurrence scans may have different intensity profiles, which make their approaches less robust in practice. Therefore, we depart from approaches with spatial prior and extend the forward-backward consistency [19, 21, 28, 29] instead. We design a forward-backward consistency constraint to locate regions with absent correspondence in the baseline and follow-up scans. The forward-backward (inverse consistency) error \(\delta _{bf}\) from B to F is defined as:

$$\begin{aligned} \delta _{bf}(x) = |\boldsymbol{u}_{bf}(x) + \boldsymbol{u}_{fb}(x+\boldsymbol{u}_{bf}(x))|_2. \end{aligned}$$
(1)

We estimate the regions with absent correspondence by checking the consistency of the forward and backward displacement fields. For any voxel x, if there is a significant violation of inverse consistency in x, i.e., \(\delta _{bf}(x)>\tau _{bf}\), the voxel x is either without valid correspondence or the displacement field is not accurately estimated. \(\tau _{bf}\) is the pre-defined threshold and is defined as follows:

$$\begin{aligned} \tau _{bf} = \sum _{x\in \{x | F(x)>0\}} \frac{1}{N_f} \big (|\boldsymbol{u}_{bf}(x) + \boldsymbol{u}_{fb}(x+\boldsymbol{u}_{bf}(x))|_2\big ) + \alpha , \end{aligned}$$
(2)

where the first term grants a tolerance interval that allows estimation errors to increase with the overall complexity of the registration and \(\alpha \) is a constant. Then, we create a binary mask \(\boldsymbol{m}_{bf}\) to mark voxels with absent correspondence as follows:

$$\begin{aligned} \boldsymbol{m}_{bf}(x)= {\left\{ \begin{array}{ll} 1,&{} \text {if } (\boldsymbol{A} \star \delta _{bf})(x) \ge \tau _{bf}\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(3)

where \(\boldsymbol{A}\) denotes an averaging filter of size \((2p+1)^3\) and \(\star \) denotes a convolution operator with zero-padding p. Since the estimated registration fields will fluctuate during learning, we apply an averaging filter to the estimated forward-backward error to stabilize the estimation of the binary mask as well as to alleviate the effect of outliers to the mask estimation. For the mask \(\boldsymbol{m}_{fb}\) in the backward to forward direction, we can define it in a symmetric way with \(\boldsymbol{u}_{fb}\) and \(\boldsymbol{u}_{bf}\) exchanged. We set \(\alpha =0.015\) and \(p=4\) in all our experiments. The values of \(\alpha \) and p are determined by measuring the forward-backward error of the pathological regions from a vanilla cLapIRN model.

2.3 Inverse Consistency

Since the decision of regions with absent correspondence is highly dependent on the inverse consistency error in our method, we further enforce the inverse consistency on the regions with valid correspondence. Mathematically, the inverse consistency loss \(\mathcal {L}_{\text {inv}}\) is defined as:

$$\begin{aligned} \mathcal {L}_{\text {inv}}=\sum _{x\in \varOmega } (\delta _{bf}(x)(1-\boldsymbol{m}_{bf}(x)) + \delta _{fb}(x)(1-\boldsymbol{m}_{fb}(x))), \end{aligned}$$
(4)

where the measure of inverse consistency error \(\delta \) is masked with the regions with valid correspondence \((1-\boldsymbol{m})\) via elementwise multiplication.

Fig. 2.
figure 2

Example axial T1ce MR slices of resulting warped images (B to F) from the baseline methods and our proposed method. Registration artefacts are highlighted with yellow arrows. The forward-backward errors (\(\delta _{fb}\) and \(\delta _{bf}\)) of our method are shown next to our result. The estimated regions with absent correspondence from our method are overlaid with the baseline and follow-up scans (in red). (Color figure online)

2.4 Objective Function

Given the deformation fields \(\phi _{bf} = Id + \boldsymbol{u}_{bf}\) and \(\phi _{fb} = Id + \boldsymbol{u}_{fb}\), where Id is the identity transform. The objective of our proposed method is to compute the optimal deformation fields that minimize the dissimilarity measure of \(B(\phi _{bf})\) and F as well as B and \(F(\phi _{fb})\) in regions with valid correspondence. Specifically, we adopt the negative local cross-correlation (NCC) with masks to exclude the similarity measure of regions without valid correspondence as shown in Eq. 5.

$$\begin{aligned} \mathcal {L}_{\text {s}}=-\text {NCC}(F, B(\phi _{bf}), (1-\boldsymbol{m}_{bf})) -\text {NCC}(B, F(\phi _{fb}), (1-\boldsymbol{m}_{fb})). \end{aligned}$$
(5)

To encourage smooth solution and penalize implausible solutions, we adopt a diffusion regularizer:

$$\begin{aligned} \mathcal {L}_{\text {r}}=||\nabla \boldsymbol{u}_{bf}||^2_2+||\nabla \boldsymbol{u}_{fb}||^2_2. \end{aligned}$$
(6)

Hence, the complete loss function is therefore:

$$\begin{aligned} \mathcal {L} = (1-\lambda _{reg})\mathcal {L}_{\text {s}} + \lambda _{reg}\mathcal {L}_{\text {r}} + \lambda _{inv}\mathcal {L}_{\text {inv}} + \frac{\lambda _{m}}{N}(|\boldsymbol{m}_{bf}|_1+|\boldsymbol{m}_{fb}|_1), \end{aligned}$$
(7)

where \(\lambda _{reg}\), \(\lambda _{inv}\) and \(\lambda _{m}\) are the hyperparameters to balance the loss functions. N denotes the number of voxels in the mutual spatial domain \(\varOmega \) and the last term is to avoid the trivial solution where all voxels are marked in \(\boldsymbol{m}_{bf}\) and \(\boldsymbol{m}_{fb}\). During training, we follow the conditional registration framework in [24] to sample \(\lambda _{reg} \in [0,1]\) and set \(\lambda _{reg} = 0.3\) in the inference phase. Formally, the optimal learning parameters \(\theta ^*\) is estimated by minimizing the complete loss \(\mathcal {L}\) function using a training dataset D, as follows:

$$\begin{aligned} \begin{aligned} \theta ^* =&\mathop {\mathrm {arg\,min}}\limits _{\theta } \Big [ \mathbb {E}_{(B,F) \in D} \; \mathcal {L}\big (B, F, \boldsymbol{u}_{bf}, \boldsymbol{u}_{fb}, \boldsymbol{m}_{bf}, \boldsymbol{m}_{fb}\big ) \Big ]. \end{aligned} \end{aligned}$$
(8)
Fig. 3.
figure 3

Boxplots illustrate that the average target registration error (TRE) near tumor (left) and far from tumor (right). The mean (\(\mu \)) and standard deviation (\(\sigma \)) are shown next to the 75\(^{th}\) percentile of each box.

3 Experiments

Data and Pre-processing. We evaluate our method on the brain tumor MR registration task using the 3D clinical dataset from the BraTS-Reg challenge [1], which consists of 160 pairs of pre-operative and follow-up brain MR scans of glioma patients taken from different timepoint. Each timepoint contains native T1, contrast-enhanced T1-weighted (T1ce), T2-weighted and FLAIR MRI. 140 pairs of scans are associated with 6 to 50 manual landmarks in both scans and 20 scans with landmarks in the follow-up scan only. All scans have carried out standard processing, including skull stripping, affine spatial normalization and resampled to the \(1 \ \text {mm}^3\) isotropic resolution. We use the DeepMedic [14] to segment the tumor core in each pre-operative scan. The tumor segmentation map is used in cost function masking for baseline methods. For learning-based methods, we further resample the scans to size of \(160\,\times \,160\,\times \,80\) with \(1.5\,\times \,1.5\,\times \,1.94 \ \text {mm}^3\) isotropic resolution in the training phase and upsample the solutions to \(1 \ \text {mm}^3\) isotropic resolution with bilinear interpolation in the evaluation. We perform 5-fold cross-validation and divide the 140 pairs of scans into 5 folds with equal size. In each group, we join 4 folds of data and the additional 20 pairs of scans as training set and validation set, and 1 fold as the test set. Specifically, for each group, we split the dataset into 122, 10, and 28 cases for training, validation and test sets.

Implementation. Our proposed method and the other baseline methods are implemented with PyTorch 1.9 and deployed on the same machine, equipped with an Nvidia Titan RTX GPU and an Intel Core (i7-4790) CPU. We build our method on top of the official implementation of 3-level cLapIRN with default parameters available in [22]. We set \(\lambda _{reg}\), \(\lambda _{inv}\) and \(\lambda _{m}\) to 0.3, 0.5 and 0.01, respectively. We use Adam optimizer with a fixed learning rate 0.0001. All learning-based methods are trained from scratch.

Measurement. We register each pre-operative scan to the corresponding follow-up scan of the same patient, propagate the landmarks of the follow-up scan using the resulting deformation field and measure the mean target registration error (TRE) of the paired landmarks with Euclidean distance in millimetres. We divide the landmarks into two sets: 1) landmarks within 30mm from the tumor region (Near tumor), and 2) landmarks outside the 30mm tumor region (Far from tumor), using tumor segmentation maps and morphological dilation. We further measure the robustness of the registration. We follow [1] to define the robustness for a pair of scans as the relative number of successfully registered landmarks, i.e., 1 if the average distance of all the landmarks in the target and warped images is reduced after registration and 0 means none of the distances is reduced. As the local deformation at voxel p is invertible if and only if the Jacobian determinant of p (\(|J_\phi |(p)\)) is larger than zero, we also measure the number of percentage of the voxels with Jacobian determinant smaller or equal to 0 (denoted as \(\%|J_\phi |_{\le 0}\)). We also measure the elapsed time in seconds for computations of each case in the inference phase (\(\text {T}_\text {test}\)).

Baseline Methods. We compare our method (denotes as DIRAC) with a conventional approach (denoted as Elastix [16]) and two cutting edge DLDR methods (denoted as VM [2] and cLapIRN [24]). For Elastix, we use the official implementation in the SimpleElastix library [20], which includes a 3-level iterative optimization scheme. For VM and cLapIRN, we use their official implementations with the best parameters reported in their papers. We also report the results of methods with cost function masking using the tumor core segmentation map for each method (denoted with postfix -CM). Note that the cost function masking strategy in learning-based methods is defined as excluding the similarity measure of the tumor region during the training phase, and the tumor segmentation is hidden during the inference phase, as opposed to conventional methods. All DLDR methods are trained from scratch with T1ce MR scans as input, except for our variant (denoted as DIRAC-D), which employs both the T1ce and T2-weighted scans of each case as input.

Table 1. Quantitative results of the pre-operative and follow-up brain MR registration. Results are provided as mean ± (standard deviation) Initial: spatial normalization. Runtime result highlighted with a asterisk (\(^*\)) denotes runtime with CPU only. To our knowledge, Elastix does not have a GPU implementation. \(\uparrow \): higher is better, and \(\downarrow \): lower is better.

Results and Discussions. Figure 3 illustrates the box-and-whisker plots of average TRE of registered landmarks based on landmarks inside the 30 mm tumor boundary (Group 1) in the left graph as well as the one for the remaining landmarks in the right graph across the 140 subjects (Group 2). Among deformable image registration methods with single MR modality as input, our method DIRAC has the lowest mean registration error of 3.31 and 1.91 mm in groups 1 and 2, respectively, which improves the registration error of our baseline method cLapIRN significantly by 0.42 mm (−11%) and 0.17 mm (−8%) in groups 1 and 2, respectively. Among the alternative methods, methods with cost function masking (-CM) show significant improvement over their baseline method in group 1 and the improvement gain in group 2 is less significant, suggesting that implicitly or explicitly enforcing the smooth deformations inside the masked tumor regions is effective to the registration near the tumor regions. Table 1 shows a comprehensive summary of the registration error, robustness, local invertibility and runtime results across the 140 subjects. As opposed to the alternative methods using cost function masking, our methods (DIRAC and DIRAC-D) have achieved the best overall results in a fully unsupervised manner without sacrificing the runtime advantage of learning-based methods. Comparing the results of DIRAC and DIRAC-D, our variant DIRAC-D, which leverages additional MR modality, slightly improves the registration error by 1.5% and 2.6% in groups 1 and 2, respectively. Figure 2 shows qualitative examples of the registration results for each method and the estimated regions with absent correspondence by our method. The results demonstrate our method is capable of accurately locating the regions without valid correspondence, i.e., the tumor and cerebral edema in the baseline scan of subject 2, and explicitly excluding these regions in similarity measure during the training phase further reduces artefacts in the patient-specific registration.

4 Conclusion

We have proposed a unsupervised deformable registration method for the pre-operative and post-recurrence brain MR registration, which capable of joint registration and segmentation of regions with absent correspondence. We introduce a novel forward-backward consistency constraint and a pathological-aware symmetric loss function. Compared to existing deep learning-based methods, our method addresses the absent correspondence issue in patient-specific registration and shows significant improvement in registration accuracy near the tumor regions. Compared to conventional methods, our method inherits the runtime advantage from deep learning-based approaches and does not require any manual interaction or supervision, demonstrating immense potential in the fully-automated patient-specific registration.