Introduction

Fig. 1
figure 1

Main components for respiratory organ motion prediction based on population model, 3D image before and partial observations from tracking during therapy, see “Methods” section

To enable focused ultrasound (FUS) therapy of the liver during free-breathing, it is important to have accurate information about the position of the structures of interest during therapy. Purely tracking the tumor is not sufficient for FUS, where absorbing and reflecting structures (e.g., bones, gas) can cause thermal injury to neighboring tissue [18, 27, 32]. Observation of the motion for these structures requires real-time 4D image acquisition and processing, which is currently impossible. Hence prior knowledge about the expected motion is required to complement the partial observations acquired during therapy.

Subject-specific [3, 7, 11, 13, 14, 16, 36] and population-based [2, 6, 9, 12, 17, 19, 20, 34] respiratory motion models have been proposed for this purpose [15, 29]. Population models are built by gathering 4D data from a number of subjects. Subsequently, the common structures within different subjects are registered and their motion correlates are learned. Figure 1 illustrates the usage of such a population model. The model is individualized by spatially mapping it to the subject using a 3D image. During therapy, the individualized model predicts from partial motion observations (from tracking structures on an MR or US image slice) the motion of unobserved regions. In contrast, a subject-specific model approach adapts to a subject by using 4D data of this subject. For 4D-CTs only very few breathing cycles are observed to avoid excessive radiation. More information can be gathered with 4D-MRI, allowing the observation of variations in breathing states. Yet long 4D-MRIs (\(>\)20 min), for observing also non-periodic long-term motion phenomenon (drift) [34], are impractical. Hence subject-specific models based on short 4D-MRIs need combining with a population drift model or regularly updated with 3D observations.

Of these various motion models [15, 29], only one model was realistically in vivo validated for abdominal organs during respiration [20] and this was based on US tracking. However such an MR/US approach requires an MR-compatible US device, a lengthy setup and synchronization of the US imaging and FUS transmissions. Motion has previously been tracked on MR slices acquired for MR thermometry [2123, 27, 28] or on other types of MR slices [5, 10]. Yet experiments included mostly periodic, simple motion of phantoms, with the exception of [5, 10, 21, 22]. Few of these studies, which use MR tracking, included temporal predictions [5, 27] and none spatio-temporal prediction. This study evaluates for the first time the in vivo spatio-temporal prediction accuracy of a motion model being driven by tracking structures on the thermometry magnitude MRIs during free-breathing. Furthermore, the sensitivity of the predictions to deformations induced by a FUS probe is analyzed. This is an important step toward translating MRgFUS in the liver into the clinic, which is the aim of the TRANS-FUSIMO project. It builds on the integrated model-based software developed during FUSIMO [26]. The motion model is based on [25, 31]. Initial results of this validation, for a spatio-temporal model built from 12 subjects, were included in [26] without providing much details.

Fig. 2
figure 2

Example images (ad) without and (eh) with dummy FUS probe. a, c, e, g Zoomed in breath-hold slice closest to (b, d, f, h) dynamic slice. (a, b) V3, lateral, (c, d) V3, medial, (e, f) V8, lateral, (g, h) V8, medial

Material

Images for validation study

For the validation study, 14 volunteers were scanned in Dundee. The MR sequence was first optimized on 4 volunteers and then fixed for the remaining 10 volunteers (V1–V10). Five of these were scanned with a dummy FUS probe in place. All volunteers were imaged in supine position with the GE Signa 1.5T HDX Echospeed MR scanner, using a 8-channel cardiac array.

  • 3D breath-hold MRIs were used for mapping the population motion model to the subject’s therapy position. A 3D FIESTA sequence with 48 sagittal slices of 4 mm thickness was used. Its scan parameters were \(\hbox {TE}=1.3\) ms, \(\hbox {TR}=3.3\) ms, flip angle \(80^\circ \), FOV 40 cm and \(256\times 256\) acquisition matrix, see Fig. 2a, c. As the slice acquisition time was 1.75 s, the acquisition was divided into 4 parts of 12 slices to reduce breath-holding time to 21 s.

  • Dynamic two-slice MR sequences were used for motion tracking and for capturing motion observations for evaluation. These were based on echo-planar imaging (EPI), since the phase EPI images are used for MR thermometry during FUS therapy, while the magnitude EPI images provide anatomical information for vessel tracking. The sequence was tuned for good vessel contrast (see Fig. 2b, d) and minimum acquisition time (72 ms per slice). Its scan parameters were TE=23.4 ms, TR=144 ms, flip angle \(50^\circ \), FOV 260 mm and \(128\times 90\) acquisition matrix. The 2 sagittal slice locations were planned on a coronal image of the liver, to ensure a reasonable gap between the two slices (range [33.2,44.1] mm) for avoiding interferences and testing spatial prediction ability. The liver vessel appearance on the selected slices was visually inspected. The sequence iterated between the two slices and each slice location was repeated 300 times, giving a scan time of 46 s. This was repeated 6 times, resulting in a total scan time of 4.6 min. The volunteer was told to breathe normally throughout the scan.

  • Images with dummy FUS probe To study the influence of the FUS probe positioning on the liver motion during respiration, MRIs were obtained for five volunteers with a dummy probe in place using the same MR protocols, see Fig. 2e–h. Initially the dummy probe was too uncomfortable for the volunteers to tolerate. Therefore the probe was fitted with a new membrane, which allowed for better filling and thus provided almost a cushion effect. Better support padding was also utilized. The volunteers were able to tolerate the probe, but still found it particularly uncomfortable.

Images for motion model creation

For the motion model, 4D-MRIs were acquired in Zurich for 16 healthy volunteers. An interleaved sagittal 2D sequence was used, where slices covering the liver were alternated with a navigator slice placed at the center of the right liver lobe [33]. After capturing many breathing cycles, the slices were retrospectively sorted based on the liver position on the navigator to form 3D volumes. MRIs were acquired on an 1.5T Philips Achieva whole body MR system using a balanced steady-state free precession sequence, SENSE factor 1.7 and halfscan (\(\hbox {flip angle}=70^\circ \), \(\hbox {TR}=3.1\) ms, \(\hbox {TE}=1.5\) ms). The images had a spatial resolution of \(1.33\times 1.33\times 4{-}5\,\hbox {mm}^3\) and a temporal resolution of 2.6–2.8 Hz. For the first 12 volunteers, the right liver lobe was imaged and a 4-channel cardiac array coil was used. For the remaining 4 volunteers, the whole liver was imaged with the same MR protocol and a 32-channel cardiac array coil.

The validation images and 25 % of the 4D-MRIs were not used in [2, 19, 20, 33, 34].

Methods

The motion prediction concept is illustrated in Fig. 1. It consists of 6 steps as described next, of which steps S4 to S6 are done repeatedly during therapy.

S1:

Individualization of motion model The off-line-created population motion model [25, 31] (for completeness described in “Appendix”) is individualized by mapping it to the subject’s liver captured on a 3D breath-hold MRI in therapy position, see “Inter-subject correspondences” section.

S2:

Registration of 3D breath-hold to 2D dynamic slices To relate the MR observations to the motion model, spatial correspondence between the 3D FIESTA breath-hold image and the EPI reference slices needs to be established. The 3D images were registered to the slices by using 3D affine transformations and minimizing the difference in normalized gradient magnitude within the liver to cope with differences in image appearance (Fig. 2). This similarity measure was favoured over mutual information to reduce runtime.

S3:

Automatic detection of liver vessels Bright blood vessels, which are used as landmarks, were automatically detected on the reference slices with an optimized algorithm based on their shape, size and brightness. Usually 10–15 landmarks are detected.

S4:

MR tracking The bright blood vessels in the liver were tracked with subpixel resolution on the EPI images using a tracking method inspired by [35]. After the detection of the vessels in step S3, the location of each landmark over time is determined using an autocorrelation algorithm. Due to pulsatile flow in the large arteries, not all vessels can always be tracked reliably. An algorithm detects the reliability of each landmark and reports it by a binary flag (“valid”, “invalid”). The motion of the “invalid” landmarks was set to the mean motion of the other landmarks.

S5:

Temporal prediction To compensate for latency \({\varDelta }\) in the therapy system, motion needs to be predicted for a future time point. This was done by first temporally extrapolating the tracking results (\(\mathbf {s}_t \rightarrow \mathbf {s}_{t+{\varDelta }}\)) and then using \(\mathbf {s}_{t+{\varDelta }}\) as input for the spatial prediction model. We evaluated the temporal prediction methods from [30] (adaptive linear (LIN), second-order (POLY2) prediction, support vector regression (SVR), kernel density estimation (KDE), median of these 4 methods (MED)) for an input sampling time (150 ms) and latency \({\varDelta }\) (75 ms, 150 ms) similar to the EPI image sequence on an independent dataset. The best-performing method was then selected for the motion model validation.

S6:

Spatio-temporal prediction The extrapolated tracking results \(\mathbf {s}_{t+{\varDelta }}\) are then used as partial observations (surrogates) to predict the liver motion \({\varDelta } \mathbf {p}_{t+{\varDelta }}\) via the population liver motion model [25, 31] using Eqs. (3, 4).

Validation strategy

The key for the motion model validation is the acquisition of an interleaved two-slice EPI sequence, such that one slice can be used for MR tracking and the other slice for evaluation of the prediction accuracy.

Liver vessel positions \(\mathbf {t}_{j,0}, j=1,\ldots ,J\) and \(\mathbf {v}_{k,0}, k=1,\ldots ,K\) were automatically detected (during step S3) in the reference image of the tracking and validation EPI sequence, respectively. Tracking of \(\mathbf {t}_{j,0}\) (S4) provided position \(\mathbf {t}_{j,t}\) at time t, and spatio-temporal prediction (S6,7) estimated \(\mathbf {v}_{k,t+{\varDelta }}\).

All vessel locations \(\mathbf {v}_{k,0}\) were annotated on a randomly selected subset of 5 % of all time frames by one observer (C. T.). Unreliable annotations were marked as “invalid”. Landmarks were annotated with subpixel resolution without accessing tracking results and before availability of prediction results. The mean (95 %) intra-observer annotation accuracy, determined by redoing 20 % of the previous annotations, was 0.6 (2.1) mm.

The prediction error per landmark and image frame was quantified by the Euclidean distance \(E_{k,t+{\varDelta }}=|| \mathbf {v}_{k,t+{\varDelta }} - \mathbf {g}_{k,t+{\varDelta }}||\) for “valid” manual annotations \(\mathbf {g}_{k,t+{\varDelta }}\). The error statistics was summarized by first determining the mean and the 95 % of \(E_{k,t+{\varDelta }}\) for all annotated image frames \(t+{\varDelta }\) of vessel k (denoted as \(\bar{E}_{k}\), \(E^{95}_{k}\) resp.) and then by calculating the mean of \(\bar{E}_{k}\) and \(E^{95}_{k}\) over all vessels \(k=1,\ldots ,K\) and volunteers. The process was repeated after reversing the role of the tracking and validation slice.

Fig. 3
figure 3

Example of 3D–2D registration result for (left) lateral and (right) medial EPI slice of V3. a, d Gradient magnitude (GM) of slice from 3D image a before and d after registration. b, e GM of EPI slice within liver region. c, f Overlay of a, b and d, e

Fig. 4
figure 4

Detected vessels (magenta) on EPI MR reference slice. a V3, lateral. b V3, medial. c V8, lateral. d V8, medial

Results

Individualization of motion model

The liver in the breath-hold image was manually segmented, anatomical landmarks were selected, and the liver uniformly meshed to achieve inter-subject correspondences, see “Inter-subject correspondences” section. Some difficulties in following structures across slices were experienced for few images when insufficient breath-hold repeatability caused misalignments.

Registration of 3D breath-hold to 2D dynamic image

Figure 2 shows the closest slice from the 3D FIESTA breath-hold image to the EPI dynamic slice. Large differences in image appearance can be appreciated. The effect of registering the breath-hold image to the dynamic slice when employing a 3D affine transformation can be seen in Fig. 3. An improved alignment of the image feature due to registration can be observed, while some misalignments are still visible. After registration, the reference and tracked vessel positions are spatially transferred to the breath-hold image and hence mapped to the motion model.

Automatic detection of liver vessels

Figure 4 shows example reference slices and the automatically detected vessels. Almost all of these locations are clearly located at vessel cross sections.

MR tracking

Example MR tracking results are shown in Fig. 5. It can be observed that the motion is larger in SI (mean standard deviation (SD): 3.2 mm) than in AP direction (mean SD: 2.0 mm), and that the breathing pattern can be irregular. The motion patterns of the various vessels are highly correlated per slice (\(R>0.94\) lateral and \(R>0.82\) medial) and across slices.

The manual annotations used for the motion model validation were also used for assessing the MR tracking performance. Overall the mean (95 %) motion is reduced from 4.9 mm (14.2 mm) to 1.1 mm (2.4 mm) by MR tracking. Similar tracking performance was achieved for the two EPI slices [lateral: 1.1 mm (2.3 mm), medial: 1.2 mm (2.5 mm)] and for the volunteers with and w/o the dummy FUS probe [1.1 mm (2.4 mm) for both]. More vessels were marked as “invalid” for the medial than the lateral slice (35 vs. 26 %) and for images with dummy FUS probe than without (33 vs. 26 %). The variation in respiratory motion for the volunteers can be seen in Fig. 6a. MR tracking reduced this to a similar mean accuracy (range 0.9–1.4 mm), see Fig. 6b.

Fig. 5
figure 5

MR tracking results showing superior–inferior (SI) and anterior–posterior (AP) position of the automatically detected vessels over time. Black dots mark low confidence results. a V3, lateral, SI. b V3, lateral, AP. c V3, medial, SI. d V3, medial, AP

Fig. 6
figure 6

Boxplots showing distribution of initial motion, MR tracking error, spatio-temporal prediction error for 216-ms latency. The mean values are marked by green stars. a Motion. b MR tracking. c Spatio-temporal prediction

Table 1 Mean RMS error (in mm) of temporal prediction for sampling rate of 150 ms and latency \({\varDelta } \in \{75,150\}\) ms

Temporal prediction

The temporal prediction results for 25 resampled US liver vessel motion traces [30] are listed in Table 1 and compared to assuming that no motion has occurred during latency \({\varDelta }\) (NON). The linear adaptive filter (LIN) performs best for these relatively short latencies. As before, subject-specific optimization of the parameters (over first 10 breathing cycles) is not required, since the median parameters provide a similar performance for the best two methods when applied to the remaining data. Applying the LIN method to the motion traces extracted by MR tracking for a latency of 72 (216) ms resulted in a mean root-mean-square (RMS) error of 0.4 (1.1) mm, while no prediction (NON) results in 0.6 (1.6) mm. Note, LIN could only be applied after sufficient (180) samples were observed.

Table 2 Mean and 95th percentile error (in mm) w.r.t. reference position over time and then averaged for all vessel locations

Spatio-temporal prediction

In accordance with [31], the model parameters were 99 % cumulative PCA energy, \(K=5\) nearest models, history length \(O=300\) and regularization weight \(\eta =\sigma _N^2/K\) where \(\sigma _N=1\) mm. Each model was trained on \(T=2000\) time steps regularly sampled from up to 850 breathing cycles. The latency \({\varDelta }\) was 72 or 216 ms, as two EPI slices were acquired within 144 ms. Prediction performance was assessed with or w/o 3D affine registration, with or w/o temporal prediction and with mean translation or exemplar model.

Table 2 summarizes the prediction accuracy for all sequences for 10 volunteers. Predictions were compared to MR tracking results (311863 samples) or manual annotations (4934 samples). Results are statistically significantly improved with temporal prediction, but similar with and w/o registration. Using the mean translation was on average 4.0 (8.4) % worse for \({\varDelta }\) (72) 216 ms. Runtime states the average time for predicting the liver position excluding MR tracking, while including 3D affine registration. The variation in prediction accuracy across volunteers is shown in Fig. 6. A tendency of increased breathing motion (5.7 vs. 4.6 mm) was observed with FUS dummy probe placement (V6–V10). Motion prediction reduced this difference (2.0 vs. 1.8 mm).

Summary and discussion

This study shows that the liver motion during free-breathing can be predicted with a mean (95 %) accuracy of 1.9 (4.4 mm) for a latency of 216 ms by the population model based on individualization using a 3D breath-hold image and tracking liver vessel motion on 2D MR thermometry images. This performance is similar compared to the other existing realistic in vivo validation, which is based on US tracking (2.4 mm mean 3D accuracy for \({\varDelta }=200\) ms) [20]. They also use a population-based statistical motion model, but partial observations from US tracking, a neural network for temporal prediction, a single PCA model for spatial prediction, simultaneous 4D-MR and US acquisition for evaluation, and do not investigate the influence of abdominal deformations.

Using the observed mean translation for spatial prediction performed similar due to the closeness of the MR tracking and validation slice. However it will not extrapolate well to greater distances (increase of 95 % by 1 mm for the whole liver in leave-one-subject experiments similar to [31]) and cannot make the most of additional observations. The errors of the individual system components are clearly not additive, with mean errors for MR tracking (1.1 mm) and temporal prediction (0.9 mm) already adding up to 2.0 mm. Application of a FUS probe increased on average breathing frequency and magnitude probably due to discomfort and the anterior motion restriction.

The study was limited by not capturing out-of-plane motion, which should be small, and by having no ground truth for the image registration. Acquiring an EPI slice during the same breath-hold would avoid this registration. The misalignments of the breath-holds had likely a small impact as displacement fields are smooth and the EPI slices lied to 50 % in the same acquisition block.

Main aspects which should improve the prediction include tracking vessels on both MR slices (no need for a validation slice), individualization based on 3D motion observations from breath-holds [31] and automatic registration by acquiring the tracking reference slice within in the same breath-hold. In conclusion, a spatio-temporal model of the liver motion during free-breathing driven by MR motion observations was in vivo validated and provided an encouraging mean accuracy close to the initial clinical requirement of 2 mm.