Introduction

Every year, an estimated 3.2 million epidurals [9] and 190,000 facet joint injections [6] are performed for back pain in North America with commensurate numbers elsewhere. Epidural and spinal anesthesia are used in many different types of surgery and shown to have benefits over general anesthesia [23]. The purpose of such regional spine anesthesia and analgesia procedures is to help alleviate pain in a desired region. Specifically, epidural steroids are commonly used for treatment of sciatica [9], while intradural spinal anesthesia is commonly used in surgery on legs [23]. Epidural injections are also used by more than 50% of women during labor and delivery. Facet denervation, on the other hand, is an effective treatment for acute and chronic back pain, a problem experienced at least once by approximately 80% of the adult population [24]. These lumbar spine procedures are particularly challenging due to the proximity of the targets to nerve tissue, the deep location of the targets, small and narrow size of the channel between the articular processes of the joint and oblique entry angles [6, 9]. For example, in traditional epidural and spinal anesthesia, needle insertion is performed blindly using manual palpation and the loss-of-resistance technique [15]. Inaccurate needle placement can cause complications such as accidental dural puncture (2.5% of all cases [1], and 3–5% in procedures performed by less experienced operators [19]) leading to post-puncture headache (86%) [27]. These procedures are more challenging in patients that are obese, or have abnormally curved spines (scoliotic) or previous back surgery [33]. Therefore, careful needle placement is critical to ensure safe and effective therapy [33]. The current standard-of-care for guiding facet joint injections is fluoroscopy, which provides real-time contrast images of the anatomy [28]. However, this imaging modality exposes both the patient and the surgeon to ionizing radiation. Moreover, fluoroscopy-guided procedures require specialized facilities, which can be expensive [33].

Ultrasound (US)-based guidance is an alternative approach for spinal anesthesia and analgesia. This modality is non-ionizing, accessible and inexpensive [33]. The feasibility of US guidance has been shown by many studies  [11, 16, 18, 30, 33]. Nevertheless, US-based methods are still not widely used in clinics due to the difficulties in interpreting spinal sonographic images. To improve the interpretation of US images, researchers have proposed to fuse them with: (i) anatomical information from preoperative computed tomography (CT) or magnetic resonance imaging (MRI); and (ii) statistical anatomical models.

Needle guidance using fusion of intraprocedure US with easy-to-interpret preprocedure CT data has been attempted before [10, 17, 31]. Prior works on fusion mostly focus on the vertebrae as single solid objects or as individual bones [7, 25, 32]. Biomechancial models for multi-vertebrae registration have also been proposed [12]. Nevertheless, CT is ionizing and harmful, making it an undesirable modality for guidance. MRI is the investigation of choice for the clinical diagnosis of most spinal disorders, and hence, also often the and available preoperative modality. Visibility of soft tissue and nerves in MR imaging makes it a good candidate for guiding lumbar spine procedures. Bø et al. [4] attempted US-MRI augmentation for spine surgery by rigidly registering the bone responses in US to the pre-segmented MR images and manually refining the results. In general, the issue with multimodal registration methods is their requirement of an accurate segmentation of the vertebrae in the preprocedure data. Segmentation of the vertebrae is challenging specially around anesthesia targets and lacks reliability and accuracy [25].

Fig. 1
figure 1

Example snapshots of preprocedure spinal MR (a) and panoramic intraprocedure US (b). The two modalities have complementary features

An alternative approach for augmenting spinal US images is to fuse them with statistical anatomical models. A number of works, including several from our group [8, 14, 21, 22], have developed statistical models of the spine. Boisvert et al. [5] constructed a statistical pose model for registering to radiographic images. Khallaghi et al. [14] focused on creating statistical shape models for each vertebra. Rasoulian et al. introduced a multi-body shape \(+\) pose (s \(+\) p) statistical model for the lumbar spine for segmenting CT images [21]. Using this model, s \(+\) p registration techniques were proposed for 2D magnetically tracked [22] and 3D tracker-less US [8]. Though promising, model-based techniques still face the problem of interpreting spinal US. As a result, it is difficult for clinicians to decide whether or not the model-based registration results are reliable for guidance. Furthermore, due to the presence of noise and the sparsity of US images in terms of including anatomical information, registration errors obtained using these methods vary in magnitude and location.

Recently, we proposed a framework for joint registration of a statistical s \(+\) p model of the spine to preoperative CT and intraoperative US [3]. This method took advantage of the complementary features in CT and US. The underlying assumption was that for a given subject, the shape of the lumbar spine was the same in both modalities, whereas the pose changed. The joint method allowed for simultaneous visualization of the model on both modalities which, in turn, made it possible to locally assess the registration results. Generally, in the s \(+\) p model, the relative positions of the vertebrae, as well as their sizes, are both encoded in the pose parameter. As a result, the s \(+\) p model does not guarantee that scale (i.e., vertebrae sizes) would be identical in both CT and US after pose optimization. Furthermore, scale estimation would also affect the pose parameters. This limitation would be specifically critical to register data of the same patient between the supine position (CT and MRI) and sitting position (US) that is preferred for epidurals and spinal injections.

Given the significant potential that joint registration offers for clinical utility of US guidance, and that MR is the preferred modality for spine imaging, we propose—for the first time—to augment US images with anatomical data from MR through a joint registration framework. Fusion of US and MR images of the spine is challenging, as these modalities depict complimentary information of the anatomy. US images of the spine provide useful information regarding posterior bone surfaces, such as laminae (LAs) and spinous processes (SPs) (Fig. 1a). MR images, on the other hand, offer a clearer depiction of more anterior structures such as vertebral bodies (VBs), while LAs are unclear and blurry due to large slice thickness of clinical MR (Fig. 1b). The joint model-based method benefits from the prior statistical information of the statistical model, as well as patient-specific information in both modalities. Moreover, to address the problem of dependency of scale and pose, we derive a new shape \(+\) pose \(+\) scale (s \(+\) p \(+\) s) model. The s \(+\) p \(+\) s model decouples the scale parameter from the pose, allowing the pose and scale to be estimated individually. Using the s \(+\) p \(+\) s model in the joint registration framework formulation guarantees identical shape and scale of the vertebrae for both MR and US modalities. Moreover, the s \(+\) p \(+\) s formulation can also lead to more accurate pose estimation as the pose is no longer constrained by the scale.

Material and methods

Statistical s \(+\) p \(+\) s Model

Previously [3, 8, 21, 22], we demonstrated that a statistical s \(+\) p model could be used for US augmentation. A shortcoming of this model, however, is that the relative position of the vertebrae and the scale, i.e., size of the vertebrae are both coded in one parameter, namely the pose. In order to capture pose and scale variations more accurately, we propose to create an s \(+\) p \(+\) s model which removes the dependence of pose and scale on one another. In addition to allowing for a less constrained pose fitting, separating scale from pose makes it possible for the scale parameter to be jointly estimated in fusion of two or more modalities. Joint computation of scale guarantees equally-sized vertebrae in patient-specific models in all modalities used in the fusion.

In the s \(+\) p model, pose can be formulated as the matrix \(T_{n,l}\), which rigidly transforms the n-th point on the model to the l-th point on the target,

$$\begin{aligned} T_{n,l} = \begin{bmatrix} k_{n,l} R_{n,l}&\quad t_{n,l}\\ 0&\quad 1 \end{bmatrix}, \end{aligned}$$
(1)

where \(k_{n,l}\), \(R_{n,l}\) and \(t_{n,l}\) are the scaling, rotation and translation components of the rigid transformation, respectively. In order to decouple scale from the pose transformations, we can rewrite \(T_{n,l}\) as the product of two independent transforms \(T_{n,l}=T_{n,l}^1 \times T_{n,l}^2\). Here, \(T_{n,l}^1\) is the scaling transform and \(T_{n,l}^2\) indicates the rotation and translation transforms. That is,

$$\begin{aligned} T_{n,l}^1= & {} \begin{bmatrix} k_{n,l}&\quad 0&\quad 0&\quad 0\\ 0&\quad k_{n,l}&\quad 0&\quad 0\\ 0&\quad 0&\quad k_{n,l}&\quad 0\\ 0&\quad 0&\quad 0&\quad 1 \end{bmatrix}, \end{aligned}$$
(2)
$$\begin{aligned} T_{n,l}^2= & {} \begin{bmatrix} R_{n,l}&\quad t_{n,l}\\ 0&\quad 1 \end{bmatrix}. \end{aligned}$$
(3)

Hence, an instance of the s \(+\) p \(+\) s model can be created as follows:

$$\begin{aligned} {\Phi }(\theta ^{s},\theta ^{k},\theta ^{p}) = {\Phi }^p_l\big ({\Phi }^s_k({\Phi }^s_l (\theta ^{s});\theta ^{k}) ; \theta ^{p} \big ), \end{aligned}$$
(4)

where, \((\theta ^{s}\), \(\theta ^{p}\), \(\theta ^{k})\) are the shape, pose, and scale coefficients, respectively, and \(\mathrm {\Phi }^k_l(.;\theta ^{k})\) is the scaling transformation with coefficients \(\theta ^{k}\). Figure 2 shows the first modes of variation for the s \(+\) p (Fig. 2a) and shape+pose+scale (s \(+\) p \(+\) s) model (Fig. 2b). Variations of the transformations of the model between \(-3\sqrt{\lambda }\) to \(+3\sqrt{\lambda }\) are depicted. \(\lambda \)’s are the eigenvalues calculated for shape, pose and scale eigenvectors obtained at training. It can be seen that unlike the s \(+\) p model, pose variations in the s \(+\) p \(+\) s model do not involve scaling. The shape deformations, however, are identical for both models. We constructed our s \(+\) p \(+\) s model from 32 segmented CT volumes of the lumbar spine [21]. The training patient population had no overlap with those in the proposed joint registration framework in the next section.

Fig. 2
figure 2

First modes of variation for a\(+\) p, and b\(+\) p \(+\) s models. a Pose involves the relative position of the vertebrae and their size. b Pose variations do not include changes in the vertebrae size. Shape variations, mainly seen at SPs, are identical

Joint registration of MR and US

The joint registration problem consists of optimizing for the s \(+\) p \(+\) s model parameters \(\theta ^{s}\), \(\theta ^{p}\), \(\theta ^{k}\), such that the net distance between the model points and two sets of target points are minimized. This is done using an expectation–maximization (EM) approach and the objective function below.

$$\begin{aligned} \mathop {\text {argmin}}\limits _{\theta ^{s},\theta ^{k},\theta ^{p}_\mathrm{md}} Q= & {} \sum _{\mathrm{md}\in \mathrm{MD}}\sum _{l=1}^{L} \sum _{m,n=1}^{M_\mathrm{md},N_{l}} P({\mathbf {t}_n}^l|\mathbf {z}_{m,\mathrm{md}})||\mathbf {z}_{m,\mathrm{md}}\nonumber \\&-\,\Phi \left( \mathbf {t}_{n}^{l},\theta ^{s},\theta ^{k},\theta ^{p}_\mathrm{md}\right) ||^2\nonumber \\&+\, \text {Regularization terms} \end{aligned}$$
(5)

In the expectation step, soft correspondences are assigned between model point \(\mathbf {t}_{n}^{l}\) and target points \(\mathbf {z}_{m,\mathrm{md}}\). This yields to a set of conditional probabilities in \(P({\mathbf {t}_n}^l|\mathbf {z}_{m,\mathrm{md}})\). Here, md denotes the target imaging modality where \(\mathrm{md}\in \mathrm{MD}=\{\mathrm {MR},\,\mathrm {US}\}\) and \(\mathbf {t}_{n}^{l}\) is the n-th point on the model’s l-th object, i.e., vertebra (\(L=5\) lumbar vertebrae). In the maximization step, the correspondences are used to compute the distance between \(\mathbf {z}_{m,\mathrm{md}}\) and the model’s points transformed with \(\mathrm {\Phi }\). \(\mathrm {\Phi }(\mathbf {t}_{n}^{l},\theta ^{s},\theta ^{p}_\mathrm{md},\theta ^{k})\) describes the transformation of model points \(\mathbf {t}_{n}^{l}\) under the common shape \(\theta ^{s}\), common scale \(\theta ^{k}\) and individual modality-specific pose coefficients \(\theta ^{p}_\mathrm{md}\).

Table 1 Description of the MR \(+\) US dataset used in this paper

Optimal model coefficients can be computed by differentiating Q in Eq. (5) with respect to \(\theta ^{s},\theta ^{p}_\mathrm{md},\theta ^{k}\). This yields to:

$$\begin{aligned} \frac{\partial Q}{\partial \theta ^{s}}= & {} \sum _{\mathrm{md}\in \mathrm{MD} }\sum _{l=1}^{L} \sum _{m,n=1}^{M_\mathrm{md},N_{l}} P({\mathbf {t}_n}^l|\mathbf {z}_{m,\mathrm{md}})\Big [{\mathrm {\Phi } (\mathbf {t}_{n}^{l})}^\top \nonumber \\&-\,{\mathbf {z}_{m,\mathrm{md}}}^\top \; \! \frac{\partial \mathrm {\Phi }(\mathbf {t}_{n}^{l})}{\partial \theta _{s}}\Big ]+R^{s}, \end{aligned}$$
(6)
$$\begin{aligned} \frac{\partial Q}{\partial \theta ^{p}_\mathrm{md}}= & {} \sum _{l=1}^{L} \sum _{m,n=1}^{M_\mathrm{md},N_{l}} P({\mathbf {t}_n}^l|\mathbf {z}_{m,\mathrm{md}})\Big [{\mathrm {\Phi } (\mathbf {t}_{n}^{l})}^\top \nonumber \\&-\,{\mathbf {z}_{m,\mathrm{md}}}^\top \; \frac{\partial \mathrm {\Phi }(\mathbf {t}_{n}^{l})}{\partial \theta ^{p}_\mathrm{md}}\Big ]+ R^{p}_\mathrm{md},\qquad \end{aligned}$$
(7)
$$\begin{aligned} \frac{\partial Q}{\partial \theta ^{k}}= & {} \sum _{\mathrm{md}\in \mathrm{MD}}\sum _{l=1}^{L} \sum _{m,n=1}^{M_\mathrm{md},N_{l}} P({t_n}^l|z_{m,\mathrm{md}})\Big [{\mathrm {\Phi }(\mathbf {t}_{n}^{l})}^\top \nonumber \\&-\,{\mathbf {z}_{m,\mathrm{md}}}^\top \; \frac{\partial \mathrm {\Phi }(\mathbf {t}_{n}^{l})}{\partial \theta ^{k}}\Big ]+R^{k}. \end{aligned}$$
(8)

The formulations for the partial derivatives \(\frac{\partial \mathrm {\Phi }(\mathbf {t}_{n}^{l})}{\partial \theta ^{s}}\) and \(\frac{\partial \mathrm {\Phi }(\mathbf {t}_{n}^{l})}{\partial \theta ^{p}}\) are previously reported [20]. The partial derivative associated with the scale parameter \(\frac{\partial \mathrm {\Phi }(\mathbf {t}_{n}^{l})}{\partial \theta ^{k}}\) can be computed similar to that of the pose.

Data

The data used in this study consist of nine volumetric US data and corresponding multi-slice MR scans acquired at Hotel Dieu Hospital, Kingston ON, Canada. Institutional ethics approval was obtained and all subjects provided written consent to participate. A summary of this dataset is provided in Table 1.

The MR data consist of clinical images acquired from nine subjects for diagnosis and treatment purposes. as the outlines of the lumbar bones are more easily detectable in these images and automatic bone detection can be performed more accurately. Additionally, the dark appearance of the dural tube in T1-weighted images makes these images good candidates to be used for registration. The sagittal field-of-view (FOV) of the MR data is limited as the main focus was on imaging the vertebral bodies which are close to the median plane. Hence, transverse processes (TPs), which lie on more lateral planes, are not captured. Absence of TPs in the MR images makes the registration task even more challenging.

The US images were obtained from the same subjects only for research purposes. Subjects were placed in a sitting position, and a sagittal zig–zag scanning method was used [22]. The sitting position limited the axial range of the scans, as the transducer was not moved too far toward the sacrum for patient comfort. As a result, the volumetric US images have a limited axial FOV.

Extraction of target point clouds

The proposed framework involves registration of point clouds on the statistical model to those obtained from the bone surfaces of the two target images. In order to extract target point clouds, each modality is individually preprocessed. For the MR volumes, we correct for intensity bias in the images using a modified fuzzy C-means algorithm [2]. The parameters for this step are set according to Suzani et al. [29]. An anisotropic diffusion filter is then applied to remove speckle; this helps reduce the number of points that will be falsely detected as edges. Canny edge detection is then applied to the processed MR volumes. To further refine the extracted edges and remove false positive points, morphological erosion is performed. The final edge points make up the target point cloud for the bones in the MR image. An example of a sagittal MR slice and the bone point cloud extracted from it are shown in Fig. 3a. The US volume is preprocessed by applying a phase-based bone enhancement technique [13] on individual native 2D slices. This results in a stack of bone probability maps from which a reconstructed 3D probability map can be obtained. The intensity value of each voxel on this map indicates its probability to represent bone surface. Finally, a threshold is applied to the probability map; voxels above this threshold along with their corresponding probability values make up the US point cloud used in the registration. The parameters of this processing step were kept the same for all patient data. Figure 3b illustrates an US image and the result of the bone detection.

Fig. 3
figure 3

Example snapshots of MR and US image and the extracted bone point clouds. a An original MR slice (left) and the bone point cloud obtained by a Canny edge detector (right). b Depicts a slice of a 3D panoramic US (left) and the corresponding bone-enhanced image (right)

Model initialization

Prior to registration, an initialization step is performed. The geometrical initialization involves manual selection of the centers of gravity (COGs) of L3 in the US and MR images. This single-click step is feasible and beneficial as it can be done in just seconds by the clinician and helps prevent convergence to a solution far off from the optimal one [26]. Once the COGs of L3 vertebrae have been located, two instances of the model are created and roughly aligned on the target in the MR and US, accordingly. This step is followed by rigid registration for further correction. US echoes cast from the sacrum or the US shadows caused at the ribs attached to T12 are used as locational cues to determine the vertebral levels, and hence, the position of L3 in the US images. The initial shape and scale coefficients of the model are set to the mean shape and scale of the s \(+\) p \(+\) s model, respectively. The pose coefficients, on the other hand, are initialized based on the modality. The pose of the MR model instance starts off with coefficient mean values. This is because MR images have been acquired in the supine position, similar to CT data used for constructing the statistical model. For the US model instance, however, we use a set of initial pose coefficients that creates a less curved instance of the lumbar vertebrae, which better resembles the lumbar spine in the sitting position.

Fig. 4
figure 4

Example snapshots of the registration results using the s \(+\) p \(+\) s model on MR (a) and US (b) image spaces. The joint registration method (blue) performs better than the model-to-US technique (red). The yellow annotations on US depict the gold standard segmentation obtained by the sonographer

Fig. 5
figure 5

Box plots showing the break-down of TRE values (mm) obtained from different registration methods (model-to-US, joint and model-to-MR) using the two models (s \(+\) p and s \(+\) p \(+\) s) at different a regions, and b vertebrae levels. Significant improvements were observed at AP, SP, TP regions, as well as at vertebrae L1 and L3 using the s \(+\) p \(+\) s model in the joint registration approach

Experiments and validation

Two series of experiments are carried out to study the effectiveness of (i) the joint framework and (ii) the s \(+\) p \(+\) s model. In the first experiment, a comparison is made between the registration results of the proposed joint method, a model-to-MR and a model-to-US approach. Our proposed method computes the shape and scale coefficients jointly from US and MR. On the contrary, the shape and scale coefficients are calculated only from MR in the model-to-MR method, and only from US in the model-to-US one. All three methods assume modality-specific pose coefficients. In the second experiment, for each registration method in (i), the performance of the s \(+\) p \(+\) s model is compared against that of the s \(+\) p model.

In both experiments, the registration accuracy is evaluated for both MR and US as target registration error (TRE). TRE is calculated between the gold standard landmarks and their corresponding points on the registered model surfaces. The MR ground truth consists of landmarks selected on SPs, articular processes (APs), LAs and VBs. The landmark selection was performed manually on sagittal MR slices using MITK segmentation tools. US gold standard landmarks were selected by an expert sonographer on the SPs and LAs on sagittal slices, as well as TPs and APs on sagittal and axial slices. In the lack of a better alternative, the ground truth to target correspondences is assigned based on the nearest neighbors approach.

Results

Figure 4 shows examples of the results of the joint and the model-to-US registration methods in the space of the two modalities. Figure 5 depicts the distribution of TRE values obtained using the model-to-MR, joint and model-to-US methods. A break-down of the TRE results is given for both models, i.e., the s \(+\) p and s \(+\) p \(+\) s, at different vertebral regions (Fig. 5a), as well as at the different levels (Fig. 5b). Statistical significance was assessed using Mann-Whitney U-test on the error distributions. The statistical analysis revealed that compared to model-to-US registration using the s \(+\) p model (light pink), the s \(+\) p \(+\) s model combined with the joint method (dark blue) can lead to a significant improvement of TRE values. Specifically, significant TRE improvements were achieved at AP (\(p=0.038\), mean improvement of 1.5 mm) and SP regions (\(p<0.01\), mean improvement by 3.9 mm). Also, TREs were significantly reduced at L1 (\(p=0.029\), improvement of 2.9 mm in the standard deviation) and L3 (\(p=0.044\), improvement of 1.1 mm in the standard deviation) vertebrae. On average, preprocessing of images for point extraction required 6.5 min with unoptimized MATLAB (Mathworks, Natick, MA) code, while a run-time of 1.6 min was recorded for registration.

Discussion and conclusion

A novel s \(+\) p \(+\) s model of the lumbar spine was presented and used for joint registration of preoperative MR and intraoperative US images to augment US in guiding anesthesia. MR is the clinically preferred imaging modality for the spine. MR-US registration is, however, challenging due to low resolution of the US images, large MR slice thickness and limited FOVs in both modalities. Statistical analysis was carried out to investigate the effect of separating scale from pose in statistical models, i.e., to compare s \(+\) p and s \(+\) p \(+\) s models. Similarly, statistical tests were used to study the advantages of the joint registration method over model-to-MR and model-to-US.

A consistent and statistically significant improvement was seen using the joint registration method. It was also shown that the s \(+\) p \(+\) s model combined with the joint registration framework outperformed all other combinations. Specifically, in APs, SPs and TPs, TRE decreased significantly using joint registration. This suggests that the proposed joint framework is preferable to the US-only approach for facet joint injections, where APs are the most critical regions. Errors at critical regions for epidurals, i.e., LAs, show much smaller changes across the different registration methods and models. Nevertheless, using the s \(+\) p \(+\) s, the joint method still achieves lower TREs compared to the others. The required clinical accuracy for facet joint injections and epidurals is estimated at 5 mm (at APs) and 3 mm (at LA), respectively [26]. Hence, the accuracy of the presented framework is sufficient for facet joint injections. These improvements achieved using the proposed technique are clinically important, as they can increase the confidence of both clinician and patient in the success of the procedure, and reduce the variability of the outcome between experienced and novice clinicians. Nevertheless, further improvement of maximum errors is needed for epidural procedures.

The TRE values obtained for this dataset are larger than those we reported for CT to US registration (CT \(+\) US) [3]. This can be due to several factors. Firstly, accurate and complete bone surface extraction is challenging in MR images. Compared to the CT data, the preprocedure MR images have thicker slices (3.8–4.4 mm). Hence, the best expected accuracy for surface extraction in the sagittal direction can be as high as 2.2 mm, i.e., half the thickness. This is already within the reported range of error in the CT \(+\) US dataset. Furthermore, due to the visibility of different tissues in MR images, facet joints and SPs are not as clearly visible as in CT. This yields to less accurate bone edge detection in MR. Also, the fields of view of both MR and US in this dataset are limited, with the clinical MR images not including the TPs and US images not capturing the full lumbar spine.

Another important source of high TRE, specially in the US space, is the pose of the subjects at image acquisition time. US images in this dataset were obtained in the sitting position, compared to supine acquisition in MR and CT. We used a less curved pose to initialize the model for registration to the US images. Nevertheless, the observed errors at L4 and L5 (the vertebrae affected most in the sitting position) were high. Since our current models are based on CT images only, it is possible that they do not contain the pose deformation involved in the sitting position at all. In addition, the initial poses were manually selected as a straight pose from the possible variations available for each of the s \(+\) p and s \(+\) p \(+\) s models. In order to obtain a more accurate initial pose, a US training dataset can be used to determine optimal pose parameters. These parameters can then be applied as prior knowledge to initialize registration, resulting in higher accuracy and faster convergence. An important observation made from the data is that the variations of the pose of the sacrum are often quite noticeable. Hence, we believe that including the sacrum in the statistical model can also potentially help achieve a better pose convergence.

Finally, assigning correspondences between the registered model and the gold standard is error prone. Currently, we use the closest neighbors approach since manual correspondence assignment is not feasible. In [3], each vertebra of the model was rigidly registered to the corresponding vertebra on the gold standard. We are not able to perform this for our current dataset as full segmentation of the vertebrae is challenging. The possibility of assigning incorrect correspondences between the model and the gold standard is therefore higher.

Various parts of the vertebrae are visible differently in US and MR images. In MR images, the outlines of VBs and LAs are clear, making their edges reliable points for registration. This is unlike SPs, which are harder to detect in thick MR slices. High-frequency details of tissue surrounding the SPs and APs can lead to falsely detected edge points for MR images. Finally, TPs are not visible at all due to the limited sagittal FOV. US images provide strong bone responses at LAs, TPs and SPs, relatively weaker responses at APs, and almost no useful information from VBs. To account for this variable visibility across different regions in each modality, we investigated the effectiveness of using a weighted joint registration technique. In this method, predefined weights are assigned to each instance of the model depending of the visibility of the different regions in the given modality. Each point belonging to a given region (SP, LA, AP, TP and VB) is assigned the weight specific to that region. Intuitively, based on the assumptions about the presence and correctness of information at each region, the weights determine which regions of each modality the objective function should favor. Nonetheless, this weighted joint method did not lead to any statistically significant improvements in TREs.

An important limitation of the current statistical model is its relatively small training set. Hence, it may fail to register to the spine of subjects with shapes or poses unlike those of the training set. Our modeling approach assumes that the relative variations of vertebrae are captured in the training data. As a result, should presence of pathology affect a single vertebra locally, the model may not be able to capture such variations effectively. This problem can be addressed by increasing the number of subjects in the training set. Another possibility is to employ a two-step registration technique where the multi-vertebrae model is initially registered to the data as shown here. The registered model is then used as initialization for registration of smaller models locally to one or more vertebrae of interest. Nevertheless, the advantage of the proposed framework is that, even for difficult-to-register cases, the clinicians will still be able to look at the patient-specific model in the preoperative space and determine the adequacy of registration.

In conclusion, we demonstrated that using complementary information from statistical models, preoperative MR and intraoperative US can contribute to augmentation of spinal US. We also showed that separating scale from pose can lead to improved registration results. Future work involves increasing the numbers of training and test cases, investigating two-step (global and local) registration and faster implementation. The proposed method is generic and can be extended other orthopedic applications such as the wrist and pelvis.