Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Accurate and robust segmentation of spinal and vertebral structures from medical images is a challenging task due to a relatively high degree of anatomical complexity (i.e. vertebrae consisting of the vertebral body, pedicles, laminae and spinous process) and due to the articulation of vertebrae with each other. In addition to the complexity and articulation, the problem also lies in insufficient image spatial resolution, inhomogeneity and low signal-to-noise ratio. Since skeletal structures have high contrast when observed in computed tomography (CT) images, CT is commonly the modality of choice for assessing three-dimensional (3D) skeletal structures, such as the spine and vertebrae.

In recent years, several automated and semi-automated methods focusing on the vertebra segmentation problem have been developed for CT images. Kim and Kim [8] proposed a fully automated method that constructs 3D fences to separate vertebrae from valley-emphasized Gaussian images, and then the region growing algorithm is applied within 3D fences to obtain the final segmentation. Klinder et al. [9] progressively adapted tube-shaped segments to extract the spine curve, performed vertebra detection on curved-planar reformatted images using the generalized Hough transform, identified vertebrae by rigid registration of appearance models to the detected candidates, and obtained the final segmentation by adapting shape-constrained models of the individual vertebrae. Kadoury et al. [6, 7] built an articulated shape manifold from a training database by embedding the data into a low-dimensional sub-space, and applied the Markov random field optimization to infer between the unseen target shape and shape manifold. Lim et al. [10] incorporated local geometrical features using the Willmore flow and prior shape knowledge by kernel density estimation into a level set segmentation framework. Ma and Lu [12] introduced a learning-based bone structure edge detection algorithm and hierarchical coarse-to-fine deformable surface-based segmentation that relied on response maps of a trained edge detector. Rasoulian et al. [14] developed a statistical multi-vertebrae model of shape and pose, and proposed a novel iterative expectation maximization registration technique to align the model to CT images of the spine. Ibragimov et al. [5] presented a segmentation framework, in which a novel landmark-based shape representation of vertebrae was combined with a landmark detection framework based on game theory.

In this paper, we describe an improved framework for vertebra segmentation that is based on the shape-constrained deformable model [9, 15]. Our framework is initialized by the results of a novel vertebra detection and alignment algorithm [4], and the segmentation of each vertebra is then obtained by a mesh deformation technique that moves mesh vertices to their optimal locations while preserving the underlying vertebral shape. The performance of the proposed segmentation framework was evaluated on vertebrae from CT images of the lumbar spine, and the obtained results with the mean error below \(0.5\) mm indicate that accurate segmentation of vertebrae was achieved.

2 Methodology

2.1 Mean Shape Model of the Lumbar Spine

Let set \(\fancyscript{T}\) contain 3D images of the lumbar spine, where each image is assigned a series of binary masks representing reference segmentations of each individual lumbar vertebra from level L1 to L5. To extract a shape model of each vertebra from each image in \(\fancyscript{T}\), the marching cubes algorithm [11] is applied to each corresponding binary mask, resulting in a 3D face-vertex mesh consisting of vertices with triangle connectivity information. The dependency of the number of vertices in each mesh on the size of the image voxel and of the observed vertebra is removed by isotropic remeshing [1]. In order to establish pointwise correspondences among vertices of the same vertebral level, the nonrigid transformation among sets of vertices is recovered using state-of-the-art coherent point drift algorithm [13] that outperforms other methods for point set registration. Finally, the generalized Procrustes alignment [3] is used to remove translation, rotation and scaling from corresponding meshes, yielding the mean shape model of each vertebra, represented by a 3D face-vertex mesh \(\fancyscript{M}=\big \{\fancyscript{V},\fancyscript{F}\big \}\) of \(\left| \fancyscript{V}\right| \) vertices and \(\left| \fancyscript{F}\right| \) faces (i.e. triangles). The mean shape model of the whole lumbar spine, i.e. a chain of mean shape models of individual vertebrae, is further used for spine detection, while the mean shape models of individual vertebrae are used for vertebra detection and segmentation in an unknown 3D image \(I\).

2.2 Vertebra Detection

The detection of vertebrae in an unknown 3D image \(I\) was performed by a novel optimization scheme that is based on interpolation theory [4]. The optimization scheme consists of three steps: spine detection, vertebra detection and vertebra alignment. To detect the spine in image \(I\), the pose of mesh \(\fancyscript{M}\) representing the mean shape model of the lumbar spine (i.e. a chain of meshes representing individual vertebrae from L1 to L5) is optimized against three translations (i.e. coordinates \(x\), \(y\) and \(z\) representing sagittal, coronal and axial anatomical directions, respectively). The global maximum of the resulting interpolation represents the location of the spine in the 3D image, and is further used to initialize the detection of individual vertebrae. To detect individual vertebrae, the pose of mesh \(\fancyscript{M}\), now representing the mean shape model of the observed lumbar vertebra, is optimized against three translations, however, in this case all local maxima of the resulting interpolation are extracted, corresponding to locations of the observed and neighboring vertebrae. The correct location of each vertebra is determined by the optimal path that passes through a set of locations, where each location belongs to a local maxima at a different vertebral level. Finally, a more accurate alignment of the mean shape model of each observed vertebra is performed by optimizing the pose of each model against three translations, one scaling (i.e. factor \(s\)) and three rotations (i.e. angles \(\varphi _x\), \(\varphi _y\) and \(\varphi _z\) about coordinate axes \(x\), \(y\) and \(z\), respectively). The resulting alignment represents the final vertebra detection result. A detailed description of the interpolation-based optimization scheme can be found in [4].

2.3 Vertebra Segmentation

After the interpolation-based alignment [4] of the mean shape model of each lumbar vertebra to the unknown image \(I\), segmentation of each lumbar vertebra is performed by a mesh deformation technique that moves mesh vertices to their optimal locations while preserving the underlying vertebral shape [9, 15]. In this iterative procedure, the following two steps are executed in each iteration: image object detection for mesh face centroids that are represented by the centers of mass for mesh faces \(\fancyscript{F} \in \fancyscript{M}\), followed by reconfiguration of mesh vertices \(\fancyscript{V} \in \fancyscript{M}\). By combining the robust initialization resulting from vertebra detection (Sect. 2.2) with modifications to the mesh deformation technique, we improve the accuracy of the resulting vertebra segmentation.

2.3.1 Object Detection

By displacing each mesh face centroid \(\mathbf {c}_i\); \(i=1,2,\ldots ,\left| \fancyscript{F}\right| \) along its corresponding mesh face normal \(\mathbf {n}(\mathbf {c}_i)\), a new candidate mesh face centroid \(\mathbf {c}_i^{*}\) is found in each \(k\)th iteration:

$$\begin{aligned} \mathbf {c}_i^{*} = \mathbf {c}_i + \delta \, j_i^{*} \, \mathbf {n}(\mathbf {c}_i), \end{aligned}$$
(1)

where \(\delta \) is the length of the unit displacement, and \(j_i^{*}\) is an element from set \(\fancyscript{J}\); \(j_i^{*} \in \fancyscript{J}\). Set \(\fancyscript{J}\) represents the search profile along \(\mathbf {n}(\mathbf {c}_i)\), called the sampling parcel (Fig.1):

$$\begin{aligned} \fancyscript{J} = \Big \{ -j, -j+1, \ldots , j-1, j\Big \}; \quad j = J - k + 1, \end{aligned}$$
(2)

which is of size \(2J+1\) at initial iteration \(k=1\) and \(2(J-K+1)+1\) at final iteration \(k=K\). The element \(j_i^{*}\) that defines the location of \(\mathbf {c}_i^{*}\) is determined by detecting vertebra boundaries:

$$\begin{aligned} j_i^{*} = \underset{j \in \fancyscript{J}}{\arg \max } \Big \{F\big (\mathbf {c}_i, \mathbf {c}_i + \delta \, j \, n(\mathbf {c}_i)\big ) - D \, \delta ^2 \, j^2 \Big \}. \end{aligned}$$
(3)

where \(\mathbf {c}_i'=\mathbf {c}_i+\delta \,j_i\,\mathbf {n}(\mathbf {c}_i)\) is the candidate location for \(\mathbf {c}_i^{*}\) (Eq. 1), and parameter \(D\) controls the tradeoff between the response of the boundary detection operator \(F\) (Eq. 4) and the distance from \(\mathbf {c}_i\) to \(\mathbf {c}_i'\). In comparison to the original approach [9, 15], we propose an improved boundary detection operator \(F\) that is based on image intensity gradients, weighted by an image appearance operator:

$$\begin{aligned} F(\mathbf {c}_i, \mathbf {c}_i') = \frac{g_{\text {max}}\left( g_{\text {max}} + \left||\mathbf {g}_W(\mathbf {c}_i')\right||\right) }{g_{\text {max}}^2 + \left||\mathbf {g}_W(\mathbf {c}_i')\right||^2}\left\langle \mathbf {n}(\mathbf {c}_i),\mathbf {g}_W(\mathbf {c}_i')\right\rangle , \end{aligned}$$
(4)

where \(\left||\cdot \right||\) denotes the vector norm, \(\left\langle \cdot ,\cdot \right\rangle \) denotes the dot product, \(g_{\text {max}}\) is the estimated mean amplitude of intensity gradients at vertebra boundaries that is used to suppresses the weighted gradients, which may occur if the gradient magnitude at the boundary of the object of interest is considerably smaller than of another object in its neighborhood (e.g. pedicle screws), and \(\mathbf {g}_W\) is the image appearance operator at candidate mesh centroid location \(\mathbf {c}_i'\):

$$\begin{aligned} \mathbf {g}_W(\mathbf {c}_i') = \big (1 + C(\mathbf {c}_i')\big ) \, \mathbf {g}(\mathbf {c}_i'), \end{aligned}$$
(5)

where \(\mathbf {g}(\mathbf {c}_i')\) is the intensity gradient at \(\mathbf {c}_i'\) and \(C(\mathbf {c}_i') \in [0,1]\) is the continuous response to the Canny edge operator [2]. By adding additional weights to the image intensity gradients, vertebra boundary points are more likely to be detected. In contrast to the original technique [9, 15], the size of the sampling parcel \(J\) (Eq. 2) is reduced in each iteration \(k\) and the image intensity gradients \(\mathbf {g}\) (Eq. 5) are additionally weighted, both to improve the accuracy of the resulting segmentation.

Fig. 1
figure 1

Each \(i\)th face centroid \(\mathbf {c}_i\) of the 3D face-vertex mesh of the observed vertebra is displaced for \({\delta }j_i\) along the sampling parcel in the direction of its face normal \(\mathbf {n}(\mathbf {c}_i)\). In an iterative framework, where the length of the sampling parcel is gradually reduced, each centroid moves to the location \(\mathbf {c}_i^{*}\) that best corresponds to vertebra boundaries (dashed curve)

2.3.2 Mesh Reconfiguration

Once the new candidate mesh face centroids \(\mathbf {c}_i^{*}\) are detected, mesh \(\fancyscript{M}=\{\fancyscript{V},\fancyscript{F}\}\) is reconfigured in each \(k\)th iteration by minimizing the weighted sum \(E\) of energy terms:

$$\begin{aligned} \underset{\fancyscript{M}}{\min }\big \{E\big \} = \underset{\fancyscript{M}}{\min }\big \{E_{\text {ext}} + \alpha E_{\text {int}}\big \}, \end{aligned}$$
(6)

where \(\alpha \) is the weighting parameter. The external energy \(E_{\text {ext}}\) attracts mesh \(\fancyscript{M}\) to new face centroids \(\mathbf {c}_i^*\), \(i=1,2,\ldots ,\left| \fancyscript{F}\right| \) (Eq. 1), that are located on vertebra boundaries:

$$\begin{aligned} E_{\text {ext}} = \sum _{i=1}^{\left| \fancyscript{F}\right| }w_i^*\left\langle \mathbf {c}_i^{*} - \mathbf {c}_i,\frac{\mathbf {g}_W(\mathbf {c}_i^{*})}{\left||\mathbf {g}_W(\mathbf {c}_i^{*})\right||}\right\rangle ^2 \end{aligned}$$
(7)

where \(\left| \fancyscript{F}\right| \) is the number of mesh faces, \(\mathbf {g}_W\) is the image appearance operator (Eq. 5), and \(w_i\); \(i=1,2,\ldots ,\left| \fancyscript{F}\right| \), are weights that are defined according to the obtained \(j_i^*\) (Eq. 3) to give a greater influence to more promising centroid locations:

$$\begin{aligned} w_i^*= \max \Big \{0, F(\mathbf {c}_i, \mathbf {c}_i^{*}) - D \, \delta ^2 \, {j_i^{*}}^2 \Big \} \end{aligned}$$
(8)

The internal energy \(E_{\text {int}}\) restricts the flexibility of mesh \(\fancyscript{M}\) by penalizing the deviation between deformation vertices \(\fancyscript{V}\) and mean vertices \(\fancyscript{V}^m\):

$$\begin{aligned} E_{\text {int}} = \sum _{i=1}^{\left| \fancyscript{V}\right| }\sum _{j\in \fancyscript{N}_i}\left||\Big (\mathbf {v}_i-\mathbf {v}_j\Big )-\Big (sR\left( \mathbf {v}_i^m-\mathbf {v}_j^m\right) +\mathbf {t}\Big )\right||^2 \end{aligned}$$
(9)

where \(\mathbf {v}_i\) and \(\mathbf {v}_i^m\) are vertices from sets \(\fancyscript{V}\) and \(\fancyscript{V}^m\), respectively, \(\fancyscript{M}^m=\{\fancyscript{V}^m,\fancyscript{F}^m\}\) represents the mean shape model of the observed lumbar vertebra (Sect. 2.1), and \(\fancyscript{N}_i\) is the set of vertices neighboring to \(\mathbf {v}_i\) (or \(\mathbf {v}_i^m\), since the topology is preserved). The scaling factor \(s\), rotation matrix \(R\) and translation vector \(\mathbf {t}\) that align mesh vertices \(\mathbf {v}_i\) to the mean vertices \(\mathbf {v}_i^m\) are determined prior to calculation of Eq. (9) by using Procrustes superimposition [3].

3 Experiments and Results

The performance of the described vertebra segmentation framework was tested on a database of \(10\) CT images of the lumbar spine (i.e. a total of \(50\) lumbar vertebrae, with axial in-plane pixel size of 0.3–0.8 mm and cross-sectional thickness of 0.7–1.5 mm) by applying a leave-one-out evaluation scheme. A reference segmentation binary mask was available for each vertebra in the database. The framework was implemented in Matlab, and executed on a personal computer with Intel Core i5 processor at \(3.2\) GHz and \(16\) GB of memory without a graphics processing unit.

3.1 Experimental Details

The mean shape model of the lumbar spine (Sect. 2.1) was obtained by applying the marching cubes algorithm [11] to binary masks representing reference segmentations of each vertebra, resulting in 3D face-vertex meshes \(\fancyscript{M}=\big \{\fancyscript{V},\fancyscript{F}\big \}\) of genus \(1\) (i.e. the number of holes is \(1\), as expected for lumbar vertebrae). The corresponding number of vertices \(\left| \fancyscript{V}\right| =\) 31.542–161.790 (the number of faces was \(\left| \fancyscript{F}\right| = 2\left| \fancyscript{V}\right| \)) was further reduced to \(\left| \fancyscript{V}\right| =\) 3.228–5.642 by isotropic remeshing with mean edge length of \(2.25\) mm [1]. After establishing correspondences among meshes of the same vertebral level by the coherent point drift algorithm [13] and applying the generalized Procrustes alignment [3], the mean shape model of each lumbar vertebra was obtained. The initialization of vertebra segmentation was obtained from the results of interpolation-based vertebra detection [4] based on spline approximation on an equidistant grid. Segmentation of each vertebra (Sect. 2.3) consisted of \(25\) iterations (from \(k=1\) to \(k=25\)) of object detection and mesh reconfiguration. The corresponding parameters, which were obtained from the original mesh deformation technique [9, 15] and not further tuned to search for a possibly better segmentation performance on the tested database, were set to \(J = 25\) (Eq. 2), \(D = 0.6\) mm\(^{-2}\) (Eqs. 3 and 8), \(\delta = 0.3\) mm (Eqs. 1, 3 and 8), \(g_{\text {max}} = 100\) HU (Eq. 4) and \(\alpha = 33\) (Eq. 6). The minimization of the sum of energy terms (Eq. 6) was performed by the conjugate gradient method.

Table 1 Lumbar vertebra segmentation results in terms of mean symmetric absolute surface distance (MSD), symmetric root-mean-square surface distance (RMSSD), maximal symmetric absolute surface distance (MaxSD) and Dice coefficient (DICE), reported as mean \(\pm \) standard deviation

3.2 Results

The performance of the proposed framework was evaluated by the mean symmetric absolute surface distance (MSD), symmetric root-mean-square surface distance (RMSSD), maximal symmetric absolute surface distance (MaxSD) and Dice coefficient (DICE), computed between the resulting 3D meshes and corresponding reference segmentation binary masks. Detailed results for the segmentation of individual vertebral levels are presented in Table 1 separately for the original mesh deformation framework [9, 15] and for the proposed framework that is based on a robust initialization and additional modifications with the aim to improve the framework performance. The overall vertebral segmentation performance (mean \(\pm \) standard deviation) was \(\text {MSD} = 0.43 \pm 0.14\) mm, \(\text {RMSSD} = 0.83 \pm 0.33\) mm, \(\text {MaxSD} = 7.32 \pm 3.23\) mm and \(\text {DICE} = 93.76 \pm 1.61\,\%\) for the proposed framework, compared to \(\text {MSD} = 0.55 \pm 0.21\) mm, \(\text {RMSSD} = 1.10 \pm 0.47\) mm, \(\text {MaxSD} = 9.65 \pm 4.37\) mm and \(\text {DICE} = 92.19 \pm 2.19\,\%\) of the original framework. The detection of all five lumbar vertebrae (i.e. levels from T1 to T5) took on average around 220 s, while the segmentation of each individual vertebra took on average around 1 min. An example of the resulting segmentation is for a selected CT lumbar spine image shown in Fig. 2.

Fig. 2
figure 2

An example of vertebra segmentation initialization (in blue) and vertebra segmentation results (in yellow) in comparison to reference segmentation (in red) for a selected CT lumbar spine image, shown in selected a mid-sagittal, b mid-coronal and c mid-axial cross-sections (Color figure online)

4 Discussion and Conclusion

In this paper, we combined robust detection of the object of interest [4] with an improved shape-constrained deformable model to segment vertebrae from CT images of the lumbar spine. Vertebra segmentation from 3D spine images has been already addressed in several studies [610, 12, 14]. The best performance in terms of accuracy was reported by Klinder et al. [9], who reported a mean point-to-surface error (i.e. one surface is represented by a set of surface points and the other by a surface mesh model) of \(0.76\) mm by applying the original shape-constrained deformable model technique, but also progressively adapted tube-shaped segments to extract the spine curve, performed vertebra detection on curved-planar reformatted images using the generalized Hough transform, and identified vertebrae by rigid registration of appearance models to the detected candidates. On the other hand, Kadoury et al. [6, 7] reported the highest Dice coefficient, i.e. of 92.5 %, which was obtained by building an articulated shape manifold from a training database and embedding the data into a low-dimensional sub-space, followed by the Markov random field optimization to infer between the unseen target shape and shape manifold. Although the overall results of the proposed method of \(0.43\,{\pm }\,0.14\) mm in terms of MSD and \(93.76\,{\pm }\,1.61\,\%\) in terms of the Dice coefficient can not be directly compared to the results reported by the existing studies because of different evaluation methodologies and data collection techniques, as well as because of different databases, we can conclude that the proposed automated spine and vertebra detection segmentation framework produces accurate results. Moreover, to objectively compare the effects of the performed modifications of the original shape-constrained deformable models, we report vertebra segmentation results also for the original technique obtained on the same database of CT lumbar spine images. From Table 1 it can be observed that an improvement of around \(20\,\%\) in terms of MSD in favor of the proposed framework was achieved. It can be therefore concluded that the performed modifications improved the accuracy of vertebra segmentation, and that when combined with a robust initialization, the proposed framework can accurately segment vertebrae from CT images of the lumbar spine.