Keywords

5.1 Introduction

The field of medical image computing and computer-assisted interventions has been playing an increasingly important role in diagnosis and treatment of spinal diseases during the past 20 years. An accurate segmentation of individual vertebrae from CT images are important for many clinical applications. After segmentation, it is possible to determine the shape and condition of individual vertebrae. Segmentation can also assist early diagnosis, surgical planning, and locating spinal pathologies like degenerative disorders, deformations, trauma, tumors, and fractures. Most computer-assisted diagnosis and planning systems are based on manual segmentation performed by physicians. The disadvantage of manual segmentation is that it is time-consuming and the results are not really reproducible because the image interpretations by humans may vary significantly across interpreters.

Vertebra segmentation is challenging because the overall morphology of the vertebral column. Although the shape of the individual vertebrae changes significantly along the spine, most neighboring vertebrae look very similar and are difficult to distinguish. In recent years, a number of spine segmentation algorithms for CT images have been proposed. The proposed methods range from unsupervised image processing approaches, such as level set [14] and graph cut methods [1], to geometrical model-based methods such as statistical anatomical models or probabilistic atlas-based methods [8, 10, 12, 16, 17] and to more recently machine learning and deep learning-based methods [4, 5, 15].

In this chapter, we proposed a two-stage method which consists of the localization stage and the segmentation stage. The localization stage aims to identify each lumbar vertebra, while the segmentation stage handles the problem of labeling each lumbar vertebra from a given 3D image. Previously, we have developed a method to fully automatically localize landmarks for each lumbar vertebra in CT images with context features and reported a mean localization error of 3.2 mm [6]. In this paper, we focus on the segmentation stage where the detected landmarks in the localization stage are used to initialize our segmentation method.

To this end, we propose to use affinely registered multi-object model-based multi-atlases as shape prior for grid cut segmentation of lumbar vertebrae from a given target CT image. More specifically, our segmentation method consists of two steps: affine atlas-target registration-based label fusion and bone-sheetness assisted multi-label grid cut. The initial segmentation obtained from the first step will be used as the shape prior for the second step.

The chapter is organized as follows. In the next section, we will describe the method. Section 5.3 will present the experimental results, followed by discussions and conclusions in Sect. 5.4.

5.2 Method

Figure 5.1 presents a schematic overview of the complete workflow of our proposed approach. Without loss of generality, we assume that for the lth (l ∈ {1, 2, 3, 4, 5}) lumbar vertebra, there exists a set of N l atlases with manually labeled segmentation and manually extracted landmarks. In the following, details of each step will be presented.

Fig. 5.1
figure 1

The flowchart of our proposed segmentation method. See text for details

5.2.1 Affine Atlas-Target Registration-Based Label Fusion

Given the unseen lumbar spinal CT image, we assume that a set of landmarks have been already detected for each lumbar vertebra. The following steps are conducted separately for each lumbar vertebra.

Using the detected anatomical landmarks, paired-point scaled rigid registration are performed to align all N l atlases of the lth lumbar vertebra to the target image space. We then select N l, sN l atlases with the least paired-point registration errors for the atlas affine registration step as described below.

Each selected atlas consists of a CT volume and a manual segmentation of the corresponding lumbar vertebra. For every selected atlas, we perform a pair-wise atlas-target affine registration using the intensity-based registration toolbox “Elastix” [11]. Using the obtained 3D affine transformation, we can align the associated manual segmentation of the selected atlas to the target image space. Then the probability of labeling a voxel x in the target image space as part of the lth lumbar vertebra is computed with average voting:

$$\displaystyle{ p_{l,x} = \frac{1} {N_{l,s}}\sum \nolimits _{i=1}^{N_{l,s} }A_{i}(x) }$$
(5.1)

where A i(x) ∈ {0, 1} is the label of the ith atlas at voxel x after aligned to the target image space.

A simple thresholding is then conducted to get an initial binary segmentation of the lth lumbar vertebra:

$$\displaystyle{ L_{l}(x) = \left \{\begin{array}{*{20}{c}} \begin{array}{*{20}{c}} 0;&p_{l,x} <T \end{array} \\ \begin{array}{*{20}{c}} 1;&p_{l,x}\geqslant T \end{array} \end{array} \right. }$$
(5.2)

where T is the threshold and is empirically selected as 0.35.

Above steps are conducted for all five lumbar vertebrae.

5.2.2 Bone-Sheetness Assisted Grid Cut

The initial segmentation obtained in the last step is usually not accurate enough as only affine atlas-target registrations are used. To further improve the segmentation accuracy, we proposed to use bone-sheetness assisted multi-label grid cut taking the initial segmentation as the shape prior.

Grid cut is a fast multi-core max-flowmin-cut solver optimized for grid-like graphs [9]. The task of multi-label grid cut is to assign an appropriate label L(x) to every voxel x in the image space Ω of the target image I. In our case, labels L(x) ∈ {0, 1, 2, 3, 4, 5} are employed for the purpose of labeling the target image into six different regions including background region (BK, L(x) = 0) and the five lumbar vertebral regions (for the lth lumbar vertebra L l, L(x) = l). After segmentation, the target image will be partitioned into six sub-image regions, i.e., \(\varOmega =\{ \varOmega _{BK} \cup \varOmega _{l_{1}} \cup \varOmega _{l_{2}} \cup \varOmega _{l_{3}} \cup \varOmega _{l_{4}} \cup \varOmega _{l_{5}}\}\).

Grid cut, similar to graph cut [3], is an energy minimization segmentation framework based on combinatorial graph theory. The typical energy function of a multi-label grid cut E(L) is defined as

$$\displaystyle\begin{array}{rcl} E(L)& =& \sum _{x\in \varOmega }R_{x}(L(x)) \\ & \quad +& \lambda \sum _{(x,y)\in \mathcal{N}}B_{x,y}(L(x),L(y)){}\end{array}$$
(5.3)

where R x(L(x)) is the pixel-wised term which gives the cost of assigning label L(x) ∈ {0, 1, 2, 3, 4, 5} to voxel x, B x, y(L(x), L(y)) is the pair-wised term which gives the cost of assigning labels to voxel x and y in a user-defined neighborhood system \(\mathit{\mathcal{N}}\), and λ adjusts the balance between the pixel-wised term and pair-wised term.

In general, grid cut methods define the energy based on intensity information. However, weak bone boundaries, narrow inter-bone space, and low intensities in the trabecular bone make image intensity alone a relatively poor feature to discriminate adjacent joint structures [13]. This can be addressed by applying image enhancement using sheetness filter to generate a new feature image (sheetness score map) [7]. For each voxel in the target image space Ω, a sheetness score BS is computed from the eigenvalues | λ 1 | ≤ | λ 2 | ≤ | λ 3 | of local Hessian matrix with scale σ as

$$\displaystyle\begin{array}{rcl} BS_{x}(\sigma )& & \!\!=\! \left (\!\exp \left (\!\frac{-R_{\mathrm{sheet}}^{2}} {2\alpha ^{2}} \!\right )\!\right )\left (\!1\! -\!\exp \left (\!\frac{-R_{\mathrm{blob}}^{2}} {2\gamma ^{2}} \!\right )\!\right ) \\ & & \quad \left (1 -\exp \left (\frac{-R_{\mathrm{noise}}^{2}} {2\xi ^{2}} \right )\right ) {}\end{array}$$
(5.4)

where α, γ, ξ are the parameters [7]. \(R_{\mathrm{sheet}} = \frac{\vert \lambda _{2}\vert } {\vert \lambda _{3}\vert }\), \(R_{\mathrm{blob}} = \frac{\vert 2\lambda _{3}-\lambda _{2}-\lambda _{1}\vert } {\vert \lambda _{3}\vert }\), \(R_{\mathrm{noise}} = \sqrt{\lambda _{1 }^{2 } +\lambda _{ 2 }^{2 } +\lambda _{ 3 }^{2}}\).

For every pixel x, we have the computed sheetness score BS x ∈ [0, 1], where larger score associates with higher possibility that this pixel belongs to a bone region. With the computed sheetness score map and the initial segmentation, we define each term of the energy function as described below:

Pixel-wised term

Based on the initial segmentation obtained in the last step, the target image space Ω can be separated into six sub-image regions, i.e., \(\varOmega =\{\varOmega _{ BK}^{{\prime}}\cup \varOmega _{l_{1}}^{{\prime}}\cup \varOmega _{l_{2}}^{{\prime}}\cup \varOmega _{l_{3}}^{{\prime}}\cup \varOmega _{l_{4}}^{{\prime}}\cup \varOmega _{l_{5}}^{{\prime}}\}\), where each sub-image region is obtained from the corresponding initial segmentation. By further employing the computed sheetness score map and the Hounsfield units (HU) of different tissues, the exclusion regions for each structure can be defined as

$$\displaystyle\begin{array}{rcl} \left \{\begin{array}{@{}l@{\quad }l@{}} E_{\neg L_{1}} \quad &\!\!=\{ v\notin \varOmega _{L_{1}}^{{\prime}}\text{ and }\mathcal{I}(x) \leq -50\text{HU}\} \\ E_{\neg L_{2}} \quad &\!\!=\{ v\notin \varOmega _{L_{2}}^{{\prime}}\text{ and }\mathcal{I}(x) \leq -50\text{HU}\} \\ E_{\neg L_{3}} \quad &\!\!=\{ v\notin \varOmega _{L_{3}}^{{\prime}}\text{ and }\mathcal{I}(x) \leq -50\text{HU}\} \\ E_{\neg L_{4}} \quad &\!\!=\{ v\notin \varOmega _{L_{4}}^{{\prime}}\text{ and }\mathcal{I}(x) \leq -50\text{HU}\} \\ E_{\neg L_{5}} \quad &\!\!=\{ v\notin \varOmega _{L_{5}}^{{\prime}}\text{ and }\mathcal{I}(x) \leq -50\text{HU}\} \\ E_{\neg BK}\quad &\!\!=\{ v\notin \varOmega _{BK}^{{\prime}}\text{ and }\mathcal{I}(x) \geq 200\text{HU}\ \! \wedge \!\ BS_{v}\!>\! 0\}\end{array} \right.& &{}\end{array}$$
(5.5)

where −50HU and 200HU are selected following [13]. The R x(L(x)) is then defined as

$$\displaystyle\begin{array}{rcl} R_{x}(L(x)) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1\quad &\text{ if }L(x) = 0\text{ and }x \in E_{\neg BK} \\ 1\quad &\text{ if }L(x) = 1\text{ and }v \in E_{\neg L_{1}} \\ 1\quad &\text{ if }L(x) = 2\text{ and }v \in E_{\neg L_{2}} \\ 1\quad &\text{ if }L(x) = 3\text{ and }v \in E_{\neg L_{3}} \\ 1\quad &\text{ if }L(x) = 4\text{ and }v \in E_{\neg L_{4}} \\ 1\quad &\text{ if }L(x) = 5\text{ and }v \in E_{\neg L_{5}} \\ 0\quad &\text{otherwise} \end{array} \right.& &{}\end{array}$$
(5.6)

Pair-wised term

As the sheetness filter enhances the bone boundaries, we employ the computed sheetness score map to define the pair-wised term:

$$\displaystyle\begin{array}{rcl} & & B_{x,y}(L(x),L(y)) \propto \\ & &\quad \text{exp}\{ -\frac{\vert BS_{x} - BS_{y}\vert } {\sigma _{s}} \} \cdot \delta (L(x),L(y)){}\end{array}$$
(5.7)

where σ s is a constant scaling parameter and

$$\displaystyle\begin{array}{rcl} \delta (L(x),L(y)) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1\quad &\text{ if }L(x)\neq L(y)\\ 0\quad &\text{ otherwise } \end{array} \right.& &{}\end{array}$$
(5.8)

5.3 Experimental Results

We evaluated our method on 21 clinical lumbar spinal CT data with the associated manual segmentation. The size of the data ranges from 512 × 512 × 318 voxels to 512 × 512 × 433 voxels. The voxel spacing of the data ranges from 0. 43 × 0. 43 × 0. 7 mm3 to 0. 29 × 0. 29 × 0. 7 mm3. In this paper, we conducted a leave-one-out (LOO) cross-validation study to evaluate the performance of the present method. More specifically, each time we took 1 out of the 21 CT data as the test data and the remaining 20 CT data as the atlases, and we chose N l, s to be 5. The process was repeated for 21 times, with each CT data used exactly once as the test data. In each time, the segmented results of the test data were compared with the associated ground truth manual segmentation. For each vertebra in a test CT data, we evaluate the average symmetric surface distance (ASSD), the Dice coefficient (DC), precision, and recall.

Table 5.1 presents the segmentation results of the cross-validation study, where the results on each individual vertebra as well as on the entire lumbar region are presented. Our approach achieves a mean DC of 93.9 ± 1.0% and a mean ASSD of 0.41 ± 0.08 mm on the entire lumbar region. In each fold, it took about 12 min to finish segmentation of all five lumbar vertebrae of one test image. Figure 5.2 shows a segmentation example.

Table 5.1 Segmentation results of the leave-one-out cross validation on 21 clinical spinal CT data
Fig. 5.2
figure 2

A lumbar vertebrae segmentation example. Top row: sagittal view. Bottom row: axial view. For both rows, from left to right: the input image, the probability map, the initial segmentation obtained from the affine atlas-target registration-based label fusion, and the final results

Figure 5.3 shows color-coded error distributions of two cases. Our automatic segmentation when applied to the case shown in the top row achieved a mean error of 0.47 mm and a 95 percentile error of 1.25 mm. For the case shown in the bottom row, the achieved mean segmentation error was found to be 0.42 mm and the 95 percentile error was 1.13 mm.

Fig. 5.3
figure 3

Color-coded error distributions of two segmented lumbar vertebral segments in comparison to the corresponding ground truth. In each row, the left column shows the error bar; the middle column shows the segmented lumbar vertebral segment with color-coded error distributions; and the right column shows the ground truth model

5.4 Discussions and Conclusions

Previous atlas-based methods [17] where nonrigid registration between atlases and the target image is required, may not work here considering the weak bone boundaries and narrow inter-bone space of neighboring vertebrae. In this paper, we only need to affinely register atlases with the target image, and the accurate segmentation is then obtained by the bone-sheetness-assisted multi-label grid cut which has additional advantage of automatic separation of the five lumbar vertebrae from each other.

In conclusion, we proposed a method for automatic segmentation of lumbar vertebrae from clinical CT images. The results obtained from the LOO experiment demonstrated the efficacy of the proposed approach.