Abstract
We describe a method to automatically detect and label the vertebrae in human lumbar spine MRI scans. Our contribution is to show that marrying two strong algorithms (the DPM object detector of Felzenszwalb et al. [1], and inference using dynamic programming on chains) together with appropriate modelling, results in a simple, computationally cheap procedure, that achieves state-of-the-art performance. The training of the algorithm is principled, and heuristics are not required. The method is evaluated quantitatively on a dataset of 371 MRI scans, and it is shown that the method copes with pathologies such as scoliosis, joined vertebrae, deformed vertebrae and disks, and imaging artifacts. We also demonstrate that the same method is applicable (without retraining) to CT scans.
Funded by ESPRC.
Financial support was provided by ERC grant VisRec no. 228180.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The task dealt with in this paper is the following: given an MR of the lumbar spine, localize and label all the vertebrae present in that image. The motivation for this work is that spine appearance, shape and geometry measurements are necessary for abnormality detection locally at each disk [2–7] and vertebrae [8, 9] (such as herniation), as well as globally for the whole spine (such as spinal scoliosis).
In more detail, the input 3D image is a (sparsely spaced) stack of 2D sagittal images, and the output consists of labelled tight bounding boxes with labels around all the vertebrae in the image. Each bounding box is specified by its position, orientation, and scale. An example of the detection and labelling for a typical normal scan is shown in Fig. 1.
This detection task is challenging for a number of reasons, including: (1) the repetitive nature of the vertebrae, (2) varying image resolution and imaging protocols; artefacts, and (3) large anatomical and pathological variation, particularly in the lumbar spine. Various examples of challenging cases in our dataset are highlighted in Fig. 2. The anatomy and pathology variation can affect both the local vertebrae / disks appearance (e.g. degraded disks—Fig. 2h), and the global layout of the spine (e.g. scoliosis—Fig. 2c).
Contributions. Our method brings together two strong algorithms—the Deformable Part Model of Felzenszwalb et al. [1] based on Histogram of Oriented Gradients (HOG) image descriptors [10] and efficient inference on graphical models [11, 12]—making the algorithm accurate, robust, and efficient on challenging spine datasets. The algorithm is also tolerant to varying MR acquisition protocols, image resolutions, patient position, and varying slice spacing unlike related solutions in the literature. It localizes all the vertebrae present in a scan, and labels them correctly as long as the sacrum is present in the scan. Importantly, the method is appliccable to standard MRI protocols.
The method has two distinct stages. First, vertebrae candidates are detected by using a sliding window detector searching over position, scale, and angle (Sect. 2.1). Second, a graphical model is fitted to the set of candidate detections to find the optimal spine layout and labelling based on the unary SVM score of the detection for each part, and a spatial cost between each pair of connected parts (Sect. 2.2). The HOG descriptor captures the near rectangular shape of the vertebrae. We detect vertebrae rather than disks since the vertebrae shape is more consistent than the disk shape as the lumbar spine studies are more often aimed at targeting disk deformations, and more suitable to be modelled with HOG. Disk locations can easily be found after detecting vertebrae.
The closest previous work to ours is that of Oktay and Akgul [13]. They detect disks and vertebrae in the lumbar spine using a Pyramid HOG descriptor; however, they only detect six disks and vertebrae with their graphical model, require the existence of both T1 and T2 scans to first detect the spinal cord, and they have a separate HOG template for each vertebrae. In contrast, we demonstrate that just one generic vertebrae detector suffices for all vertebrae, and only require the T2 scan. Furthermore, they only use the mid-sagittal slices, making it only applicable to cases where all the spine parts are in the mid-sagittal slice, whereas we sequentially search for vertebrae in all the 2D images in the 3D stack (not restricted to the mid-sagittal slice).
Ghosh et al. [14] also use HOG features [10], however they do not label the vertebrae and make strong use of heuristics and information from complementary axial scans. They detect disks rather than vertebrae. Zhan et al. [15] present a robust hierarchical algorithm to detect and label arbitrary numbers of vertebrae and disks in nearly arbitrary field of view scans, as long as one of four ‘anchor’ vertebrae (C2, T1, L1, S) are present. They first detect the ‘anchor’ vertebrae, and then other ‘bundle’ vertebrae connected to it graphically. Although the method works very well within its domain, it requires isotropic 2.1 mm resolution scans which limits its applicability severely. Our method is not limited to this domain and, in particular, does not require the high isotropic resolution.
A further extensive body of literature on spine localization and labelling exists. In almost all the papers, the algorithms work in two stages. First, some anatomical parts characteristic of the spine are detected (vertebrae [16–18]/disks [14, 19–21]/both [13, 15]). Second, a spine layout model is fitted to the candidates to determine the best hypothesis for the spine layout. The spatial configuration of the spine parts, and in some cases also their individual characteristics [15, 18, 22], are taken into account to both label the disks and/or vertebra, and localize the spine.
2 Method
We present a method to localize and label vertebrae in lumbar MR images using two HOG-based detectors and a graphical model. First, given a stack of sagittal MR slices, vertebrae and sacrum candidates are detected using latent SVM on HOG in each slice as described in Sect. 2.1. Next, after local non-maxima suppression, the vertebrae candidates corresponding to the spine are picked and labelled by fitting a graphical model, as explained in Sect. 2.2.
2.1 Spine Part Detection
The spine part (vertebrae) detection is implemented using two detectors constituting latents SVMs on Histogram of Oriented Gradients (HOG) descriptors [10] using the Felzenszwalb VOC Challenge object detection framework [23]. We learn one generic 2D detector for vertebrae bodies (VBs), trained on all VBs in all the training images, and another more specific 2D detector for the sacrum, trained on the VBs of the first two links of the sacrum. Both the models are visualized along with a set of training samples in Fig. 3.
Training. Both the generic vertebrae body (VB) detector and the sacrum detector are trained using the Felzenszwalb detection framework [23]. The positive training examples for the VB detector are tight bounding boxes around the vertebral bodies of T10...L5 vertebrae with the bounding box sides parallel to the vertebral facets as shown in Fig. 3a. The positive training examples for the sacrum detector are tight bounding boxes around the first two links of the sacrum, with one side parallel to the posterior side of the sacrum as shown in Fig. 3b. The bounding boxes for both the VB and the sacrum are defined by fitting a minimum bounding rectangle to landmarks on them—four for the VB and eight for the sacrum. Each training sample is extracted from the slice cutting through the middle of the respective vertebral body.
For the VB detector, four HOG templates are trained, each of them of a different aspect ratio. The HOG templates are each 6 cells high, and 6, 7, 8, and 9 cells wide, corresponding to aspect ratios between 1 and 1.5. The HOG cell size for the VB model is \(8 \times 8\) pixels. The HOG template for the sacrum detector is 9 cells high by 5 cells wide, with \(8 \times 8\) pixel HOG cell size. The HOG feature vectors are 31-dimensional, with 18 contrast-sensitive, 9 contrast-insensitive direction bins; and 4 texture feature bins.
The HOG templates capture the rectangular shape of the vertebrae, with variations due to deformation, and the trapezoid shape of the first two links of the sacrum. The vertebrae show wide size and resolution variation and are all scaled and warped to match one of the aspect ratios at training. The model is learned iteratively in several steps, with new positive samples mined by running the detector on the positive samples, collecting the strongest detections as new positives, and training a new detector using the new positives.
The negative samples for the vertebrae detector are first picked randomly from mid-slices with a hand-drawn black polygon covering all the vertebral bodies. Next, an iterative learning procedure is employed to pick hard negatives as false positive detections on the negative training images as detailed in [23].
Testing. During the candidate detection step at test time, a previously unseen sagittal scan is taken as input, and tight bounding boxes around vertebrae candidates are returned as output. The candidate search is performed sequentially in all slices of the scan. The VB and sacrum detector are run on each slice of the scan, searching over position, scale, and angle, with the scan rotated by \(-20^{\circ }\) to \(20 ^{\circ }\) in \(10^{\circ }\) increments. A feature pyramid is calculated for each angle, with HOG cells placed densely next to each other. The feature pyramid has 10 levels per doubling of resolution (10 levels per octave), with the image resized and resampled to 2\(\times \) the original size to 0.5\(\times \) the original size from the finest to coarsest scale. All the detections at all positions, scales, orientations are collected and transformed onto the original test image coordinate system as shown in Fig. 4.
A greedy non-maxima suppression algorithm is employed to remove most of the false positive detections in each slice as follows. First, the top-scoring bounding box is retained, and all bounding boxes overlapping it more than 50 % are discarded. Next, the second-highest scoring remaining bounding box is retained, and the discarding and retention process continues until all the remaining bounding boxes have at most 50 % overlap.
Next, the remaining bounding boxes from all the slices are collected and projected onto a single slice, and the non-maxima suppression process is repeated to end up with bounding boxes across all the slices that have at most 50 % overlap. These bounding boxes are next passed as input to the Graphical Model as described in Sect. 2.2 in order to eliminate any remaining false positives, and to label the vertebrae.
2.2 Graphical Model for Spine Layout
We train a parts-based graphical model [12] connecting the vertebrae in a chain. The graphical model takes as input the detections after non-maxima suppression described in the previous subsection, and gives as output the placement and labels of all vertebrae in the image. The method deals with multiple slices by ignoring the slice index in inference. The spine layout is given as a configuration \(L = (l_1,l_2,\ldots ,l_{n-1},l_{n})\) where \(l_i\) are the vertebra locations, with \(l_1\) the C1 and \(l_n=l_{25}\) the sacrum. The optimal configuration \(L^*\) of the graphical model is
where \(l_i\) and \(l_j\) denote the vertebrae locations \(l = (x_i,y_i,height_i,width_i,\theta _i)\) given by their location \((x,y)\), size \((height, width)\), and orientation \(\theta _i\). The best model fit minimizes the sum of the unary appearance mismatch terms \(m_i\) from the part detectors output and the spatial deformation cost \(d_{ij}\) for connected pairs \(ij\) of parts, laid at \(l_i\) and \(l_j\) respectively. The last appearance term value \(m_{25}\) comes from the sacrum detector, and the rest of the appearance term values come from the universal vertebra detector. The spatial deformation cost is a sum of box functions on \(x\) and \(y\) coordinates, the ratio of vertebrae areas \(A_i\) & \(A_j\), and the angle between the vertebrae:
Here \(S\), \(T\), \(U\), and \(V\) are the box functions on area \(A\), position x & y, and angle \(\theta \) that take a low constant value if their arguments are within favourable range of each other and a higher constant value if their arguments are outside that range.
To speed up the fitting process, a Viterbi message passing scheme from [12] for fast inference in \(O(nh^2)\) time is employed where \(n\) is the number of parts and \(h\) the number of candidates per part. Typically, there are around \(h=100\) candidate positions per part, and the full inference takes around 0.1 s per MR volume. The full detection process from input to output typically takes less than a minute.
Training. The edges for the box functions \(S\), \(T\), \(U\), and \(V\) are found as the minimum and maximum argument values of those functions on the training set (e.g. the minimum and maximum \(x\)-distance between L1 and L2 for \(T\), etc.).
Testing. At test time, the whole model is fitted to the retained states, with extra “hidden” states with outside-FOV position for each part, with a penalty value for the “hidden” state learnt at training time [24]. At each position in the scan, the highest-scoring detection across all slices is retained for graphical model fitting as explained in Sect. 2.1. All retained states can be in different slices.
3 Experiments
3.1 Data, Annotation and Evaluation
The dataset consists of 371 MRI T2-weighted lumbar scans, acquired under various protocols. The scans contain normal and various abnormal cases as illustrated in Fig. 2. The dataset is split into 80 training and 291 testing images. The scans have isotropic in-slice resolution varying from 0.34 to 1.64 mm with mean at 0.78, median at 0.84 mm; and varying slice spacing from 3 to 5 mm, with 4 mm in almost all scans. The scans range in fields of view, containing 7–23 vertebrae starting from the sacrum, with median at 10 per scan.
Annotation. The scans were hand-annotated with two types of ground truth as illustrated in Fig. 5: (i) All the vertebrae centres in all the scans are marked with a point (“\(+\)” in Fig. 5), and labelled with the vertebrae name; and (ii) all the training scans plus some test scans are annotated with a tight bounding box around each vertebra (Fig. 5A3, B3). The tight bounding boxes were defined by points (“\(\times \)” in Fig. 5) along the vertebrae boundaries as shown.
Evaluation protocol. The detections are evaluated against vertebrae-center and the sacrum-center ground truth points. A positive detection for the sacrum is counted if a detected sacrum bounding box contains the sacrum ground truth point and does not contain any vertebrae center ground truth points. A positive detection for the vertebrae is counted if a detected vertebra bounding box contains one and only one ground truth point for a vertebral body, including the sacrum. This evaluation protocol ensures that the cases where a detection is much larger than the vertebral body, covering several vertebrae, are not counted as positive.
3.2 Results
The algorithm is evaluated on a set of 291 lumbar spine test images with variable number of vertebrae visible. Given an input scan, both the sacrum and vertebrae detectors are run on the scan, searching over position, scale, and angle. The position search is dense, the scale varies from 0.25 to 2 of the original image size (scale = 1) in small increments (so that there are 10 different scales per doubling of image size). The angle search runs from \(-20^\circ \) to \(20^\circ \) for vertebrae and \(-60^\circ \) to \(0^\circ \) for the sacrum in \(10^\circ \) increments.
Example outputs are shown in Fig. 6, and statistical results on localization error over the test set are plotted in Fig. 7 and tabulated in Fig. 8 by vertebrae type.
We achieve 84.1 % correct identification rate overall, and 86.9 % for the lumbar vertebrae. The mean detection error between the Ground Truth centre of the vertebrae and the center of the detected bounding box is 3.3 mm, with standard deviation 3.2 mm. If the assigned labels are allowed to be shifted by \(+\)/\(-\) one, the errors are 92.9 and 94.7 % respectively.
Independent sacrum detection (without graphical model) with local non-maxima supression shows 98.1 % recall at 48 % precision. Independent general vertebrae detection (without graphical model) shows 97.1 % recall at 9.1 % precision.
Our method works well on very challenging examples with various anomalies illustrated earlier in Fig. 2. The identification results compare favourably to other approaches in the literature, although direct comparison is not possible since the algorithms have been evaluated on different datasets. Glocker et al. [18] report median identification error of 81 % with median localization error below 6mm on CT images. Zhan et al. [15] detect disks and vertebrae in isotropic MRI scans with 97.7 % “perfect” labelling rate as assessed by a medic but do not report detection errors. Pekar et al. [19] report 83 % correct labelling rate on 30 lumbar MRI scans. Our method is able to correctly localize the centre of the vertebrae out of the mid-sagittal slice in scoliotic cases such as image f in Fig. 6.
Application to CT images. Although the method has been principally designed for MRI images, it is directly applicable to CT images as shown in Fig. 9. No retraining is required for detection on CT due to the high generalization of HOG detectors.
4 Conclusion
We have presented a HOG-based algorithm to localize vertebrae in lumbar MRI scans of the spine that is simple, accurate and efficient. We demonstrate robustness to severe deformations due to diseases, image artefacts, and a wide range of resolution, patient position, and acquisition protocols on a challenging clinical dataset. It is straightforward to extend the method to completely general FOVs if required, by taking other anatomical context into account [18].
References
Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of CVPR (2008)
Pfirmann, C.W.A., Metzdorf, A., Zanetti, M., Hodler, J., Boos, N.: Magnetic resonance classification of lumbar intervertebral disc degeneration. Spine 26(17), 1873–1878 (2001)
Fardon, D.F., Milette, P.C.: Nomenclature and classification of lumbar disc pathology. Spine 26(5), E93–E113 (2001)
Alomari, R.S., Corso, J.J., Chaudhary, V., Dhillon, G.: Desiccation diagnosis in lumbar discs from clinical mri with a probabilistic model. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro. ISBI ’09, pp. 546–549 (2009)
Alomari, R.S., Corso, J.J., Chaudhary, V., Dhillon, G.: Computer-aided diagnosis of lumbar disc pathology from clinical lower spine MRI. Int. J. Comput. Assist. Radiol. Surg. 5(3), 287–293 (2010)
Ghosh, S., Alomari, R.S., Chaudhary, V., Dhillon, G.: Computer-aided diagnosis for lumbar mri using heterogeneous classifiers. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro (2011)
Michopoulou, S., Costaridou, L., Vlychou, M., Speller, R., Todd-Pokropek, A.: Texture-based quantification of lumbar intervertebral disc degeneration from conventional t2-weighted MRI. Acta Radiol. 52(1), 91–98 (2011)
Ghosh, S., Alomari, R.S., Chaudhary, V., Dhillon, G.: Automatic lumbar vertebra segmentation from clinical ct for wedge compression fracture diagnosis. In: SPIE 7963, Medical Imaging 2011: Computer-Aided Diagnosis (2011)
Wels, M., Kelm, B.M., Tsymbal, A., Hammon, M., Soza, G., Sühling, M., Cavallaro, A., Comaniciu, D.: Multi-stage osteolytic spinal bone lesion detection from ct data with internal sensitivity control. In: Proceedings of SPIE 8315, Medical Imaging 2012: Computer-Aided Diagnosis (2012)
Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: Proceedings of CVPR, vol. 2, pp. 886–893 (2005)
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. c–22(1), 67–92 (1973)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)
Oktay, A.B., Akgul, Y.S.: Simultaneous localization of lumbar vertebrae and intervertebral discs with SVM based MRF. IEEE Trans. Med. Imaging 1179–1182 (2013)
Ghosh, S., Malgireddy, M.R., Chaudhary, V., Dhillon, G.: A new approach to automatic disc localization in clinical lumbar MRI: Combining machine learning with heuristics. In: International Symposium on Biomedical Imaging (2012)
Zhan, Y., Maneesh, D., Harder, M., Zhou, X.S.: Robust MR spine detection using hierarchical learning and local articulated model. Med. Image Comput. Comput.-Assist. Interv.—MICCAI—LNCS 7510, 141–148 (2012)
Chwialkowski, M.P., Shile, P.E., Pfeifer, D., Parkey, R.W., Peshock, R.M.: Automated localization and identification of lower spinal anatomy in magnetic resonance images. Comput. Biomed. Res. 24(2) (1989)
Aslan, M.S., Ali, A., Rara, H., Farag, A.A.: An automated vertebra identification and segmentation in CT images. In: Proceedings of IEEE 17th International Conference on Image Processing (2010)
Glocker, B., Feulner, J., Criminisi, A., Haynor, D.R., Konukoglu, E.: Automatic localization and identification of vertebrae in arbitrary field-of-view ct scans. In: Medical Image Computing and Computer-Assisted Intervention (2012)
Pekar, V., Bystrov, D., Heese, H.S., Dries, S.P.M., Schmidt, S., Grewer, R., Harder, C.J.D., Bergmans, R.C., Simonetti, A.W., Muiswinkel, A.M.V.: Automated planning of scan geometries in spine mri scans. In: Medical Image Computing and Computer-Assisted Intervention, vol. 10, pp. 601–608 (2007)
Alomari, R.S., Corso, J.J., Chaudhary, V.: Labeling of lumbar discs using both pixel- and object-level features with a two-level probabilistic model. IEEE Trans. Med. Imaging 30(1), 1–10 (2011)
Kelm, B.M., Wels, M., Zhou, K.S., Seifert, S., Suehling, M., Zheng, Y., Comaniciu, D.: Spine detection in ct and mr using iterated marginal space learning. Med. Image Anal (2012)
Klinder, T., Ostermann, J., Ehm, M., Franz, A., Kneser, R., Lorenz, C.: Automated model-based vertebra detection, identification, and segmentation in CT images. Med. Image Anal. 13(3), 471–482 (2009)
Felzenszwalb, P.F., Grishick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE PAMI 32(9), 1627–1645 (2010)
Potesil, V., Lootus, M., El-Labban, A., Kadir, T.: Landmark localization in images with variable field of view. In: International Symposium on Biomedical Imaging (2013)
Acknowledgments
Acknowledgements for the dataset.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lootus, M., Kadir, T., Zisserman, A. (2014). Vertebrae Detection and Labelling in Lumbar MR Images. In: Yao, J., Klinder, T., Li, S. (eds) Computational Methods and Clinical Applications for Spine Imaging. Lecture Notes in Computational Vision and Biomechanics, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07269-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-07269-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07268-5
Online ISBN: 978-3-319-07269-2
eBook Packages: EngineeringEngineering (R0)