Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Localization, labeling, and segmentation of the vertebrae and the intervertebral discs are essential tasks that have been attracting an increasing number of research groups worldwide. The accuracy and robustness of these imaging tasks are crucial for subsequent abnormality diagnosis. Moreover, accurate results of these tasks are critical for radiologists to perform an accurate diagnosis from various imaging modalities including X-ray radiography, Computed Tomography (CT) scans, and Magnetic Resonance Imaging (MRI). Furthermore, surgeons demand accurate reporting of these results when overlaid on a computer guided surgery system or a computer assisted surgery system.

Whilst the localization task is to locate an anatomical structure (e.g. locating the intervertebral discs by a point within or a bounding box around the discs), the segmentation task is to provide a fine contour that accurately delineates that structure (e.g. a contour around the vertebra). Labeling, on the other hand, is to identify the anatomical nomenclature of each structure (e.g. labeling each of the five lumbar vertebrae as L1, L2, L3, L4 and L5). Figure 1 shows an example of localization and labeling for the six intervertebral discs connected to the five lumbar vertebrae on a sagittal MRI [5].

Fig. 1
figure 1

Localization and labeling of a sagittal lumbar T2-weighted MRI. Lumbar area is the second area to the last of the vertebral column. It is the main part of the vertebral column that is responsible for bearing the major body weight. The lowest lumbar vertebra is L5 and the highest is L1. Inter-vertebral discs are labeled based on the enclosing vertebrae [5]

The essential structures within the vertebral column that have been attracting researchers for localization, labeling, and segmentation are the intervertebral discs, the vertebrae and the Dural Sac.

The lumbar vertebrae are the five vertebrae between the rib cage and the pelvis which are designated as L1–L5, starting at the top. The intervertebral discs are fibrocartilaginous cushions which are named upon the vertebral bodies that sandwich a particular disc, e.g., the disc in between L1 and L2 is named L1-L2. In clinical practice, the radiologist reports the diagnosis at each disc level and at each vertebra level. Hence, the first requirement of any lumbar Computer Aided Diagnosis (CAD) system is to localize and label the lumbar discs and vertebrae as shown in Fig. 1 [5]. Specifically, localization refers to providing centroids or bounding boxes for each of the lumbar discs, and labeling refers to identifying each localized disc as one of the six lumbar discs (T12-L1, L1-L2, L2-L3, L3-L4, L4-L5, L5-S1). While some researchers have discussed methods to provide a point within each lumbar disc [5, 76], there are also methods [32] that provide a bounding box around every visible disc in clinical lumbar MRIs as illustrated in Fig. 2.

Fig. 2
figure 2

This figure illustrates the results of an automatic lumbar disc localization method [32] which detects all the visible discs in a lumbar MRI. In case more than six discs are detected, the lower most six discs are identified as the lumbar discs. The red boxes are the bounding boxes provided for each of the visible discs in the clinical MRI. The red stars show the automatic disc centers, while the green stars show the true centers

Another important tissue structure in the lumbar spine is the Dural Sac. It is the membranous sac that encases the spinal cord within the bony structure of the vertebral column. The human spinal cord extends from the foramen magnum and continues through to the conus medullaris near the second lumbar vertebra, terminating in a fibrous extension known as the filum terminale. The Dural Sac usually ends at the vertebral level of the second sacral vertebra. Intensity inhomogeneity within the sac due to varying amounts of white and gray matter makes the segmentation of the Dural Sac and the spinal cord very challenging. Moreover, automatic segmentation in clinical MRIs is even harder due to variations in appearance and a lack of bright spinal fluid in cases with certain abnormalities such as stenosis.

After localization and labeling, comes the challenging task of tissue segmentation. Segmentation of discs is quite difficult due to extreme variability in shape, size and appearance of intervertebral discs in lumbar MRI. Moreover, discs with abnormalities can be very fuzzy and difficult to segment manually leading to significant inter-observer variability. At the same time, segmentation of intervertebral discs is a very important part of lumbar CAD systems in order to diagnose and quantify abnormalities such as herniation, desiccation and degeneration.

Requirements for CAD systems of the lumbar region are unique since we need to segment the Dural Sac and localize, label and segment the lumbar intervertebral discs before we can initiate the diagnosis. Figure 3 shows an illustration of automated segmentation [34] of the discs, vertebrae and the Dural Sac of a clinical MRI using two methods, the first using a probability map and HOG features, while the second method uses neighborhood label information as well, in a Gibbs Sampling approach.

Fig. 3
figure 3

Illustration of automated lumbar tissue segmentation [34]: a shows the original mid-sagittal MRI, be show the manual segmentation (ground truth), fh show the label maps for the dural sac, disc and vertebra respectively using method 1 (probability map + HOG features), while i and j show the dural sac and disc segmentation after morphological post processing. km Show the label maps generated at the end of iteration number 1, 6 and 200 respectively using method 2 (probability map + HOG features + neighborhood labels via Gibbs sampling), while n and o show the dural sac and disc segmentation after morphological post processing

2 The Vertebral Column

This section is dedicated to present the anatomy of the vertebral column in general with focus on the lumbar area. It also provides the standardized nomenclature of the various abnormalities in the vertebral column as endorsed by the North American Spine Society (NASS), the American Society of Spine Radiology (ASSR), and the American Society of Neuroradiology (ASNR) [24].

The vertebral column, also known as the backbone or the spinal column, is typically made up of (33) individual bones called vertebra (plural: vertebrae) that interlocks with each other. These vertebrae are classified into five areas from top to bottom: Cervical (7), Thoracic (12), Lumbar (5), Sacral (5), and Coccyx (4). Among these (33) vertebrae, only the top (24) are movable due to which clinicians often state that the vertebral column consists only of (26) vertebrae counting the Sacral vertebrae as one and the Coccyx as one. In each of these four regions, the vertebrae have unique features that allows certain functionality [88].

There are five distinct regions in the vertebral column. The top most region is the Cervical region which consists of seven vertebrae anatomically named from top to bottom as C1 (also called Atlas), C2 (also called Axis), C3, C4, C5, C6, and C7. The main function of this region is to support the weight of the head [normally weighs about 10 pounds (4.5 kg)]. The cervical has the most range of motion due to the first two specialized vertebrae that connect to the skull.

The Thoracic region comes next and consists of twelve vertebrae named from top to bottom as T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, and T12. The major role of the thoracic spine is to protect the organs that lie in the chest by supporting the rib cage. The motion is limited due to the nature of the chest.

The Lumbar region comes next and has the largest vertebrae. This region is responsible for the whole flexibility of the back as well as bearing the weight of the body. Five vertebrae exist in this area that are named (from top to bottom) as L1, L2, L3, L4, and L5. The spinal cord stops, typically at the L1-L2 area where the nerves hang down inside the Pachymeninx (Dural Mater) which is the tough and inflexible outermost of the three layers of the meninges surrounding the brain and spinal cord. The Sacral and the Coccyx regions have less functionality and they are barely movable. The sacral one fused vertebra (named as S or S1) provides attachment for the Ilium (hip) bones and protects the pelvic organs while the coccyx region fused vertebra are not recognized for main functionality.

2.1 Lumbar Spine Region

Since our focus is on the lumbar region, we present more details on the anatomy of the lumbar region as well as details of the MRI. There are two common anatomic terms that relate to the low back: anterior and posterior. Anterior refers to the front of the spine while posterior refers to back of the spine as shown in Figs. 4 and 5. The section of the spine that makes up the low back is called the lumbar spine. Lumbar spine includes these main structures: Intervertebral discs, vertebrae, and other structures.

Fig. 4
figure 4

Sagittal T2-SPIR-weighted MRI showing anterior and posterior terms [5]

Fig. 5
figure 5

Axial T1-weighted MRI showing anterior and posterior terms [5]

2.1.1 Intervertebral Discs

Intervertebral discs are unique structures that absorb shocks between adjacent vertebrae. They act as the ligaments that connect the vertebrae together and the pivot point which allows the spine mobility by bending and rotating. They make about one fourth of the spinal column length [88].

An intervertebral disc is composed of two parts: an outer strong ring called Annulus Fibrosus and a soft gel-like inner called Nucleus Pulposus. The nucleus pulposus consists of 80–85 % water in normal cases. By aging the disc dehydrates limiting its ability to absorb shocks. The outer rings gets weaker as well and start having tears that causes various abnormalities. The bottom up view is the anterior while top down is the posterior direction. Figures 6 and 7 show T1- and T2-weighted MRI for the same lumbar disc from our dataset, respectively.

Fig. 6
figure 6

Sagittal T1 MRI for L1-L2 disc [5]

Fig. 7
figure 7

Sagittal T2 MRI for L1-L2 disc [5]

2.1.2 Vertebrae

A vertebra (plural: vertebrae) is a bone with specific structure that supports and protects the spinal cord. A typical vertebra consists of two segments: Anterior (front) and posterior (back). The anterior part of the vertebra is the body while the posterior, which is known also as the vertebral (neural) arch, includes: the vertebral foramen, a pair of pedicles and a pair of laminae, and supports seven processes [four articular, two transverse, and one spinous (aka neural spine)].

2.1.3 Other Structures

In addition to the main structures within the vertebral column, there are few other structures including.

Nerves: The spinal cord hangs inside a bony ring through the vertebral column that is made up of millions of nerve fibers. The spinal cord extends down to the L2 vertebra. Below L2, a bundle of nerves named as Cauda Equina hangs down in what is known as the Thecal Sac. Two large nerves branch off the spinal cord, one from each side passing through the neural foramina of each vertebra. These spinal nerves group together to form the main nerves that go to the organs and limbs. The nerves of the lumbar spine (Cauda Equina) go to the pelvic organs and lower limbs [88].

Connective tissues: They are the fibrous connections that hold the cells of the body together. The ligaments are strong connective tissues that attach bones together. There are many long ligaments that connect on the front and back sections of the vertebrae. The anterior longitudinal ligament runs lengthwise down the front of the vertebral bodies. Two other ligaments run full-length within the spinal canal. The posterior longitudinal ligament attaches on the back of the vertebral bodies. The Ligamentum Flavum is a long elastic band that connects to the front surface of the lamina bones (just behind the spinal cord). Thick ligaments also connect the bones of the lumbar spine to the sacrum (the bone below L5) and pelvis.

Muscles in the lower back are arranged in layers. The superficial layer, the closest to the skin, is covered by a thick tissue called Fascia. The middle layer, called the Erector Spinae, has strap-shaped muscles that run up and down over the lower ribs, chest, and low back. They join in the lumbar spine to form a thick tendon that binds the bones of the low back, pelvis, and sacrum. The deepest layer of muscles attaches along the back surface of the spine bones, connecting the low back, pelvis, and sacrum. These deepest muscles coordinate their actions with the muscles of the abdomen to help hold the spine steady during activity [95].

Spinal segments is a notion that includes two vertebrae separated by an intervertebral disc, the nerves that leave the spinal column at each vertebra, and the small facet joints that link each level of the spinal column. The intervertebral disc separates the two vertebral bodies of the spinal segment. The disc normally works like a shock absorber. It protects the spine against the daily pull of gravity. It also protects the spine during heavy activities that put strong force on the spine, such as jumping, running, and lifting. The spinal segment is connected by two facet joints described earlier. When the facet joints of the lumbar spine move together, they bend and turn the low back [95].

3 Popular Lumbar Imaging Modalities

Most image based research literature focuses on X-ray radiography, Dual-energy X-ray Absorptiometry (DEXA or DXA), CT, and MRI. X-ray radiography and DEXA are cheaper and widely popular modalities as an initial diagnostic tool. Hence, the availability of the data provided researchers with great opportunities to investigate labeling, localization, and even diagnosis problems.

On the other hand, MRI (Fig. 8) and CT (Fig. 9) are more expensive and less available for researchers. Hence, fewer researchers obtained access to such data and were able to investigate localization, segmentation, and diagnosis problems on the various anatomical structures. Few efforts utilized other modalities such as ultrasound, especially, for fetal spine detection and abnormality detection.

Fig. 8
figure 8

This figure illustrates a sample image from each of the five popular clinical MRI protocols—T1 weighted sagittal, T2 weighted sagittal, T2 weighted axial, T2-SPIR sagittal and Myelo [6]

Fig. 9
figure 9

This figure shows samples of a lumbar CT scan. The first two images show sagittal and coronal slices from a CT volume, while the last image shows a 3D reconstruction [2]

Both X-ray and DEXA (aka DXA) radiography consist only from one 2D slice that shows the area of interest. On the other hand, CT scans show a full 3D volume for the area of interest. Clinical CT spans the whole area in slice-by-slice fashion that can be directly used to produce a full 3D volume. Usually CT consists of a set of axial slices with specific thickness depending on the available technology.

Moreover, clinical MRI consists of few protocols that vary depending on the available technology. The current standard in MRI in North America, for low back, is the 3 T MRI. Most of the current MRI radiology centers produce: (1) T1-Weighted sagittal (T1 W-sagittal), (2) T2-Weighted sagittal (T2 W-sagittal), (3) T2-Weighted axial (T2 W-axial) for a set of selected discs, (4) T2-Weighted axial (T2 W-axial) and (5) Myelo MR images (Fig. 8). While the sagittal views span the side-to-side dimension of the body, the axial views are acquisitions of each intervertebral disc within the area of interest. Each disc has a set of axial slices that are aligned with the dimension of the major axis of the disc. MRI acquisition technician spends a manual effort for planning the acquisition to make sure that each disc volume is acquired correctly and that all acquired protocols are manually co-registered. The patient is not allowed to move during the whole acquisition period.

Many clinical MRI protocols exist that have trade-offs in diagnosis of various backbone abnormalities. The technician has four main parameters to tune before MRI acquisition that control the appearance (intensity) of the resulting image: (1) proton density, (2) longitudinal relaxation time (T1), (3) transverse relaxation time (T2), and (4) the flow. The proton density refers to the concentration of protons in the tissue in the form of water and macromolecules (proteins, fat, etc.). Both T1 and T2 relaxation times define the way that the protons revert back to their resting states after the initial RF pulse. The most common effect of the flow is the loss of signal from rapidly flowing arterial blood.

Two common pulse sequences for MR imaging are widely used: T1- and T2-weighted spin-echo sequences. The T1-weighted sequence uses a short Repetition Time (TR) and a short TE (Echo Time) (TR ≤ 1,000 ms, TE ≤ 30 ms). The T2-weighted sequence uses a long TR and long TE (TR ≥ 2,000 ms, TE ≥ 80 ms). Moreover, two major techniques are used for suppression of fat signals in MRI: Short Tau Inversion Recovery (STIR) and selective partial inversion recovery (SPIR). The STIR sequences suppress fat signal by using an initial 180° radiofrequency pulse to invert the longitudinal magnetization. Image acquisitions are then performed with the inversion time equivalent to the known null point for fat (approximately 0.69 × T1) [92]. SPIR is a more recent fat-suppression technique that is based on the use of frequency-specific pulse sequences [93]. Only the fat magnetization pulse is inverted leaving water resonances as is. This technique is useful for suppressing any tissue-specific pulse given the known-frequency of that tissue. However, the SPIR technique is extremely sensitive to the magnetic field inhomogeneity. SPIR is used with both T1- and T2-weighted MRI [44].

Furthermore, another important MRI sequence generator that is related to common current clinical MRI is called MR Myelography [57] (Myelo is a new Latin word, from Greek muelos, which means spinal cord). In this method, the background signal is suppressed by using heavily T2-weighted fast spin-echo pulse sequences and obliterating fat signal by pre-saturation. The resulting slices are then projected into a composite image using a standard maximum intensity projection (MIP) algorithm [57].

It is worth mentioning that inter-observer variability exist in lumbar diagnosis similar to many diagnosis tasks from various imaging modalities including X-ray radiographs, MRI, CT, Single-Photon Emission Computed Tomography (SPECT), and High Resolution (HR). However, MRI shows high inter-observer reliability compared to plain radiographs in lumbar area diagnosis (e.g., [62]). Mulconrey et al. [70] showed that abnormality detection for degenerative disc and Spondylolisthesis with MRI has κ = 0.773 and κ = 0.728, respectively, which is considered high in showing inter-observer reliability where this reliability is considered perfect when 0.8 ≤ κ ≤ 1.

4 Challenges

Automatic detection of abnormalities from MRI or CT scans has been studied by researchers for quite some time. The challenges are manifold—ranging from variations in scanner specifications, parameter settings, modalities, differences in body structure and composition and last but not the least the task of segmentation which is a big challenge in computer vision.

In general, the segmentation of CT and MRI scans is difficult due to three main reasons.

  1. (1)

    Partial Volume Effect: It is a scenario where multiple tissues contribute to pixels and blurs intensity across boundaries as illustrated in Fig. 10.

    Fig. 10
    figure 10

    This figure shows an illustration of partial volume effect in an imaginary scan consisting of two different kinds of tissues. While the first image shows the expected image the second one shows the actual image with fuzzy boundaries due to partial volume effect [32]

  2. (2)

    Intensity Inhomogeneity: It is defined as non-anatomic intensity variations of the same tissue over the image, and may be caused by the imaging instrumentation (RF non-uniformity, static field inhomogeneity) or due to patient movement as seen in Fig. 11.

    Fig. 11
    figure 11

    This figure illustrates intensity inhomogeneiy in a Lumbar MRI [32]

  3. (3)

    Intensity Similarity: Very often two or more tissues have the same intensities in MRI scans as illustrated in Fig. 12.

    Fig. 12
    figure 12

    This figure shows intensity similarity of different tissues in a T1 weighted lumbar MRI [32]

All these factors contribute to the fact that segmentation of a lumbar MRI is a very challenging task.

Real world clinical MRIs are even more challenging since patients very often suffer from one or more lumbar abnormalities such as vertebral fractures, Spondylolysis, Spondylolisthesis, Scoliosis, intervertebral disc abnormalities (degeneration, desiccation, herniation, bulge and annular tears) and spinal Stenosis. In addition, there is lumbar variability due to patient age, height and structure leading to diverse images. Figure 13 shows a sample set of clinical lumbar MRIs showing some of the variability [5].

Fig. 13
figure 13

Variability in disc appearances, shapes, locations, and sizes in different abnormal cases. a Shows variability in appearance of discs. The lower two discs (L4-L5 and L5-S1) have less intensity levels due to abnormalities (Herniation, Stenosis and Desiccation). b Shows variability in shape of discs with close intensity levels due to abnormalities in the lower two discs (Herniation and Stenosis). c Shows clear difference at the lowest disc level (L5-S1) as well as the difference in bending of the lumbar vertebral column which results in variability in location. d Shows variability in location of discs from other figures, sizes of discs, and the missing disc at L4-L5 disc. e Shows variability in disc sizes between the upper four discs and the lowest disc L5-S1. Ages of these patients are 35, 36, 29, 47, and 27, respectively from a to e. All images have been edited by cropping and contrast enhancement for better visualization

5 Advances in Localization, Labeling, and Segmentation

There are three main steps for the proper diagnosis in medical imaging: (1) Localization and labeling of anatomic structures, (2) Segmentation and (3) Diagnosis and quantification of abnormalities. There has been an extensive amount of work done in the area of vertebral body localization and segmentation from X-ray radiographs and CT scans in the past two decades. On the other hand, localization of soft tissues in MRI and diagnosis of disc abnormalities is comparatively more recent and has been of central focus for low back research in the last decade.

In this section, we review in detail, the current literature in the context of spinal tissue localization, segmentation and abnormality diagnosis. We classify the literature based on the medical imaging modality.

5.1 X-ray Radiography

Since X-ray radiographs use ionizing radiation and show better detailing of the bony tissues, there has been plenty of research in the direction of vertebrae segmentation from X-ray scans. Moreover, the availability of X-ray radiographs data helped boost the related amount of research.

5.1.1 Vertebrae

More than two decades ago, Hedlund and Gallagher [39] performed vertebral morphometry on lateral thoracic and lumbar X-ray radiographs of 153 women with a preliminary diagnosis of Spinal Osteoporosis. Measurements included anterior and posterior vertebral height, width, area, wedge angle, percent reduction of Anterior to Posterior Height (PRH) and Percent Difference in Anterior Height between adjoining vertebrae (PDAH). They showed that among individuals with mild Osteoporosis (0–2 fractures) PDAH identified 86 % of the fractures and 95 % of the individuals with fractures.

Manual selection of anatomical points for vertebral abnormality diagnosis is time consuming, imprecise and subjective. To obtain a more objective and accurate description of the vertebral body shape, semi-automatic methods were proposed that were based on statistical models of vertebral bodies in the sagittal view. Very early on, in 1993, a computerized quantifying technique for vertebral morphometry on lateral radiographs of the spine was proposed by Nicholson et al. [73]. Although fracture detection was improved by expanding the description of the vertebral body shape from six points to a contour, the amount of traumatic spinal injury or latent vertebral fracture was often underestimated. The main reason for the wrong diagnosis originated from the limited measurement possibility caused by the lack of depth perception in X-ray radiographs. Later in 1997, Smyth et al. [87] described how Active Shape Models (ASM) could be used to locate both normal and fractured vertebrae from Dual energy X-ray Absoptiometry (DXA) images of the spine. However, three initialization points have to be manually selected. To overcome the lack of depth perception in X-ray images, Benameur et al. [8] performed projection of a three-dimensional (3D) statistical shape model of a vertebra to a pair of orthogonal 2D X-ray radiographs. They validated this method on 57 scoliotic vertebrae images. However, the proposed segmentation was highly dependent on model initialization.

Various efforts that target the diagnosis of certain vertebra conditions involved localization and segmentation. In 2000, Long and Thoma [60] investigated the segmentation of C2 and C3 vertebrae from the cervical area using an ASM as a first step for building an image based retrieval system for a dataset consisting of 7,000 lumbar X-ray radiographs and 10,000 cervical spine X-ray radiographs. They built the Web-based Medical Information Retrieval System (WebMIRS) based on the National Health and Nutrition Examination Surveys (NHANES). Later, Cherukuri et al. [16] proposed image processing techniques for computing size-invariant, convex hull-based features to highlight anterior Osteophytes. Feature evaluation of 714 lumbar spine vertebrae using a multi-layer perceptron yielded normal and abnormal average correct discrimination of 90.5 and 86.6 %, respectively. Despite that the main purpose of this work is diagnosis, a great portion of this effort was to identify the vertebra.

Another work was presented by de Bruijne and Nielsen [11] who used Shape Particle Filter [23] and k-nearest neighbor pixel-level classification for a semi-automatic segmentation of the lumbar vertebrae. Around the same time, Kaminsky et al. [47] presented a standardized protocol that combined newly developed interactive tools (rotation transformation, warped dissection plane) with standard segmentation tools to provide both a fast and accurate 3D spine segmentation procedure. Howe et al. [42] proposed a multi-level segmentation technique for vertebrae from cervical and lumbar X-ray radiographs using an ASM and a generalized Hough transform. Their validation is based on a leave-one-out test with an error of 2 mm for 57 % of the cervical cases and less than 4 mm for 68 % of the lumbar cases. However, their method needs manual intervention for initialization.

Later, Crimi et al. [22] presented a Bayesian approach and used prior information to estimate the covariance matrix from a small number of samples in a high dimensional shape to segment the vertebrae from X-ray radiographs. Moreover, Zewail et al. [103] segmented the vertebrae from X-ray radiographs using a contourlet-based salient point matching and a localized multi-scale shape prior. They tested their work on 100 X-ray radiographs and obtained an average segmentation error of 1.2 mm. Later, Crimi et al. [22] presented a Bayesian approach and used prior information to estimate the covariance matrix from a small number of samples in a high dimensional shape to segment the vertebrae from X-ray radiographs. Moreover, Zewail et al. [103] segmented the vertebrae from X-ray radiographs using a Contourlet-based salient point matching and a localized multi-scale shape prior. They tested their work on 100 X-ray radiographs and obtained an average segmentation error of 1.2 mm.

More recently, Lecron et al. [59] presented a fully automated vertebrae detection. They used an edge polygonal approximation to detect vertebral edges and a SIFT descriptor to train an SVM-model. They achieved a corner detection rate of 90.4 % and a vertebra detection rate from 81.6 to 86.5 % on 250 cervical cases.

5.1.2 Intervertebral Discs and Dural Sac

Despite that X-ray radiographs are mainly used for bone visualization, few efforts have been performed in the analysis of the intervertebral discs and the spinal canal. Chamarthy et al. [14] introduced image analysis techniques, including scale-invariant, distance transform-based features to characterize the disc space narrowing (with four grades zero to three) in cervical vertebrae. For a data set of 294 vertebrae X-ray images, experimental results yielded average correct grade assignment of greater than 82.10 %. However, the testing images had to be manually labeled with the boundary points.

Later on, Koompairojn et al. [56] described a fully automatic Spinal Stenosis diagnosis system via vertebral morphometry [73], using an Active Appearance Model (AAM) for segmentation and a Bayesian framework for classification. Experimental results on 86 lumbar spine X-ray images from the NHANES II database showed accuracy ranging from 75 to 80 %. Moreover, Stanley et al. [89] investigated new size-invariant features (claw and traction) for the detection of anterior Osteophytes for efficient Content Based Image Retrieval (CBIR). Using a K-means clustering and nearest neighbor classification approach, average correct classification rates of 85.80, 86.04 and 84.44 % were obtained for claw, traction and anterior Osteophytes, respectively, on 390 cervical vertebrae.

5.2 Computed Tomography (CT)

CT imaging technique has become indispensable for diagnosis of spine abnormalities by providing a detailed 3D representation of the anatomy. Compared to X-ray radiography and MR imaging, CT proved to have higher sensitivity and specificity in the visualization of the bone structures.

5.2.1 Vertebrae

A number of automatic and semi-automatic methods for segmentation of vertebrae and vertebral structures in CT have been proposed [27, 50, 52, 64, 80, 84, 91, 96, 98, 102] over the last decade. On one hand some researchers proposed techniques that segment each vertebra separately [50, 61, 64, 84, 98], which might lead to mis-segmentation due to absence of a clear boundary between vertebrae. To overcome this issue, some authors proposed techniques for simultaneous segmentation of all vertebrae [52, 80].

Early last decade, Hahn [36], proposed a fully automated approach to evaluate rotation of the cervical vertebrae in 3D using a multidimensional Powell minimization algorithm for spiral CT scans. Later, in 2004, the same research group [37] presented a method for determination of the planes separating the individual vertebrae of the spine from CT volumes using a Balloon based model. This model requires careful initialization similar to the 2D active contours (snakes) besides its high dependency on the edge detector.

Meanwhile, Ghebreab and Smeulders [27] presented a combination of Strings [26] and Necklaces [28] to model the spine in the lumbar area using both a priori knowledge about natural variation and anatomical saliency in the visual appearance of the spine. The Strings model focuses on learning the most relevant biological variation in the visual appearance of the spine as a whole, and Necklaces aims at exploiting inhomogeneities in multiple continuous shape and gray-level features of vertebrae. Thus they were able to use both a priori knowledge about natural variation and anatomical saliency in the visual appearance of the spine. However, they tested their method on only six CT cases and with minimal spinal and vertebral deformations. Furthermore, manual intervention is used for initialization of their model. On the other hand, Vrtovec et al. [96] detected the spine curve from CT using a polynomial model to provide a Curved Planar Reformation (CPR) of the 3D spinal column. They fit the spinal curve to a set of points extracted from a distance map that emphasized the vertebral bodies and tested the method on five cases, including one Scoliotic case achieving mean positional errors between 2 and 6 mm.

Furthermore, Yao et al. [102] presented a systematic algorithm for segmenting the spinal column from chest and/or abdominal CT scans, without labeling of vertebrae. Their method is based on thresholding, Watershed, and directed graph search besides modeling the vertebral bony tissue as a four-part model. They showed correct segmentation of 69 cases out of the 71 total cases. Meanwhile, Mastmeyer et al. [64] segmented the lumbar vertebral bodies in CT images by combining viscous deformable models with the geometrical shape of the vertebral body, starting from a point in the center of each vertebral body. Tan et al. [91] presented a level set-based segmentation algorithm for the vertebrae and validated their work on synthetic 3D vertebrae volumes. After parameter selection, they tested the algorithm on 50 vertebrae (from ten subjects), obtaining 90 % success rate. Later, Shen et al. [84] presented a segmentation technique for vertebrae from 3D CT scans using prior knowledge with a set of high level features to form a surface model. However, they did not perform labeling. They tested their model on 150 vertebrae with a comparative segmentation to two experts’ segmentation. In the same year, Klinder et al. [51] presented a two-scale framework for modeling and segmenting the spine from thoracic CT scans, achieving a segmentation accuracy of 1.0 mm in average for ten thoracic CT volumes. By applying statistical models of shape, gradient and appearance of spinal structures in 3D, the same research group [52] detected, identified and segmented the vertebrae in CT volumes. However, their identification algorithm is based on vertebrae Active Appearance Model for spatial registration and matching which is very computationally expensive (20–30 min per case). Their framework was tested on 64 CT images including pathologies like Scoliosis, Kyphosis and collapsed vertebrae. Later, Kim and Kim [50] automatically segmented the vertebrae by a region growing algorithm inside a volume limited by a 3D fence that was obtained from a deformable model. They obtain 80 % success on a 50 patient dataset. More recently, Ma and Lu [61] proposed a method for segmentation and identification of thoracic vertebrae in CT images by training an edge detector to bone structures via steerable gradient features and using a deformable surface model in a two-stage coarse-to-fine scheme. They achieve point-to-surface error 0.95 ± 0.91 mm on 40 volumes.

Segmentation that is not based on deformable models [50, 91, 102], generally do not provide any quantitative information of vertebral deformations for CAD systems, while the segmentation based on deformable models is mathematically too abstract for describing deformations in clinical practice [52, 61, 64]. For example, Štern et al. [98] proposed a parametric method for quantitative description of vertebral body deformations; evaluated from the parameters of a 3D super-quadratic model, which is initialized as an elliptical cylinder and then gradually deformed by introducing transformations that yield a more detailed representation of the vertebral body shape. Their method was validated on 75 CT and 75 MRI vertebrae extracted from none normal and ten abnormal subjects; showing a success rate of 94.5 and 88.6 %, respectively.

Meanwhile, Kadoury et al. [46] proposed a method for inferring articulated spine models from pre-operative X-ray to intra-operative CT images. This approach automatically segments the entire spinal column with annotated landmarks by modeling complex, non-linear patterns of prior deformations from a Riemannian manifold embedding, showing an accuracy of 0.7 ± 1.8 mm for thoracic vertebra and 2.1 ± 2.5 mm for lumbar vertebra based on the localization of surgical landmarks. Recently, Rasoulian et al. [80] developed a statistical multi-vertebrae shape and pose model and proposed a registration-based technique to segment the CT images of spine using a reduced number of registration parameters. Validation on lumbar vertebrae of 32 subjects shows a mean error less than 2 mm, which the authors argue, is sufficient for many spinal needle injection procedures, such as facet joint injections.

In another recent approach that avoids an explicit parametric model of appearance, Glocker et al. [35] proposed a vertebrae localization and identification algorithm which builds upon supervised classification forests. They overcome the tedious requirement for dense annotations by a semi-automatic labeling strategy. Extensive evaluation on a dataset of 224 spine CT scans of patients with pathologies (including high-grade Scoliosis, Kyphosis, and presence of surgical implants) shows a mean localization error of 12.4 mm and 70 % identification rates on pathological spines, which outperforms a parametric approach using Regression Forests and Hidden Markov Models (HMM).

One major effort in vertebrae segmentation was part of a semi-automated vertebra fracture detection system [2]. In the segmentation of vertebra, they started with the CT volume and select the middle slice as a starting point for segmentation. There are two main steps to train their model: (1) Inter vertebral disc localization (that leads to vertebra localization as illustrated below). (2) ASMs for each vertebra level.

For the first training task, they trained the proposed model in [5] by allowing a radiologist to place a point inside each disc for the six discs enclosing the five lumbar vertebrae. Then they saved this data with the corresponding images to train the model for the disc localization step (a point inside each disc).

The second training task is the selection of a fixed set of points (16 points) for each vertebra. Then produced a separate model for each vertebra level and prepare the training data required for an ASM (x-, y-coordinates and the image itself). Figure 14 shows a sample image with the 16 points on the edges of each vertebra as selected by the expert radiologist.

Fig. 14
figure 14

Preparation of training data [2]. An expert radiologist manually selects a set of 16 points based on a predefined model for locating these points

The steps for the segmentation of the lumbar vertebrae from CT are explained below in three sub steps: vertebrae localization, vertebra point distribution modeling by the ASM, and vertebra boundary delineation by a Gradient Vector Flow (GVF-)snake [1, 2].

The vertebrae localization step provides a point inside each vertebra. This step utilized an earlier work on disc localization from clinical MRI [5]. After producing a point inside each disc, they take the average point between each two discs and consider this as the vertebra localization point as shown in Fig. 15.

Fig. 15
figure 15

Automated vertebrae localization. Filled circles are disc labels from Alomari et al. [5]. Crosses are the average location between each two disc labels

The next step is to model the vertebra point distribution by an ASM [20]. In this work, they produced a separate model for each vertebra level. A radiologist prepares the training data where he manually marks 16 landmark points for each vertebra as shown in Fig. 14. These points are named from k1 to k16. Similar to [20], they initially calculated the mean shape \( \bar{x} = \frac{1}{N}\sum\nolimits_{1}^{N} x \) where N is the size of the training data. Then each vertebra shape x i , where \( i \in \left\{ {1, \ldots ,N} \right\}, \) is recursively aligned to the mean shape \( \bar{x} \) using generalized Procrustes analysis to remove translational, rotational, and isotropic scaling from the shape.

Then, they model the remaining variance around the mean shape for each vertebra with principal components analysis (PCA) to extract the Eigen vectors of the covariance matrix associated with 98 % of the remaining point position variance according to the standard method for deriving the ASM’s linear shape representation.

However, they did not use the original CT image for training the ASM of each vertebra. Rather, they applied the range filter R first on the image to obtain a better edge enhancement for vertebrae. R is the range filter operator where the intensity levels in each 3 × 3 window are replaced by the range value (maximum–minimum) in that window. This operator R has high values in abrupt-change regions and small values in smooth regions as shown in Fig. 16.

Fig. 16
figure 16

Range filter 3 × 3 window on the CT image [1]

To apply the ASM for detection of the point distribution of the vertebra body boundary, they applied the mean shape \( \bar{x} \) around the vertebra point produced by the localization step (cross inside each vertebra). Then, allowed the ASM to converge and obtain the boundary. They then fed this boundary to the GVF-snake in the next step.

The ASM can capture the rough boundary of the vertebra as a point distribution model. However, fine detailed delineation of the vertebra body need a more refining model. They [1] selected the GVF-snake proposed by Xu and Prince [101] because it has been proven to move toward desired image properties such as edges including concavities. GVF-snake is the parametric curve that solves:

$$ {\mathbf{x}}_{t} (s,t) =\alpha {\mathbf{x^{\prime\prime}}}(s,t) - \beta {\mathbf{x^{\prime\prime\prime\prime}}}(s,t) + {\mathbf{v}} $$
(1)

where \( \alpha \) and \( \beta \) are weighting parameters that control the contour’s tension and rigidity, respectively. \( x^{\prime\prime} \) and \( x^{\prime\prime\prime\prime} \) are the second and fourth derivatives, respectively, of \( x.\;{\mathbf{v}}\left( {x, y} \right) \) is the Gradient Vector Flow (GVF), \( s \in \left[ {0, 1} \right], \) and t is time component to make a dynamic snake curve from x(s) yielding x(s, t).

GVF-snake requires an edge map that is a binary image highlighting the desired features (edges) of the image. Most researchers use Canny edge detector or Sobel operator on the original image. They presented the GVF-snake with a canny edge map applied on the range-filtered image I.

They then applied the GVF-snake by initializing its contour to the contour produced by the ASM, that is the points k1 to k16. Figure 15 shows the same example after the convergence of the GVF-snake.

Figures 17 and 18 show four cases selected from the data set to show the robustness of the final contour despite the various abnormalities in various lumbar levels. They performed qualitative measure where a radiologist visually and carefully examined each vertebra contour and approved the automated segmentation contour for all cases.

Fig. 17
figure 17

Final contour for two cases. Images are contrast-enhanced for visual convenience [1]

Fig. 18
figure 18

Final contour for two cases. Left severely abnormal L4 vertebra. Images are contrast-enhanced for visual convenience [1]

5.2.2 Spinal Cord and Canal

Many research groups have focused on the segmentation of the spinal cord and the spinal canal in CT. Early on, Karangelis and Zimeras [48] introduced a semi-automatic 3D method for segmenting the spinal cord and tested that on 14 CT volumes. On each slice image, they used a boundary tracking method along with linear interpolation in the z-direction. However, proper selection of the seed point and the threshold limits its applicability. Meanwhile, Archip et al. [7] presented a top-down knowledge-based technique that identified the spinal cord in CT images. This approach used an Anatomical Structures Map and a task-oriented architecture plan solver. They claimed that the method was flexible enough to handle inter-patient variation and transparent to the radiologist ensuring that the experts can take control of undesirable results by image analysis. On 23 cases, the spinal canal was localized with an accuracy of 92 %, the spinal cord with an accuracy of 85 % and the lamina with an accuracy of 72 %. Couple years later, Burnett et al. [12] developed a semi-automatic algorithm for spinal canal segmentation of CT scans. The spinal canal was partially delineated by wavelet-based edge detection and fitted to a deformable model. Later, the template was aligned manually to fit more accurately to the spinal canal. Experiments on 557 axial images showed that automatic delineation of the spinal canal was successful on 91 %, unsuccessful on 2 % and requiring further editing on the rest 7 % of the images. Around same time, Nyúl et al. [75] proposed a semi-automatic method using 2D snakes for segmenting the spinal cord in a slice-by-slice manner testing that on 27 CT images for the Thoracic region. The 3D volume is then generated by interpolation. Snakes [49] are highly sensitive for the initialization which is usually performed manually.

On the other hand, because CT scans are better than X-rays and MRIs in terms of boney structure visualization, there has been great efforts toward building a CAD system for detection of various abnormalities such as Syndesmophytes (abnormal bone structures at the vertebral end plates) [90], spine Metastases [38] and vertebral fractures [2, 29]. Most of these efforts include localization, labeling, or segmentation work.

Mid last decade, Tan et al. [90] provided a quantitative measure of the Syndesmophytes using high resolution CT images. They first segmented the whole vertebra using a cascade of successive level sets, and then used curvature information to segment and quantify Syndesmophytes achieving 0.898 Pearson correlation between manual (medical expert) and the automated diagnosis which a high positive correlation level.

More recently, Hammon et al. [38] proposed a method of automatic detection of Lytic and Blastic Thoracolumbar spine Metastases (malignant tumors) from 3D CT images. They first detected the vertebral bodies using iterative marginal space learning and then use a cascade detector consisting of three random forest-based discriminative models to detect Metastases. Evaluation on 20 patients with 42 Lytic and on 30 patients with 172 Blastic Metastases (where the CAD system was trained using CT images of 114 subjects with 102 Lytic and 308 Blastic spinal Metastases) showed a sensitivity of 88 % for Lytic and 83 % for Blastic Metastases.

In vertebral fracture detection, Ghosh et al. [29] developed an unsupervised and non-parametric approach for vertebral segmentation using Hough lines and morphological operations. They also proposed a set of clinically motivated features including vertebral height features for automatic fracture detection using a Support Vector Machine (SVM). On 50 clinical cases they showed a segmentation error of 1.5 mm and a wedge fracture detection accuracy of 97 %. More recently, Al-helo et al. [2] proposed another method using ASM and a GVF-snake for vertebra segmentation and clinically motivated features for wedge fracture detection resulting in 98 % accuracy (specificity of 87.5 % and sensitivity over 99 %) using an unsupervised learner.

5.3 Magnetic Resonance Imaging (MRI)

While CT proved to have higher sensitivity and specificity in the visualization of the bone structures, MRI provides superior contrast in visualizing the soft tissue that surrounds the vertebrae, without ionizing radiation associated with CT or X-ray imaging. Moreover, MRI does not subject the patient for harmful radiations of the X-ray radiography and CT. It is important to highlight that research efforts in the literature have not been focused on distinct problems. There are many overlaps in research papers that may target localization, labeling, segmentation, and even diagnosis. We provide approximate categorization below for the literature based on the target problem.

5.3.1 Localization and Labeling

As early as 1989, Chwialkowski et al. [19] studied the localization of discs, vertebrae and spinal cord in one MRI case using intensity profiles and edge detectors. A decade later, Booth et al. [10] used an algorithm based on symmetry, active contours and edge detection to identify the vertebral body edges from cross-sectional vertebral MRI. However, the unavailability of data prevented these efforts from robust validation. Later in the last decade, Vrtovec et al. [97] detected the spine curve from MRI using a polynomial model to provide a Curved Planar Reformation (CPR) of the 3D spinal column. Their optimization framework is based on the automatic image analysis of MR spine images that exploits some basic anatomical properties of the spine. They tested the method on 21 axial MR scans of the spine from twelve subjects, achieving mean errors of 2.5 mm and 1.7° for the position of the 3D spine curve and axial rotation of vertebrae, respectively.

Mid last decade, Peng et al. [78] performed vertebra and disc labeling on five whole spine MRIs, by extracting intensity profiles of discs and use a convolution operation to match a template of the disc. Later, Masaki et al. [63] proposed a method for automated geometry planning based on intensity and a Hough transform to localize the spine and the discs. They only used ten MRI normal cases for validation. The dependency on static values (thresholds) limits the capability of segmentation methods when they are tested on different datasets. Furthermore, performing many sequential steps to achieve the segmentation task increases the error rate due to propagation of the error from each step to the next. Another study by Weiss et al. [100] proposed a semi-automatic technique for disc labeling. The upper and lower halves of the spine are separately labeled after histogram processing, filters and the use of threshold values. They tested their algorithm on fifty MRI cases.

In surgery planning, Pekar et al. [77] developed a labeling method for the whole spine. Initially, a set of disc candidates are located by a filter using eigenvalues analysis of the Hessian matrix. Then using prior structural knowledge of the spine, they picked the disc centers from the candidates. After that labeling takes place starting from the first spine point and moving upward/downward. They also used a distance constraint for locating the next disc, otherwise a new point is introduced and that disc is considered missing due to abnormality. They used 15 subjects for validation producing 60 image volumes for lumbar and cervical areas with two poses for each subject.

Bhole et al. [9] presented a method for automatic detection and labeling of lumbar vertebrae and discs from clinical MRI by combining tissue property and geometric information from T1-Weighted (T1 W) sagittal, T2-Weighted (T2 W) sagittal and T2 W axial MRI protocols. They achieved 98.8 % accuracy for disc labeling on 67 sagittal images. However, they relied on specific threshold values extracted from the dataset which prevents the extension for their method to another dataset with variable parameter settings.

Schmidt et al. [82] introduced a probabilistic inference method using a part-based model achieving up to 97 % disc detection rate on 30 cases. In another similar approach, Oktay and Akgul [76] proposed a method using Pyramidal Histogram of Oriented Gradients (PHOG) based on SVM and a probabilistic graphical model and achieved 95 % accuracy on forty cases.

Localization and labeling has been better understood in the literature. The author’s research group developed and tested a myriad of techniques. Koh et al. [54] proposed a joint attention and active contour models to segment the low back spine and subsequently label discs in later research efforts. However, the initial contour is highly sensitive to the inhomogeneous MRI signal intensity. Furthermore, [5] proposed a novel probabilistic model of the lumbar discs. This model adequately insulates the localization variables from the pixel intensities while at the same time modeling the exact disc geometry rather than solely pixel-level labels. Let \( {\mathcal{D}} = \{ d_{0} ,d_{1},\ldots, d_{6}\} \). be the set of disc variables with each \( d_{i} = \left( {x_{i} , y_{i} } \right)^{\text{T}} ,i \in \left[ {1,6} \right] \) representing the disc center (it could also include disc angle, boundary, etc.), d 0 is a label for non-disc pixels. Inferring \( {\mathcal{D}} \) from an image is our ultimate goal, but we avoid doing it directly due to its large computational complexity. We thus introduce a set of auxiliary variables, called disc-label variables and denoted by \( {\mathcal{L}} =\{ l_{i},\;\forall i\in\varLambda\} \). Each disc-label variable can take a value of {−1, +1} for non-disc or disc, respectively. The disc-labels make it plausible to separate the disc variables from the image intensities, i.e., the disc-label variables will capture the local pixel-level intensity models while the disc variables will capture the high-level geometric and contextual models of the full set of discs. This approach is simpler and more robust than the model by Schmidt et al. [82] where they had a particular label for each disc. Next, we present more details about this highly cited work. This approach marginalizes over the possible disc-labelings since these are auxiliary variables giving the following optimization function:

$$ {\mathcal{D}}^{*} =\arg \mathop {\hbox{max} }\limits_{{\mathcal{D}}} \sum\limits_{{\mathcal{L}}} P({\mathcal{L}},{\mathcal{D}}|{{{\mathtt{I}}}}) $$
(2)
$$ = \arg \mathop {\hbox{max} }\limits_{{\mathcal{D}}} \sum\limits_{{\mathcal{L}}} \frac{{P({{{\mathtt{I}}}}|{\mathcal{D}},{\mathcal{L}})P({\mathcal{D}},{\mathcal{L}})}}{{P({{{\mathtt{I}}}})}} $$
(3)
$$ = \arg \mathop {\hbox{max} }\limits_{{\mathcal{D}}} \sum\limits_{{\mathcal{L}}} P({{{\mathtt{I}}}},{\mathcal{L}})P({\mathcal{L}}|{\mathcal{D}})P({\mathcal{D}}) $$
(4)

where the second equality follows from the multi-level nature of the model (the disc variables are assumed independent of the intensities). Note the summation is over a very large set of possible assignments \( \left( {2^{|\varLambda |} } \right) \). Then, the authors model it as a Gibbs distribution:

$$ P({{{\mathtt{I}}}},{\mathcal{L}}) = \frac{1}{Z}\exp [ - \beta_{1} \sum\limits_{s \in \varLambda } U_{\text{I}} (l_{s} ,{{{\mathtt{I}}}}(s))] $$
(5)
$$ P({\mathcal{L}}|{\mathcal{D}}) = \frac{1}{Z}\exp [ - \beta_{2} \sum\limits_{s \in \varLambda } U_{{\text{D}}} (l_{s} ,{\mathcal{D}})] $$
(6)
$$ P({\mathcal{D}}) = \frac{1}{Z}\exp [ - \beta_{3} \sum\limits_{{d_{i} \in {\mathcal{D}}}} U_{\text{L}} (d_{i} ) - \beta_{4} \sum\limits_{(i \sim j)} V_{\text{D}} (d_{i} ,d_{j} )] $$
(7)

where \( \beta_{k} \ge 0, k = \left\{ {1, \ldots , 4} \right\} \) are tunable parameters and \( Z\left[ \cdot \right] \) are the partition functions. The \( ( \cdot \sim \cdot ) \) notation denotes the set of neighboring elements on the disc chain. The potentials U I and U D model the pixel (low)-level intensity and spatial models, respectively. The potentials U L and V D model the object (high)-level location and context, respectively.

The exact inference is infeasible for this model because of the dependencies of \( {\mathcal{D}} \) on all \( {\mathcal{L}} \) despite that \( {\mathcal{D}} \) is a Markov chain. They used the generalized Expectation Maximization (gEM) algorithm to optimize Eq. (4). Whereas an EM algorithm requires maximization in the M step, a generalized EM algorithm only requires an improvement over the current state. This particular method has a high disc localization rate. However, the spatial information assumes higher locality in low spine area within the MRI. Moreover, it does not incorporate other aspects such as angles within the disc chain and most importantly, it does not take into consideration the meta-data of the patient such as weight, height, and history. Patient’s low back structures vary based on their weight, height, and history.

While most of the literature in localization provides disc centroids [5, 9, 76, 82]. Ghosh et al. [32] presented an approach using heuristics and machine learning methods to provide tight bounding boxes for each disc achieving 99 % localization accuracy on 53 cases. This method can by-pass complicated segmentation algorithms and directly feed the detected disc region to a CAD system that extracts relevant features and automatically provides diagnostic results [30, 31].

5.3.2 Segmentation

Few research efforts have been conducted on segmentation of vertebrae from MRI despite that bones are better outlined in CT scans. In 2004, Carballido-Gamio et al. [13] discussed the segmentation of vertebral bodies from sagittal T1-Weighted (T1 W) MRI using normalized cuts [85] with Nyström approximation method [25]. T1 W MRI were first preprocessed by Anisotropic Diffusion algorithm [79] that smooths the image without distorting the edges. However, they test their work on only six subjects for lumbar area. Five years later, Huang et al. [43] proposed a statistical learning approach based on an improved AdaBoost algorithm for efficient vertebra detection from MRI with a success of 98 % on less than 25 cases.

As early as 1997, Roberts et al. [81] proposed a method based on watershed algorithm to segment the five lumbar level discs from MRI. However, they required major user intervention by carefully selecting an ROI. Their work studies the relation between patient age and disc height. They concluded that the disc height increases with aging and that it increases from L1-L2 level and decreases at L5-S level. Later on, Hoad and Martel [40] presented a technique to segment the bone and soft tissues from MRI. However, their method requires sensitive initialization by the user to locate four points on each vertebrae. Wachter et al. [99] used various image segmentation techniques including shape model, Hough transforms, and edge detectors to segment the 3D spine and discs in the cervical area from full 3D MRI. They did not report the number of validation cases except stating that they are several T1 W and T2 W cases. Couple years later, Chevrefils et al. [17] proposed a method to segment the discs based on Watershed and many image processing techniques including opening and erosion. This method, however, encounters an over-segmentation issue. To overcome this problem, the same group [18] also presented a framework for automatic segmentation of intervertebral discs of Scoliotic spines from 2D and 3D MRI. Twenty two texture features (18 statistical and four spectral) were extracted from every closed region obtained from their earlier segmentation procedure [17]; followed by PCA and clustering which resulted in an overall accuracy of 85 %, specificity of 83 % and sensitivity of 87 % on 505 images derived from only three patients.

A Hough transform based approach was presented by Shi et al. [86] which showed success on 48 out of 50 cases but no quantitative evaluation was discussed. Moreover, the first disc has to be hand labeled for initialization. Another approach was proposed by Michopoulou et al. [68] based on three variations of atlas based segmentation. However, they start from a manually input point for the center of each disc. Evaluation on 42 normal and 78 degenerated discs showed best performance by the atlas-robust-fuzzy C-Means approach which combines prior anatomical knowledge with fuzzy clustering techniques.

More recently, Neubert et al. [71] presented a method for the 3D segmentation of Vertebral Bodies (VBs) and Intervertebral Discs (IVDs) from the thoracolumbar region using statistical shape analysis and registration of gray-level intensity profiles. Validation on a dataset of high resolution 3D MR SPACE scans from 28 asymptomatic volunteers resulted in Dice values of 0.89 and 0.88 (lumbar and thoracic IVDs, respectively). Furthermore, Law et al. [58] proposed an unsupervised disc segmentation method that employs an Anisotropic Oriented Flux detection scheme to distinguish the discs from the neighboring structures with similar intensity, recognize ambiguous disc boundaries, and handle the shape and intensity variation of the discs. However, they require two user provided points for initialization. Evaluation on mid-sagittal slices of 69 cases (110 normal vertebrae) showed an average of 0.92 Dice similarity coefficient.

Most of the methods presented to date for the segmentation of the spinal cord from MRI, has been semi-automatic [21, 41, 65, 74]. They include various approaches such as B-spline active surface optimization [21], watershed segmentation [74] and deformable models [65]. Horsfield et al. [41] proposed a semi-automatic method utilizing a constrained active surface model of the cord surface assess multiple Sclerosis.

In the past few years there has been few research efforts towards the fully automated spinal cord segmentation. Koh et al. [53] developed an approach using Gradient Vector Flow Field which achieved a similarity index of 0.7 on 52 cases. They estimated the spinal cord using the magnitude of the gradient vector flow edge map, followed by a connected component analysis to remove holes in the segmentation. The same research group [54] proposed an unsupervised and fully automatic method based on an active contour model based on saliency maps, achieving 0.71 Dice Similarity Index on 60 cases. Similarly, Mukherjee et al. [69] applied an active contour approach, which evolved an image gradient based, open-ended contour using dynamic programming-based energy-minimization. Evaluation on MRI scans of cat showed a mean positive correlation of 0.94. More recently, Chen et al. [15] proposed a deformable atlas-based registration combined with a topology preserving classification to robustly segment the spinal cord and the CeroSpinal Fluid (CSF).

In a knowledge-based approach to reconstruct the cervical tissues of the cervical spine, Seifert et al. [83] used the Hough transform and knowledge about spine curvature to find initial seed points for discs which are then refined by clustering by considering the center of gravity of the cluster as the disc center. Disc centers are then used to segment the soft tissues (spinal cord, trachea and discs) from nine cervical MRIs resulting in 91 % accuracy. However, due to the use of a number of rules and heuristics, it is not clear if this approach will work for pathological cases.

In most of the previous work, segmentation of the Dural Sac, vertebrae and intervertebral discs have been handled separately which might lead to overlapping tissue regions. Moreover, some techniques depend on shape models giving rise to errors in case of high variability in appearance. Recently Ghosh et al. [34], used a Gibbs sampling approach to simultaneously label all tissues in the lumbar MRI. This method uses both neighborhood intensity information and label information for each update. Experimental results on 53 cases showed an average Similarity Index of 0.77 and 0.66 for the Dural Sac and Intervertebral discs respectively. Within the same research group, Alomari et al. [6] presented a coordinated joint model to accurately segment the lumbar discs from clinical MRIs in addition to their diagnosis work.

On the other hand, due to better discrimination of soft tissues in MRI, there has been a growing interest in the research community for automatic diagnosis of InterVertebral Discs (IVD) abnormalities such as Herniation, Degeneration, Desiccation, as well as Spinal Stenosis and Spinal Scoliosis from 2D and 3D MRIs. Most of these efforts include steps for localization and segmentation of the target structure.

Early last decade, Tsai et al. [94] detected Herniation from 3D MRI and CT volumes of the discs by using geometric features such as shape, size and location. However, it is a computationally expensive method and served better for visualization.

Clinical MRIs are, however, mostly 2D due to the high cost and acquisition time involved. Michopoulou et al. [67] presented the classification of the Intervertebral Discs (IVDs) into normal or degenerated, by using fuzzy C-Means to perform semi-automatic atlas-based disc segmentation and then used a Bayesian classifier. They achieved 86–88 % accuracy on 34 cases. They also reported 94 % accuracy using texture features [66] for 50 manually segmented discs.

A reasonable amount of research involving the use of real clinical MRIs on large dataset from the same research group [3, 4, 31, 30, 55] and diagnostic reports has also been reported. Alomari et al. [4] presented a fully automated herniation detection system using GVF-snake for an initial disc contour and then trained a Bayesian classifier on the resulting shape features. They achieved 92.5 % accuracy on 65 clinical MRI cases but a low sensitivity of 86.4 %. Alomari et al. [3] also presented a desiccation diagnosis system in lumbar discs from clinical MRI using a probabilistic model and achieving over 96 % accuracy. Ghosh et al. [31, 30] presented a comprehensive comparison of features, dimensionality reduction techniques and classifiers for herniation detection resulting in high specificity and sensitivity. They were however evaluated on only 35 clinical cases. Koh et al. [55] developed a computer-aided diagnosis framework for lumbar spine with a two-level classification scheme using heterogeneous classifiers. They used clinical MR image data from 70 subjects in T1 and T2-weighted sagittal view for evaluation of the system achieving 99 % herniation detection accuracy along with a speedup factor of 30 times in comparison with radiologist’s diagnosis.

Jäger et al. [45] presented a complete system for computer-aided assessment of anomalies in 3-D MRI images of Scoliotic spine which provided an orthogonal view onto every vertebra. First the spinal cord is segmented using a manual seed point and an iterative process where the segmentation is updated by an energy based scheme derived from Markov random field (MRF) theory. Then the vertebrae are labeled using an intensity profile and finally using parametric approximation MPRs (Multi-planar reformatting) are computed that are orthogonal to the backbone for every position of the spinal cord. Evaluation on 20 clinical 3-D MRI SPACE datasets, results in a mean angle difference of less than six degrees.

Along with proposing a method for the 3D segmentation of vertebral bodies and IVDs, Neubert et al. [71] showed that the shape parameters describing the extracted 3D volumes of lumbar IVDs allowed successful identification (100 % sensitivity, 98.3 % specificity) of IVDs with early degenerative changes. They also noted that the 28 subjects used were asymptomatic, and that the shape features seemed to work well for early detection of degeneration. Recently, the same group [72] evaluated the performance of 3D shape parameters, intensity features, and planar measurements of lumbar IVDs to detect degeneration in 28 asymptomatic and 11 symptomatic patients, concluding that intensity features are the most relevant in symptomatic patients.

In another exploratory work, Ghosh et al. [33] showed the utility of axial lumbar MRI for automatic diagnosis of abnormal discs using Convolutional Neural Network for dynamic feature extraction and classification. They achieved 80.81 % accuracy (specificity of 85.29 % and sensitivity of 75.56 %) on 86 clinical cases (391 discs) using only an axial slice for each disc.

6 Summary

We provided a detailed description of the challenges and the current status towards a fully automated lumbar diagnostic system. Not only is there variability in scans due to varying modalities and parameter settings, there is also extreme inter-patient vulnerability due to patient structure, age, gender and abnormalities. In addition, medical scans suffer from problems like partial volume effects and intensity inhomogeneity which makes segmentation, labeling and diagnosis from medical imaging scans a very challenging problem. While CT uses harmful radiation, it is cheaper than MRI. However, MRIs are better in terms of soft tissue details and is a preferred modality to diagnose underlying causes of back pain. There has been significant efforts in the past few decades towards automatic labeling, segmentation and diagnosis via vertebral column CT and MRI scans. Approaches suggested in the current literature use various image processing, machine learning and computer vision techniques. However, in the direction of automatic diagnosis using real clinical MRI data, work has been rather limited due to the unavailability of data and the fact that clinical data are relatively more challenging.