Introduction

Lumbar disc herniation is a degenerative disease of high prevalence and one of the most frequent causes of low back pain [1]. The loss of height of the intervertebral disc leads to bulging of the annulus fibrosus into the spinal canal or the neural foramen. These degenerative alterations can lead to complications such as spinal cord or nerve root compression, which results in pain or paralysis in the worst case [2]. Fast and reliable diagnosis and treatment are necessary to avoid poor outcomes [3].

Magnetic resonance imaging (MRI) is the diagnostic imaging examination of choice in the diagnosis of lumbar disc herniation [4]. However availability of MRI is limited in small hospitals and outside of business hours. In some patients, absolute and relative contraindications such as cochlear implants, pacemakers, claustrophobia or the inability to lie still because of pain may limit MRI as a diagnostic tool. Computed tomography (CT) is an alternative, non-invasive imaging technique with high availability and shorter examination time [5]. However the diagnosis of lumbar disc herniation in CT is challenging for radiologists in comparison to MRI, because of low contrast between the intervertebral disc and the cerebrospinal fluid. The pooled summary estimates of sensitivity and specificity were moderate at approximately 75% according to a meta-analysis from 2012 with the most recent included study dating from 1993 [5]. Another study from 1989 calculated an accuracy, false negative rate, false positive rate, sensitivity and specificity for CT of 73.6%, 40.2%, 13.8%, 59.8% and 86.2%, respectively, and those for MRI of 76.5%, 35.7%, 13.5%, 64.3% and 86.5% in comparison with surgery [6].

Since the aforementioned studies, CT technology has evolved especially with improvement of image quality. Notable advances in CT technology are multidetector technology with helical pitch image acquisition and iterative reconstruction (IR) of raw image data. However the diagnostic accuracy of iteratively reconstructed multidetector CT (MDCT) of the lumbar spine has not been evaluated yet.

The purpose of this study was to evaluate the accuracy of contemporary MDCT in the diagnosis of lumbar disc herniation with MRI as standard of reference. Our hypothesis was that MDCT performs similarly to routine MRI.

Material and methods

Patient population

Institutional ethical review board approval was waived for this retrospective analysis of CT and MRI data sets. All patients provided informed written consent before indicated examination. All consecutive adult patients were initially included in the study population, who received diagnostic examination of the lumbar spine with MRI and CT within 4 weeks from 1st January 2010 to 31th December 2014. Only patients referred with low back pain were further included in the study. Patients were excluded if they received bone or metal optimized CT, discontinued the MRI examination, underwent surgery between the examinations or had images that were undiagnostic because of severe metal or motion artefacts. The flow chart of the selection process is shown in Fig. 1.

Fig. 1
figure 1

Patient selection flow chart. 796 adult patients receiving both CT and MRI of the lumbar spine within 4 weeks during a time period of 4 years were initially included. A total of 127 patients remained in the study population after application of exclusion criteria (reason for referral other than low back pain, examination with bone or metal optimized protocol, not examined with Somatom Definition Flash CT and Magnetom Aera MR, undiagnostic images, discontinued examinations, surgery between CT and MR examination). If divided into cohorts, 60 patients with FBP reconstructed CT images and 67 patients with IR reconstructed CT images were in each group

Computed tomography of lumbar spine

All patients received unenhanced CT with a Somatom Definition Flash 128-row dual source multidetector CT machine (Siemens Healthcare, Erlangen, Germany) of clinical apparent vertebral heights in supine position and in helical pitch. The institutional standard protocol included examination parameters of 140 kV peak tube voltage, 50–260 mAs tube current, 38.4 mm detector collimation and 0.8 pitch factor in accordance with the recommendations of the European Guidelines on Quality Criteria for Computed Tomography. Raw data sets of 0.625 mm thickness were reconstructed with either filtered back projection (FBP) or the vendor’s statistical iterative reconstruction (IR) algorithm with a strength of 3. Five presets are available with increasing noise reduction with ascending strength setting. The strength is not associated with the number of iteration loops. High levels of iterative reconstruction lead to an increase of noise reduction but blurring as well [7, 8]. A strength of 3 was implemented in the institutional standard protocol to achieve a good ratio of noise reduction and image sharpness as recommended by the manufacturer [9]. Multiplanar reformations (MPR) were calculated with a slice thickness of 2 mm.

Documentation of radiation dose

Dose parameters were recorded in volume-weighted computed tomography dose index (CTDIvol [mGy]) and dose–length product (DLP [mGy cm]). Effective dose [mSv] was calculated from DLP multiplied by an abdominal conversion factor (0.0153 mSv/(mGy cm)) proposed by Deak et al. according ICRP Publication 103 [10, 11].

Magnetic resonance imaging of lumbar spine

The patients’ lumbar regions were examined in supine position with a Magnetom Aera (1.5 T) (Siemens Healthcare, Erlangen, Germany) using the integrated 32-channel spine coil.

The standard institutional protocol included the following sequences:

  • Sagittal and transverse T2w TSE (TR/TE 5830/108 ms, slice thickness 3.5 mm)

  • Sagittal and transverse T1w TSE (TR/TE 791/14 ms, slice thickness 3.5 mm)

  • Sagittal short tau inversion recovery (STIR) (TR/TE 7560/63 ms, slice thickness 3.5 mm)

  • Additional contrast-enhanced T1 TSE fs (TR/TE 775/12 ms, slice thickness 3.5 mm) according to the clinical indication

Quantitative evaluation of image quality

For the evaluation of image quality in CT mean attenuation values (MAV [HU]) and standard deviation (SD) were measured using Syngo.Via VB10 (Siemens Healthcare). Circular regions of interest (ROI) were placed in normal and homogenous-appearing ventral annulus fibrosus (AF), cerebrospinal fluid (CSF) between lumbar vertebra 3 and sacral vertebra 1, nerve root (NR) and air in axial images and sagittal MPR. ROI were placed in nerve roots in CT because of difficulty in discriminating the spinal cord from the cerebrospinal fluid. Signal-to-noise ratio (SNR) was calculated from the quotient of MAV and corresponding SD measured in the ROI. Contrast-to-noise ratio (CNR) as an indicator for the depiction of different anatomical structures was calculated from the difference in MAV of an interesting tissue compared to MAV of another tissue related to pure image noise [12]. In this study CNR was calculated for the differentiation of AF/CSF and NR/CSF in relation to measured SD of air.

For the evaluation of image quality in MRI mean signal intensity (MSI) and standard deviation (SD) were measured. Circular ROI were placed in AF and CSF identically as in the corresponding CT, in the spinal cord (SC) and air in sagittal images. ROI were placed in the spinal cord and not in the nerve root to avoid partial volume effects. SNR and CNR were calculated as in CT with values of SC instead of NR. An example of placement of ROI is shown in Fig 2.

Fig. 2
figure 2

Example placement of regions of interest. A 36-year-old female patient with paraparesis and suspicion of intraspinal tumour or haematoma. Measurements were performed in soft-tissue windowed sagittal CT images and in sagittal T2-weighted MR images, respectively. ROI were placed in the annulus fibrosus (1), cerebrospinal fluid (2), air (3) and nerve roots in CT (4) in the spinal cord in MRI (4)

Qualitative evaluation of image quality and diagnosis of lumbar disc herniation and neural foramen stenosis

Two radiologists with 4 and 10 years of experience in musculoskeletal imaging independently evaluated the depiction of anatomical structures, overall image quality and the amount of lumbar disc herniation and neural foramen stenosis in MR and CT images. Patient data information was anonymized to avoid bias by identification of corresponding images. The evaluators’ attention was especially focused on the depiction of the intervertebral disc, neural foramen, nerve root and perithecal fat. For the grading of image quality the evaluators used a five-point scale, as shown in Table 1.

Table 1 Patient population data

A nominal scale proposed by Saleem et al. [2] was used to grade lumbar disc herniation (Table 2). For the grading of neural foramen stenosis an ordinal scale was used according Wildermuth et al. [13] with modification [14] (Table 2). Clinical relevance was included in a three-point scale: 0 = normal (grade 0), 1 = protrusion of low clinical relevance (grades 1–3) and 2 = prolapse of high clinical relevance (grades 4 and 5) for lumbar disc herniation, and 0 = normal (grade 0), 1 = neural foramen narrowing of low clinical relevance (grades 1 and 2) and 2 = nerve root compression of high clinical relevance (grades 3 and 4) for neural foramen stenosis. Diagnostic confidence was graded in a five-point Likert scale (Table 2).

Statistical analysis

All statistical analyses were carried out with IBM SPSS Statistics (IBM Corporation, Endicott, NY, USA).

The results of the quantitative evaluation of image quality were displayed as bar charts and compared with ANOVA to assess statistical significance. A p value of less than 0.05 was considered as significant. Table 2

Table 2 Scales used for qualitative assessment and diagnosis

The results of the qualitative evaluation of image quality, amount of noise and depiction of anatomical structures were displayed as bar charts and compared with Mann–Whitney U test to assess statistical significance. A p value of less than 0.05 was considered as significant.

The scores for clinical relevance in CT were compared with MRI as a gold standard. Specificity (SP), sensitivity (SE), positive predictive value (PPV), negative predictive value (NPV) and accuracy (AC) were calculated for the detection of disc herniation and neuroforaminal stenosis with 4 × 4 contingency tables.

The inter-reader agreement was calculated with Cohen’s kappa (κ) with values defined as poor (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), good (0.61–0.80) and excellent (0.81–1.00) [15]. Agreement between CT and MRI was evaluated by calculating intraclass correlation coefficients (ICC) with values defined as poor (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), good (0.61–0.80) and excellent (0.81–1.00).

Results

From a total of 796 patients, 127 patients were finally included in the study population: examinations were reconstructed with FBP in 60 patients (24 male, 36 female) and with IR in 67 patients (38 male, 29 female). The age of patients of the FBP group ranged from 27 to 87 years, with an average of 65 years; the age of patients of the IR group ranged from 33 to 90 years, with an average of 63 years. The average maximum diameter of the abdomen measured in the CT survey images was 28.2 cm in the FBP group and 28.0 cm in the IR group. All patient population data are included in Table 1.

The prevalence of lumbar disc herniation per patient was 85% and 85.7% in FBP and IR groups, respectively. The prevalence of disc protrusions and prolapse per patient and per disc are shown in Table 1.

The average radiation dose for CT was 21.96 mGy CTDIvol (average effective dose 8.22 mSv) in the FBP group and 25.68 mGy CTDIvol (average effective dose 10.62 mSv) in the IR group.

Image quality

Example images of FBP, IR and MRI are shown in Fig. 3. The results of the qualitative assessment of image quality in general, the amount of noise and depiction of anatomical structures are shown in Fig. 4. Average scores given for evaluation of general image quality were 3.85 and 4.56 for CT and MRI in the FBP group, respectively, and 4.72 and 4.67 for CT and MRI in the IR group. Scores given to IR images for general image quality and depiction of anatomical structures were significantly higher than scores given to FBP images (all p values less than 0.01). Scores given to IR images for the amount of noise were significantly lower than scores given to FBP images (p value less than 0.01). Cohen’s kappa of 0.82 revealed an excellent agreement between the readers.

Fig. 3
figure 3

Example images of FBP and IR CT images and corresponding MR images. A 65-year-old male patient with acute paraplegia and history of renal cell carcinoma. Included patient examinations were reconstructed with either FBP or IR. In comparison with FBP, IR shows less noise and visually sharper edges of anatomical structures, especially of the intervertebral disc. In MRI small disc protrusions of L4/5 and L5/S1 are confirmed

Fig. 4
figure 4

Qualitative assessment of image quality, noise and depiction of anatomical structures. IR images were rated with significantly higher scores than FBP images for image quality (median FBP 4.0, IR 5.0) and depiction of intervertebral disc (median FBP 3.5, IR 5.0), neuroforamen (median FBP 4.0, IR 5.0), nerve root (median FBP 4.0, IR 5.0) and perithecal fat (median FBP 4.0, IR 5.0). Respectively, IR images were rated with significantly lower scores for noise than FBP images (median FBP 1.0, IR 0.0). Corresponding MRI of FBP and IR groups were rated with similar scores for image quality, noise and depiction of anatomical structures

The results of the quantitative assessment of image quality are shown in Tables 3 and 4 and in Fig. 5. SNR calculated for IR were significantly higher than for FBP. However calculated CNR were equal in both IR and FBP.

Table 3 Average Hounsfield units and signal intensity of anatomical structures
Table 4 Signal-to-noise and contrast-to-noise ratios
Fig. 5
figure 5

Signal-to-noise and contrast-to-noise ratios. Signal-to-noise ratios (SNR) were calculated for annulus fibrosus (AF), cerebrospinal fluid (CSF) and nerve root (NR); contrast-to-noise ratios (CNR) were calculated for annulus fibrosus and cerebrospinal fluid (AF/CSF) and nerve root and cerebrospinal fluid (NR/CSF). For AF, CSF and NR calculated SNR was significantly higher in IR than FBP. But for CNR of AF/CSF and SC/CSF there was no significant difference between IR and FBP. The absolute values are shown in Table 4

Diagnostic accuracy

The results of all calculations for sensitivity, specificity, positive and negative predictive value and accuracy are shown in Tables 5 and 6. In cases of large amounts of metallic material and deformed bony pathologies, the diagnosis of lumbar disc herniation and neuroforaminal stenosis was challenging for the readers. In the diagnosis of lumbar disc herniation and neuroforaminal stenosis the readers were absolutely confident in all MR and IR images (average scores greater than 4.8), but only probably confident in FBP images (average score 4.3 for neuroforaminal stenosis and 4.1 for lumbar disc herniation) because of the higher amount of noise. Two examples of underestimated lumbar disc prolapse and protrusion are shown in Fig. 6a, b.

Table 5 Diagnostic accuracy based per patient
Table 6 Diagnostic accuracy based per height
Fig. 6
figure 6

a Disc extrusion underestimated as broad-based protrusion in CT. A 73-year-old male patient with history of disc herniation L5/S1 and acute micturition disturbance and low back pain. Disc extrusion was underestimated as broad-based protrusion in CT because of low contrast between disc and cerebrospinal fluid. b Broad-based disc protrusion underestimated as subligamentous bulging in CT. A 70-year-old male patient with low back pain and paraesthesia of lower limbs. Broad-based disc protrusion with spinal canal stenosis was underestimated in CT because of dorsal spondylophytes

In the detection of lumbar disc herniation in general, MDCT found 583 true positives (TP), 448 true negatives (TN), 33 false positives (FP) and 55 false negatives (FN) per height, resulting in excellent SN, SP, PPV and AC (>90%), high NPV (>80%) and a good agreement with MRI (ICC 0.8).

As for the detection of lumbar disc prolapses, MDCT found 63 TP, 1031 TN, 4 FP and 21 FN per height, resulting in excellent SP, PPV, NPV and AC (>90%), moderate high SN (>70%) and moderate agreement with MRI (ICC 0.57).

In the detection of lumbar disc protrusions, MDCT found 595 TP, 499 TN, 18 FP and 7 FN per height, resulting in excellent SN, SP, PPV, NPV and AC (>90%) and good agreement with MRI (ICC 0.73).

In the detection of nerve root compressions, MDCT found 44 TP, 775 TN, 245 FP and 16 FN per height, resulting in excellent NPV (>90%), moderate high SN, SP and AC (>70%), low PPV (<20%) and good agreement with MRI (ICC 0.78).

Discussion

This study’s aim was to evaluate the diagnostic accuracy of contemporary CT. Our study revealed an excellent specificity, accuracy and precision for MDCT for the detection of lumbar disc herniation. This is important, because compared to MRI, CT is a highly available examination and often the only modality available out of hours. Given that MRI is often secondarily performed, the diagnosis has already been made on the basis of CT. MRI has higher operation expenses and longer duration of examinations than CT which may cause additional costs and prolong time to treatment. Our results suggest that MDCT is an accurate and precise alternative to MRI to confirm lumbar disc herniations. However, because of the moderately high sensitivity of MDCT in the diagnosis of disc prolapses, complementary MRI may be justified in negative cases in CT but with strong clinical suspicion for a disc prolapse, and in negative cases of lumbar disc herniation in CT but with suspicion for differential diagnoses causing symptoms of spinal canal stenosis or nerve root compression such as tumours or inflammatory diseases. Severe spinal canal stenosis may lead to compression and oedematous alteration of the nerve roots. In the depiction of the spinal cord and cauda equine, CT is known to be inferior to MRI. However cauda equina syndrome (CES) is primarily a clinical diagnosis and delayed treatment can result in increased morbidity [16, 17]. Although CES has a low incidence of 2–6% in patients with lumbar disc herniation [16], the most common cause of CES is lumbar disc herniation [18]. This is why in case of unavailability of immediate MRI-based diagnosis, CT may be sufficient for the fast identification of the affected height for the surgical planning.

Another important aspect in the proper diagnostic performance is the diagnostic confidence of the readers. A high diagnostic confidence is important for a report valuable for the determination of the therapeutic strategy. This study revealed a higher confidence of the evaluators with IR than with FBP images. The evaluators were more confident with their diagnoses with IR images similarly to MR images. The lower diagnostic confidence of the readers with FBP images may be explained by the inferior image quality in FBP compared to that in IR. The inferior image quality of FBP is demonstrated with significantly higher SNR values of IR in comparison to FBP in the quantitative assessment and significantly worse scores given for FBP than IR in the qualitative assessment of the depiction of anatomical structures, image quality and image noise. CNR values did not differ significantly between FBP and IR images; however, the lower amount of noise may have an impact in the subjective evaluation of depiction.

The diagnostic accuracy of CT in the detection of lumbar spine pathologies was regarded as relatively low in a recent meta-analysis (i.e. 75%) [5]. However to our knowledge, the most recently published study investigating conventional CT was published in 1993.

Advances in CT technology such as helical multidetector systems may be a possible explanation for the difference between the results of Van Rijn et al. and this study. Hu et al. observed an increase in image quality and image acquisition speed in a human phantom model with MDCT [19], and Chawla concluded that MDCT leads to an increase of quality and speed in the assessment of bony injuries of the spine [20].

In terms of image quality the results of this study are concordant with similar studies in the bony cervical spine [21] and other organs [2224].

For the detection of neuroforaminal stenosis MDCT showed low specificity, moderate high sensitivity, accuracy and precision and excellent negative predictive rate. These results suggest that MDCT may be an appropriate tool to exclude neuroforaminal stenosis. However an explanation for this low specificity of MDCT may be that MRI tends to underestimate nerve root compressions [25].

In this study MRI was chosen as standard of reference with the advantage of good comparability to CT as a cross-sectional imaging technique and frequently indicated examination, but with the disadvantage of a possible systematic error because of unclear diagnostic accuracy of MRI caused by the absence of a definite gold standard. This is why this study cannot be directly compared to the results of van Rijn et al. who reviewed studies using surgery as standard of reference, but this study follows their implication of research to evaluate on a patient level with an unselected population of patients with low back pain [5].

A limitation of this study is that the FBP and IR groups did not involve the same individuals, because of an adjustment of the CT examination protocol from FBP to IR in 2011, but retaining all other examination parameters. The prevalence of disc protrusion, prolapse and neuroforaminal stenosis and all results regarding MRI were comparable between the groups. For further studies evaluating diagnostic accuracy it is recommended to evaluate images originating from identical CT data sets.

Another limitation is that the IR algorithm used in this study was from one vendor specific for Somatom Definition CT machines only (SAFIRE by Siemens Healthcare, Erlangen, Germany). Other vendors’ IR algorithms for other CT machines were not available for this study.

As a further development of the IR algorithm, the so-called model-based iterative reconstruction (MBIR) considers data modelling information such as system optics within the CT machine and interaction of X-rays with the human body. MBIR was shown to further increase image quality in cranial, cervical, thoracic, abdominal and angiographic CT imaging [22, 2631] and has good potential in dose-reduced CT imaging [28, 32, 33]. For this study an MBIR algorithm was not available yet. However with increasing availability of MBIR-equipped CT, further studies assessing the diagnostic accuracy in the detection of spinal pathologies of new image reconstruction technologies or of dose-reduced imaging of the spine are encouraged so as to compensate for the disadvantage of radiation exposure in contrast to MRI.

To summarize the findings of this study, MDCT imaging showed an excellent accuracy and precision in the detection of lumbar disc herniation and neuroforaminal stenosis. Improvement of image quality by IR increases the diagnostic confidence of the readers in the diagnosis of lumbar disc herniation and neuroforaminal stenosis. As far as comparable to previous studies, further development of CT imaging techniques has shown an improvement of diagnostic accuracy.