Introduction

Low back pain is one of the most common causes of disability for individuals of working age in developed countries [1, 32]. There are many causes of low back pain, and it is generally believed that degenerative disc disease (DDD) is one of the most prevalent [7, 27, 38]. Although the mechanisms by which DDD may cause low back pain are not clear, the severity of DDD is associated with onset of symptoms [29]. Degeneration of the intervertebral disc (IVD) is associated with progressive changes in material properties, matrix composition, and morphology. One accepted method for grading IVD degeneration severity is pathomorphological [42]. This method has been shown to be both reliable and repeatable. Biochemical methods have also been established [2]. These two methods have been shown to correlate well with each other and are often used in research where cadaveric tissue is analyzed. However, because they have limited clinical utility, a non-invasive system for grading severity of DDD would be useful.

Obviously changes in IVD morphology and matrix composition are observable with various imaging modalities. Even prior to disc degeneration, it has been suggested that vertebral endplate changes may occur [4, 35, 36], and endplate changes such as sclerosis, Schmorl’s Nodes, endplate shape alterations and calcifications can be observed on plain radiographs and on MR [13, 17, 37].

During the first phase of IVD degeneration, loss of proteoglycans and collagen type II has been observed [2]. Because the technique of MRI is based on proton density, water content and chemical environment of protons [28], with proteoglycan denaturation MR can detect the associated dehydration as a loss of the signal intensity on T2-weighted images. This method is commonly accepted and mostly visually classified [41, 43]. In addition to direct changes in the disc, Modic noticed that signal intensity changes in the vertebral-body marrow adjacent to the endplate are also strongly associated with clinical symptoms related to DDD of the same level [23].

With progressive matrix alterations, changes in disc morphology also become visible in radiographs. One such change is a reduction in disc height, which can be quantified by comparison with a database of age-, body height- and gender-corrected normal values [11]. Another change is osteophyte formation, which is understood to be a compensation mechanism to distribute the increasing axial and shear forces (due to instability) on a larger articulating surface. Macnab et al. differentiated osteophytes of the spine into two types: traction and claw spurs. The first is thought to be the result of abnormal shear loads and is proposed as a sign of instability, while the second is proposed to arise from compressive loads and is a benign, age-related finding. The displacement of the intervertebral disc may lead to traction at the site of osseous attachment (Sharpey fibers) of the annulus fibrosus or short perivertebral ligaments to the vertebral surface. Osteophytes develop at this location, several millimeters from the disco-vertebral junction and are well depicted in radiographs [20, 33].

Nevertheless, even with such changes, MRI is believed to provide a more detailed picture of disc morphology including the endplates. The axial deformation of the disc—disc extension beyond the interspace (DEBIT)—has been classified by Jensen et al. into five categories (1) intact, (2) bulge, (3) protrusion, (4) extrusion and (5) sequestration) [16]. Viikkari graded the shape of the nucleus into four degrees (1) round/oval, (2) extension into inner annulus, (3) extension into outer annulus and (4) extension beyond outer annulus, using sagittal MR slides [43]. Yu divided annular tears into three degrees: (I) concentric tears: fluid-filled space between the annular lamellae; (II) radial tears: rupture of all annular layers; and (III) transversal tears, plus rupture of Sharpey fibers. Types II and III are visible on MRI as high-intensity zones on T2-weighted images [44].

Because many of these individual parameters have been associated with severity of IVD degeneration, they are currently being examined and combined to develop a non-invasive grading scheme for imaging. Yet most of these studies use clinical data and cannot be compared to established grades of IVD degeneration [31]. Hence, the aim of this study was to compare various imaging parameters used to describe IVD degeneration with gross-morphology-IVD-degeneration grades and biochemical IVD-matrix composition, and to create grading methods based only on significant parameters.

Methods

Specimen

Thirty-nine human lumbar discs (L1–S1) were collected from eight cadavers (age range: 19–86 years, average 54 years) and stored at 4°C, wrapped in saline-soaked gauze to prevent dehydration. Intersegmental ligaments and the paralumbar musculature were left intact for better contrast during MRI examination. The spines were prepared within 48 h postmortem to minimize water-content changes and enzymatic degeneration of the disc matrix. Donors with any pathology affecting the spine were excluded from this study. (Table 1)

Table 1 Spine donor data

Imaging

Prior to freezing, MRI scans were obtained with a GE 1.5-T unit (General Electric Medical Systems, Milwaukee, WI) and using a spine array coil (5×11 in.). The following spin-echo sequences were used:

  • Axial localizer (spoiled gradient)

  • Sagittal T1 (TE minimum full/TR 400)

  • Sagittal T2 (TE 100/TR 4000), sagittal proton density (TE 10–20/TR 2000)

  • Axial T1 (TE minimum full/TR 400) (thickness 4 mm/spacing 0.4 mm, matrix 512×512, FOV 26 cm)

Three observers (one radiologist, S.E.A., and two orthopedic surgeons, P.F.H. and L.M.B.) examined all images and assessed the following eight parameters:

  1. 1.

    T2-intensity loss

  2. 2.

    Modic changes

  3. 3.

    Endplate cartilage loss

  4. 4.

    DEBIT score

  5. 5.

    Annular tears

  6. 6.

    Osteophytes

  7. 7.

    Nucleus pulposus (NP) shape

  8. 8.

    Endplate integrity

Assessment was performed according to recommendations of other authors [16, 23, 26, 41, 43, 44]. (Table 2, Fig. 1)

Table 2 Magnetic resonance imaging parameters and scores. For axial and sagittal images: T2-intensity loss, Modic changes, osteophytes and nuclear pulposus (NP) shape correlated most significantly with degeneration. Modic type 1: decreased signal intensity on T1-weighted spin-echo images and increased signal intensity on T2-weighted images; Modic type 2: increased signal intensity on T1-weighted images and isointense or slightly increased signal intensity on T2-weighted images; DEBIT disc extension beyond the interspace
Fig. 1 a
figure 1

Grade 1: segment L4–L5 of a 19-year-old female. The nucleus is gelatinous and round, or oval-shaped, on MR with a high T2-signal intensity. The annulus show discrete fibrous lamellas. The endplate is intact, with a uniformly thick hyaline cartilage layer. The vertebral body still has rounded margins, with no Schmorl’s nodes or sclerosis; b grade 2: segment L2–L3 of a 53-year-old female. In the nucleus, peripherally, white fibrous tissue becomes visible. The annulus presents mucinous material between lamellas. The cartilage layer of the endplate becomes irregular; c grade 3: segment L3–L4 of a 53-year-old female. Consolidated fibrous tissue in the nucleus area and loss of the annular nuclear demarcation. On radiographs, reduction of the disc height and early osteophytes formation is visible. Frequently, rim calcification and sclerosis are also visible. On MRI, T2-signal intensity is reduced; the nucleus flattens and can extend into the inner annulus; isolated defects of the endplate and concentric tears of the annulus appear; d grade 4: segment L3–L4 of an 86-year-old female. Horizontal clefts parallel to the endplate are visible macroscopically and on MRI. Osteophytes, small Schmorl’s nodes and intranuclear calcification are common findings. The nucleus can extend into the outer annulus; e grade 5: segment L4–L5 of a 64-year-old male. Disc height is markedly reduced. The nucleus has disappeared totally (absent T2 signal), and big, sometimes-bridging, osteophytes appear. Diffuse and severe sclerosis and calcifications dominate

While frozen, anteroposterior (A-P) and lateral radiographs were taken (Faxitron 804, Field Emission, McMinnville, OR, USA) The following settings were used: 60 kV for lateral view and 65 kV for A-P projection, exposure time 5 min, film-focus distance 53 cm with a 0.5 cm Al filter. The three observers classified the following parameters into three or four categories: (1) disc height, (2) endplate sclerosis, (3) osteophytes, (4) Schmorl’s nodes, (5) intradiscal calcifications and (6) endplate shape [11, 14, 17]. (Table 3, Fig. 1)

Table 3 Parameters (and scores) on plain anteroposterior and lateral radiographs; disc height, osteophytes and calcifications correlated significantly with degeneration

Morphometry

After thawing, functional spine units (FSU) consisting of both endplates and intervertebral disc were isolated and the posterior elements removed. A double-bladed cutting device was used to sagittally section the disc into three parts: (1) half, (2) 1 cm thick para-midsagittal slab and (3) the remainder. Digital photos were taken of the mid-sagittal section surface (Minolta RD175 digital camera, Minolta, Japan) and graded using a five-category grading scheme for assessing the gross morphology of the IVD [42]. (Table 4, Fig. 1)

Table 4 The Thompson grading scheme [42] for gross morphology of the human lumbar intervertebral disc

Biochemistry

Samples for biochemical analysis (water and proteoglycan content) were harvested from the nucleus pulposus of the mid-sagittal slab. Water content was calculated by weighing the samples before and after drying (110°C, 5 days) [3]. For proteoglycan (PG) content, samples were digested in proteinase K before measuring the glycosaminoglycan content with a dimethylmethylene blue dye binding colometric assay at 530 nm (Perkin Elmer HTS 7000, Shelton, CT, USA) [10].

Statistical analysis

General linear regression models were used to assess the correlation between radiographic or MRI grades, i.e., the sum of scores for selected parameters, and morphological grade. To determine the most significant parameters to be selected for each composite imaging score, a backward elimination was applied. Parameters below 10% significance level were omitted until parameters could no longer be rejected. This resulted in a composite score for X-ray (CRS) and MRI (CMS) using only the most significant parameters of the MRI and radiography results. Furthermore, a combined imaging score (CIS) using both CRS and CMS was determined, using the same backward-elimination technique. In order to assess the discriminating potential between distinct grades of the three scoring systems, both parametric (one-way ANOVA with Tukey’s post hoc testing) and non-parametric (Kruskal–Wallis with Mann–Whitney post hoc testing) statistical analyses were performed.

Simple linear analysis of variance regression was used to examine the correlation of nuclear proteoglycan and H2O content with the MRI imaging parameters. One-way ANOVA with Tukey’s post hoc testing was used to assess differences between grades. For all statistical analyses a significant level of p<0.05 was used.

Cohen’s kappa statistic is often used to test for interobserver reliability between a pair of observers. When there are more than two observers, this statistic is calculated for all possible pairs. However, in this study, due to the high cell-level variation, i.e., the number of grades and number of cases, it was not possible to calculate the kappa statistics. As an alternative, a reliability analysis using Cronbach’s alpha model was used to analyze interobserver agreement between all three observers simultaneously.

Results

X-Ray

Regression of the radiological grade with the morphological grade showed that disc height loss, osteophytes and intradiscal calcifications were most significantly correlated with the morphological grade, accounting for 80% of the total variation, or 96% of the variation within our specimen population. Out of these three parameters, disc height loss correlated with the morphological grade much more significantly (p<0.001) than the other two parameters (osteophytes p<0.014 and calcifications p<0.010). The correlation of this composite radiographic score (CRS) with the morphological grade of degeneration was, again, highly significant (p<0.001), with a Pearson correlation coefficient of 0.77 (Fig. 2). Furthermore, discs of different morphological grades had significantly different composite radiographic scores (p<0.001 for both parametric and non-parametric tests). However, both parametric and non-parametric post hoc testing showed that this discrimination was only significant between discs of grades 1 or 2 vs grade 3 or higher, and grade 3 vs grades 4 or 5.

Fig. 2
figure 2

Composite radiographic score (CRS) vs morphological degeneration. Data points represent the mean of the sum of parameters: disc height loss, osteophytes and intradiscal calcifications, with error bars representing one standard error of the mean

MRI

Of all MRI score parameters, only T2-intensity loss (p<0.001), Modic changes (p=0.019) and osteophytes (p= 0.016) were significantly correlated with morphological grade, accounting for 83% of the total variation in the data, or 98% of the variation within our specimen population. Parametric and non-parametric analyses revealed that an MRI score with only these three parameters did not discriminate well between different morphological grades of degeneration, with significantly different MRI scores only between grades 1, 2 or 3 vs grades 4 or 5 (p<0.001). However, a composite MRI score (CMS) also including NP shape was still significantly correlated with morphological grades of degeneration (p<0.001), and this score exhibited significant differences between the higher grades of degeneration, i.e., grades 1, 2 and 3 were not different from each other but were significantly different from grades 4 (p<0.001) and 5 (p<0.001). Grades 4 and 5 were also significantly different from each other (p=0.01) (Fig. 3).

Fig. 3
figure 3

Composite MRI score (CMS) vs morphological degeneration. Data points represent the mean of the sum of parameters: Modic changes, osteophytes, nucleus shape and T2-intensity loss, with error bars representing one standard error of the mean

MR findings and biochemical assessments

T2-intensity loss correlated significantly (p<0.001) with both water and PG content (Fig. 4). However, the water and PG content of all three T2-intensity loss levels did not significantly differ from each other. There was only a significant difference in the water (p<0.011) and PG (p<0.003) content of normal T2-intensity discs as compared with those with either moderate or marked T2-intensity loss, but not between the two severities. It should also be noted that care must be taken in regard to these results, as the number of discs with marked T2-intensity loss was small, n≤4, especially for water content, for which a number of outliers had to be rejected due to desiccation during preparation, caused by the extremely dry ambient conditions in Davos (1,560 m altitude).

Fig. 4
figure 4

Mean proteoglycan content vs the nucleus pulposus to T2-signal intensity loss, with error bars representing one standard error of the mean

Scores

The composite radiographic (CRS) and composite MRI (CMS) scores were merged to see if this combined imaging score (CIS) would enhance the correlation with the morphological grade. Regression showed that the combined imaging score was better correlated to the morphological grade (p<0.001, R2=775) than either the composite radiographic score (p<0.001, R2=642) or the composite MRI score (p<0.001, R2=696) alone. Parametric and non-parametric analyses showed that the combined imaging score is not much better than the composite radiographic score at distinguishing individual morphological grades; i.e., only grades 1 vs 2 and grades 4 vs 5 were not significantly different (Fig. 5). Again, this result may simply be due to the limited numbers of disc samples available in grades 1 and 5. In a backward elimination of the regression for the combined imaging score, the parameters Modic changes (MRI), calcifications (X-ray) and T2-intensity loss (MRI) accounted for 87% of the variability; but this reduced score did not enhance discrimination between grades. Finally, the interobserver-agreement analysis using Cronbach’s alpha model of reliability showed a high correlation for all three observers, i.e., low interobserver error. The intra-class correlation coefficients (alpha values) ranged from 0.95 for CRS and CMS to 0.97 for CIS and are supposed to be highly significant. Interestingly, the interobserver error was markedly smaller between the two surgeons than it was for a comparison with the radiologist (Table 5, Table 6).

Fig. 5
figure 5

Composite imaging score (CIS) vs morphological degeneration. Data points represent the mean of the sum of parameters: disc height loss (X-ray), osteophytes (X-ray), calcifications (X-ray), T2-intensity loss (MRI), Modic changes (MRI) and nucleus shape (MRI), with error bars representing one standard error of the mean

Table 5 Disc degeneration in relation to radiography/MRI scores
Table 6 Analysis of interobserver agreement on intervertebral disc degeneration grading, using Cronbach’s alpha model of reliability (CRS composite X-ray score, CMS composite MRI score, CIS combined imaging score)

Discussion

In this study, we investigated eight parameters of degeneration for MR and six for conventional radiographs, and correlated these findings with the biochemical and morphological degree of IVD degeneration. The frequency and distribution of the pathological findings in our data population was comparable to the results of earlier studies used as references [15, 22, 26, 41, 43, 44]. By elimination of less important/significant parameters, we created an easy-to-use grading scheme for radiograph, MR and combined imaging that allows accurate and non-invasive classification of IVD degeneration.

There have been other proposed classifications for IVD degeneration in plain radiography as well as in MRI. Pfirrmann et al. has proposed an MRI score based on homogeneity of the NP, distinction of NP as compared with annulus fibrosus, T2-signal intensity and disc height on fast spin-echo MRI [31]. They have shown that this classification system tested on actual patient images has good reliability. However, accuracy, e.g., morphological grade or symptoms, was not evaluated. In comparison, our results also show that T2-signal intensity, nuclear calcifications, NP shape (extensions into annulus) and disc height are important. However, by contrast, the other parameters that most significantly correlated to morphological grades of degeneration were specific to the vertebral body, i.e., Modic changes, osteophytes and calcifications of the rim. Hence, neglecting the adjacent osseous structure, may limit our interpretation of images when assessing IVD degeneration severity.

Surprisingly, some pathological MR findings such as annular tears, DEBIT, Schmorl’s nodes, sclerosis and endplate integrity in our data did not correlate as well with the morphological appearance, used as the gold standard. The reasons may be manifold, but one reason is that, in our sample population of donors, without history of spinal disorders, these findings were rare or difficult to detect. Annular tears were detected in six out of 39 examined discs (five concentric, one radial), which is the expected frequency for asymptomatic individuals [15]. Osti showed that normal MRI does not exclude significant changes in the peripheral structure of the intervertebral disc and that discography is more accurate [25]. With nine bulges, three protrusions and one extrusion out of 39 discs, DEBIT was noted in a slightly lower prevalence than expected for our donor population compared with literature [15]. Sclerosis was observed only in six out of 78 vertebral endplates (four moderate, two severe). Katz described five different types of endplate sclerosis, which did not always show the classic radiographic band-like pattern along the vertebral endplate. The atypical patterns may be difficult to recognize as degenerative in origin, since infection, Paget’s disease, metabolic disorders and metastasis can manifest as radiographical sclerosis [17]. Additional T2-weighted MRI allows differentiation between discogenic (degenerative) and infection-induced sclerosis [40]. Endplate cartilage loss, which is associated with aging and degeneration, was only seen in five of 78 samples examined by MR [34]. Finally, Schmorl’s nodes are also difficult to depict on plain lateral radiographs as corroborated by our results (one of 78 endplates on radiographs vs 13 of 78 on MRs, 7.7%) and others (0–47%)[9, 13, 21]. However, its value for disc degeneration remains unclear and controversial. Whereas in some older studies a relation of the appearance of Schmorl’s nodes to degeneration was observed [14], more recent work with additional MRI as well as our results cannot confirm this correlation [9, 13, 30]. The radiological pattern of Schmorl’s nodes seems multifactorial in origin, and degeneration, among other factors such as predisposition and axial traumas, is only one possible pathomechanism. So, before these parameters such as annular tears, Schmorl’s nodes, sclerosis and endplate integrity can be used as indicators for IVD, further studies are needed to evaluate and differentiate their importance for degeneration.

The difficulties we faced at distinguishing extreme grades (Thompson grades 1/2 and 4/5) can be due to the low number of samples available at these grades. On the other hand, the clinically relevant bandwidth, e.g., the onset of discogenic symptoms, is between grades 2 to 4 (mild-to-severe degeneration) [27, 38]. Surprisingly, the composite radiographic score (and the combined imaging score) were better at differentiating degeneration grades than the composite MRI score alone (compare Figs. 2, 3 and 5). This may indicate that changes of the vertebral structure occur in an early phase of degeneration. Still, these radiographically defined parameters can also be quantified by MRI, and MR data should be used whenever available. Probably, the use of quantitative MR for T2-signal intensity analysis would improve the discrimination potential of the composite MRI score; but, on the other hand, we tried to use clinical conditions [3].

The results of the biochemical evaluation were within the range of former studies, and proteoglycan and water content correlated as expected to the T2-signal intensity loss [2]. Again, the poor visual three-grade classification for T2-signal intensity limited the detection of early signal loss. The use of a computed signal-intensity measurement with a region-of-interest capability would be an alternative for a more accurate classification [26].

The clinical relevance of our scores is open. Many studies reveal a high ratio of abnormal findings in MR and radiographs in asymptomatic people [5, 15, 26, 39]. The parameters that showed the strongest correlation with morphological degeneration—and, therefore, were included in the scores—are associated with low back pain [6, 8, 12, 18, 19, 24, 26]. Thus, the correlation of the scores with clinical symptoms is likely but needs to be examined in further studies.

Conclusions

Selective imaging parameters and a newly created, repeatable scoring scheme was found to correlate with disc degeneration as determined in a morphological manner. Surprisingly, in our sample population, radiographic parameters were better able to distinguish different stages of degeneration, whereas MRI could only detect advanced stages of disc degeneration. We conclude that conventional radiography remains a cost-effective, non-invasive in vivo grading method to detect early disc degeneration and, combined with MRI, correlates best with morphometrical and biochemical assessment of disc degeneration.