Introduction

Intervertebral disc (IVD) degeneration is a universal and natural process. As we age, our discs lose elasticity and begin to show signs of degeneration. Proteoglycans, a water retaining molecule, diminish within the degenerating discs. With the decrease in water, the discs lose their ability to resist compression and torque. Eventually the discs begin to lose height and structural integrity. The cascade of events that manifest as the discs degenerate can often progress to a pathologic state leading to chronic pain [1].

Physicians have routinely used magnetic resonance imaging (MRI) to assist in the clinical diagnoses of disc degeneration [2, 3]. Several MRI-based metrics have been developed to facilitate communication between clinicians concerning the degree of degeneration in patients. The current gold standard is the 5-grade system developed by Pfirrmann et al. [3]. They demonstrated that the discs could be rated by independent raters with reasonably high confidence and reproducibility. While others have tried to expand and improve upon this technique [4], it is qualitative in nature and subject to bias. While most discs in Pfirrmann’s study were rated in agreement, some of the ratings differed by one grade. In another rating, the difference was two grades. These errors could have a large impact on making the right decision for the course of treatment. Moreover, such generalized grading schemes might miss the fact that two discs rated with the same grade could actually be in different phases of pathologic changes. Therefore, it is imperative to discover biomarkers that supplement the Pfirrmann grading system.

Pfirrmann classification includes visual assessment of disc height and other features seen in T2 weighted (T2W) images. If it maintained a nominal height, it would be given a lower score, provided other aspects of the visual inspection complemented that finding. Several other studies have also documented an association between disc height loss and disc degeneration [3, 57]. However, there has not been any comprehensive analysis of the link between loss of structural integrity and various other physiological and morphological changes.

To address the need for such imaging biomarkers, we proposed a new method to quantify disc height loss and studied its association with degeneration. We also investigated the associations between disc height loss and changes in apparent diffusion coefficient (ADC) and voxel intensities in T2W images because there is strong evidence that disc height loss, restricted water diffusion in the nucleus pulposus (NP) and dehydration are good indicators of degeneration and those can be probed indirectly using MRI [512].

Materials and methods

Subjects and MRI protocol

51 adult participants took part in this study. The study was approved by the IRB and written consents were obtained from all participants. The demographic characteristics of the study cohort are given in Table 1.

Table 1 Demographic characteristic of the study cohort

Images of the lumbar spine were acquired using a 3T GE Discovery MR750 (Waukesha, WI, USA) MRI system. All images were acquired with a CTL-spine coil, FOV = 310 mm and 16-sagittal slices with 3 mm thickness. A conventional T2W MRI was acquired using a Fast Spin-Echo sequence with TR/TE = 4500 ms/104 ms, ETL = 24 and 1 mm in-plane resolution. Diffusion weighted images (DWI) were acquired using a single-shot EPI with TR/TE = 2100 ms/75 ms, NEX = 12 and 2.4 mm in-plane resolution b = 0, 600 s/mm2.

T2W images were reviewed by two expert radiologists and a trained medical student for Pfirrmann classification of the lumbar IVDs. If there was a disagreement between the raters, the median score was assigned to the disc.

Multi-parametric quantitative MRI of disc degeneration

ADC calculation

ADC is a measure of bulk diffusion of water molecules and there is compelling evidence in published data indicating its use for assessment of disc degeneration [811, 13]. It is an indirect measure of some pathologic changes in the NP such as fibrosis, cross-linking, and dehydration.

The ADC values were calculated in each voxel from the motion-corrected DWI images using Eq. 1:

$$S_{1} \, = \,S_{0} \cdot \exp ( - b_{1} \cdot {\text{ADC}})$$
(1)

where S 1 and S 0 are the signal intensities measured with and without the diffusion weighting, respectively. Regions of interest (ROI) that encompass the whole NP were drawn on the S 0 image and voxel-averaged ADC values were calculated for each IVD.

Normalized disc signal intensity in T2W images

T2W images reflect the progression of degeneration very well because the voxel intensities show good correlation with water and proteoglycan content in the IVDs [13, 14]. It is also used as one of the features in Pfirrmann classification. Therefore, we derived a semi-quantitative measure from the T2W images. First, the mean voxel intensity in each NP was calculated. Then another ROI was drawn in a uniform region of the gray matter of the spinal cord and the mean intensity was calculated. Then, the mean intensity in each disc was divided by the mean intensity in the spinal cord gray matter to obtain a metric that can be compared across subjects. Based on earlier reports [15], we expected that the inter-subject variations in T2W signal in spinal cord gray matter would be minimal in subjects with no spinal cord anomalies.

Disc height index

Here we propose a new method to assess disc height loss with degeneration. Part of this method was modeled after a technique that was originally proposed by Lu et al. [16] and later refined by Masuda et al. [17]. They introduced a measurement called disc height index (DHI), which helps minimize the inter-subject variations in overall size of the vertebral column, enabling inter-subject analyses. The proximal (PV) and distal (DV) vertebral body height and IVD height (DH) were measured from the anterior (1), middle (2), and posterior (3) portions of each respective disc level on T2 W images (a sketch is shown in Fig. 1). The measurements were performed on the midsagittal slice. The corners of the vertebral bodies and the midpoints of the endplates were marked using an approach similar to the one described by Inoue et al. [18] and Quint et al. [19]. Then, the measurement lines were drawn between those landmarks as depicted in Fig. 1 and distance measurements were taken. DHI was determined using Eq. 2:

Fig. 1
figure 1

A sketch of measurements taken from the vertebral bodies and IVDs for DHI calculation given in Eq.2

$${\text{DHI}}\, = \,2\,*\,\frac{{{\text{DH}}1 \, + \, {\text{DH}}2 \, + \, {\text{DH}}3}}{{\left( {{\text{PV}}1\, + \, {\text{PV}}2 \, + \, {\text{PV}}3} \right)\, + \, \left( {{\text{DV}}1\, + \, {\text{DV}}2\, + \, {\text{DV}}3} \right)}}.$$
(2)

Although DHI calculation given in Eq. 2 or similar methods proposed by Haefeli et al. [7] or Inoue et al. [18] are useful, it does not tell us how much a degenerating disc collapsed with respect to a healthy one. Therefore, we first established normative data for each lumbar IVD by picking discs with no discernible degeneration (grade 1 or 2) from the subjects. The DHI scores of all healthy discs at a particular level (e.g., L1/L2) were averaged so that each disc was assigned a mean and standard deviation, representing the healthy population (Table 2). Then, each disc of every subject was given a DHIdiff score based on how much its DHI deviated (in units of standard deviations, σ) from the mean DHI of the corresponding healthy disc. The discs selected for this normative data included those from both sexes, different races and participants with different heights (155–193 cm, mean = 173 cm) and BMI (18.7–41.5, mean = 26.3). Therefore, the standard deviation represents the most probable variations due to these main factors. Further changes from this standard deviation might represent a pathologic condition.

Table 2 Number (N), mean and standard deviation (σ) of DHI of healthy discs for each lumbar level

Statistical analysis

SPSS software version 21 was used for statistical analysis (Armonk, NY, USA). Significance level of 0.05 was set for all analyses. First, one-way ANOVA was conducted to compare the difference between DHI scores with respect to Pfirrmann grades. Then, pairwise comparisons were done using independent-samples t test, assuming unequal variances.

Then, the associations between DHI scores and ADC values or normalized T2 intensity values were analyzed. Seven discs that did not have ADC values were excluded from this analysis (one subject did not undergo DWI scan and NP of two discs were not visible in S 0 images due to severe degeneration). To study the associations between DHI loss, Pfirrmann grades and ADC or T2 changes, the discs were binned based on their DHI scores (with 0.5σ increments) and average ADC, normalized T2W intensity and Pfirrmann grades were calculated for each bin. Then, the differences between group means were compared using one-way ANOVA test for ADC and T2 and Kruskal–Wallis test for Pfirrmann grades.

Results

DHI and Pfirrmann

Pfirrmann classification suggests that grade 2 and 3 discs should not have any discernible disc height loss, while grade 4 and 5 should show significant collapse. This visual assessment was first validated using a box plot of DHIdiff against Pfirrmann grades (Fig. 2). Height loss was not significant in grade 2 discs, but noticeable loss could be seen with grade 3 discs. Those discs had a median DHIdiff of about one σ. Further decline can be seen with grade 4 and 5 discs. One-way ANOVA analysis demonstrated significant differences in DHIdiff with increasing Pfirrmann grades (p < 0.001). Results of pairwise comparisons are shown in Fig. 2. Here, data from grade 1 and 2 discs were combined and compared to discs with higher grades since they are considered as healthy discs and their DHI values were not significantly different in our analysis (p = 0.564). Significant loss of disc height with increasing degree of degeneration was evident after grade 2.

Fig. 2
figure 2

Box plot of DHIdiff (in units of standard deviation) plotted against Pfirrmann grades. Each box plot shows the median, 1st and 3rd quartiles and minimum and maximum values of DHIdiff for each grade. One-way ANOVA analysis demonstrated significant differences in DHIdiff with increasing Pfirrmann grades (p < 0.001)

The relation between Pfirrmann grades and disc height loss was further investigated by calculating the average Pfirrmann grades in each DHI bin. The resulting bar graph shows that the average Pfirrmann grades were relatively low (around 2.2) until DHI loss was more than 1.5σ (Fig. 3). At this threshold, a noticeable increase in average grade was observed. Pairwise comparisons of the first 6 bins (DHI >μ−1.5σ) were done using Kruskal–Wallis test and no significant differences were found (p = 0.223). Then, discs with DHI >μ−1.5σ and DHI ≤μ−1.5σ were separated into two groups and compared using the same test and a significant difference was found (p < 0.001).

Fig. 3
figure 3

Mean Pfirrmann grades for discs with different levels of DHI loss. For instance, DHI <μσ means that the discs in that bin is at least one standard deviation thinner than the mean (μ) of healthy discs. A significant difference in Pfirrmann grades was found between the discs with DHI >μ−1.5σ and DHI ≤μ−1.5σ (p < 0.001)

DHI and T2 intensity

Normalized T2W voxel intensity remained consistent at just under 0.23 (arbitrary units) until the DHIdiff was about 1.5σ (one-way ANOVA, p = 0.959). After this cutoff point, however, there is a substantial drop in intensity (Fig. 4). A t test comparing discs above and below this cutoff demonstrated that the T2W signal drop after this threshold was highly significant (p < 0.001, without assuming equal variances).

Fig. 4
figure 4

Mean normalized T2W voxel intensities (in arbitrary units) for discs with different levels of DHI loss (in units of σ). The T2W signal drop after 1.5σ DHI loss was highly significant (p < 0.001)

DHI and ADC

A trend similar to T2W intensity changes was also observed for ADC values (Fig. 5). The ADC values in discs remained around 1.5 × 10−3mm2/s for DHI >μ−1.5σ (one-way ANOVA, p = 0.976). However, after the 1.5σ mark, a significant drop in ADC values was noted (p = 0.001).

Fig. 5
figure 5

Mean ADC in disc NP for discs with different levels of DHI loss. Similar to the trends observed for T2W voxel intensities, a significant drop in ADC values was noted after 1.5σ DHI loss (p = 0.001)

A summary of descriptive statistics for the quantitative MRI measures is given in Table 3.

Table 3 Mean ± standard deviation for Pfirrmann grades, normalized T2 intensity and ADC (in μm2/sec) within each DHI bin

Multi-parametric assessment of disc degeneration

To illustrate the potential of these MRI biomarkers in the clinical evaluation of disc degeneration, data from a few subjects were investigated individually. In those subjects, some discs were rated by the radiologists as normal or mildly degenerated (Pfirrmann grade 2 or 3), but the DHIdiff, ADC and normalized T2W voxel intensity values suggested that these discs already began to undergo significant pathologic changes. Data from one of those subjects is presented in Fig. 6 and Table 4. For instance, the L3/L4 disc was rated grade 3 but it had a lower ADC value, almost the same disc height loss and slightly higher T2W signal compared to the L4/L5 disc, which was graded as 4. Therefore, the imaging metrics indicated that the L3/L4 disc might be closer to a grade 4 disc, rather than a grade 3 disc (e.g., L2/L3 disc in the same subject).

Fig. 6
figure 6

T2W image of lumbar spine. Pfirrmann grades are labeled next to each disc

Table 4 MRI measurements from intervertebral discs shown in Fig. 5

Discussion

DHI and DHIdiff are novel biomarkers that could be instrumental in the evaluation of IVD degeneration. Accurate diagnoses depend on metrics that are easily reproducible, clear to interpret, and fast to perform. We propose that DHI fulfills these criteria. These new metrics assess height loss in vivo and correlates well with several pathological processes probed by other MRI techniques. We observed a threshold of about 1.5σ, beyond which indicators of degenerative processes had accelerated. Therefore, the DHI, in combination with other MRI biomarkers, has the potential to aid the clinician assess the condition of degenerating discs more accurately.

Although Pfirrmann grading has greatly improved and standardized classification of degenerating discs, it is a subjective assessment and leads to unknown amount of bias. If Pfirrmann classification were supplemented by unbiased quantitative metrics, the physicians would be equipped with more accurate data to diagnose the condition of the discs and plan the best treatment regimen.

Of note, our study discovered discrepancies between Pfirrmann grades and the level of degeneration indicated by ADC, T2W voxel intensity, and DHI. In clinical practice, it would not be beneficial for a patient if his treatments were planned based on low Pfirrmann rating, although his discs were actually in further stages of degeneration. DHI is a potentially useful biomarker and it has a cutoff point beyond which discs undergo major degenerative changes.

Addition of other quantitative metrics might further improve the assessment of degenerating discs. Strong evidence supports the use of ADC and T2W voxel intensity in the assessment of disc health [811, 13, 14, 20]. Each metric reveals additional information about the pathological processes occurring inside the discs. Therefore, these measures add meaningful insight into the phase of ongoing degenerative processes.

The precedent for diagnostic data using standard deviations and a cut off for disease processes is already established with dual-energy X-ray absorptiometry (DXA) and osteoporosis. The mean bone density for a healthy 30-year old is determined and a t score in units of standard deviations demonstrates whether the bones are more or less dense than those of a 30-year old. At a t score of −1.0σ to −2.5σ signifies osteopenia and a score greater than −2.5σ signifies osteoporosis [14, 21]. In a similar fashion, a DHI score of 1.5σ might be used as a threshold of significant aberrations in degenerating discs. If feasibility of this approach is demonstrated with clinical outcomes in future studies, physicians would be equipped with a clear cutoff of disease process instead of ambiguous scores and grades.

In this study, care was taken to minimize potential sources of error and bias. Since Pfirrmann classification was used as the gold standard, discs were examined by three experienced raters and the median score of the raters were taken. ROI drawings could be another source of bias and error; nevertheless the technician was carefully trained to encompass anatomy of interest and drawings were verified by an independent operator. We preferred to use the six vertebral measurements approach proposed by Masuda et al. [17] instead of the single vertebral measurement presented by Inoue et al. [18] because the average of six vertebral measurements might reduce small errors made in the placement of measurement lines (Fig. 1). A limitation of the study presented here is the difficulty in obtaining reliable measurements in severely degenerated discs. DHI loss could probably be estimated; however, T2W voxel intensity and ADC measurements could be compromised because it becomes harder to distinguish the compartments of the disc.

IVD degeneration is a process that affects all people [1]. Understanding the degeneration processes and specific conditions that might lead to low back pain would benefit millions of people suffering from this ailment. Combined information from imaging biomarkers could be used to identify the characteristics of disc degeneration better and plan effective treatments.