Introduction

Magnetic resonance imaging (MRI) is currently the best non-invasive examination for the diagnosis of lumbar spinal stenosis (LSS), because MRI has excellent soft tissue contrast and provides a good visualization of compression and damage of dural sac and nerve root, especially in LSS caused by lumbar disc herniation [1,2,3,4]. However, a conventional MRI (cMRI) is performed in the supine position with no load on the lumbar spine, which does not reflect the true condition of the lumbar spine in the upright position. In some patients, LSS exacerbates in the upright position, so cMRI may not be adequate for diagnosis [5,6,7,8].

An axial loading MRI (alMRI) can simulate the loading of the lumbar spine in the upright position. Several clinical studies have confirmed that alMRI could improve the accuracy of the diagnosis of LSS and thus change the clinical treatment strategy [9,10,11,12]. However, the only device used for alMRI is the DynaWell L-Spine (Dynawell Diagnostics Inc.), which still has some defects, such as instability of exerting force, complex operation, and high cost [13, 14]. The purpose of this study was to adopt a new device with pneumatic shoulder-hip compression mode and to investigate its applicability in alMRI and diagnostic value for LSS.

Materials and methods

Patients

This study was approved by the institutional review board of the 305 Hospital of PLA, Beijing, and informed consent was provided by all participants. A total of 87 consecutive patients (36 women and 51 men, aged 29–67 years; mean age, 49 years) were included between January 2018 and December 2019. All patients were symptomatic, presenting with neurogenic intermittent claudication and/or irradiating leg pain and/or numbness. Patients with only unilateral radicular symptoms were not included. The patient’s height was limited to less than 190 cm (set limit of the axial loading device). The exclusion criteria included previous spinal surgery, severe osteoporosis, severe cardiopulmonary dysfunction, history of spinal fracture, spinal malignant tumor, lower limb disease, and claustrophobia.

Axial loading device

This new lumbar alMRI device consists of wearable components and pressure components, which can achieve the lumbar load by exerting pressure on the shoulders and hips. The pressure is applied in the pneumatic mode. Through the inflation of the control system, the air is transmitted from the vent pipe to the cylinder, and the connecting belt is tightened after the cylinder is pressurized so that the pressure can be applied stably and measured accurately (Fig. 1).

Fig. 1
figure 1

The new axial loading MRI device consists of wearable components and pressure components (A). The patient lies on the examination bed with the wearable components (B) and the pressure was applied in pneumatic mode by the control system (C)

The patients lay on the examination bed with the wearable components and then the pressure was applied. By adjusting the pressure adjustment knob, the strength was controlled at 40–50% of the body weight, simulating the load weight of the lumbar when the body is upright, and the compression time was 5 min [11, 15, 16]. Then, the alMRI scanning of the lumbar was performed. A small cushion was placed underneath the lumbar spine to achieve lordosis during alMRI, simulating the upright position.

Imaging technique

All MRI examinations were performed on a 1.5-T system (Signa Optima, GE Healthcare) using a surface coil. Sagittal T2-weighted fast recovery fast spin-echo (FRFSE) and T1-weighted fast spin-echo (FSE), as well as axial T2-weighted FRFSE sequences, were performed. The repetition time/echo time (TR/TE) was 450–525/9–16 for T1-weighted images and 2600–3800/110–130 for T2-weighted images. The slice thickness was 4 mm, and the field of view (FOV) was 320 × 320 for sagittal images and 200 × 200 for axial images. The imaging matrix was 320 × 256 for sagittal images and 320 × 220 for axial images. All participants underwent conventional MRI scanning first, followed by alMRI scanning. Fifteen of the 87 patients underwent a second test–retest alMRI 1 week after the first two examinations to evaluate the repeatability of the alMRIs. All MRI examinations were performed between 6:00 pm and 9:00 pm to exclude the effects of diurnal variations on the lumbar spine.

Image interpretation and measurement

All images were read by two radiologists (L.L., L.W., with 12 and 5 years of experience in reading lumbar MRI, respectively) using an AW workstation (version 4.6, GE Healthcare). The MRI images were sent to the radiologists with all details of the participants’ information withheld and the examiners were blinded to whether they had been obtained with or without axial loading.

The dural sac cross-sectional area (DSCA), sagittal vertebral canal diameter (SVCD), disc height (DH), and ligamentum flavum thickness (LFT) were measured at L3-L4, L4-L5, and L5-S1. The SVCD was measured by the transverse axial image, which was the distance from the midpoint of the posterior edge of the intervertebral disc to the base of the spinous process. The DH was averaged from the anterior edge and posterior edge of the intervertebral space. The LFT was measured on the transverse axial position image, and the thickest part of the ligamentum flavum was selected for measurement. If the LFT was uneven between the left and right sides, the thickest side was selected for measurement. Absolute stenosis was defined as DSCA less than 75 mm2, and relative stenosis was defined as less than 100 mm2 [17, 18]. A significant reduction in DSCA (> 15 mm2) after axial loading was also considered a meaningful finding [19, 20]. Disc bulge, disc herniation, ligamentum flavum hypertrophy, intraspinal synovial cysts, and foraminal stenosis at the three intervertebral spaces were compared as valuable diagnostic information. Terminologies and definitions were based on current literature [21,22,23,24].

All quantitative data were measured three times by each reader and averaged, with each measurement taken 1 week apart. The final quantitative data used was the average of the values measured by the two readers. If the quantitative value difference were more than 10% or the results of the qualitative indicators are inconsistent, an agreement will be reached after consultation by two readers.

Image quality assessment

All image quality was assessed at the same time as each image interpretation and measurement by two readers using a 5-point Likert scale: score 5, the main structures were perfectly displayed, no artifacts or imaging distortion, no effect on diagnosis; score 4, well-displayed structures, few artifacts, slight image distortion, little effect on diagnosis; score 3, relatively well-displayed structures, a few artifacts, slight image distortion, a certain effect on diagnosis; score 2, moderate-displayed structures, moderate artifacts and image distortion, significant effect on diagnosis; score 1, severe artifacts, image distortion, or poor signal intensity, unavailable for diagnosis. The image quality score is performed by the reading physician at the same time as each image measurement.

Examinee comfort assessment

After the alMRI, patients were asked to assess the level of comfort of the two examinations on a 5-point Likert scale: score 5, no discomfort during the examination; score 4, only mild discomfort during the examination; score 3, certain discomfort, but the MRI could be undertaken with ease; score 2, obvious discomfort, but the MRI could be completed; score 1, intolerable discomfort, and the MRI could not be performed. We also interviewed patients with score 3 and below about the reasons for their discomfort.

Repeatability and observer reliability

The test–retest repeatability evaluation of alMRI was completed by L.L. using data of 15 patients who underwent the second alMRI. The observer reliability assessment was completed by L.L. and L.W. using data from the first cMRI and alMRI of all 87 patients. The data used for the comparison included all quantitative data and image quality scores.

Statistical analysis

The statistical analyses were performed using the software SPSS (version 25, IBM Corporation). The cohort was characterized using means and standard deviations to describe continuous variables and proportions to describe categorical variables. Unadjusted bivariate analyses were completed using paired t-tests for continuous variables. Pearson’s correlation coefficient test was used for the correlation analyses between DSCA and SVCD, DSCA and DH, and DSCA and LFT, respectively. Image quality and examinee comfort were compared using the Wilcoxon signed-rank test. The test–retest repeatability of alMRIs and observer reliability were calculated by the intraclass correlation coefficient (ICC). Absolute agreement, two-way random effects, and single-measure models were adopted. Statistical significance was defined as p < 0.01. A Pearson’s correlation coefficient value (r) < 0.4, between 0.4 and 0.69, between 0.7 and 0.9, and exceeding 0.9 represented poor, moderate, strong, and very strong correlations, respectively. The ICC values < 0.4, between 0.4 and 0.54, between 0.55 and 0.69, between 0.70 and 0.84, and exceeding 0.85 represented poor, weak, moderate, good, and excellent agreement, respectively [25].

Results

Study population

A total of 87 symptomatic patients completed the examinations. The baseline characteristics and clinical symptoms are summarized in Table 1.

Table 1 Baseline characteristics and clinical symptoms of the patients

Comparison of quantitative parameters

The mean values of DSCA, SVCD, DH, and LFT for 261 intervertebral spaces of 87 patients are listed in Table 2. All data demonstrated statistically significant changes after axial loading (p < 0.01). The DSCA, SVCD, DH, and LFT of L4-5 level all showed the greatest change after axial loading, with declines of 7.7%, 4.6%, and 2.7% and an increase of 6.9%, respectively. The second and third declines in DSCA, SVCD, and DH were at L5-S1 and L3-4, with 9.8%, 6.0%, and 4.8%, and 4.6%, 2.2%, and 3.1%, respectively. The second and third increases in FLT were at L3-4 and L5-S1, with 5.7% and 1.7%, respectively.

Table 2 Qualitative parameters between cMRI and alMRI

Correlations between changes of quantitative parameters

Among 261 intervertebral spaces, Pearson’s correlation coefficients between changes of DSCA and SVCD, DSCA and DH, and DSCA and LFT were strong (r = 0.80, p = 0.000), strong (r = 0.72, p = 0.000), and weak (r = 0.37, p = 0.000), respectively (Fig. 2).

Fig. 2
figure 2

The Pearson’s correlation coefficients between changes of DSCA and changes of SVCD, DH, and LFT were strong (r = 0.80, p = 0.000), strong (r = 0.72, p = 0.000), and weak (r = 0.37, p = 0.000), respectively

Comparison of qualitative indicators

Of the 261 intervertebral spaces, 8 qualitative indicators increased from 501 to 669 after axial loading, for a total increase of 168 (33.5%) (Table 3). The DSCA was significantly reduced (> 15 mm2) after axial loading in a total of 29 intervertebral spaces in 21 patients. Relative stenosis existed in at least one vertebral space before loading in 87 patients, and such spaces increased from 128 to 155 after loading, an increase of 27 (21.1%). Nineteen patients (21.8%, 19/87) developed absolute stenosis (DSCA < 75 mm2) after axial loading and 10 of them (11.5%, 10/87) also had a significant reduction in DSCA (> 15 mm2) (Figs. 3 and 4).

Table 3 Quantitative indicators between cMRI and alMRI
Fig. 3
figure 3

A 67-year-old woman. Sagittal T2-weighted fast recovery fast spin-echo (FRFSE) of conventional MRI (A) and axial loading MRI (B); disc bulge was found more severe at L4-5 after axial loading (white arrow). Axial T2-weighted FRFSE of conventional MRI (C) and axial loading MRI (D); the DSCA reduced from 92.35 to 47.23 mm2 after axial loading

Fig. 4
figure 4

A 42-year-old man. Sagittal T2-weighted fast recovery fast spin-echo (FRFSE) of conventional MRI (A) and axial loading MRI (B), axial T2-weighted FRFSE of conventional MRI (C) and axial loading MRI (D); disc bulge appears mildly aggravated (white arrow) and the ligamentum flavum is significantly thickened (arrowhead) at L4-5 after axial loading. The DSCA reduced from 107.12 to 73.93 mm2 and the ligamentum flavum thickness increased from 3.29 to 4.18 mm after axial loading

Image quality assessment

In the image quality assessments of conventional MRI and alMRI, reader 1 scored 74 cases (5 points), 13 cases (4 points) and 72 cases (5 points), 14 cases (4 points) and 1 case (3 points); reader 2 scored 72 cases (5 points), 15 cases (4 points) and 69 cases (5 points), 17 cases (4 points) and 1 case (3 points), respectively. Two readers scored 3 as the same patient and both concluded that the patient’s images had some motion artifacts. There was no statistical difference between the image quality assessments of the two readers in the two examinations (Table 4).

Table 4 Image quality and examinee comfort assessment between cMRI and alMRI

Examinee comfort assessment

The patients’ scores on conventional MRI and alMRI were 44 cases (5 points), 33 cases (4 points), 10 cases (3 points) and 39 cases (5 points), 38 cases (4 points), 9 cases (3 points), and 1 case (2 points), respectively. There was no statistical difference in patients’ comfort scores between the two examinations (Table 4). In all examinations, no patient was discontinued due to severe discomfort. Ten patients with 3 points in conventional MRI complained of noise during the examination and/or mild claustrophobia. Among the 10 patients with 3 points or less after axial loading, besides the discomfort caused by MRI itself mentioned above, 6 patients complained of worsening symptoms after compression, and 1 patient with 2 points felt the pain and numbness in two legs were significantly worse after compression, but could still insist on completing the examination.

Repeatability and observer reliability assessment

The test–retest repeatability ICCs for DSCA, SVCD, DH, LFT, and image quality were 0.963, 0.932, 0.917, 0.745, and 1, respectively, suggesting good to excellent reliability. The inter-observer ICCs for DSCA, SVCD, DH, LFT, and image quality were 0.934, 0.901, 0.861, 0.605, and 0.928, respectively, suggesting moderate to excellent reliability, and the intra-observer ICCs were 0.972, 0.952, 0.913, 0.715, and 1, respectively, suggesting good to excellent reliability (Table 5).

Table 5 Observer and test–retest ICCs for quantitative parameters and image quality

Discussion

The axial loading device represented by the DynaWell L-Spine currently in clinical use achieved lumbar loading by applying pressure to the shoulders and feet. However, the pressure may vary during the examination due to mild movement of the hip and knee joints, such as slight flexion of the knees. Furthermore, the device uses a manual rotating knob to adjust the tension of the connecting belt mechanically. If the patient is uncomfortable or needs to readjust the pressure during the examination process, the technician is required to return to their bedside to assist [13, 14, 26,27,28,29].

The new device used in this study applies pressure through the shoulder and hip, so there was no change in pressure due to joint movements of the patient’s knee and hip, thus improving the stability of the pressure on the lumbar spine. Meanwhile, the new device adopted pneumatic mode to control the pressure, which could adjust the pressure remotely from the operating room, thus ensuring a safer and faster pressure adjustment than bedside operation when the patient was unable to tolerate the applied pressure.

In this study, all 87 patients successfully completed alMRI, and no patient discontinued the examination because of intolerance. There was no statistical difference in the image quality and comfort assessments between conventional MRI and alMRI. The discomfort associated with the axial load device was mainly the aggravation of the patient’s spinal stenosis–related symptoms after compression, a phenomenon also presented in the DynaWell device [30, 31]. Interestingly, most patients who felt discomfort were more concerned with the discomfort of the MRI itself rather than the axial load device, such as noise and claustrophobia [32, 33].

We used quantitative and qualitative data that can best reflect changes in the lumbar spinal canal, and comprehensively investigated the differences in imaging before and after axial loading in patients with spinal stenosis. Such a systematic study has not been seen in previous papers.

We found statistically significant differences in all four quantitative parameters before and after axial loading. DSCA was the most commonly used imaging parameter in the study of LSS, which could most directly reflect the severity of stenosis. Absolute spinal stenosis was generally defined as a threshold of DSAC less than 75 or 70 mm2 [34, 35]. In this study, 19 patients (19/87, 22%) showed absolute stenosis in the spinal canal after axial loading, which coincided with a study using the DynaWell device, in which 13 patients (13/54, 25%) were found with absolute stenosis after axial loading [30]. A significant reduction in DSCA after axial loading was considered a meaningful change, and previous studies have found that patients with this sign had more severe clinical symptoms [36, 37]. In this study, of the 19 patients who developed absolute spinal stenosis after axial loading, 10 also had a significant reduction in DSCA. Such patients would most likely have their treatment strategy influenced by the results of alMRI.

SVCD was also a common parameter for judging spinal canal stenosis, which is usually affected by disc bulge and herniation, especially central herniation [17]. A study of axial loading in patients with spinal stenosis showed a strong correlation between changes in DSCA and SVCD before and after axial loading, which was consistent with our findings [19]. DH reflected direct changes in the lumbar disc, which decreased in close relationship to disc degeneration, bulge, and herniation [38, 39]. In a study of DH of the lumbar spine after axial loading, a statistically significant reduction in DH after loading was found only for L4-5, but the study was conducted in young volunteers [40]. In this study, changes in both SVCD and DH were strongly correlated with changes in DSCA, hypothesizing that increased disc bulge or herniation after axial loading was the main cause of spinal stenosis.

The thickened ligamentum flavum was also a reason for spinal stenosis, which was mainly due to ligamentous fibrous hyperplasia and flexion. The changes in FLT we measured before and after axial loading were statistically significant, but they were weakly correlated with changes in DSCA; therefore, we hypothesized that ligamentum flavum thickening was a less important reason for spinal stenosis relative to intervertebral disc lesions. The inter- and intra-observer ICCs for LFT represented moderate and excellent reliability. We believe that the reason for this was the large measurement error due to the small value of the LFT itself and the frequent blurring of the boundary between the ligamentum flavum and the vertebral arch. In addition, when the thickness of the ligamentum flavum was not even, the determination of its thickness was more difficult.

For various types of imaging-observable qualitative indicators, the presence or increased degree after the load was considered a meaningful information for clinical diagnosis [30, 41]. Eight qualitative indicators we studied increased by 33.5% (168/501) after axial loading, and these indicators will have an impact on clinical decision-making, which reflects the significance of alMRI for the diagnosis of spinal stenosis [42].

Our study had some limitations. First, our study was a retrospective study; the time interval between the collection of clinical symptoms in patients and MRI was not limited, which may introduce bias in the results; and our study lacked information on the relevance of imaging to clinical decision-making. Another limitation is that we did not distinguish between central and collateral (intervertebral foramen and lateral saphenous fossa) stenosis in the study, and the findings may lead to poor correlation of the corresponding data, such as collateral stenosis may lead to poor correlation of SVCD with DSCA. However, in this case, the changes between SVCD and DSCA remained strongly correlated, presumably because the two spinal stenosis symptoms are closely related and often coexist, especially in patients with significant lumbar degenerative changes [43].

The new device is suitable for performing alMRI with good patient tolerance of the examination and satisfactory image quality. An alMRI with the new device can exacerbate the severity of LSS, providing more valuable information for the diagnosis of LSS and reducing missed diagnoses.