Introduction

Deep learning (DL) applications in radiology have significantly evolved through recently developed convolutional neural networks that facilitate automated classification, segmentation, and disease detection [1,2,3]. Uses of DL techniques in spine imaging include automating lumbar vertebrae numbering, classifying disc herniations, and grading spinal stenosis [4,5,6]. Additionally, DL applications in image reconstruction have been shown to improve image quality via noise and artifact reduction, and super-resolution reconstruction [7], particularly for knee MRI [8] and coronary CT angiography [9].

DL image reconstruction of 2D MRI has only recently been introduced on some MRI vendor platforms [10, 11]. DL image reconstructions for 3D MRI remain largely unexplored. Clinical lumbar spine (L-spine) protocols typically include multiplanar (sagittal, axial, and sometimes coronal), two-dimensional (2D) T2-weighted fast spin echo (T2w-FSE) acquisitions with total acquisition times nominally reaching 25 min [12,13,14]. In comparison, a single 3D T2w-FSE isotropic sequence can generate the same planes as 2D imaging via multiplanar reformations but with significantly shorter total acquisition time (6–8 min) and without interslice gaps, avoiding partial volume averaging effects [12, 13, 15]. The 3D acquisition also enables creation of “true” cross-sectional views orthogonal to the obliquely oriented neural foramina, which may facilitate more accurate assessment of neural foraminal stenosis, particularly in patients with scoliosis [12, 13]. However, due in part to signal-to-noise (SNR) limits, the achievable spatial resolution of 3D images has to date remained inferior to 2D T2w-FSE using standard reconstruction techniques [16].

Applying DL reconstruction to improve 3D image quality may potentially mitigate resolution and SNR drawbacks but requires comparisons of diagnostic performance prior to routine use. Observer performance in radiology is an important source of variability and has been previously investigated in interpretation of L-spine MRI, showing reasonable performance for characterization of degenerative findings such as spinal stenosis, spondylolisthesis, disc degeneration, facet arthrosis, and Modic endplate changes [17,18,19,20,21].

This study’s objective was to evaluate the application of a DLRecon algorithm [22, 23] to enhance 3D T2w-FSE L-spine MRI. We hypothesized that in addition to overall decreased acquisition time compared to a 2D protocol, DLRecon 3D T2w-FSE would also demonstrate overall improved image quality and similar interobserver agreement compared to both standard-of-care (SOC) 3D reconstructed and SOC 2D images.

Materials and methods

This retrospective study was approved by our institutional review board and informed subject consent was waived.

Image acquisition

An L-spine MRI protocol was developed to include 3D and 2D T2-weighted FSE sequences optimized for spatial resolution, contrast, and scan time (Table 1). The new protocol was utilized for clinical imaging for 1 month (between August 17 and September 17, 2020) on all L-spine exams performed on either of two identical 3-T MRI scanners (Signa Premier, GE Healthcare) using combined 60-channel posterior table and 30-channel anterior arrays (GE AIR© Coil). The sequences evaluated included (1) isotropic 3D T2w-FSE (CUBE); (2) 2D T2w-FSE axial (parallel to L4-5 disc space); and (3) 2D T2w-FSE sagittal (Table 1). A vendor-supplied prototype of AIR Recon DL 3D [22,23,24,25] was used to reconstruct 3D SOC exams offline. The prototype reconstruction performs denoising, deringing, and interpolation in all three directions with the goal of providing a sharp, clean image volume that can be effectively reformatted in any orientation. The network was trained using a supervised learning approach using pairs of pristine and typical datasets. Similar to the 2D version of AIR Recon DL [22,23,24,25], training data contained diverse content and feature contrasts. DLRecon was performed on a separate Unix workstation (Intel Xeon 12-core, 2.10 GHz, 64GB RAM), with an average reconstruction time of 221 min.

Table 1 Lumbar spine MRI parameters

Case inclusion

A total of 48 subjects were scanned and identified for analysis. Checks were performed to confirm offline storage of the 3D complex k-space data and consistency of scan parameters. Exclusion criteria were absence of 2D sequences or 3D k-space data, history of previous L-spine surgery, and presence of transitional lumbosacral anatomy, giving n = 35 subjects for final analysis (Fig. 1). This cohort size met the required sample size calculation based on an a priori power analysis for a hypothesized κ of 0.6 for a precision level of 0.15 (95% CI 0.45–0.75) for each variable of interest, based on reported prevalence from standardized grading scales used in the literature [20, 26,27,28,29].

Fig. 1
figure 1

Subject inclusion/exclusion diagram

Observers

3D DLRecon, 3D SOC, and 2D SOC images for these 35 patients were anonymized and then randomized for blinded evaluation by four readers: three musculoskeletal fellowship-trained attending radiologists (D.N.M., 23 years of experience; M.S., 4 years of experience; Y.E., 11 years of experience) and a musculoskeletal radiology fellow (S.S.). Prior to independent grading, each observer attended a training session led by a separate musculoskeletal radiologist (D.B.S. with 8 years of dedicated MRI experience) and was provided a handbook containing grading scales for each imaging variable of interest validated by current literature with pictorial examples.

Image evaluation

Images were independently reviewed on dedicated picture archiving and communication system workstations (SECTRA IDS7, Sectra AB) with diagnostic quality monitors. Overall image quality was subjectively assessed on a 3-point grading scale (poor, average, or excellent). Motion artifact was subjectively assessed on a 4-point grading scale (absent, mild, moderate, or severe).

Central stenosis was determined via cross-sectional area measurement of the thecal sac at the L4-5 mid-disc level, categorized as absent (≥ 130 mm2), mild (< 130 mm2 and ≥ 100 mm2), moderate (< 100 mm2 and ≥ 65 mm2), or severe (< 65 mm2) [30]. Foraminal stenosis at L3-4, L4-5, and L5-S1 was graded using multiplanar reformations (MPRs) to achieve orthogonal orientation to the foramen with level of severity based on degree of effacement of the fat in a 4-quadrant division of the neural foramina: none, mild, mild-to-moderate, moderate, moderate-to-severe, severe (0, 1, 1.5, 2–2.5, 3, 4 quadrants effaced, respectively) [31]. Disc degeneration at L3-4, L4-5, and L5-S1 was graded on a 4-point ordinal scale modified from Pfirmann et al [20] to combine grades 1 and 2 due to the low prevalence of grade 1 [26]. Facet joint synovial cysts at L3-4 and L4-5 were documented only if ventral in location and causing impingement of intra- or extrathecal nerve roots. Annular fissures at L3-4 and L4-5 were considered present if demonstrating higher signal intensity than the nucleus pulposus on T2w imaging, and their laterality (right, left) and position (central, paracentral, subarticular, foraminal, extraforaminal) were recorded. Finally, the presence and position of a disc bulge or disc herniation and the latter’s type (protrusion, extrusion, sequestration) at L3-4 and L4-5 were recorded.

Statistical analysis

Analysis of interobserver reliability within image sequences was conducted using Conger’s kappa (κ) or intraclass correlation coefficient (central stenosis only). Mann-Whitney U tests were employed for comparison of summary measures between image sequences. Tukey’s post hoc adjustments were used for the adjustment of multiple comparisons. All analyses were performed using SAS version 9.4 (SAS Institute Inc.). p values of < 0.05 were deemed statistically significant. κ values indicated agreement on the basis of values suggested by Altman [32] and adapted from the method of Landis and Koch [33]: 0.81–1.00 indicated very good agreement; 0.61–0.80, good agreement; 0.41–0.60, moderate agreement; 0.21–0.40, fair agreement; and 0.20 or lower, poor agreement.

Results

Of the 48 initial patient scans, 9 were excluded due to previous surgery, and 4 due to transitional lumbosacral anatomy, yielding a final number of n = 35 (Fig. 1). Images from 35 patients (mean age 58 ± 19 years, 26 female and 9 male) were evaluated by all 4 readers. Clinical indications for undergoing lumbar spine MRI included lower back pain with radiculopathy (n = 22), lower back pain without radiculopathy (n = 9), scoliosis (n = 2), lower limb paraparesis (n = 1), and evaluation of fracture (n = 1) (Fig. 1).

Median image quality was significantly higher for 3D DLRecon (2.0 out of a maximum score of 2) versus 3D SOC (1.0 out of 2), p value < 0.001, and for 3D DLRecon versus both 2D axial (1.0 out of 2), p value < 0.001 and 2D sagittal (1.0 out of 2), p value < 0.001 (Table 2). 3D SOC showed significantly lower image quality (1.0 out of a maximum score of 2) when compared to 2D axial (1 out of 2 as well), p value 0.005, but not when compared to 2D sagittal, p value = 0.822. Evaluation of motion artifact showed significantly lower median motion artifact for 3D DLRecon (0.0 out of a maximum score of 3) versus 3D SOC (1.0 out of 3), p value = 0.001, and also versus 2D axial (1.0 out of 3), p value < 0.001 and 2D sagittal (1.0 out of 3), p value < 0.001 (Table 2). Motion artifact in 3D SOC was similar to 2D axial (p value = 0.493) and 2D sagittal (p value = 0.123).

Table 2 Comparison of median and interquartile range [IQR] image quality (grades: 0–2) and motion artifact scores (grades: 0–3)

Interobserver agreement for each variable of interest with corresponding 95% confidence intervals is shown in Table 3. There was very good agreement for central stenosis (κ = 0.97–0.98) for all sequences. Interobserver agreement of neural foraminal stenosis was moderate to good (κ = 0.55–0.76) for 3D DLRecon, comparable to both 3D SOC (κ = 0.56–0.73) and 2D (κ = 0.58–0.71); substantial for grading disc degeneration (κ = 0.79–0.82) for 3D DLRecon, comparable to both 3D SOC (κ = 0.81–0.84) and 2D (κ = 0.81–0.87); fair to moderate for grading disc herniation (κ = 0.28–0.69) for 3D DLRecon, comparable to 3D SOC (κ = 0.30–0.56), but with a one grade difference versus 2D (κ = 0.17–0.62); fair to moderate for annular fissures (κ = 0.26–0.65) for 3D DLRecon, comparable to both 3D SOC (κ = 0.20–0.57), but with a one grade difference versus 2D (κ = 0.14–0.53); and fair to good for facet joint cysts (κ = 0.22–0.80) for 3D DLRecon comparable to 3D SOC (κ = 0.22–0.67), but different from 2D (κ = 0–1). κ values for image quality ranged from fair for 3D DLRecon (0.26) and 3D SOC (0.28) to poor for 2D (0.18–0.23), whereas motion artifact κ values were moderate for 3D DLRecon (0.44), fair for 3D SOC (0.32), and poor for 2D (0.04–0.19).

Table 3 Interobserver agreement (Conger’s κ) for clinical variables of interest

Comparison of the mean grading values for each variable of interest between all 3 sequences did not demonstrate any statistically significant difference for central or foraminal stenosis (p > 0.05) (Table 4). Some occurrences of significant differences were noted for disc herniations for 3D DLRecon vs. 2D (right L4-5), annular fissures for 3D DLRecon vs. 2D (right L3-4, right L4-5) and 3D SOC vs. 2D (right L3-4), and facet synovial cysts for 3D DLRecon and 3D SOC vs. 2D (right L3-4) and 3D DLRecon vs. 3D SOC (right L3-4).

Table 4 Comparison of median grading values and interquartile range for clinical variables of interest between each sequence

Discussion

Isotropic 3D MRI provides overall time savings and the ability to produce high-quality MPRs, but the associated SNR and spatial resolution loss, given equivalent scan times, are known trade-offs compared to 2D imaging [16]. In our study, we applied a 3D DLRecon algorithm to 3D T2w-FSE MRI to improve its image quality and evaluated its performance against SOC 3D and 2D imaging (Fig. 2).

Fig. 2
figure 2

Lumbar spine MRI of a 41-year-old female with lower back pain comparing 3D T2-weighted FSE with DLRecon (deep learning reconstruction) sagittal (A) and axial (B) images, and without (standard-of-care, SOC) sagittal (C) and axial (D) images, along with 2D T2-weighted FSE sagittal (E) and axial (F) images

3D DLRecon had higher image quality than 3D SOC (p < 0.001) and 2D imaging (p < 0.001), despite only having fair agreement between observers; this supports the hypothesis that DLRecon may help address the image quality shortcomings of 3D SOC-reconstructed FSE images (Fig. 3). 3D SOC images were inferior to 2D axial, but not to 2D sagittal, which may be related to 3D view ordering that preferentially results in more FSE-related blurring in the axial plane. DLRecon also showed statistically significant less motion artifact when compared to 2D, despite having only fair interobserver agreement and increased chance of motion [12] as the scan time of 3D acquisition was approximately 2–3 times longer compared to the 2D acquisition (Fig. 4). This may be explained by how motion artifacts more favorably manifest in 3D compared to 2D imaging [34], possibly due to slice-selection effects in 2D and differences in view-ordering. DLRecon also demonstrated significantly less motion compared to SOC images, which may be attributed to the reduced noise and increased sharpness with DLRecon.

Fig. 3
figure 3

Image quality grading scale and representative sagittal T2-weighted FSE images from 48-year-old (A) and 41-year-old (B, C) patients showing an image graded as excellent for image quality (A), average for image quality (B), and poor for image quality (C)

Fig. 4
figure 4

Motion artifact grading scale and representative sagittal T2-weighted FSE images from a 63-year-old female imaging graded as demonstrating no motion artifact (A), from a 32-year-old female graded as mild motion artifact (B), from a 60-year-old graded as moderate motion artifact (C), and from a 64-year-old male graded as severe motion artifact (D)

Overall, interobserver agreement for the variables of interest in our study was comparable between all 3 sequences and when compared to previous studies of interobserver variability from the literature. In evaluating central and neural foramina stenosis, moderate to very good agreement and no statistical difference were noted between 2D and 3D. Notably, the very good interobserver agreement between 2D and 3D for central stenosis (κ = 0.97–0.98) was higher than the κ = 0.809 previously reported by Lee et al [12]; this result may be explained by either the shorter echo train length in our study (50 vs. 145) that improves image sharpness, or differences in grading systems, as we performed a quantitative measurement of thecal sac area instead of the qualitative grading method previously validated by Schizas et al [18].

Neural foraminal stenosis for 3D DLRecon demonstrated moderate to good agreement (κ = 0.55–0.76), which is slightly lower, but still comparable to what has been reported in the literature for 3D MRI (κ = 0.809) [12, 35]. This may be due to the 6 levels of discrimination used in our grading scale [31], and also used in clinical practice at the authors’ institution, versus previously a reported scale of 4 grades [35]. We did not observe improved κ for foraminal stenosis on 3D versus 2D MRI, as other studies have found as well (10, 11). The L-spine neural foramina may, in general, not demonstrate enough obliquity, compared to the cervical spine, to fully benefit from the use of 3D MPRs. By excluding cases with prior surgery, we may have also limited the proportion of patients with more complex anatomy, including those with significant scoliosis, in whom 3D MPR could have demonstrated superiority to conventional 2D planes (Fig. 5).

Fig. 5
figure 5

Comparison of 3D deep learning reconstructed (DLRecon) images of the lumbar spine versus 2D images in a 53-year-old female with severe scoliosis. Axial (A), sagittal (B), and coronal (C) 3D DLRecon T2-weighted FSE multiplanar reformations (MPRs) a facilitate the creation of optimal planes orthogonal to the exiting left L3 nerve root at the level of the L3-4 neural foramen in this patient with severe scoliosis. 2D T2-weighted FSE axial (D) and sagittal (E) acquisitions centered at the level of the L3-4 neural foramen in this patient with severe scoliosis demonstrate suboptimal evaluation of the exiting left L3 nerve root due to the inability to make use of MPRs

We acknowledge several limitations of this retrospective study. There was some variability in 2D imaging parameters (predominantly TR) due to heterogeneity in scan plane obliquity and specific absorption rate calculations, but 3D acquisition parameters were held constant. Imaging was performed only of the lumbar spine at a single field strength (3 T), and therefore protocols may not be generalizable to imaging at 1.5 T or to the cervical or thoracic spine without further optimization. While powered for this analysis, the study sample size was relatively small (n = 35) and patients with previous lumbar spine surgery were excluded, which limits the applicability of our findings to patients with spinal hardware who may not be indicated for imaging at 3 T (vs. 1.5 T). Additionally, the DLRecon algorithm was not applied to 2D images, which could have improved image quality scores of 2D images but was not done as DLRecon is still not widely available across all vendors and institutions. Finally, the rather long current offline reconstruction time of the DLRecon algorithm (221 min) does not currently allow for practical use in an emergent clinical setting.

In summary, applying a 3D DLRecon image algorithm to isotropic 3D MRI of the L-spine improved image quality when compared to SOC 3D and 2D L-spine MRI without compromising interobserver agreement for evaluation of clinically relevant pathology. Our institution plans to routinely adopt the 3D DLRecon sequence as soon as the final version of the software becomes commercialized, which will hopefully also further improve reconstruction time. The resulting improved image quality and MPRs could provide additional diagnostic benefits over standard 2D imaging in other anatomic regions, such as the cervical spine, in which neural foramina are more obliquely oriented.