Introduction

Low bone mineral density (BMD) may result in osteoporosis or osteopenia and is associated with increased risk of fractures including vertebral body fractures and hip fractures [1,2,3]. Fractures of the hip have been associated with increased morbidity and mortality in the elderly [4,5,6]; therefore, identifying patients at risk for hip fractures is important [4,5,6]. A previous spine fracture increases the risk of a subsequent hip fracture [7]; therefore, identifying patients with spine fractures is clinically important. BMD is most commonly measured in clinical practice using dual-energy X-ray absorptiometry (DEXA) imaging [8]. Screening DEXA studies are recommended by the US Preventive Task Force (USPTF)/World Health Organization (WHO) [9,10,11] and the International Society for Clinical Densitometry (ISCD) for the clinical evaluation of postmenopausal women aged 65 years or greater, and for individuals at high risk for low BMD [9,10,11]. Approximately three million DEXA studies were performed in 2006 in the USA alone [12, 13].

Incidental findings are commonly found on imaging studies including DEXA studies [14]. These incidental findings may be detected by the radiologist and may change clinical management [14,15,16,17]. The most common incidental findings include osteoblastic lesions, Paget’s disease of bone, and lumbar spine fractures [14,15,16,17,18,19]. Fractures of the spine may result in decreased vertebral body height and in some cases focal increased BMD [14, 18]. Lumbar spine fractures are particularly important because they change the patient’s clinical management and result in the need for additional imaging of the spine and possibly surgical consultation. Therefore, there is a clinical need to identify patients with lumbar spine fractures.

A prior report showed that approximately 16% of DEXA studies have incidental findings including fractures, and none of these incidental findings were commented on by the interpreting radiologist [14]. We hypothesized that machine learning algorithms could be used to identify lumbar spine fractures from DEXA studies without analysis of image pixel data using the DEXA data output. The goal of this manuscript was to use a support vector machine (SVM) classifier to automatically detect L1–L4 lumbar spine fractures using ancillary data from routine posterior–anterior (PA) DEXA studies without requiring additional DEXA imaging or radiation.

Subjects and Methods

The study was approved by the local Institutional Review board (IRB) with a waiver of the need for signed informed consent.

We identified 307 patients who underwent a DEXA study. These patients were treated at a tertiary care academic healthcare center between January 1, 2010, and April 1, 2018. Of these 307 patients, 108 (35.2%) had at least one fracture of the upper lumbar spine (L1, L2, L3, or L4) vertebral body. Patients were classified as having fractures if there was prior imaging (either computed tomography (CT), magnetic resonance imaging (MRI), or radiographs of the lumbar spine) showing the lumbar spine fracture prior to the DEXA study (Fig. 1). These lumbar spine fractures were initially diagnosed based on at least 20% vertebral body height loss and were diagnosed by an independent radiologist. Another musculoskeletal radiologist confirmed the findings and reviewed the initial radiology report. The control patients had no lumbar spine fractures based on imaging (CT, MRI, or radiographs) of the lumbar spine obtained after the DEXA study was performed.

Fig. 1
figure 1

Sagittal image from a computed tomography (CT) scan of the abdomen and pelvis. Coronal image from a CT scan of the abdomen and pelvis. PA view of the lumbar spine from a DEXA image. a Red arrow shows compression fracture deformity of the superior endplate of L1. b Red arrow shows compression fracture deformity of the superior endplate of L1. c Frontal view of the lumbar spine as seen on a DEXA image demonstrates that the compression deformity of L1 is not grossly evident

DEXA Imaging

All patients were imaged using the same General Electric (GE) DEXA Lunar Prodigy Advance system (General Electric Healthcare, Chicago, IL) in order to minimize intermachine variability [20]. Quality assurance on this machine was performed as recommended by the International Society for Clinical Densitometry (ISCD) [21].

The quantitative measures used in the analysis were the patient’s age, sex, height, weight, and the ancillary DEXA data output: L1 BMD, L1 T-score, L1 Z-score, L1 area, L1 width, L1 height, L2 BMD, L2 T-score, L2 Z-score, L2 area, L2 width, L2 height, L3 BMD, L3 T-score, L3 Z-score, L3 area, L3 width, L3 height, L4 BMD, L4 T-score, L4 Z-score, L4 area, L4 width, L4 height, neck BMD, neck T-score, neck Z-score, neck bone mineral content (BMC), neck area, Wards BMD, Wards T-score, Wards Z-score, Wards BMC, Wards area, trochanter BMD, trochanter T-score, trochanter Z-score, trochanter BMC, trochanter area, shaft BMD, shaft T-score, shaft Z-score, shaft BMC, shaft area, total hip BMD, total hip T-score, total hip Z-score, total hip BMC, and total hip area.

Statistical Methods

Qualitative and quantitative variables were compared between patients in each group.

The proportion of males in the control patients was compared to the proportion of males in the patients with spine fractures using Fisher’s exact tests. Quantitative variables including age, height, and weight were compared between control patients and patients with spine fractures using t tests with unequal variances. Pearson’s correlations were used to evaluate the association between lumbar spine BMD (measured at L1, L2, L3, and L4, respectively) with patient age, height, and weight, respectively. Pearson’s correlations were used to evaluate the associations between lumbar vertebral body height (measured at L1, L2, L3, and L4, respectively) each with patient age, height, and weight, respectively.

This dataset was divided into two smaller datasets—a training dataset (80%) and a test dataset (20%). The training dataset had 86 (35.0%) patients with at least one fracture of the L1–L4 lumbar spine, and 160 (65.0%) control patients. The test dataset had the remaining 22 (36.0%) patients with at least one fracture of the L1–L4 lumbar spine, and 39 (63.9%) control patients. Approximately 65.4% (161/246) of the patients in the training dataset were female, whereas 67.2% (41/61) of the patients in the test dataset were female. Males had significantly higher BMD at L1 (95% CI (0.03, 0.13), P = 0.001), L2 (95% CI (0.04, 0.15), P = 5.4 × 10−4), L3(95% CI (0.03, 0.15), P = 0.003), L4 (95% CI (0.03, 0.16), P = 0.003), and larger vertebral body heights at L1 (95% CI (0.12, 0.27), P = 1.11 × 10−6), L2 (95% CI (0.12, 0.30), P = 2.86 × 10−6), L3 (95% CI (0.13, 0.31), P = 1.82 × 10−6), and L4 (95% CI (0.17, 0.33), P = 2.59 × 10−9) than females. Patients’ clinical and demographic characteristics are shown in Table 1.

Table 1 Summary clinical and demographic data for patients in study

A SVM was used to identify the training vectors that best discriminated patients with fractures from control patients in the training dataset. SVM is a supervised learning model that is often used for pattern recognition, classification, and regression analysis [22]. C-classification with four different kernels (linear, cubic polynomial, radial basis function (RBF) and sigmoid) [22,23,24,25] with 10-fold cross-validation was utilized.

$$ \mathrm{Linear}\ \mathrm{kernel},K\left(x,y\right)=x.y $$
(1)
$$ \mathrm{Cubic}\ \mathrm{polynomial}\ \mathrm{kernel},K\left(x,y\right)={\left(x.y+1\right)}^3 $$
(2)
$$ \mathrm{Radial}\ \mathrm{basis}\ \mathrm{function},K\left(x,y\right)={e}^{\left(\left\Vert x-y\right\Vert \hat{\mkern6mu} 2\right)/2\upsigma \hat{\mkern6mu} 2} $$
(3)
$$ \mathrm{Sigmoid}\ \mathrm{kernel},K\left(x,y\right)=\tan h\left(\upsilon (x.y)+c\right) $$
(4)

These training vectors were then used to classify each patient’s DEXA study into one of two categories: either fracture of the lumbar spine (F), or control DEXA study (N) in the training and the test datasets. DeLong’s test was used to compare receiver-operator characteristic (ROC) curves [26].

Statistics were performed using Rv3.4 (https://www.r-project.org/). The e1071 package was used for the SVMs, and the pROC library [27] was used for DeLong’s comparison of the ROC curves. All statistical tests were two-sided, and P values < 0.05 were considered statistically significant.

Results

Patients with lumbar spine fractures were on average older (P = 0.006) and weighed less (P = 6.42 × 10−9) than the control patients. There was no significant correlation, r, between patient age and the lumbar spine BMD. Vertebral body heights at each level decreased with patient age (P < 0.01 at each level). Vertebral body heights at each level were also positively associated with patient height (P < 0.001 at each level) and weight (P < 0.001). Lumbar spine BMD was positively associated with increased patient weight (P < 0.001 at each level) (Table 2).

Table 2 Association between demographic factors and lumbar spine BMD

The SVM classifier with linear kernel had an accuracy of 93.5% (230/246) and AUC of 0.9258 (Table 3) for discriminating patients with lumbar spine fractures from control patients in the training dataset, and had the highest accuracy of the kernels evaluated in the training dataset. The sensitivity and specificity of the SVM with the linear kernel for discriminating patients with lumbar spine fractures from control patients using DEXA ancillary data in the training dataset were 89.5% (77/86) and 96.6% (153/160). The SVM classifier with linear kernel had an accuracy of 91.8% (56/61) and an AUC of 0.8963 in the test dataset. The SVM classifier with RBF kernel was more accurate than the SVM with the linear kernel in the test data for detecting patients with L1–L4 lumbar spine fractures, but this was not statistically significant (DeLong’s test P = 0.317). The SVM classifier with the RBF kernel detected L1–L4 lumbar spine vertebral body fractures significantly better than expected by chance (P < 0.001, 95% CI (0.84, 0.98)). The sensitivity and specificity of the SVM classifier with the linear kernel for discriminating patients with lumbar spine fractures from control patients in the test dataset were 81.8% (18/22) and 97.4% (38/39), and comparable to that of the SVM classifier with the RBF kernel.

Table 3 Impact of different SVM kernels for identifying L1–L4 lumbar spine vertebral body fractures on DEXA studies

The SVM classifier with RBF kernel had a ROC curve with significantly higher AUC than the SVM classifier with cubic polynomial kernel ROC curve, for discriminating patients with lumbar spine fractures from control patients in the test data (DeLong’s test P = 0.013). There was no significant difference in between the AUCs of the ROC curves from the SVM classifier with the RBF kernel, and the SVM classifier with linear kernel (DeLong’s test P = 0.317) or the SVM classifier with sigmoid kernel (DeLong’s test P = 0.543). The SVM classifier with linear kernel had a significantly better AUC for discriminating patients with fractures from control patients than the cubic polynomial kernel ROC curve (DeLong’s test P = 0.034) (Fig. 2). The SVM with the sigmoid kernel had an AUC that was not significantly different from that of the AUC of the SVM with the cubic polynomial kernel (DeLong’s test P = 0.181) or the AUC of the SVM with the linear kernel (DeLong’s test P = 0.729).

Fig. 2
figure 2

Comparison of receiver operating characteristic (ROC) curves using different SVM kernels for discriminating between patients with lumbar spine fractures and control patients in the test dataset. The dotted line is the line of no discrimination

Discussion

The results show that patients with fractures of the L1-L4 lumbar spine can be identified after obtaining a screening posterior-anterior lumbar spine DEXA study by analysis of the DEXA ancillary data. Lumbar spine fractures were detected by the SVM with the linear kernel with sensitivity of 81.8% and specificity of 97.4%. All kernels evaluated performed reasonably well apart from the cubic polynomial kernel.

This study has tremendous clinical implications. Firstly, all 108 fractures identified were not prospectively identified by the diagnostic radiologist interpreting the DEXA study. The analysis shows that the optimized SVM detected over 81% of these fractures in the test dataset. Application of the SVM required no additional imaging or radiation, and utilized routine PA DEXA ancillary data that are produced by DEXA manufacturers. This is distinct from the DEXA vertebral fracture assessment (VFA), which requires a lateral projection of the thoracolumbar spine to assess for spine fractures and results in additional radiation to the patient.

DEXA screening studies are often used to identify patients at risk for hip, spine, and wrist fractures because of low bone mineral density; however, as this analysis shows, the DEXA screening study can also detect patients with upper lumbar spine fractures. This is clinically significant because detection of a spine fracture changes clinical management and often prompts further imaging workup including MRI studies to assess for potential spinal cord compromise. In addition, a lumbar spine fracture in a patient with osteoporosis results in a diagnosis of severe osteoporosis.

To our knowledge, this is the first study utilizing the ancillary data from PA DEXA output to predict lumbar spine fractures. The other results are similar to previously published studies. Males have been shown to have higher BMD than females [28, 29], which is supported by the data. BMD was strongly associated with patient height, and weight [30,31,32]. Prior reports have shown that vertebral body fractures [14]) can be detected by radiologists reviewing images from DEXA studies. DEXAs have been utilized to identify patients at risk for vertebral body fractures related to low BMD [1,2,3]. The data show that patients with fractured L1–L4 vertebral bodies had lower BMD than control patients, which is similar to prior published reports [1,2,3].

The study has a few limitations. Firstly, it is a retrospective study based on clinical data at a single academic institution, and therefore subject to ascertainment bias. The study sample proportion of patients with lumbar spine fractures was enriched for patients with fractures and therefore is not necessarily representative of the true prevalence noted in the general screening population. The patients were restricted to patients with their DEXA studies performed on the same DEXA scanner made by a single manufacturer. This was done to eliminate systematic differences that have been found between DEXA scanners made by different manufacturers due to differences in calibration [33, 34]. However, we believe that these results will be generalizable to other DEXA systems and would probably be most accurate if the training and testing datasets come from the same DEXA scanner manufacturer.

While less than 16% of DEXA studies have been reported to have incidental findings, approximately 35.1% (108/307) of the patients in this study had lumbar spine fractures, which potentially could affect the performance of the SVM in clinical practice. Our sample was somewhat enriched in the prevalence of lumbar spine fractures to allow the SVM algorithm to better predict these rare findings. The small sample size also limits the power of the algorithm to better predict the rare incidental findings that can be detected on DEXA studies. The failures to detect fractures appeared to be primarily because of Schmorl’s nodes/endplate deformities that could not be detected using the frontal projection obtained routinely for DEXA studies, or because the technologist’s tracing of each vertebral body for measurement of vertebral body height and area were inaccurate. However, our study is the first to show that fractures can be identified using machine learning algorithms without analysis of image pixel data and that these algorithms can be used to aid radiologists in identifying these often missed fractures that change clinical management. Further research is required to validate our findings in larger studies.

In summary, SVM classifiers can use quantitative ancillary data routinely obtained from DEXA studies to identify L1–L4 lumbar spine fractures. Machine learning algorithms can be used as an adjunct to identify incidental findings and assist radiologists in the interpretation of DEXA studies.