Introduction

Vertebral fractures are the hallmark of osteoporosis, being the most prevalent osteoporotic fracture. Moreover, the presence of vertebral fracture is among the strongest risks for future subsequent vertebral and non-vertebral fracture [13]. Vertebral fractures cause direct functional impairment as well as future disabling fractures and increased mortality [46]. Despite the high prevalence and significant impact, underdiagnosis and undertreatment of vertebral fractures is a worldwide problem [7]. Less than one third of individuals with vertebral fracture receive medication or proper treatment [811]. In particular, mild fractures are often overlooked or unrecognized on routine lateral chest or lateral spine radiographs [12, 13]. New methods are urgently needed to improve the accuracy and efficiency of identifying vertebral fractures so that proper therapeutic intervention can be initiated.

There are three general approaches for identifying prevalent vertebral fractures from lateral spinal radiographs, lateral dual-energy X-ray absorptiometry (DXA), or lateral scout views from quantitative computed tomography (QCT) imaging: (1) visual semi-quantitative (SQ) grading developed by Genant et al. [14], (2) algorithm-based approach for the qualitative identification of vertebral fracture (ABQ) developed by Jiang et al. [15], and (3) quantitative vertebral morphometry measurement (QM), which was first developed in the 1960s [16]. Although the SQ and ABQ methods are currently considered as the gold standard methods for vertebral fracture assessment, they have limitations, including the requirement for highly experienced readers and modest reproducibility between readers, particularly for fractures of mild severity [17, 18]. In comparison, since the QM method relies on direct measurements of vertebral dimensions achieved by placement of six points on the vertebral body by a trained reader, disadvantages include the major time commitment to place six points on each vertebral body and potential variability in point placement by different readers. Thus, a combination of SQ and QM approaches may enhance the strengths of each technique while minimizing their limitations, possibly improving the identification of vertebral fractures [19, 20].

Recent development of shape-based statistical modeling technology for semi-automated quantitative morphometry may make QM measurements more feasible. One of these algorithms (SpineAnalyzer, Optasia Medical, Cheadle, UK) not only allows rapid, semi-automated placement of the standard six vertebral morphometry points, but also identifies 95 points to carefully delineate the shape of each vertebral body [21]. Thus, it facilitates QM methods be eliminating the manual annotation of six points on each vertebra, and it also has the potential to provide novel information about vertebral shape. A previous study using lateral radiographs reported excellent accuracy and reproducibility for semi-automated vertebral morphometry measurements using this semi-automated algorithm. In particular, the mean accuracy error of vertebral height measurements was 1.06 ± 1.2 mm, which corresponds to 3.4% of vertebral height on average, and the mean precision error, reflecting inter-observer variability, was 0.61 ± 0.73 mm, corresponding to 2.3% of vertebral height [21]. However, this study was limited, in that it did not report inter-and intra-reader reliability of vertebral heights, height ratios, and vertebral fracture classifications based on the morphometry measurements.

Our long-term goal is to enhance assessment of vertebral fractures by determining the clinical utility of these novel semi-automated quantitative morphometry measurements. Specifically, in this study we determined intra-and inter-reader reliability of semi-automated vertebral morphometry measurements and fracture assessment using lateral scout views from QCT. In addition, we compared the time required to complete the morphometry analysis in subjects with and without fractures.

Methods

Subjects

The participants used in this study are identical to those used by Samelson et al. [22] to determine the reliability of vertebral fracture assessment from lateral QCT scout views using Genant's semi-quantitative algorithm [14]. Specifically, 100 participants were selected from the community-based Framingham Heart Study Offspring and Third Generation Multi-detector computed tomography (MDCT) study [23]. QCT scans of the chest and abdomen were acquired in 3,529 participants in the MDCT study for assessment of coronary and aortic calcium [24]. Subjects selected for the current study included 50 men and 50 women, ranging in age from 50 to 87 years. To ensure an adequate number of individuals with vertebral fractures in our reliability study, a clinical investigator experienced in evaluating lateral spine images (DPK) reviewed the scout films of persons age 70 years and older to identify at least 16 individuals with suspected vertebral fracture. In these 16 individuals, there were 30 vertebral fractures, including 12 mild (SQ 1), 14 moderate (SQ 2), and 4 severe (SQ 3) fractures. Seven individuals had one vertebral fracture, one had two vertebral fractures, three had three vertebral fractures, and three had four vertebral fractures. We then selected a convenience sample of an additional 84 subjects to obtain 100 individuals.

Computed tomography scans

QCT scans were acquired using an eight-slice multi-detector QCT scanner (Lightspeed Ultra, General Electric Medical Systems, Milwaukee, WI), as previously described [24]. The scout views used in this study consisted of frontal and lateral low energy 2D scanograms extending from the upper thoracic (T4) to sacral (S1) vertebral levels (Fig. 1).

Fig. 1
figure 1

Method used to conduct semi-automated quantitative morphometry measurements (SpineAnalyzer, Optasia Medical, Cheadle, UK). a CT lateral scout view; b operator identification of individual vertebra from T4 to L4; c automated placement of standard quantitative morphometry points (large circles) and 95 points for shape definition (smaller dots); and d shape contours of lumbar area

Semi-automated quantitative vertebral morphometry

Semi-automated quantitative vertebral morphometry was performed using a model-based shape recognition technology that provides standard six-point morphometry, plus detailed annotation to define the shape of each vertebra between T4 to L4 (Fig. 1). DICOM images were loaded and displayed, and vertebrae of interest from T4 to L4 were labeled by the operator by manually placing points in the approximate center of each vertebra. Using these points, the algorithm then automatically identifies vertebral body margins, draws contours, and places points for standard six-point morphometry. The operator reviews the images, and if necessary, manually adjusts the point placement. The program computes vertebral heights, height ratios, and deformities indicative of vertebral fracture. The semi-automated measurements can be made on lateral images from various sources, including radiographs, DXA-based VFA, and lateral CT scout views [25, 26].

Study design

To determine inter-and intra-reader reliability, vertebral morphometry measurements were performed by two non-radiologist readers, at two different time points (>2 weeks apart) using the semi-automated algorithm. Thus, four evaluations were independently performed for each subject's scout view image. Readers were blinded to subject identification number and age. In addition, we also determined reproducibility of vertebral heights and height ratios using the unadjusted morphometry point placement (i.e., no operator intervention). In this case, one non-radiologist reader performed vertebral morphometry at two time points, and both times did not apply any manual intervention to vertebral contours or point placement.

Vertebral morphometry measurements and assessment of vertebral fracture

We measured posterior (h P ), anterior (h A ), and mid-vertebral height (h M ). We used height ratios and deformity percentages from six morphometry points to classify fractures as wedge, biconcave, and crush fractures, the latter being based on Black et al. [27]. The equations for vertebral height ratios and deformity percentages are as follows: where h A is the anterior height for the vertebral body at the current level, h P is the posterior height for the vertebral body at the current level, h M is the mid height for the vertebral body at the current level, h Pi-1 is the posterior height for the vertebral body at the level below, h Pi+1 is the posterior height for the vertebral body at the level above, h Ai-1 is the anterior height for the vertebral body at the level below, and h Ai+1 is the anterior height for the vertebral body at the level above.

$$ \begin{array}{*{20}{c}} {wedge\,ratio\left( {{r_W}} \right) = {h_A}/{h_p}} \hfill \\{biconcave\,ratio\left( {{r_B}} \right) = {h_M}/{h_P}} \hfill \\{crush\,ratio\left( {{r_C}} \right) = \min \left[ {\max \left( {{h_{{Pi}}}/{h_{{Pi - l}}},{h_{{Ai}}}/{h_{{Ai - l}}}} \right),\max \left( {{h_{{Pi}}}/{h_{{Pi + 1}}},{h_{{Ai}}}/{h_{{Ai + 1}}}} \right)} \right]} \hfill \\{wedge\,deformity\left( {{d_W}} \right) = 100 \times \left[ {1 - {r_W}} \right]} \hfill \\{biconcave\,deformity\left( {{d_B}} \right) = 100 \times \left[ {1 - {r_B}} \right]} \hfill \\{crush\,deformity\left( {{d_C}} \right) = 100 \times \left[ {1 - {r_C}} \right]} \hfill \\\end{array} $$

Vertebral fractures were identified based by deformity percentages derived from morphometry alone, using Genant's semi-quantitative scale [14] as a guide; grade 0 (<20% deformity), grade 1 (≥20% deformity), grade 2 (≥25% deformity), and grade 3 (≥40% deformity).

The time to perform morphometry measurements was measured by one non-radiologist reader in 25 randomly selected subjects for both the unadjusted morphometry point placement and the semi-automated approach, in which the operator adjusts morphometry point placement as needed.

Statistical analysis

We computed intraclass correlations (ICCs), and the root mean squared coefficient of variation (RMS CV) for the inter-and intra-reader differences in the vertebral heights and height ratios (wedge, biconcave, and crush) for the semi-automated approach. We also computed intra-reader reliability for the two trials of unadjusted measurements, and for the comparison of unadjusted vs. semi-automated (i.e., manually adjusted) morphometry point placement. To compute RMS CV, we used the method recommended by Glüer et al. [28]. To determine reliability of vertebral fracture assessment based on deformity percentages, we estimated agreement corrected for chance using a simple kappa (k) statistic and associated 95% confidence interval (CI) [29]. We computed k for intra-reader agreement between time 1 and time 2 separately for each of the two readers, and k for inter-reader agreement between reader A and reader B separately for each of the two time points. We computed the k values for two dichotomous definitions of prevalent vertebral fracture: (1) grades 1–3 (mild, moderate, and severe) versus grade 0 (normal), and (2) grades 2–3 (moderate and severe) versus grades 0–1 (normal and mild). Unreadable vertebrae (n = 35) were classified as normal (grade 0).

Analyses were performed on a vertebra-specific level. We considered k > 0.75 as excellent agreement, 0.40–0.75 as fair to good agreement, and <0.40 as poor agreement beyond that expected by chance as characterized by Fleiss [30]. We also conducted a stratified analysis to evaluate potential differences in agreement by spinal region, categorized as T4-6, T7-9, T10-12, and L1-4, as previously reported [22, 31]. We used Student's t test to compare the time to complete morphometric analysis in subjects with at least one prevalent mild vertebral fracture vs. those with no fracture, and in subjects with at least one prevalent moderate or severe fracture vs. those with no fracture. We compared the prevalence of vertebral fracture identification by both readers and both time points by vertebral level. All statistical analyses were conducted using SPSS (SPSS Inc., Chicago, IL, USA), and SAS (SAS Institute Inc., Cary, NC, USA).

Results

Demographic data of participants

Among 100 participants, 4 women were excluded due to poor image quality. The average BMI of four women excluded was 42.0 ± 7.4 kg/m2 (34.5–52.1), indicating that the poor image quality was likely due to excessive weight or obesity. Two vertebrae were not able to be analyzed by both readers at both time points, and were excluded from the analysis. In addition, 35 vertebrae were unreadable by at least one reader at one time point. All together, 4,949 vertebrae (99.1%) out of a total of 4,992 were analyzable using the semi-automated algorithm. Demographic data and age distribution for the 50 men and 46 women included in the reliability study are shown in Table 1. The participants' mean age was 70.3 ± 8.9 years, ranging from 50 to 87 years.

Table 1 Demographic data of study subjects (mean ± SD)

Reproducibility of vertebral morphometry measurements using the algorithm without manual contour adjustment

To evaluate the reproducibility of the morphometry algorithm without manual contour adjustments, we computed the ICCs for the morphometry done at two time points, and also compared this to ICCs for unadjusted morphometry points vs. semi-automated (i.e., manual intervention by the operator's discretion) (reader A, time 1) in the same 96 participants. The ICCs for vertebral height measurements for unadjusted time 1 vs. unadjusted time 2 ranged from 0.97 to 0.98 (Table 2) and were similar to ICCs for vertebral heights comparing the unadjusted to the semi-automated approach (ICC = 0.96 to 0.97) (Table 2). In both cases, ICCs showed that the vertebral height measurements were highly reliable, but the vertebral height ratios were slightly less reliable than the vertebral heights. Moreover, the unadjusted algorithm for morphometry point placement is less reliable in subjects with fractures, providing rationale for the approach in which the operator adjusts morphometry points after visual inspection of the images (Figs. 2 and 3).

Table 2 ICCs of vertebral heights and height ratios, unadjusted-time 1 vs. unadjusted-time 2 and unadjusted vs. semi-automated approach
Fig. 2
figure 2

Unadjusted algorithm which needed no manual adjustment. a Original lateral scout view from CT scan, b automated placement of morphometry points, and c automated placement of contour lines

Fig. 3
figure 3

Unadjusted algorithm which needed manual adjustment. a Original lateral scout view from CT scan, b automated placement of morphometry points, and c automated placement of contour lines, showing poor placement for fractured vertebra at T10; and d contour lines after manual adjustment of points on T10

Time to conduct semi-automated vertebral morphometry measurements

The time needed to conduct semi-automated vertebral morphometry measurements averaged 5.4 min ± 1.7 min (range, 3.2–9.1 min) per subject. In comparison, the average time needed to place morphometry points using the algorithm with no operator adjustments of vertebral contours was 45.6 ± 4.5 s (38.7–56.8 s). There was no difference in analysis time for the semi-automated approach between subjects with no fracture and those with only mild vertebral fracture. However, the time needed to perform semi-automated morphometry measurements was approximately 2 min longer in subjects with moderate or severe fracture compared to those with no fracture (6.7 ± 1.6 min vs. 4.6 ± 1.3 min, p = 0.002). The time required to complete the semi-automated morphometry analysis was independent of sex, age, height, weight, and BMI (data not shown).

Reproducibility of vertebral morphometry by the semi-automated algorithm (i.e., manual adjustment of point placement by operator)

Intra-and inter-reader ICCs for anterior, mid, and posterior vertebral heights for all vertebral levels combined were excellent, ranging from 0.96 to 0.98 (Table 3). ICCs were also excellent at distinct spinal regions (T4-9, T10-12, and L1-4), ranging from 0.87 to 0.96. Intra-and inter-reader RMS CV values ranged from 2.5% to 3.9% and 3.3% to 4.4%, respectively.

Table 3 Intra-and inter-reader reliability of vertebral height measurements by semi-automated morphometry

For vertebral height ratios, intra-reader ICCs for wedge ratio were good to excellent for all vertebrae together and for distinct spinal regions (T4-9, T10-12, and L1-4), with ICCs ranging from 0.62 to 0.83 (Table 4). However, biconcave and crush ratio measurements were less reliable with ICCs ranging from 0.36 to 0.73 for biconcave ratio and 0.48 to 0.63 for crush ratio. Likewise, inter-reader ICCs for wedge ratio were also good to excellent for all vertebrae together and for distinct spinal regions, ranging from 0.59 to 0.80, but the inter-reader reliability of biconcave and crush ratio measurements were only fair to good, with ICCs ranging from 0.38 to 0.72 for biconcave ratio and 0.42 to 0.62 for crush ratio. Intra-and inter-reader RMS CV for the various height ratios ranged from 3.6% to 5.8% and 4.1% to 5.4%, respectively (Table 4). However, unlike the comparable reliability of vertebral heights in the T4-6 region with other regions (Table 3), the intra-and inter-reader reliability was worse for vertebral height ratios at T4-6 compared to other spine regions (Table 4).

Table 4 Intra-and inter-reader reliability of vertebral height ratios by semi-automated morphometry

Reliability of morphometric vertebral deformities by the semi-automated algorithm

Based on morphometry measurements alone, readers A and B identified 52 and 59 subjects at time 1, and 51 and 46 subjects at time 2 with at least one prevalent vertebral fracture, respectively (Table 5). The total number of vertebrae classified as fractured ranged from 6.2% to 8.7%, including 91 and 108 by readers A and B at time 1, respectively, and 94 and 77 at time 2 by readers A and B, respectively, with the majority of fractures (53–63%) being mild (SQ 1, i.e., morphometric deformity ≥20%, but <25%) (Table 5). The proportion of wedge fractures (73.2%) was much higher than that of biconcave (15.7%) or crush (11.1%) fractures. The distribution of prevalent fractures by spinal location was bimodal, with peak frequencies occurring at T7-8 and T11-12 (Fig. 4). Whereas the spatial distribution of wedge fractures mimicked the total bimodal distribution, biconcave fractures were congregated within lower thoracic and lumbar vertebrae, whereas crush fractures were distributed throughout the vertebra levels. In terms of fracture distribution by deformity types at specific vertebral levels, wedge fractures comprised most of the prevalent fractures in thoracic and thoracolumbar area; whereas biconcave and crush fractures comprised most of the prevalent fractures in the L2-4 region (data not shown).

Table 5 Number of subjects and vertebrae according to semi-quantitative grade, determined solely from quantitative morphometry measurements
Fig. 4
figure 4

Prevalence of vertebral fracture (deformity ≥ 20%) by vertebral level. The highest prevalence was seen at T7-8 and T11-12. Reader A1, reader A-time 1; reader A2, reader A-time 2; reader B1, reader B-time 1; reader B2, reader B-time 2

Examining all vertebral levels together, we found good intra- (k = 0.59 to 0.69) and inter-reader agreement (k = 0.67) for vertebral fracture defined by a deformity of ≥20% (≥SQ 1) (Table 6). When we compared the agreement for vertebral fracture defined by a deformity of ≥25% (i.e., ≥SQ 2), intra-reader agreements was higher with k of 0.58 to 0.77, but slightly lower for inter-reader agreement with k of 0.61 to 0.64 (Table 6). Inter-and intra-reader agreement was fair at T4-6, but was good to excellent at other spinal regions (T7-9, T10-12, and L1-4) (Table 6).

Table 6 Intra-and inter-reader agreement for vertebral fracture classification, kappa (k) statistics (95% CI)

Discussion

In this study, we determined intra- and inter-reader reliability of semi-automated vertebral morphometry measurements and morphometry-based fractures using lateral CT scout views. The semi-automated algorithm provided excellent intra- and inter-reader reliability for vertebral height measurements, along with good to fair reliability for vertebral height ratios. Reliability for vertebral fracture assessment based solely on quantitative morphometry was also good and was comparable to previous reports for SQ vertebral fracture grading by radiologists. Furthermore, the average time to complete the semi-automated morphometry analysis was approximately 9 min 40 s less than previously reported for manual morphometry analysis [32].

Our evaluation of the reproducibility of measurements derived from the unadjusted morphometry points showed that the ICCs for the two unadjusted analyses were not equal to 1, demonstrating some variation in the algorithm's point placement due to variation wherein the operator places the initial seed point in the middle of the vertebral body. The ICC for unadjusted analyses would only be equal to 1 if the initialization points were placed at identical locations every time. The ICCs from the unadjusted morphometry points were worse in subjects with fractures than those without fractures, providing strong rationale for operator review of point placement, and adjustment as needed.

Although we found excellent intra- and inter-reader reliability for semi-automated vertebral height measurements, inter- and intra-reader reliability of vertebral height ratios and associated agreement for vertebral fracture classification was good to fair. Poorer reliability of height ratios is anticipated, since the ratio reflects the error in each of the individual height measurements used to compute the ratio. Furthermore, our classification scheme for vertebral fractures relied on thresholds for height ratios, meaning that a very small (and clinically insignificant) variation in a height ratio from 24.8% to 25.1% would lead to two different fracture classifications. Such a phenomenon explains in part why k statistics for vertebral fracture were worse than agreement for height measurements, and also why the k scores did not improve even when the fracture definition was changed to include only moderate and severe fractures (i.e., deformity ≥25%), as is customarily seen in studies that utilize semi-quantitative visual assessment of fractures [22, 31].

In general, we observed the better reliability for wedge and biconcave fractures than for crush fractures. This may be because intra-vertebral height ratios are used to define wedge and biconcave fractures. In contrast, inter-vertebral height ratios are used to determine crush fractures, and this may have contributed to the inferior reliability for these fracture types. In addition, the relatively low prevalence of crush fractures may have contributed to their lower intra- and inter-reader reliability.

In comparing our results to previous reports, it is important to note that the reliability of vertebral heights and height ratios varies according to the type of imaging modality, the spinal location, and subject characteristics (i.e., osteoporotic vs. normal) [33, 34]. Specifically, the reliability of morphometric measurements from radiographs is generally better than morphometric measurements from lateral dual-energy X-ray absorptiometry scans due to superior spatial resolution provided by radiographs [33, 34]. The reliability of quantitative morphometry measurements is better in persons who do not have osteoporosis than in those with low bone density or vertebral fractures [33, 34]. In the current study, we used QCT lateral scout views from a mixed sample of normal and osteoporotic subjects and found reliability for vertebral height measurements to be comparable to prior reports [34, 35]. Intra- and inter-reader reliability of vertebral height ratios from the current study were slightly worse than those of reported for lateral radiographs but comparable to those reported for lateral dual-energy X-ray absorptiometry scans [34]. Prior reports did not systematically evaluate whether reproducibility of vertebral height measurements depends on spinal region; however, at least one study [33] reported similar results as ours, with slightly worse reproducibility of height measurements in the upper thoracic regions than elsewhere. The poorer reproducibility is likely due to the difficulty of visualizing vertebrae in the upper thoracic region.

In comparing our results for reliability for vertebral fracture identification to those previously reported by Samelson et al. [22] for vertebral fracture assessment by standard SQ readings, we found that despite the use of different techniques for fracture identification (i.e., quantitative morphometry by non-radiologists vs. visual semi-quantitative scoring by trained radiologists), the two studies had comparable inter- and intra-reader reliability for vertebral fracture assessment. The distribution of vertebral fractures was also similar for the two approaches, with bimodal peaks at the mid-thoracic and thoracolumbar levels. However, vertebral fracture assessment solely by quantitative morphometry identified nearly twice as many fractures (SQ ≥ 1) as the SQ method. The quantitative morphometry measurements may have been more sensitive than SQ reading for mild fractures. However, it is more likely that the increased number of fractures identified by purely quantitative morphometry is because the deformity-based classification scheme does not exclude non-fracture deformities of congenital, developmental, and degenerative origins. Ultimately, it may be best to combine QM methods with visual SQ grading to optimize sensitivity and accuracy of fracture determination. Alternatively, it may be possible to combine the morphometry data with ‘machine learning’ algorithms to develop computer-aided diagnosis for vertebral fractures, as proposed by Roberts et al. [36]. Altogether, the availability of semi-automated, reproducible, accurate vertebral morphometry data provides a great opportunity for future development of novel algorithms to improve detection of vertebral fractures.

Our study has a few limitations that must be considered when interpreting the results. Our reproducibility data are based on repeat analyses of the same scans, rather than analysis of duplicate scan acquisitions. Our vertebral morphometry measures and fracture assessments were based on lateral CT scout views, which are currently not used in routine practice for assessment of vertebral fractures. However, QCT is being used more commonly in general and in osteoporosis research studies. CT scans of the trunk that are acquired for other clinical reasons may be useful for assessment of vertebral fractures. In particular, midline sagittal reformations of CT scan data are highly useful for identifying vertebral fractures [37, 38], and indeed, our own fracture detection would likely have been enhanced if we had used this approach. In addition, vertebral fracture identification by semi-automated morphometry has not yet been directly compared to traditional visual semi-quantitative methods. These data are needed to define the optimal clinical use for the automated morphometry measurements.

The current study also had a number of strengths. In particular, non-radiologist readers performed the semi-automated vertebral morphometry measurements, thereby establishing the practical utility of semi-automated vertebral morphometry by technical staff and/or physicians who may not have advanced training in radiology. Also, we conducted both intra- and inter-reader reliability of semi-automated vertebral morphometry using a sample enriched with subjects that had fractures, thereby providing reliability estimates in the expected population that the technique would be used clinically (i.e., those suspected of vertebral fracture). The subjects selected for this study may not represent the general population, as there were more subjects with vertebral fracture than would be expected for an age- and sex-matched sample of the general population.

In summary, underreporting of vertebral fractures is common worldwide [12, 13]. This is problematic, since both clinical and morphometric fractures are associated with significant morbidity and are strong risk factors for future fracture. Moreover, numerous treatments that reduce future fracture risk, even in subjects with prevalent fractures, are available. Thus, new methods that facilitate and enhance the detection of vertebral fractures may improve clinical management of patients with osteoporosis. The results from this study suggest that semi-automated vertebral morphometry is a feasible, reliable, and quick method that may complement current methods for identifying vertebral fractures and also promote development of novel automated algorithms to enhance vertebral fracture identification.