Introduction

Elderly outpatients with complaints of severe lumbar back pain in almost all cases undergo plain X-ray (X-P), from which a diagnosis of the underlying condition should be made. At such times an existing lesion of vertebral disk degeneration or osteoporosis can often conceal a latent incident spinal fracture, with the result that a delayed diagnosis can make it difficult to prevent post-fracture sequelae or other problems [1]. However, reports are occasionally received that an accurate diagnosis of the existence or location of the incident fracture is difficult with X-P images only and that X-P screening images are not effective for low back pain [2, 3]. Meanwhile, many reports have stated that magnetic resonance imaging (MRI) has a high degree of accuracy for the definite diagnosis of incident spinal fracture, and it continues to be used as the more useful tool [46]. However, due to limitations in equipment and considerations that must be given to the economics of medical treatments, it is not possible to use MRI with all patients. In the present study, therefore, with diagnosis by MRI taken to be the correct diagnosis, we conducted a multi-lateral analysis of the diagnostic accuracy of several orthopedic surgeons and radiologists who based their diagnoses on X-P images, in order to identify ways to support improvements in diagnostic accuracy. This is a cross-sectional study.

Materials & methods

Participants

The subjects were patients above the age of 50 who were examined at the authors’ hospitals between May 1999 and January 2004, and who had undergone MRI within 4 weeks of the initial examination. A non-incident fracture group consisted of patients without incident vertebral fractures, while an incident fracture group consisted of patients with incident vertebral fragility fractures caused by weak external force, such as that sustained in falls from a standing position. One hundred twenty-three patients had these conditions. After excluding patients who had a history of primary or metastatic bone tumor, infectious disease, hematological disorders, or compression fracture within the previous year, which would leave spots with high signal intensity on the MRI images (three patients), the final number of subjects for the study was 120, of whom 112 were women and eight were men, with ages ranging from 50 to 96 years (mean age: 75.6 years).

Measurements

Five orthopedists and two radiologists from our hospital interpreted anterioposterior (A-P) and lateral thoracolumbar X-Ps taken during the initial examination. They did not question the patients or have access to physiological findings, and the images were arranged by a third party with the patients’ IDs and names concealed. The correct diagnoses were taken to be those of two radiologists not involved in the treatment of the patients who, in consultation with each other, reached the same conclusion based on MRI [1.5T, T1-weighted images (SE: TR/TE = 400/15 ms); T2-weighted images (SE: TR/TE = 2500/120 ms)]. In this study, a definition of a fracture based on the MRI image also included a bone bruise without deformity as an incident fracture. Differences in the ability of the five orthopedists to interpret spinal X-P images were investigated in advance. The subjects of this investigation were 89 healthy community residents who underwent thoracolumbar spine X-P for the purpose of a long-term longitudinal epidemiological study at out hospital. Each orthopedist classified the vertebral spines (Nathan’s classification [7]) on an A-P thoracolumbar image, after which intraclass correlation coefficients were calculated using SAS (Statistical Analysis Software, Cary N.C.) ver. 8.2, and the level of coincidence was observed. The results revealed no significant difference in the ability of the orthopedists to interpret radiographs, with intra-class correlation (ICC) = 0.739 [95%CI for ICC: 0.679–0.799]. Accordingly, assuming that there was no difference in the ability to interpret spinal X-P images, the correct diagnosis rate for the presence and location of incident spinal fractures and the correct diagnosis rate according to the morphological classifications (classifications of Genant et al. [8] and Yoshida [9]) of the incidental fractured vertebral body were analyzed, and subjects were divided into three groups for the analysis of factors affecting correct diagnosis: (1) non-incident fracture group with and without prevalent fractures (non-incident fracture group); (2) incident fracture group without prevalent fractures; (3) incident fracture group with prevalent fractures. Bone mineral density (BMD) was measured using dual energy x-ray absorptiometry (DPX; Lunar, GE Healthcare, UK) in bones of the entire body, the lumbar vertebrae, and the femoral neck. The density for the lumbar vertebrae (L2–4) was adopted for the present study.

Statistical analysis

SAS ver. 8.2 was used for the accumulation and analysis of data. In comparing the correct diagnosis rate for fractured vertebral body morphology, adjustments were made using the Cochran-Mantel-Haenszel method for variations in age, body weight, lumbar spine bone mineral density, and examiner ability, and analysis was conducted with ANOVA, Tukey’s multiple comparison test, and logistic regression analysis.

Results

Number of patients and fractured vertebras

Of the 120 patients, 67 patients were diagnosed with incident fractures with and without prevalent fractures in 95 vertebrae, including single incident fractures in 50 patients and two or more incident fractures in 17 patients. There was non-incident fracture with and without prevalent fractures in 53 patients. The group of incident fractures without prevalent fractures consisted of 24 patients and 28 vertebrae, and with prevalent fractures, of 43 patients and 67 vertebrae. Significant differences were seen in age, height, weight, and lumbar vertebrae BMD (p<0.0001) (Table 1).

Table 1 Baseline data (means ± standard deviation)

A breakdown of correct and incorrect diagnoses

A correct diagnosis of incident fracture was made in 51.5% of cases overall. A breakdown shows that in cases when non-incident fracture was seen by MRI the correct response of non-incident fracture (true negative) was made in 37.7% of cases and the correct diagnosis of incident fracture (true positive) was judged to have occurred in 13.8% of cases. The location of incident fracture was mistaken in 17.2% of the cases. Responses of non-incident fracture on X-P images in cases with incident fracture (false negative) occurred in 24.8% of the cases, while responses of incident fracture on X-P images in cases of non-incident fracture (false positive) occurred in 6.5% of the cases (Table 2).

Table 2 A breakdown of correct and incorrect diagnosesa

The overall rate of correct diagnosis

Non-incident fracture group

We next compared the correct diagnosis rate of incident fractures by the five examiners in each of the three groups. The correct diagnosis rate of the five examiners was high overall, reaching 85.3% (73.6–92.5%) in the non-incident fracture group. The overall diagnosis rate was also high with no significant variation between the five examiners (p=0.486).

Incident fracture group without prevalent fractures

The overall correct diagnosis rate for the incident fracture group without prevalent fractures was 39.3% (21–58.3%), and significant variation was seen between the five examiners (p = 0.04).

Incident fracture group with prevalent fractures

Despite the low overall correct diagnosis rate of 16.8% (9.3–21%) in the incident fracture group with prevalent fractures, no significant difference was seen in the correct diagnosis rate between the five examiners, and overall the diagnosis rate was low (p=0.432).

Thus, the correct diagnosis rate for incident fractures decreased significantly in the non-incident fracture group followed by the incident fracture group without prevalent fractures, and the incident fracture group with prevalent fractures, in that order. However, a second investigation after adjusting for differences in age, weight, and lumbar vertebrae BMD revealed significant variation in all three groups (Fig. 1).

Fig. 1
figure 1

Diagnosis rate by the five examiners. a Significant variation was seen in the non-incident fracture group after adjustment. b Significant variation was seen in the incident fracture group without prevalent fractures. c Significant variation was seen in the incident fracture group with prevalent fractures after adjustment. d Average of diagnosis rate. e The kappa score of interexaminers. These results were moderate

The kappa score of interexaminers

The median kappa-score of all examiners was 0.65 [0.51 (min.) to 0.81(max.)]. The median kappa-score of inter-orthopedists was 0.65 (0.51–0.72), while the kappa-score of inter-radiologists was 0.69. The median kappa-score of orthopedists-radiologists was 0.63 (0.54–0.81) (Fig. 1).

The rate of correct diagnosis based on the number of prevalent fractures

The next variable investigated was the correct diagnosis rate by number of prevalent fractures in the incident fracture group with prevalent fractures. No correlation was found between the correct diagnosis rate and the number of complicating prevalent fractures when the subjects were divided into either six groups according to the number of prevalent fractures (one fracture to six or more fractures) or two groups (one fracture vs. two or more fractures) (p=0.139, 0.284, respectively; Fig. 2).

Fig. 2
figure 2

No correlation was found between correct diagnosis rate and the number of complicating prevalent fractures. a Diagnosis rate when divided into one of six groups. b Diagnosis rate when divided into one of two groups

The rate of correct diagnosis by morphological classification

The primary osteoporosis diagnostic criteria

We then looked at the correct diagnosis rate for incident fractures by morphological classification of the vertebral body in the incident fracture groups with and without prevalent fractures. The morphological classifications used were the primary osteoporosis diagnostic criteria of Genant et al. [8] and Yoshida’s classification [9] (Fig. 3). Using the primary osteoporosis diagnostic criteria of Genant, the correct diagnosis rate was high for wedge-type fractures in the combined results for the incident fracture groups with and without prevalent fractures (fracture group) (p<0.0001). Similar results were obtained even after adjustment had been made for variation between the examiners. However, this significant difference disappeared after age, body weight, and lumbar BMD had been adjusted for. The same results were obtained in the incident fracture group with prevalent fractures, but in this case a significant difference was seen after correction in the incident fracture group without prevalent fractures (p=0.0455) (Table 3).

Fig. 3
figure 3

The morphological classifications used were the primary osteoporosis diagnostic criteria of Genant [8] and Yoshida’s [9] classification. Yoshida’s criteria is for incident fractures and classified in four types as follows: protruding type, the anterior bone cortex disrupted protrudes anteriorly; indented type, the anterior bone cortex disrupted indents posteriorly; end plate slippage type, the anterior edge of the end plate disrupted displaces anteriorly; end plate compression type, the center of the end plate disrupted indents and depressed

Table 3 Diagnosis ratea according to the morphological classifications

Yoshida’s classification

When Yoshida’s classification was applied, the correct diagnosis rate was high for intended and protruding types of fractures (p<0.0001). The correct diagnosis rate was significantly higher in the incident fracture group without prevalent fractures even when there were morphological changes (wedge, intended and protruding type) in the anterior bone cortex. Conversely, the correct diagnosis rate was low in the incident fracture group with prevalent fractures, end plate compression and slippage type fractures with no morphological changes in the anterior bone cortex, and in “miscellaneous” cases that belonged to no category and had almost no morphological change.

Odds ratios affected the rate of correct diagnosis

Odds ratios (ORs) were investigated for factors that would affect the correct diagnosis rate, including age, body weight, lumbar vertebrae BMD, and examiner ability. In an overall investigation, age (OR=0.660), body weight (OR=2.082), and examiner ability (p=0.0205) affected the correct diagnosis rate. A younger age and greater body weight resulted in higher correct diagnosis rates, and results were also affected by the examiner’s ability. None of these factors had an effect in the non-incident fracture group. Significant variation is seen in examiner’s ability in Fig. 1, but not to the extent that results were affected (p=0.0709). In the fracture groups, both body weight (OR=2.206) and examiner ability (p=0.0039) affected the results. This was also seen in the incident fracture group without prevalent fractures alone, but in the incident fracture group with prevalent fractures alone only lumbar BMD had an effect (OR=1.574) (Table 4).

Table 4 Odds ratios of factors that would affect the correct diagnosis rate

Discussion

The prevalence rate of spinal fracture is thought to be 117 people per 100,000 in the population [10], and the lifetime risk of spinal fracture in women over the age of 50 rises to about 40% [11]. Vertebral body fractures result in pain and functional restrictions, and provoke a marked decrease in quality of life [12, 13]. Therefore, early prevention of spinal fractures and accurate diagnosis and treatment are crucial. There are various reports on the diagnosis of incident spinal fracture [14], but a diagnostic gold standard has yet to be established. Nearly all institutions first take X-P images for patients presenting with lumbar pain. However, it is difficult to determine from X-P images the presence and location of incident fragility fractures in elderly patients with osteoporosis at the time of injury; it is even more difficult when the patient has prevalent fractures. Furthermore, incident fractures are defined as those vertebral bodies that show distinct morphologic changes or osteosclerosis change on the follow-up X-P images. Consequently, we usually cannot detect incident fractures at the early stage of diagnosis.

With respect to the effectiveness of X-Ps for lumbar pain disease in general, David et al. reported that 17.8% of patients in an emergency department received unnecessary lumbar X-Ps [15], while Khoo et al. reported that 90.5% of AP views on X-Ps have no benefit and were effective only in assessing the sacroiliac joint [16]. Thus, establishing a diagnosis for lumbar pain is difficult with X-P alone, and most cases require MRI. Many reports attest to the high diagnostic accuracy of MRI, and it continues to be more useful tool in diagnosing spinal fracture [46]. In MRI images, fractures are defined so that an acute fracture associated with hemorrhage and edema increases the focal water content and thus increases the signal on T2-images. With an osteoporotic fracture, the hemorrhage will be organized and the edema will decrease, giving a low to intermediate signal intensity on T2-weighted images. It has already been reported that femoral neck fractures cannot be judged on X-P images and that MRI diagnosis is useful in cases of occult fracture. Pandey et al. reported that fractures are not discovered on X-P images and that even on MRI images, 30% show no fracture [17], while Rizzo et al. reported that occult fractures were detected on MRI in 36 of 62 patients (58%) [18].

With respect to spinal disease as well, Nakano et al. investigated the diagnostic accuracy of MRI for incident vertebral fractures. They took vertebral bodies showing signs of crush and bone sclerosis on follow-up X-P images to indicate true incident fractures and reported that the diagnostic sensitivity and specificity of MRI were 99.0% and 98.7 %, respectively [19, 20]. They also reported that based on diagnosis with MRI it was possible to diagnose with precision a fracture in the early period of onset. In addition, Kanchiku et al. reported that the diagnostic rate of the fractured vertebral body was 98% by MRI, which was higher than the 87% for plain radiography (p=0.006); in patients for whom no posterior wall injury was seen on X-P imaging at the time of the injury, intraspinal protrusion of the posterior wall of the vertebral body was diagnosed in 37% using MRI [21]. Eugene et al. reported that twice as many spinal diseases were detected when using MRI as when diagnosis was made from X-P imaging [2]. Thus, MRI is considered to be reliable in the diagnosis of incident fragility fracture. However, this high diagnostic accuracy also gives rise to some problems. Rupp et al. reported that in distinguishing between tumor and compression fracture on MRI images, compression fracture can only be diagnosed in those patients that have completely maintained normal marrow within the vertebrae and that it is difficult to make a distinction, due to changes in contrast effect and intensity, over multiple vertebrae or invasion to the posterior vertebral body wall [22]. In addition, Cuenod et al. reported that at 2 months after a spinal fracture is sustained, changes in brightness on MRI images have completely returned to normal in only 13% of the cases [23], indicating the possibility that old fractures can be mistaken for incident fractures. Equipment limitations at some institutions and economic problems make it impossible to conduct MRI with all patients. Jefferey et al. compared MRI in the acute phase of lumbar pain with X-P over the clinical course and concluded that no cost benefit was achieved [24]. Thus, several problems are also encountered with the use of MRI in diagnosis.

Based on all of the points raised above, we re-examined X-P diagnosis and investigated whether the correct diagnosis rate with X-P in the initial examination could be improved. To our knowledge, this type of comparison has not been carried out to date, however, a search of the literature has revealed that various data sets are available on diagnosis rates for incident fractures with X-P. In a comparison of local and central readings, Pierre et al. reported a correct diagnosis rate of 95% in the non-fracture group and 66% in the fracture group [25]. Hachiya et al. reported a correct diagnosis rate of 43%, false positives in 41% of the cases, and false negatives in 16% [26]. Nakano et al. reported a correct diagnosis rate of 51.5% [27], while Kanchiku et al. reported a high correct diagnosis rate of 87% [21]. However, factors such as unspecified measurement conditions, a small number of examiners, or non-uniform skill levels of examiners in these studies make them inadequate for the establishment of a correct diagnosis rate.

In the present study, a strict diagnosis was made together with radiologists, the ability of five orthopedists to interpret X-Ps was determined in advance to be uniform, and three groups were compared. The results of this analysis showed the correct diagnosis rate to be 51.5%, which did not differ greatly from the reports of previous investigators. However, the mean correct diagnosis rate for incident vertebral fracture group was 24.8%, and it was even lower – 16.8% – in the group with prevalent fractures. The correct diagnosis rate decreased in order of non-incident fracture group (highest), the incident fracture group without prevalent fractures, and the incident fracture group with prevalent fractures (lowest), a result which demonstrates anew the difficulty of diagnosing the location of fractures in the daily clinical setting. Moreover, after correcting for various factors, we found that there was a significant inter-examiner variation in all groups. This seems to indicate that the ability of an examiner to interpret radiographs is reflected in the correct diagnosis rate. In an examination based on the number of prevalent fractures, the correct diagnosis rate did not drop as the number of prevalent fractures increased, and no correlation was found. This finding that the number of prevalent fractures does not exert an effect is intriguing. Thus, even with prevalent fractures over multiple vertebrae, it is assumed that with diligence, incident fractures can be detected.

The previously mentioned criteria of Genant et al. were used in the analysis by morphological classification [8]. These criteria are commonly used in the diagnosis of osteoporotic vertebral body fractures. However, 45.5% of the cases in our study did not fit any type in these classifications, bringing some doubt to the judgments that have been made to date. We therefore conducted the investigation using these criteria in conjunction with Yoshida’s classifications [9]. A high correct diagnosis rate was obtained for wedge type fractures with the diagnostic criteria for primary osteoporosis, and for protruding and indented type fractures with Yoshida’s criteria; however, the correct diagnosis rate was low with the remaining types of fractures. Thus, a key to raising the correct diagnosis rate for incident fragility fractures may be to focus sufficient attention on morphological changes in the anterior bone cortex when diagnosing from X-P images.

In this investigation of factors influencing the correct diagnosis rate of osteoporotic vertebral body fractures, we found age, body weight, and examiner ability had an overall effect. The negative correlation seen with age, in which the correct diagnosis rate decreased as age increased, and the decrease in the correct diagnosis rate with lower body weight are understandable, but the finding that BMD did not exert an effect was intriguing. Moreover, the finding that the ability of the examiner to interpret radiographs was reflected in the correct diagnosis rate indicates the importance of continuing efforts to improve ability.

Several points remain for future study, including the facts that the present study was a retrospective study and that the diagnosis was made without questioning the patients or pathological findings. Based on the results presented here, an investigation of how repeat readings will change the correct diagnosis rate should also be made. In any case, the finding that the correct diagnosis rate was low, even when made by orthopedists experienced in reading radiographs, is a finding that should be taken into consideration in the normal diagnosis of incident spinal fragility fractures with X-Ps only, and may be important in identifying keys for the development of new diagnostic criteria and more accurate diagnoses. The present study indicates the importance of not only improving the ability of examiners to interpret radiographs but also of the attention that should be paid to morphological changes in the anterior bone cortex during examinations.