Introduction

Vertebral fractures occur in approximately 20% of postmenopausal women [1], but two-thirds of vertebral fractures do not come to clinical attention [2], and they are usually considered as asymptomatic. However, the risk of vertebral fractures in women with one prevalent fracture is twice that of women without prevalent fractures, while for women with three or four prevalent fractures, the risk is almost six times higher [3]. Prevalent vertebral fractures are a strong risk factor for subsequent peripheral fractures, including hip fracture [4]. Height loss, kyphosis, chronic back pain and back-related functional disability are the usual consequences of vertebral fractures [5]. Social isolation and depression have also been reported in patients with vertebral fractures [6]. All together, the consequences of vertebral fractures have an important impact on patient’s health-related quality of life [7, 8]; the greater the number and the severity of fractures are, the worse the quality of life is [7, 8]. Recent fractures have more impact than older fractures and cause great health care utilization [9]. Moreover, patients with multiple fractures or clinical vertebral fractures are at increased risk of mortality [10, 11]. The number of elderly people at risk for osteoporosis is expected to increase dramatically in the next decades. All together, these data indicate that accurate identification of vertebral fractures and appropriate treatment are needed to reduce the impact of this disease on patients and the health care system.

Identification of vertebral fractures may be considered as obvious. However, in a clinical trial when patients have to be included in seven centers on the basis of the presence of vertebral fracture, 25% of patients were requalified as having no fracture by central reading [12]. In a clinical setting, vertebral fracture was recorded as a discharge diagnosis in only 1 out of 12 hospitalized old women who had radiographic evidence of a fracture [13]. Chest radiographs are a potential tool for revealing thoracic vertebral fracture; only half of moderate to severe fractures were mentioned on discharge reports in the emergency department of a tertiary care hospital [14].

The objective of this study was to assess the accuracy of the spinal radiographic diagnosis of vertebral fractures in a routine rheumatology outpatient clinic setting. We compared the results of local interpretation given by a rheumatologist with that of a subsequent central reading.

Patients and methods

Study participants

Study subjects were ambulatory post-menopausal women aged 60 to 80 years visiting a rheumatologist. The patients had clinical symptoms (thoracic and/or lumbar spine pain, kyphosis, height loss, etc.) that, according to the rheumatologist, were potential signs of vertebral fracture. The patients were not being given any anti-osteoporotic treatment at the start of the trial and gave an informed consent.

Evaluation of vertebral fractures

Each investigator was asked to prescribe spine X-rays according to standardized procedures for image acquisition, including imaging screen technique, film size, exposure time, kilovolt peak, collimation of the X-ray beam, patient positioning, focus-film distance (100 cm) and patients’ breathing techniques. Three lateral X-rays (thoracic and lumbar radiographs and an image of the thoraco-lumbar junction) and antero-posterior radiographs of the spine were obtained. At each site, the rheumatologist was instructed to evaluate each radiograph for the presence of vertebral fracture from T4 to L5. For each vertebra, a binary assessment (fracture yes/no) was used. Then, spinal radiographs of patients with at least one fracture according to this evaluation were selected. They were sent to a single central reading facility (CEMO, Cochin Hospital, Paris) for confirmation of the quality of X-rays and evaluation for vertebral fracture by a single rheumatologist trained for the use of the semi-quantitative method of vertebral fracture assessment described by Genant [15]. Criteria for good image quality included superimposition of vertebral endplates, blurred rib contours and appropriate exposure enabling clear visibility of vertebral contours along the entire spine. Vertebral deformities unrelated to fracture, such as those associated with Scheuermann’s disease and osteoarthritis, were excluded from the grading. A semi-quantitative visual assessment of each vertebra from T4 to L4 was performed as follows: grade 0, normal; grade 1, a decrease of 20 to 25% in the height of any vertebra; grade 2, a decrease of 25 to 40%; grade 3, a decrease of 40% or more [15]. For L5 vertebra, a binary analysis was performed.

Statistical methods

Baseline characteristics are expressed as the mean ± standard deviations for quantitative variables and by the absolute and relative (%) frequency for qualitative variables. The concordance of diagnosis was assessed at two levels. At the vertebral level, a vertebra was qualified as fractured by the semi-quantitative assessment if it had a grade ≥1. The diagnosis concordance between the investigator and central reader was evaluated using kappa scores (±95% confidence interval) for each vertebral level. At the patient level, a patient was qualified as fractured if she had at least one vertebral fracture. For central reading, the diagnosis was considered as not possible in the presence of some non-legible vertebrae with the remaining vertebrae being non-fractured.

Results

The study was conducted with 294 rheumatologists asked to interpret the X-rays as usual care, without particular training. Among the 824 patients included, 629 were considered as having at least one osteoporotic vertebral fracture by the investigators. The diagnosis of fracture by central reading using the semi-quantitative assessment could only be done in 588 patients (93.5%). In 41 patients, the diagnosis was not possible, as no fracture was observed, but at least one vertebra was illegible. The baseline characteristics of the 629 patients are summarized in Table 1.

Table 1 Characteristics of the patients

Concordance of diagnosis at the vertebral level

In order to evaluate the concordance of diagnosis at the vertebral level between local and central interpretations, we first considered all the vertebrae (from T4 to L5) of the 629 patients. Among them, 7,878 vertebrae could be evaluated with the semi-quantitative assessment. They are the basis of this analysis.

The kappa scores varied from 0.20 (0.03–0.37) to 0.77 (0.72–0.82). They increased from T4 to be greater than 0.6 for the nine vertebrae from T8 to L4 (Fig. 1).

Fig. 1
figure 1

Diagnosis agreement on vertebral fracture as a function of the vertebral level, according to kappa score

According to the centralized analysis, 1,536 vertebrae were fractured. Among them, 396 were considered as non-fractured by the investigator (25.8% false-negative at the vertebral level). Among these 396 false-negative fractures, 268 (67.7%) were grade 1, 74 (18.7%) were grade 2 and 54 (13.6%) were grade 3; 62 (15.7%) errors were related to an obvious numbering discrepancy (including 25.8% of grade 1, 38.7% of grade 2 and 35.5% of grade 3). The thoraco-lumbar junction (T12 and L1 vertebrae) represented 17.7 and 14.5%, respectively, of these discrepancies. When numbering discrepancies were removed, the false-negative fractured vertebrae rate was still 21.7%. The thoracolumbar distribution of all detected vertebral fractures is shown in Fig. 2.

Fig. 2
figure 2

Distribution of the number of fractures and false negative rate (FNR) by vertebral level

Most fractures occurred in the lower thoracic or upper lumbar spine, but most under-diagnosed fractures occurred in the upper-thoracic spine. The proportion of under-diagnosed fractures in the thoracic spine ranged from 20.4 to 80.8% (at T12 and T4, respectively) and in the lumbar spine ranged from 14.8 to 25% (at L1 and L3, respectively).

Among the 6,342 non-fractured vertebrae according to the centralized analysis, 397 were considered as fractured by the investigator (6.3% false positive at the vertebral level). The majority of these discrepancies were situated from T4 to T6.

Concordance of diagnosis at the patient level

Among the 588 patients who could be evaluated with the semi-quantitative assessment, 40 were requalified by the central reader as having no fracture, i.e., 6.8% of the discrepancies between the two assessments. Two hundred thirty-five (40.0%) of the 588 patients had at least one false-negative vertebra, and 205 (34.9%) had at least one false-positive vertebra.

Discussion

This study shows that vertebral fractures were frequently under-diagnosed in the assessment of postmenopausal women with osteoporosis. The false-negative fractured vertebrae rate was 25.8%, despite a standardized protocol of acquisition, which was aimed at avoiding inadequate film quality. Numbering mistakes contribute slightly to these discrepancies between local and central readings. Moreover, as this false-negative rate has been observed on X-rays that were read originally as showing at least one fracture, it thus may be underestimated.

There are possible explanations for this high rate of failure to identify vertebral fractures in the local assessments. Long-standing fractures may be considered clinically irrelevant. However, although the immediate (i.e., in a year) risk was reported only for incident fracture [16], any vertebral fracture increases the risk of future vertebral and peripheral fractures [3, 4]. Because the presence of at least one vertebral fracture was required for study enrollment, local reviewers may have considered that the number of fractures was not relevant and may have stopped looking for fractures as soon the first one was observed. This could most likely explain why 128 (82 after removing numbering mistakes) grade 2 and 3 fractures were missing. One could consider that missing one fracture is not clinically relevant in patients having several fractures, since it will not change the therapeutic decision of the physician. However, studies of quality of life (QOL) in patients with vertebral fractures using a specific questionnaire for osteoporosis (QUALEFFO) have shown that a QOL decrease is a function of both the severity and the number of vertebral fractures [7, 8]. These two parameters are therefore clinically important in a clinical evaluation. Moreover, in the Study of Osteoporotic Fracture, there was an increased risk of mortality in patients with vertebral fractures, and mortality rose with the number of fractures [17]. One possible explanation for this observation is the relationship between the spine deformity, kyphosis, pulmonary restrictions and deaths [17].

We fully recognize that there is no “gold standard” for definition of vertebral fractures. In particular, the grade 1 fractures, which represent 67.7% of false-negative fractures, have been a subject of controversy. In a prospective study in post-menopausal women receiving calcium and vitamin D, those with grade 1 fractures had an incidence rate of vertebral fractures in 3 years of 10.5%, which is 2.4 times that of patients without fracture at baseline [18]. These data emphasize the need for accuracy in radiographic identification of mild fractures. In our study, most of the missed fractures were located in the thoracic spine. Actually, these fractures are so common that some physicians may consider them as an expected effect of aging. Moreover, degenerative changes are common in the mid-thoracic spine and may explain this high level of discrepancy.

As a consequence of the failure to diagnose vertebral fracture radiographically, many patients who require treatment to reduce fracture risk and maintain quality of life are not being properly identified [19]. Improving the accuracy of reporting of vertebral fractures on X-rays is important for the appropriate management of osteoporotic patients.