Abstract
Assessment of vertebral fracture is critically important for the diagnosis and treatment of osteoporosis. This study aimed to clarify the effectiveness of the semiquantitative (SQ) method in the assessment of vertebral fractures in Japanese clinical practice. Forty-four physicians (seven experts and 37 nonexperts) assessed the spinal radiographs of 40 patients participating in the Adequate Treatment of Osteoporosis (A-TOP) Japanese Osteoporosis Intervention Trial (JOINT)-02 at the baseline, 12 months, and 24 months using the SQ method. The proportion of diagnosed fracture cases per spine was higher in the nonexpert group than in the expert group at each time point, and was especially high in the upper thoracic spine (T4–T6). The least mean squares spinal fracture index was significantly higher in the nonexpert group than in the expert group for all time points. The kappa statistics were also higher in the expert group than in the nonexpert group for all vertebral levels at all time points. Assessment of vertebral fractures using the SQ method tended to be overestimated by nonexpert physicians compared with the experts, with poor nonexpert interobserver reliability and well-matched expert interobserver reliability. Conscious efforts to avoid overestimation and to obtain higher reliability with the SQ method should be made to achieve more precise diagnoses and treatment of osteoporosis in Japanese clinical practice.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Vertebral fractures are the commonest osteoporotic fractures, and the assessment of vertebral fractures is widely used to diagnose osteoporosis or monitor disease progression. This assessment has also been used in many clinical trials as an end point to evaluate the efficacy of drugs for the treatment of osteoporosis [1–4]. Several methods of assessing vertebral fractures have been developed, and most are categorized as quantitative morphometry (QM) [5–9]. The morphometric approach is based on a comparison between the vertebral heights of osteoporotic patients and the vertebral height of normal women, including the anterior–posterior ratio, middle–posterior ratio, and posterior–posterior adjacent ratio. The cutoff thresholds differ, and no single measurement is considered the gold standard for vertebral fracture assessment. Recently, the semiquantitative (SQ) method has been used to assess vertebral fractures in clinical practice and clinical trials instead of QM. Genant et al. [10] devised the SQ method as a new way to assess vertebral fractures without measuring vertebral heights. In the SQ method, each spine was graded into four categories (normal, mild, moderate, and severe) on visual inspection. Excellent reproducibility of interobserver and intraobserver reliability (between experienced and inexperienced but trained observers) was found, with good agreement between QM and SQ methods [10]. Wu et al. [11] found excellent interobserver agreement using the SQ method. Grados et al. [12] compared the SQ method with four morphometric methods for assessing prevalent vertebral fractures, and found good agreement as well. Crans et al. [13] revealed that a spinal deformity index derived from the SQ assessment of vertebral fractures predicted future vertebral fracture risk and that the spinal deformity index or SQ method was clinically useful for the treatment of osteoporosis.
Despite these studies, little is known of exactly how the SQ method is used in clinical practice in Japan. Therefore, our aim was to clarify how the SQ method is used to assess vertebral fractures in clinical practice in Japan, by comparing expert physicians with nonexpert physicians.
Materials and methods
Materials
Lateral (thoracic and lumbar) spine radiographs of 40 osteoporotic patients were included in the present study. All of the radiographs originated from the Japanese Osteoporosis Intervention Trial (JOINT)-02 conducted by the Adequate Treatment of Osteoporosis (A-TOP) research group to evaluate combination therapy (alendronate and alfacalcidol) compared with monotherapy (alendronate alone) nationwide in Japan. The details of JOINT-02, including the study design, patient characteristics, inclusion and exclusion criteria, and end points, were previously reported [14, 15]. Baseline and follow-up (12 and 24 months) radiographs were converted into electronic data files (Digital Imaging and Communication in Medicine files). The use of the radiographs in this study was approved by the A-TOP executive and the ethical committees.
SQ method and spinal fracture index
The SQ approach was developed by Genant et al. [10] in the 1990s as a new method to assess vertebral fractures on visual inspection without measuring vertebral heights. The grading of each spine is classified into four categories as follows: normal (grade 0); mild deformity (grade 1, 20–25 % reduction in anterior, middle, and/or posterior height and 10–20 % reduction in area); moderate deformity (grade 2, 25–40 % reduction in any height and 20–40 % reduction in area); and severe deformity (grade 3, 40 % reduction in any height and area). In each spine, grade 1 or higher was considered “fractured” and grade 0 was considered “not fractured.” The spinal fracture index (SFI) was calculated for each patient by dividing the sum of individual vertebral grade scores by the number of spines evaluated, which provided general information on the osteoporosis severity in an individual patient [10].
Assessment of vertebral fractures using the SQ method
Seven expert physicians (expert group) and 37 nonexpert physicians (nonexpert group) independently assessed the vertebral deformity grade (T4–L4) of each patient on a personal computer using the SQ method. The expert group consisted of three orthopedists, three spinal surgeons, and one radiologist (average medical career, 29 years), all with experience assessing vertebral fractures in several drug trials or highly specialized experience assessing vertebral fractures in clinical practice. The nonexpert group consisted of 18 orthopedists, 14 internal medicine physicians, and 5 radiologists (average medical career, 16 years), not using the SQ method for assessing vertebral fractures in daily practice. Baseline and follow-up radiographs were assessed in chronological order per patient. The physician assessment data were gathered and statistically analyzed as a data set.
Statistical analysis
The frequency and proportion of SQ grade per spine and per visit were assessed between the experts and nonexperts, and the proportions were compared using the Pearson chi squared test. Also, the proportion of radiographs that the experts and nonexperts assessed as fractured (grade 1 or higher) was examined per spine at the baseline, 12 months, and 24 months. Physicians assessed the SFI for each patient, and the mean value was calculated. Using the SFI as the dependent variable, we used a mixed effects model, accounting for the correlation between the SFI of each patient assessed by the same physician. The SFI least mean squares was estimated to evaluate the difference between the experts and the nonexperts using the model, adjusted and not adjusted for their years of medical experience. To analyze interobserver reproducibility within each group, we calculated the kappa statistics per group per spine (T4–L4). Because we were interested in the degree of agreement between more than two physicians, we used the extended kappa statistical method proposed by Fleiss [16]. The kappa groups were as follows: 0–0.2, poor agreement; 0.2–0.4, fair agreement; 0.4–0.6, moderate agreement; 0.6–0.8, good agreement; 0.8–1.0, very good agreement [16].
Results
Forty-four physicians (seven experts and 37 nonexperts) assessed 40 sets of spine radiographs at the baseline and during the follow-up period using the SQ method. Table 1 shows the proportions of SQ grade per spine level assessed by the expert and nonexpert groups. There was a significant difference in all spine levels and at all time points between the two groups. The proportion of grade 0 was lower for every spine level in the nonexpert group than in the expert group. Figure 1 shows the proportion of fractured cases per spine level that the expert and nonexpert physicians assessed at the baseline, 12 months, and 24 months. The proportion of fractured cases was high in the thoracolumbar spine (T11–L2) compared with other spine levels in both groups. In addition, the proportion per spine was higher in the nonexpert group than in the expert group at each time point, and was especially high in the upper thoracic spine (T4–T6).
The mean values of the SFI assessed per case by the experts and nonexperts were plotted at the baseline, 12 months, and 24 months (Fig. 2). The mean values were consistently higher in the nonexpert group than in the expert group for every time point.
We compared the SFI of the expert group with that of the nonexpert group at the baseline, 12 months, and 24 months using a mixed effects model adjusted or not adjusted for years of experience as a physician. The least mean squares SFI was significantly higher in the nonexpert group than in the expert group for all time points (P < 0.0001) (Fig. 3). The margins of the least mean squares SFI between the nonexpert group and the expert group remained almost constant regardless of adjustment, at 0.21 (not adjusted) and 0.19 (adjusted) at the baseline, 0.21 and 0.19 at 12 months, and 0.23 and 0.19 at 24 months, respectively.
Table 2 shows the interobserver kappa statistics in the expert and nonexpert groups for SQ grade of vertebral deformity per spine. The kappa statistics were higher in the expert group than in the nonexpert group for all vertebral levels at the baseline, 12 months, and 24 months. The expert group scores were considered to have moderate or good agreement, except for the T4 and L3 levels at the baseline, and the T5, and T6 levels at 24 months, and were particularly high between T12 and L4. The kappa statistics for the nonexpert group were considered to have poor or fair agreement at the baseline, 12 months, and 24 months, and were particularly low between T4 and T6.
Discussion
We assessed the vertebral fractures of 40 patients using the SQ method at the baseline, 12 months, and 24 months, and evaluated the interobserver reproducibility and discrepancies between expert and nonexpert physicians. In all spines from T4 to L4, the proportion of fractured cases (grade 1 or higher) was higher in the nonexpert group than in the expert group at the baseline and in the follow-up period (Fig. 1), and the proportion of SQ grades evaluated was significantly different per spine between the two groups (Table 1). In addition, in all cases, the mean value of the SFI was higher in the nonexpert group than in the expert group at all visits (Fig. 2). The nonexpert group had a tendency to overestimate the SQ grade of vertebral fractures, particularly in the thoracic spine, compared with the experts. Genant et al. [10, 11] reported that the SQ assessment of vertebral fractures showed an excellent intraobserver and interobserver agreement between experienced and inexperienced but trained physicians, and that the SQ method was a good reproducible method to assess osteoporotic vertebral fractures. However, another report indicated that it was difficult to identify subtle differences between SQ grade 1 as mild fracture and borderline deformity (grade 0.5), and those assessments were sometimes arbitrary [17]. In these reports, the inexperienced physicians were well trained [10, 17], and it would appear that they understood how to make use of all information regarding vertebral body size, shape, and projection to assess vertebral fractures using the SQ method. The discrepancies between the two groups in our study may have resulted from a lack of previous training in SQ assessment for the nonexperts. In addition, our study was conducted under several preexisting biases because the radiographs we used originated from JOINT-02, and it was previously reported that participants in that study had high fracture risks, with the number of prevalent vertebral fractures of one or more as an inclusion criterion [13, 14], which may have had some effect on the assessment of vertebral fractures by the nonexpert group.
The least mean squares SFI was significantly higher in the nonexpert group than in the expert group at the baseline and in the follow-up period, whether adjusted or not adjusted (Fig. 3). The estimated margin between the two groups was fairly constant at 0.19–0.23. These results also indicate an overestimation by the nonexpert group compared with the expert group for the SQ assessment of vertebral fractures. Conversely, there was a major difference between the expert group and the nonexpert group in the kappa statistics at the baseline and in the follow-up period. The kappa statistics for the nonexpert group were notably low at 0–0.2 (poor agreement) from T4 to T6 and 0.2–0.4 (fair agreement) from T7 to L4, whereas those of the expert group were high at 0.4–0.6 (moderate agreement) in most spine levels and 0.6–0.8 (good agreement) from T12 to L4. The interobserver reproducibility in the expert group for SQ assessment was excellent compared with that of the nonexpert group, similar to the findings of Genant et al. [10, 11].
Delmas et al. [18] reported that underdiagnosis of vertebral fractures was observed in the IMPACT trial (a multicenter multinational prospective study) in several geographic regions, including North America, Latin America, Europe, South Africa, and Australia. All radiologists were given a radiographic procedure manual, which was the principal tool for standardization of the SQ assessment, and this was a major difference compared with our study. The results of Delmas et al.’s study were as follows: there were 789 patients with vertebral fractures (grade 1 or higher) and 1,662 patients without vertebral fractures (grade 0) in the central readings, and 607 with vertebral fractures and 1,844 without vertebral fractures in the local readings. Further, among 789 patients with vertebral fractures in the central readings, 266 patients had no vertebral fractures (false-negative rate, 34 %) and 523 patients had vertebral fractures (true-positive rate, 66 %) in the local readings. Among 1,662 patients without vertebral fractures in the central readings, 1,578 patients had no vertebral fractures (true-negative rate, 95 %) and 84 patients had vertebral fractures (false-positive rate, 5 %) in the local readings. Conversely, our results indicated that the proportion of fractured cases was lower in the expert group than in the nonexpert group, revealing a discrepancy in the results between the two studies. It appears that a bias toward aggressive identification of vertebral fractures occurs in a clinical trial because of the strict protocol and use of the radiographic procedure manual, which differed from our study design, and our results may be reasonable despite no use of a manual.
Our study has several limitations. First, the SQ assessment was performed using images on a personal computer rather than on the actual X-ray films, which may have reduced the image resolution and minimized the shape of the spine. Second, all physicians independently assessed the vertebral fractures from the baseline to 24 months without instructions for the standardizing of the SQ assessment method, such as would be obtained from a special manual. Finally, the results of our expert group are not a gold standard of assessment but a reference, and our results are a relative comparison, only, of the two groups because there was no assessment adjudication in the expert group.
Vertebral fracture assessment is important not only in diagnosis and evaluation of the treatment effects of osteoporosis but also in epidemiologic studies of osteoporosis or the treatment of the clinical vertebral fracture. The SQ method may be not well known in daily clinical practice, but it has been widely used in assessment of vertebral fracture in many clinical trials of osteoporotic drugs. Precise assessment of vertebral fracture using the SQ method in daily practice is necessary to realize proper diagnosis and treatment of osteoporosis. Our results suggests that (1) conscious effort should be made to promote the SQ method in daily practice, and (2) training programs for the SQ method may be helpful to avoid overestimation of vertebral fractures by nonexpert physicians.
In conclusion, we clarified that the SQ assessment of vertebral fractures tended to be overestimated by nonexpert physicians, with poor nonexpert interobserver reliability and well-matched expert physician interobserver reliability in Japan. The SQ method is generally understood to include the entire spectrum of features of spinal deformity and to have a high reproducibility. Conscious efforts should be made to promote the SQ method to contribute to the treatment of osteoporosis.
References
Chesnut CH III, Skag A, Christiansen C et al (2004) Effects of oral ibandronate administered daily or intermittently on fracture risk in postmenopausal osteoporosis. J Bone Miner Res 19:1241–1249
Harris ST, Watts NB, Genant HK et al (1999) Effects of risedronate treatment on vertebral and nonvertebral fractures in women with postmenopausal osteoporosis: a randomized controlled trial. JAMA 282:1344–1352
Ettinger B, Black DM, Mitlak BH et al (1999) Reduction of vertebral fracture risk in postmenopausal women with osteoporosis treated with raloxifene: results from a 3-year randomized clinical trial. JAMA 282:637–645
Neer RM, Arnaud CD, Zanchetta JR et al (2001) Effect of parathyroid hormone (1–34) on fractures and bone mineral density in postmenopausal women with osteoporosis. N Engl J Med 344:1434–1441
Melton LJ 3rd, Lane AW, Cooper C et al (1993) Prevalence and incidence of vertebral deformities. Osteoporos Int 3:113–119
Eastell R, Cedel SL, Wahner HW et al (1991) Classification of vertebral fractures. J Bone Miner Res 6:207–215
Minne HW, Leidig G, Wüster C et al (1988) A newly developed spine deformity index (SDI) to quantitate vertebral crush fractures in patients with osteoporosis. Bone Miner 3:335–349
Sauer P, Leidig G, Minne HW et al (1991) Spine deformity index (SDI) versus other objective procedures of vertebral fracture identification in patients with osteoporosis: a comparative study. J Bone Miner Res 6:227–238
McCloskey EV, Spector TD, Eyres KS et al (1993) The assessment of vertebral deformity: a method for use in population studies and clinical trials. Osteoporos Int 3:138–147
Genant HK, Wu CY, van Kuijk C et al (1993) Vertebral fracture assessment using a semiquantitative technique. J Bone Miner Res 8:1137–1148
Wu CY, Li J, Jergas M et al (1995) Comparison of semiquantitative and quantitative techniques for the assessment of prevalent and incident vertebral fractures. Osteoporos Int 5:354–370
Grados F, Roux C, de Vernejoul MC et al (2001) Comparison of four morphometric definitions and a semiquantitative consensus reading for assessing prevalent vertebral fractures. Osteoporos Int 12:716–722
Crans GG, Genant HK, Krege JH et al (2005) Prognostic utility of a semiquantitative spinal deformity index. Bone 37:175–179
Shiraki M, Kuroda T, Miyakawa N et al (2011) Design of a pragmatic approach to evaluate the effectiveness of concurrent treatment for the prevention of osteoporotic fractures: rationale, aims and organization of a Japanese Osteoporosis Intervention Trial (JOINT) initiated by the Research Group of Adequate Treatment of Osteoporosis (A-TOP). J Bone Miner Metab 29:37–43
Orimo H, Nakamura T, Fukunaga M et al (2011) Effects of alendronate plus alfacalcidol in osteoporosis patients with a high risk of fracture: the Japanese Osteoporosis Intervention Trial (JOINT)-02. Curr Med Res Opin 27:1273–1284
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378–382
Genant HK, Jergas M, Palermo L et al (1996) Comparison of semiquantitative visual and quantitative morphometric assessment of prevalent and incident vertebral fractures in osteoporosis. J Bone Miner Res 11:984–996
Delmas PD, van de Langerijt L, Watts NB et al (2005) Underdiagnosis of vertebral fractures is a worldwide problem: the IMPACT study. J Bone Miner Res 20:557–563
Acknowledgments
The authors thank the expert and nonexpert physicians for the assessment of the vertebral fractures using the SQ method. We also thank the A-TOP research group for providing the spinal radiographs of the patients participating in JOINT-02.
Conflict of interest
All authors have no conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Uemura, Y., Miyakawa, N., Orimo, H. et al. Comparison of expert and nonexpert physicians in the assessment of vertebral fractures using the semiquantitative method in Japan. J Bone Miner Metab 33, 642–650 (2015). https://doi.org/10.1007/s00774-014-0625-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00774-014-0625-3