Introduction

Over the past decade, dual-energy X-ray absorptiometry (DXA) has emerged as the gold standard to evaluate patients at risk for fragility fractures [1,2]. DXA thresholds have been defined to diagnose osteoporosis in (Caucasian) women and have proven to be useful to identify subjects who are at increased risk of fracture [3]. However, because of cost constraints and limited availability, DXA is not an optimal tool for population screening. In this regard, there is a continuing need for low-cost screening methods to select high-risk individuals who are more likely to benefit from DXA testing [4,5].

Quantitative ultrasound (QUS) at the calcaneus is an attractive screening tool because of the (relatively) low cost, the lack of ionizing radiation, and the fact that it has been shown to predict future fracture risk [6]. However, calcaneal QUS thresholds for the diagnosis or treatment of osteoporosis have not as yet been defined. In view of these limitations, QUS is mainly used to screen for potentially osteoporotic individuals. In most centers, postmenopausal women with low ultrasound values are referred for additional density measurement by DXA for a definitive diagnosis.

The aim of the current study was to evaluate the ability of calcaneal QUS to identify osteoporosis, as defined by a lumbar spine or total hip T-score below –2.5 using DXA, and to compare QUS with two alternative potential screening methods: (i) digital X-ray radiogrammetry (DXR), a new radiogrammetric method providing an estimation of metacarpal bone density from basic geometric measurements [7], and (ii) radiographic absorptiometry (RA) to assess phalangeal bone density [8].

Materials and methods

Patient recruitment

The study population consisted of community-dwelling postmenopausal women who had been consecutively referred to the Leuven University Center for Metabolic Bone Diseases for bone densitometry. To be eligible for participation, women had to be between 50 and 75 years of age, functionally independent, and without clinical evidence of osteoarthritis. None of the women included in the study was receiving therapy for osteoporosis, including hormone therapy, SERMs or bisphosphonates. Women with peripheral oedema were also excluded, to avoid interference with ultrasound transmission. After obtaining informed consent, all tests were performed on the same day and in a set order. Informed consent was obtained from all the women and all procedures were approved by the Leuven University Medical Ethics Committee.

Dual-energy X-ray absorptiometry (DXA)

Areal bone density was measured using the DXA QDR 4500a fan beam system (Hologic Inc., Waltham, Mass., USA) by specially trained and certified technicians. Recently established national reference data [9] were used to derive T-scores at the lumbar spine (vertebrae L2–L4) and the total hip region. For the purpose of this analysis, osteoporosis (the outcome measure) was defined as a T-score below –2.5 at the spine and/or hip. In our unit, the short-term precision errors for both measurements in postmenopausal women are less than 1%.

Ultrasound

Calcaneal ultrasound attenuation was measured using the Sahara equipment (Hologic). The Sahara system linearly combines broadband ultrasound attenuation (BUA) and speed of sound (SOS) values with equal weighting into a single parameter called the quantitative ultrasound index (QUI). QUI is claimed to improve the standardized coefficient of variation of velocity or BUA alone. T-score calculations for QUI were based on local reference data [9]. Short-term precision in postmenopausal women is 2.6% for QUI in our setting.

Digital X-ray radiogrammetry (DXR)

From an assumed physical model of the bone [7], DXR attempts to bridge the gap between radiogrammetry and densitometry. In contrast to DXA, radiogrammetry does not use the intensities of the image in a quantitative manner, but relies on geometric measurements to provide a BMD estimate. In all participants, a plain radiograph of the non-dominant hand was taken according to a standardized protocol and analyzed with the X-Posure System version 2 software (Pronosco A/S, Vedbaek, Denmark), to produce a density estimation based on the measurement of cortical thickness of the metacarpals of the 2nd, 3rd, and 4th digits. Digital radiographs were acquired using computed radiology (ADC-70; Agfa Gevaert NV, Mortsel, Belgium). The X-ray tube, at an average setting of 50 kV and 5 mA, is focused on the middle of the second metacarpal and positioned at 1 m from the hand. Patient X-ray exposure to radiation using standard X-ray equipment is 1 µS per examination. The regions of interest (ROIs), the three middle metacarpal bones, are automatically detected by the software and cannot be modified by the operator. Based on the mean cortical thickness, the volume bone per area and a density is calculated (g/cm2). T-score calculations for DXR were based on local reference data [9]. The short-term precision error in postmenopausal women in our setting was 0.7%.

Radiographic absorptiometry (RA)

Bone mineral density of the middle phalanges of the 2nd, 3rd and 4th digits of the non-dominant hand were performed with a self-contained single energy (60 kV) X-ray system (Alara Metriscan, Hayward, Calif., USA). To record the image, the system uses a storage phosphor plate, which is scanned to extract the image. The hand radiograph is corrected according to a record reference image startup. An aluminium wedge contained in the image is used as an image positioning reference. After the system analyses the image and has segmented soft tissue and bone into separate components, the ROIs are automatically identified and outlined. Density is estimated in the three phalanges and, after averaging, expressed in arbitrary units (mineral mass/area). T-scores were expressed using local reference data [9]. The patient X-ray exposure is less than 0.012 µSv per examination. Short-term precision in postmenopausal women in our center is 1.14%.

Statistical analysis

The primary aim of our analyses was to compare the value of QUS, DXR, and RA in discriminating osteoporosis (as defined by DXA-determined BMD) and to determine whether the discrimination of osteoporosis by QUS was enhanced by additional measurements using DXR or RA. In view of the impact of age and body mass index (BMI) on bone density and osteoporosis occurrence, all regression analyses were adjusted for age and BMI. Linear regression analyses were performed relating the various DXA measurements with the values of QUS, DXR, and RA. Subsequently, multiple regression models were constructed with total hip or lumbar spine density as response and QUS, DXR, and RA values as regressors. For each model, partial multiple correlation coefficients were calculated (R2 p ) to indicate the relative importance of each regressor in the model. No interaction terms were included in any of the fitted regression models. As indicated, osteoporosis was defined as a T-score below –2.5 at the spine and/or hip. Receiver operating characteristic (ROC) curves were constructed by calculating the specificity and sensitivity of QUS, DXR, and RA at different cut-point values in discriminating osteoporosis (as defined by DXA-determined BMD) and the areas under the curves (AUCs) were computed. Sensitivity was defined as the proportion of subjects with osteoporosis who had a “positive” QUS, DXR or RA test. Specificity was defined as the proportion of subjects without osteoporosis who had a “negative” QUS, DXR or RA test. For each test, the positive predictive value (PPV) was calculated as: true positive (positive test and osteoporosis) divided by the number of subjects with a positive test. The negative predictive value (NPV) was calculated as: true negative (negative test and without osteoporosis) divided by the number of subjects with a negative test. All statistical analyses were conducted with the use of SAS (Statistical Analysis Systems Inc., Cary, N.C., USA). Reported P-values are two-sided. The nominal significance level was set at 0.05.

Results

Subject characteristics

The study population consisted of 221 postmenopausal women, of whom 41 (18.5%) had osteoporosis (defined as a DXA-determined T-score below –2.5 at the hip or spine). Subject characteristics and the results of the DXA, QUS, DXR, and RA measurements are shown in Table 1. Compared with women without osteoporosis, patients with osteoporosis were older and, as expected, had a lower body mass index (BMI) and significantly lower DXA, QUS, DXR, and RA values.

Table 1 Subject characteristics of women participating in the study and results of the DXA, QUS, DXR, and RA measurements

Prediction of DXA-determined BMD by QUS, DXR and RA

Both unadjusted (Figs 1, 2, and 3) and age- and BMI-adjusted (data not shown) regression analyses revealed statistically highly significant relations between femoral BMD as measured by DXA, regardless of measurement site, and QUS, DXR, and RA values. According to age- and BMI-adjusted multiple regression models (Table 2), only approximately 5% of the variance in lumbar spine BMD could be explained by QUS, compared with about 17% by DXR (P<0.001). Similar findings were observed when assessing total hip BMD. Likewise, prediction of lumbar spine or total hip BMD by RA was not, or was only marginally, enhanced by performing QUS (Table 3).

Fig. 1
figure 1

Correlation between QUS and DXA-determined lumbar spine BMD (R=0.478, P<0.001)

Fig. 2
figure 2

Correlation between DXR and DXA-determined lumbar spine BMD (R=0.608, P<0.001)

Fig. 3
figure 3

Correlation between RA and DXA-determined lumbar spine BMD (R=0.661, P<0.001)

Table 2 Prediction of lumbar spine and total hip BMD as measured by DXA by (age- and BMI-adjusted) multiple regression models based on QUS and DXR
Table 3 Prediction of lumbar spine and total hip BMD as measured by DXA by (age- and BMI-adjusted) multiple regression models based on QUS and RA

Sensitivity and specificity of QUS, DXR and RA in discriminating osteoporosis

The ROC curve for identifying women with osteoporosis is plotted in Fig. 4. The QUS ROC curve showed an AUC (SE) of 0.72 (0.04), whereas the DXR and RA curves were characterized by AUCs of 0.84 (0.03) and 0.80 (0.03), respectively. As indicated in Table 4, QUS identified about 68% of patients with osteoporosis (sensitivity) and about 70% of women without osteoporosis (specificity). Of those women with a negative QUS test, about 90% had no osteoporosis (negative predictive value), whereas only about one-third of patients with a positive QUS test had osteoporosis (positive predictive value). The sensitivity and specificity of DXR and RA in discriminating osteoporosis are indicated in Tables 5 and 6. DXR and RA identified 77% and 83% of the women with osteoporosis, respectively. Both for DXR and RA, the negative predictive value was about 94%. When comparing the AUC values for different tests, the AUC difference of QUS and DXR had a P-value of 0.02. Similar trends were observed when analyzing the AUC difference between QUS and RA (P=0.04). Finally, the screening methods were combined to explore the possibility of improving the sensitivity and specificity attained by QUS, DXR or RA alone (Table 7). In the algorithm used, the cases were detected only if they were identified by both methods. Compared with DXR alone, combining DXR with QUS decreased the sensivity from 77% to 68%; the specificity was also poorer (from 79% to 69%). Similarly, combining RA with QUS had a negative impact on sensitivity and specificity. When performing sensitivity-specificity analyses on a subset of the study population consisting of age- and BMI-matched individuals with and without osteoporosis (n=131), similar results were obtained but the AUC differences between QUS and DXR or RA were no longer statistically significant (data not shown).

Fig. 4
figure 4

ROC curve comparison between QUS, DXR, and RA to screen for osteoporosis at the spine or hip for all subjects (n=221)

Table 4 Sensitivity and specificity for various threshold values of QUS in identifying osteoporosis at the lumbar spine or total hip BMD
Table 5 Sensitivity and specificity for various threshold values of DXR in identifying osteoporosis at the lumbar spine or total hip BMD
Table 6 Sensitivity and specificity for various threshold values of RA in identifying osteoporosis at the lumbar spine or total hip BMD
Table 7 Performance comparison using sensitivity and specificity for combinations of tests to screen for osteoporosis at the spine or hip on all subjects versus one test alone

Discussion

To reduce the burden associated with osteoporosis, it is important to identify women at risk for osteoporotic fractures. Several prospective studies have assessed the association between quantitative ultrasonic measures and osteoporotic fractures, providing strong evidence for a relation between baseline QUS and subsequent (hip) fracture risk [10,11]. Even after adjustment for (femoral neck) BMD, the ultrasound variables remained predictive of hip fracture, supporting the concept that QUS may not be a simple measure of bone density, but rather a measure of other qualitative properties of bone [12]. In clinical studies, each 1 SD reduction in calcaneal BUA was associated with a doubling of the risk for hip fractures [relative risk (RR)=2.0], compared to RRs of 2.2 and 1.9–2.6 for BMD measurements in the calcaneus and femoral neck, respectively [10,11]. These findings suggest that QUS and BMD may be equally predictive of future fracture risk. Nevertheless, in the process of making treatment decisions, an accurate knowledge of DXA-assessed BMD status is still useful in many patients. Particularly among those who have not yet suffered a vertebral fracture, those who are in the DXA-defined osteoporotic range, a T-score below −2.5, will benefit most from (anti-resorptive) treatment [13]. Because recent intervention trials did not include QUS in their protocols [14,15,16], similar QUS intervention thresholds remain to be defined.

However, DXA is not adequate for population screening purposes, due to cost constraints and availability. Consequently, screening all women by DXA is not recommended [17,18]. Calcaneal QUS is therefore increasingly being used to identify subjects likely to have low bone mass who should be referred for bone densitometry. The objective of this study was to calculate the sensitivity, specificity, and discriminative values of calcaneal QUS in a community-dwelling sample of postmenopausal women. QUS identified about 68% of patients with osteoporosis (sensitivity) and about 70% of women without osteoporosis (specificity). The QUS AUC was 0.72. This area under the ROC curve measures the probability of the correct risk rating of a randomly selected diseased/healthy pair of subjects. An AUC of at least 0.80 is usually required to consider a test to be effective [19]. Only about one-third of patients with a positive QUS test had osteoporosis (positive predictive value). However, given that many postmenopausal women have one or more risk factors for osteoporosis, the question may not be whom to test, but rather whom not to test [20]. Of those women with a negative QUS test, about 90% had no osteoporosis (negative predictive value).

The use of QUS to screen for BMD at different sites has been studied previously by several groups [21,22,23,24]. In these reports, linear regression analysis and ROC analysis were used to predict BMD at different sites from calcaneal [21,22,23] or phalangeal ultrasound [24]. Overall, in all studies, the QUS correlation coefficients and ROC analysis showed results similar to this study.

In addition to QUS measurements of the calcaneus, phalanges, radius and other peripheral bones, a variety of DXA and other X-ray-based peripheral measurement techniques (such as DXR, RA or pDXA of the calcaneus) have recently found widespread clinical use. However, the pattern of age-related bone loss defined by these approaches, even when standardized as a T-score, varies substantially [17,25,26]. Consequently, the uniform application of a threshold level of T equal to −2.5 results in the classification of a varying number of subjects as osteoporotic. Moreover, as with QUS, it is unclear which thresholds should be used for DXR or RA in the decision-making process.

In this study, we addressed the potential of DXR or RA to serve as more efficient case-finders and subsequently increase the diagnostic utility of DXA scans. In our study sample, the sensitivity for identifying women with osteoporosis was 77% using DXR and 83% using RA. The DXR and RA curves were characterized by AUCs of 0.84 and 0.80, respectively. Of those subjects who had a negative metacarpal or phalangeal density test, 94% were identified by DXA as having no lumbar spine or total hip osteoporosis. However, the AUC differences between QUS and DXR or RA, although (marginally) statistically significant, were small and did not persist after adjusting for age and BMI. Based on the results of our specificity-sensitivity analyses, the case is thus not very strong for superiority of the radiation-based hand measures relative to the ultrasound-based calcaneal measures. Taken together, our data suggest that, compared with calcaneal QUS, metacarpal DXR and phalangeal RA may be at least as effective as pre-screening methods for targeting DXA testing in high-risk postmenopausal women. Combining DXR or RA with QUS had a negative impact on sensitivity and specificity. DXR or RA alone remained the best screening methods.

While our results support the concept that DXR and RA, like QUS, can be helpful to identify subjects who should (or should not) be referred for DXA, the aim of our analysis was not to address the appropriateness or even cost-effectiveness of this approach. Given the residual uncertainty of using peripheral density to predict central BMD, the need to re-measure many individuals while missing a significant proportion of true positives, and the economic implications of these strategies, future research should address the costs and benefits of different types of prescreening before making recommendations.

In women with DXA-documented osteoporosis, various antiresorptive agents have been demonstrated to reduce fracture risk. However, a T-score threshold to “define” osteoporosis, the outcome used in our study, may or may not indicate an absolute need for treatment in individual patients. The indication for therapy should be modulated by clinical factors, including age and fracture history [27]. Although there is a strong association between BMD and the likelihood of fracture, other factors may also influence fracture risk. For a proportion of women who are labeled as osteoporotic, the risk of a fracture during their remaining lifetime could be theoretically sufficiently low that treatment would not be appropriate. Conversely, many women who do not reach the threshold “osteoporosis” according to the WHO definition might have other risk factors and circumstances that would justify treatment.

Another important issue when calculating T-scores is the use of an appropriate reference range. Recent studies have shown that there may be limited agreement between a manufacturer’s reference database and data derived from a study population. For example, mean values for femoral neck BMD from the NHANES III study population were approximately 3–5% lower than the manufacturer’s reference values, and the standard deviations were 26–30% higher [28]. There are various reasons for such discrepancies, and the most obvious lie in different sampling procedures of the various studies. In the current study, DXA, QUS, DXR and RA results were expressed using a single, local young-adult reference population [9].

There are several limitations which may affect the inferences derived from these data. In particular, our study was not designed to address the issue of the (relative) value of DXR or RA in predicting fracture risk. In fact, correlations between different measurement approaches rarely allow any deductions to be made about a technique’s ability to predict fracture occurrence. Only prospective fracture endpoint trials can provide an answer to that important question. While QUS has been shown to predict future fracture risk [10,11], there is no prospective evidence as yet of fracture prediction by DXR or RA. For the time being, if DXR or RA is to have clinical use, it will be as a prescreening tool to identify postmenopausal women with increased risk of low BMD. Also, the number of patients with postmenopausal osteoporosis included in this study was limited, and the AUC differences between QUS and DXR or RA were small, supporting the need for additional, large-scale studies. Moreover, our results may not be generalizable, since the participants were referred for bone densitometry and not a random sample of the postmenopausal population. Our study sample had proportionately fewer women older than 65 as compared with the actual distribution of menopausal women in Belgium, with only about 19% of our study subjects diagnosed as osteoporotic. This may have affected the overall specificity of each selection method. Finally, we acknowledge that fracture risk depends on life expectancy and other factors, in addition to bone density. Information about a woman’s BMD must be combined with other risk factors, as well as with information about the effectiveness, inconvenience, side effects, risks, and costs of the treatment considered.

In conclusion, the results of this study suggest that metacarpal DXR and phalangeal RA may be used as effectively as calcaneal QUS for pre-screening in high-risk postmenopausal women. Because an intervention threshold has been developed only for DXA, this method remains relevant to treatment decisions in many women. However, a variety of peripheral measurement approaches can be used to identify women at increased risk for DXA-defined osteoporosis.