Introduction

Osteoporosis is a disease characterized by low bone mass and structural deterioration of bone tissue, leading to bone fragility and an increased susceptibility to fractures [1]. Osteoporosis, as a systemic bone disease, not only occurs in postmenopausal women but also plagues many elderly male patients. The major and worst outcome of osteoporosis is osteoporotic fractures, and the resultant high morbidity and mortality has caused an enormous healthcare burden on individuals, families, and society. It has been reported that fractures caused by osteoporosis will double and medical expenses will increase at an alarming rate by the year 2050, and the incidence of osteoporosis in men will also increase [2]. In current clinical practice, diagnosis and treatment of osteoporosis are gradually being recognized by doctors, but osteoporosis, especially for men, is still underdiagnosed and undertreated.

Bone mineral density (BMD), measured by dual-energy X-ray absorptiometry (DXA), is the gold standard for diagnosing osteoporosis according to World Health Organization (WHO) criteria [3], but the huge bulk, nonportability, and relatively high expense of DXA equipment, which requires a professional technician to operate, has limited the access to DXA in some countries and regions. In China, DXA use mainly concentrates in large hospitals; most community health services centers do not install the equipment and therefore cannot meet the needs of the elderly population for screening. Thus, clinicians need the help of a simple screening tool to detect the high-risk population of osteoporosis. A variety of simple screening tools have been developed to screen osteoporosis, but most of these tools are developed based on European populations [46]. The Osteoporosis Self-Assessment Tool for Asians (OSTA) was developed from postmenopausal Asian women in eight different Asian regions by assessing several clinical risk factors associated with bone loss and osteoporosis. The final index included only age and weight [7]. OSTA has been validated in South Korea, the Philippines, African-Americans, Caucasians, and other ethnic populations [811] and found to be helpful in detecting low BMD in postmenopausal women. In 2005, Kung et al. [12] developed a clinical risk assessment tool for male osteoporosis based on only age and weight, similar to the OSTA. In that study, information on risk factors (age; weight; height; weight-bearing activity for > 1 h/day; fracture after age of 50 years; and daily calcium intake) was collected from 420 community-dwelling adult Chinese men aged 50 years and older using a structured questionnaire, and the ability of these risk factors to identify subjects with femoral neck BMD T score of −2.5 or less was assessed. Multiple regression analysis and item reduction yielded a final clinical risk assessment tool based on only age and weight, similar to the OSTA, described previously for Asian women. However, few studies have evaluated the role of OSTA in detecting men with low BMD, especially in Asia.

Quantitative bone ultrasound (QUS), with its good reproducibility, lack of radiation damage, noninvasiveness, and other significant advantages, is widely used in primary healthcare centers to evaluate bone health. However, there are potential sources of error for QUS measurements in vivo, including soft tissue thickness, temperature [13], and anthropometric parameters [14] that may lead to misclassification of individuals. Consequently, because of its inaccuracy QUS is not used as the gold standard method for diagnosis of osteoporosis. Nevertheless, QUS can be used as an alternative screening tool for osteoporosis, and several studies have confirmed that calcaneal QUS can predict BMD measurements as well as fracture risk [1518]. However, the diagnostic T value measured by QUS has not been uniformly defined as it varied in different populations in different studies [19]. Few studies have assessed the value of QUS in identifying low bone mass in Chinese males.

Early detection of a population at high risk for osteoporosis is the key to prevent and reduce the incidence of osteoporotic fractures. Therefore, more effort should be made to find available and effective methods for the prevention and early discovery of osteoporosis. The aim of our study is to validate and establish the best cutoff value for OSTA and QUS-T score, as well as the combination of these two in the diagnosis of osteoporosis among Chinese elderly men.

Materials and methods

Subjects and data collection

The study subjects were southeastern Chinese men recruited from the health checkup population of Zhongshan Hospital, Fudan University, from June 2012 to December 2013. These men, from different districts in Shanghai, were having an annual health checkup but were not hospitalized patients. As a growing city in China, Shanghai has a mixture of rural and urban populations, which present epidemiological patterns of age and gender distribution, income, consumer habits, and lifestyle considered to be similar to those of the general Chinese population. Sample size was determined by using the formula (\( n = \,\,(\,\,\frac{{u_{\alpha } }}{\delta }\,)^{2} \,(\,1 - p\,)\,\,p \)). We set α = 0.05, δ (admissible error) = 0.075 [20], and p referred to expected sensitivity or specificity of the diagnosing tool [12]. Thus, for studying the diagnostic value of OSTA, the number of subjects needed to achieve the objective of this study was 256:151 for the normal control group and 105 for the case group. For QUS, the numbers were 279:121 for the normal control group and 128 for the case group.

People who visited the health checkup department in Zhongshan Hospital were invited to participate by researchers who explained the purposes and procedures of the study. Participants were asked to fill in a comprehensive questionnaire regarding personal lifestyle, risk factors for osteoporosis, and personal health and family history. Inclusion criteria were men aged ≥ 60 years, with willingness to participate in the study and with the ability to read and provide the questionnaire and informed consent. Participants were excluded from the present analysis for at least one of the following reasons: those who had a history or evidence of metabolic bone diseases (Paget’s disease, osteomalacia, renal osteodystrophy); who suffered with bone-related tumor or bone metastases; those who had taken medications that affect bone metabolism such as steroids; those who had ever used any anti-osteoporosis drugs such as bisphosphonates or calcitonin; those who have severe liver or kidney impairment; those who have been recently bedridden for more than 3 months; and those who have had both hips previously fractured or replaced. People who met the criteria mentioned previously were scheduled for DXA and QUS examination.

During our study period, altogether 472 subjects met the criteria and agreed to participate. For the final analysis, only 323 participants completed QUS measurements because of unresolved technical difficulties. Reliable BMD of the lumbar spine and hip was available for all 472 participants.

This study was approved by the Ethics Committee of Zhongshan Hospital, Fudan University. All the study data and information were collected after participants gave informed consent.

Measurements

Anthropometric and DXA measurements were obtained for all participants during the same visit. Height and weight was measured without shoes in light indoor clothing using a stadiometer. OSTA index was calculated using the following algorithm: OSTA index = 0.2 × [weight (kg) − age (years)] (integer).

All DXA measurements were performed by a well-trained technician using DXA (Discovery A; Hologic, USA; with CV < 1 %) on left hip (femoral neck, trochanter, internal, Ward’s triangle, and total hip) and lumbar spine (L1–L4, L-Total). T score was obtained by comparison to white males (source: NHANES). We used the WHO diagnostic criteria for osteoporosis to classify our patients into three categories: osteoporosis (T score ≤ −2.5 SD), osteopenia (−1.0 ≥ T score > −2.5 SD), or normal (T score > −1.0 SD). The lowest T score at the femoral neck, internal, total hip, or lumbar spine was considered. T scores as assessed by DXA were used as the gold standard for diagnosing osteoporosis in our study.

Right calcaneal QUS measurement were taken using the Sahara clinical bone sonometer (Hologic) and two parameters (QUS-BMD and QUS-T score) were recorded. Daily calibration was done during the entire study period, and measurements were made according to the standard procedure by a trained technician.

Statistical analysis

Descriptive characteristics of the study population were tabulated as mean and SD, or proportions as applicable. Differences among groups of patients were calculated by analysis of variance (ANOVA). The Z test was used to compare the difference of area under the curve (AUC), and logistic regression analysis was used to fit a new variable (Y) based on OSTA index and QUS-T score. Significance was determined using a two-sided test at P level < 0.05. All statistical analyses were performed using SPSS version 20.0.

The receiver operating characteristic (ROC) curve analysis, which plots sensitivity against (1 − specificity), was performed to define the ability of OSTA index, QUS-T score, and the combination of these two to discriminate osteoporosis or low bone mass (include osteoporosis and osteopenia) as determined by a T score ≤ −2.5 or T score ≤ −1 (by lowest value at the femoral neck, internal, total hip, or lumbar spine, was considered). The discriminative performance of the two tests for selecting men with low BMD was compared using the AUC, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and likelihood ratio. Sensitivity was defined as the proportion of men with osteoporosis (T scores ≤ −2.5) who tested positive (i.e., having an index below the cutoff value), and specificity was defined as the proportion of men without osteoporosis who tested normal (i.e., having the index value above the cutoff value). PPV was defined as the proportion of men with a positive predictive result who are actual cases of osteoporosis and NPV was defined as the proportion of men with a negative predictive result who are actual non-cases of osteoporosis. The ratio between the probability of a positive test result given the presence of osteoporosis and the probability of a positive test result given the absence of osteoporosis is defined as positive likelihood ratio; the ratio between the probability of a negative test result given the presence of osteoporosis and the probability of a negative test result given the absence of osteoporosis is the negative likelihood ratio. Then, a best cutoff point that yielded the maximum value of Youden index (sensitivity + specificity − 1) was chosen as the borderline for detecting those with osteoporosis or low bone mass for each tool.

Results

Characteristics of the study participants at baseline and among different WHO status

A total of 472 men 60 or more years of age were eligible for the present study, of whom 323 had a QUS examination. The baseline characteristics of the study cohort are summarized in Table 1, including age, WHO status, BMD, and T score. The mean age of the study sample was 78.0 years (range, 60–97 years). The percentage of men with osteoporosis and osteopenia was 27.7 % and 53.2 %, respectively.

Table 1 Baseline characteristics [mean ± SD (range) or proportions (in percent)] of participants in the study

Table 2 summarizes the descriptive characteristics for different levels of BMD. The number of cases of osteoporosis at femoral neck, total hip, or lumbar spine was 55 (11.7 %), 22 (4.7 %), and 108 (22.9 %), respectively. The WHO diagnostic status was associated with significant differences regarding height, weight, body mass index (BMI), QUS-BMD, and QUS-T score (P < 0.001), but not with age.

Table 2 Summary of descriptive characteristics among three WHO statuses in our study (n = 472)

Discriminatory performance of OSTA or QUS-T score alone for predicting osteoporosis at different sites (Table 3; Fig. 1)

All 95 % confidence intervals (CIs) of AUC were greater than 0.5, which suggested that both tools had enough discriminative power to select men with T score ≤ −2.5 at different sites, and the AUC for OSTA and QUS-T score were not significantly different from each other (Fig. 1). The best cutoff for OSTA and QUS-T score for predicting any site of osteoporosis (either femoral neck, or internal, or total hip or lumbar spine T score ≤ −2.5) were −3.5 and −1.25, had a sensitivity of 47.3 % and specificity of 76.8 % for OSTA, and 80.4 % sensitivity and 59.7 % specificity for QUS-T score (Table 3). In detecting men with osteoporosis, OSTA had better specificity than sensitivity, whereas QUS had better sensitivity than specificity.

Table 3 Discriminatory performance of OSTA index and QUS-T score for predicting osteoporosis at different sites
Fig. 1
figure 1

Receiver operating characteristic (ROC) curves for Osteoporosis Self-Assessment Tool for Asians (OSTA) index, quantitative bone ultrasound (QUS)-T score, and the combination of these two methods (Y) for detecting osteoporosis at femoral neck (a), total hip (b), lumbar spine (c), and any site (d)

Both tools had the greatest performance at predicting total hip osteoporosis, yielding an AUC of 0.787, sensitivity of 81.8 %, and specificity of 72.7 % for OSTA and an AUC of 0.883, sensitivity of 82.4 %, and specificity of 86.6 % for QUS-T score (Fig. 1b). NPV for both methods for identifying osteoporosis at different sites was quite high, ranging from 79.2 % to 98.9 %, which meant that OSTA and QUS-T score did well in selecting men with low risk and avoiding unnecessary DXA tests.

Performance of different diagnostic points based on the predefined best cutoff of OSTA and QUS-T score in detecting men with any site osteoporosis

We set OSTA ≤ −4 and QUS-T score ≤ −1.25 as the cutoff points, and these points had a sensitivity of 36.1 % and specificity of 92.9 % in identifying men with either femoral neck, internal, total hip, or lumbar spine T score ≤ −2.5 (Table 4). We then analyzed the performance of using either OSTA ≤ −4 or QUS-T score ≤ −1.25 as the cutoff points in detecting men with osteoporosis: a high sensitivity (88.7 %) and low specificity (46.5 %) were observed.

Table 4 Characteristics of different diagnostic points based on OSTA and QUS-T score for predicting osteoporosis at any site

Performance of the combination of OSTA and QUS-T score in detecting men with osteoporosis

We fitted a new variable (Y) based on the OSTA index and QUS-T score by using SPSS 20.0 (via logistic regression analysis) with diagnosis (osteoporosis and controls) as the dependent variable and OSTA index and QUS-T score as the independent continuous variable, logit(P) = −2.553 − 1.088 × QUS-T score − 0.111 × OSTA index, to see whether this new variable possessed better advantages in computing efficiency and accuracy. The AUC of the ROC analysis for Y for identifying subjects with T ≤ −2.5 at any site was 0.771 (95 % CI, 0.716–0.825), which was significantly greater than that of OSTA (0.676; 95 % CI, 0.612–0.732, P = 0.008) but was not better than QUS-T score alone (0.762; 95 % CI, 0.706–0.818, P = 0.33) (Table 5; Fig. 1). Comparison of AUC was also made between Y and OSTA index or QUS-T score alone for detecting subjects with T ≤ −2.5 at specific sites (femoral neck, total hip, or lumbar spine); however, no significant difference were seen (except that between Y and OSTA index, AUC 0.652 vs. 0.752 at predicting osteoporosis at lumbar spine). Numerically, we found that both sensitivity and specificity were optimized in indentifying osteoporosis at different sites by combining OSTA with QUS-T score.

Table 5 The characteristics of OSTA index combined with QUS-T score (Y) for predicting osteoporosis at different sites

Discussion

With the increase of life expectancy and changes in lifestyle, osteoporosis will become an even more serious public health issue. Simple risk assessment tools provide a quick and inexpensive way for detecting individuals at risk of osteoporosis and fractures. Such tools are quite useful in communities where BMD measurement is inaccessible and costly. Currently available noninvasive diagnostic techniques for detecting osteoporosis and predicting fracture risk consist of osteoporotic risk assessment instruments such as the OSTA and QUS. The performance of such tools had been reported, but only a few studies involved the Chinese male populations.

The research we have done validated the usefulness of OSTA and QUS in detecting osteoporosis subjects among Chinese men. The results suggested that the OSTA index and QUS-T score were useful in identifying males with low BMD, and DXA testing resources can thus be better allocated to those most likely to benefit. Such tools could also be an alternative method for diagnosing osteoporosis in areas where DXA measurement is unavailable. We analyzed the value of OSTA and QUS for identifying men with low bone mass in Chinese males aged 60 or over; mean age was 78.0 years, and the prevalence of osteoporosis was quite high in our population (27.7 %), suggesting the need for reliable screening tools for the detection of men at risk for osteoporosis.

Different studies have sometimes applied different cutoffs for OSTA [2124], making it difficult to compare OSTA with other screening tools. Information on their utility in different populations is particularly important to establish the generalizability of these approaches and to assure their validity in clinical practice as applied in different clinical and epidemiological settings [25]. In the present study, we used the Youden index to select a primal cutoff for OSTA. The best cutoff of OSTA for predicting BMD, T score ≤ −2.5, was consistent at different sites (either femoral neck or total hip, or lumbar spine), all were −3.5. It is worthwhile to note that the original published cutoff of OSTA was ≤ −1, based on femoral neck BMD of a cohort of 860 women in eight Asian countries [7]. However, this study suggested a much lower point, OSTA ≤ −4 (as OSTA was expressed in integrals). The discrepancy of this may be explained by demographic differences in the samples: the original study was based on a population of younger women (mean age, 62 years), whereas our study was done among men aged 60 years and older. This finding indicated that the best cutoff of OSTA may vary with age and male groups. Additionally, the original OSTA study used sensitivity to select the best cutoff, whereas our study adopted the conventional method by giving attention to the Youden index, which considered both sensitivity and specificity.

In addition to OSTA, QUS has also been reported to be useful to determine which patient should be tested using DXA and for prediction of fracture risk in postmenopausal women [24, 26, 27] and in men [12, 28, 29]. However, most of these investigations used broadband ultrasound attenuation (BUA) or quantitative ultrasound index (QUI) instead of calcaneal QUS-T score to detect subjects with low bone mass. Also, the limited studies showed a different cutoff value of QUS-T score in detecting a high-risk population of osteoporosis [19]. The best cutoff of QUS-T score for predicting osteoporosis at femoral neck in our current study was −1.15, which was similar to the study of Kung et al. [12]. The best cutoff of QUS-T score for predicting osteoporosis at any site in our study was −1.25; to our knowledge, there were no previous data about the best cutoff of QUS-T score in detecting Chinese elderly males with osteoporosis at any site.

The low PPV for both tools in our study suggested that men with low BMD would be misclassified and subject to referral to an unnecessary DXA examination as a result of the high false-positive rate by using OSTA or QUS-T score. In contrast, we showed that the high NPV (range, 79.2–98.9 %) for both OSTA and QUS-T score implied that such tools could accurately rule out subjects with low risk of osteoporosis and reduce unnecessary medical procedures and increase efficiency, something especially pertinent for healthcare providers with tight budgets or in areas where patients have to pay for the DXA test.

The role of OSTA combined with QUS in improving estimates of low BMD had been mentioned in previous studies, and similarities in our results were noticed. In analyzing the performance of combination OSTA index ≤ −4 with QUS-T score ≤ −1.25 in identifying males with osteoporosis at either proximal femur or spine, we found that such diagnostic cutoff points yielded sensitivity of 36.1 % and specificity of 92.9 %. The high specificity indicated that few of those high-risk individuals would be falsely subjected to final examination by BMD; the low sensitivity suggested that quite a large proportion of subjects would be falsely excluded from DXA testing, resulting in a considerable portion of patients with osteoporosis underdiagnosed. Conversely, when we determined the usefulness of combining both methods by studying subjects with either OSTA ≤ −4 or QUS-T score ≤ −1.25, a relatively high sensitivity but low specificity was seen, which indicated that using either OSTA index ≤ −4 or QUS-T score ≤ −1.25 as the cutoff points will achieve a low miss diagnosis rate, but at the cost of a high misdiagnosis rate, resulting in a certain number of patients being subjected to an unnecessary DXA test.

For an aid to diagnosis test, one would likely to simply choose the test with the best sensitivity and specificity or AUROC curve. Actually, sensitivity and specificity are usually inversely related [23]. We can improve the sensitivity or specificity by adjusting the boundary values, but they cannot increase at the same time. Our current study showed that OSTA had better sensitivity than specificity in detecting a high-risk osteoporotic population, whereas QUS-T score is the opposite. Although the combination of OSTA and QUS mentioned here had somewhat better sensitivity or specificity than OSTA or QUS alone in identifying men with low BMD at any site, the discriminative power seemed to be attenuated by the unacceptable losses in terms of specificity or sensitivity accordingly. From this situation, we wondered whether a new variable matching OSTA and QUS-T score (let us define it as Y) would have better performance. Thus, we accessed the performance of Y in predicting men with low BMD at any site and found that the AUC of Y was greater than OSTA alone and that sensitivity and specificity were optimized to a more reasonable level.

Similar to other studies, our study has certain limitations. First, the sample was not large enough to make definite conclusions, and the relative limited representativeness of the study population, which make the results do not generalize quite so strongly to others. However, the findings of our current study advanced our knowledge of the usefulness of OSTA and QUS in the diagnosis of osteoporosis. The practical application of these tools in facilitating clinical decisions should be explored further, and a larger population-based study would be valuable to assure the scientific reliability of our findings. Second, the skewed distribution of population age in our cohort, with a high proportion of men aged >70 years (77.8 %), may lead to nonsignificant difference of age among the three WHO statuses (osteoporosis, osteopenia, and normal), and as a consequence, limit the discriminative power of OSTA (AUC 0.676, 95 % CI, 0.612–0.732, sensitivity 47.3 %, specificity 76.8 %) in identifying men with osteoporosis because age made half the contribution to the OSTA index. Furthermore, body weight was measured at a single time point, which may not reflect the true long-term weight of a subject. However, such a limitation is present in any study of this type, and the effects are likely to be minor.

In conclusion, this simple and noninvasive method, combining OSTA and QUS, may have some ability to identify subjects with high risk of osteoporosis when used alone. The combination of OSTA and QUS can help find the populations at high risk for osteoporosis and facilitate the appropriate use of bone densitometry, which could be an alternative method for diagnosing osteoporosis, especially in areas where DXA measurement is not accessible, although such tools are not the “gold standard” method.