Introduction

Osteoporosis (OP) is a metabolic disease of the skeletal system, defined as low bone density and susceptibility to fracture [1]. Osteoporosis is a silent disease and symptoms of pain and fractures occur in advanced stages [2]. The fractures impose a remarkable burden and cost on individuals and society each year [3] and lead to increased mortality [4].

In Iran, it has been reported that 50% of men and 70% of women over the age of 50 have low bone mineral density [5]. According to the previous reports, approximately 0.85% of the global burden of hip fractures and 12.4% of the burden of hip fractures in the Middle East are contributed to Iran [6]. With growing life expectancy, the prevalence of OP is increasing; especially among women. It has been estimated that about 1 out of 3 women aged 50 and over will experience osteoporotic fracture during their lifetime [2]. Osteoporosis is four times higher in postmenopausal women compared with men [7], and this difference is attributed to hormonal changes in postmenopausal women [8].

The principal goal in osteoporosis management is to reduce the risk of fractures. Hence, the ability to assess fracture risk in identifying patients who are eligible for intervention is essential [9]. Nowadays, dual-energy X-ray absorptiometry (DXA) is the preferred method for the evaluation of bone mineral density (BMD) and predicting fracture risk [10]. However, it is not recommended for screening OP in the whole population over the age of 50 [11]. Moreover, densitometry examination is costly as well as limited access to DXA in some areas. Consequently, to identify women at risk for osteoporosis/fracture and to optimize the use of bone densitometers, some pre-screening tools have been established [12, 13]. To date, several osteoporosis/fracture prediction models have been developed, and most of them are based on the non-Asian population; however, only some of them have been validated. Some valid developed models for the screening of osteoporosis are Osteoporosis Risk Assessment Instrument (ORAI) with 94.4% sensitivity and 41.4% specificity[14], and Osteoporosis Self-Assessment Tool for Asians (OSTA) with 91% sensitivity and 45% specificity, at their original populations[15]. The Fracture Risk Assessment Tool (FRAX), is a computer tool developed by the the University of Sheffield that estimates the probability of 10-year hip fracture (HF) and major osteoporotic fracture (MOF), based on information about a person’s clinical risk factors with or without BMD measurements [16].

Matin et al.[17] from Iran conducted a study to develop a tool to identify patients at risk of osteoporosis who can benefit from the use of DXA scans. The final model of the Osteoporosis Prescreening Model for Iranian Postmenopausal women (OPMIP) was developed based on 7 variables with 73.2% sensitivity and 61% specificity. However, as their study location was in a referral center in Tehran, their results might not be illustrative for the whole Iranian population.

In this study we aimed to assess and compare different osteoporosis/fracture risk assessment tools in an Iranian population of postmenopausal women.

Materials and methods

Study participants

This study was performed using the data of the Bushehr Elderly Health (BEH) program, a population-based cohort study on the elderly population aged 60 years and over [18]. The second stage of this study focused on musculoskeletal disorders in the elderly population with special attention to osteoporosis and sarcopenia. The details of the study were reported elsewhere [19]. Inclusion criteria for BEHP study were, residence in Bushehr province at least one year before entering the study, not having a plan to leave Bushehr within two years after entering the study, and full consent to participate in the project. Moreover, participants must have sufficient physical and mental abilities to move and refer to the interview site. This study was conducted on the women population of BEHP study.Those women who were taking any medication for osteoporosis treatment, were not considered for the current study.

Measurements and definitions

A comprehensive questionnaire was used to collect information on lifestyle and behaviors, medical history, and medication use. Interviews were performed by a trained interviewer.

Height was measured using a tape in a standing position without shoes with a sensitivity of one centimeter, and weight was measured by a standard digital scale the barefoot. Waist circumference measurement was done using a flexible tape at a point midway between the iliac crest and the last rib in a standing position. The presence of diabetes mellitus was defined as fasting plasma glucose (FPG) ≥ 126 mg /dl or glycosylated hemoglobin (HbA1C) ≥ 6.5%, or treatment with hypoglycemic drugs in patients with a history of diabetes. BMI was calculated by dividing weight (kg) by height squared (square meters). Fracture history was defined as fracture with minor trauma after age of 45. Smoking was considered as currently using cigarette or hookah at the study time. Corticosteroid use and hormone therapy were described based on the self-declaration of the participants. Corticosteroid dosage in the models was 5 mg of prednisolone or its equivalent use for more than 3 months. Hormone replacement therapy (HRT) is used according to specific guidelines in post-menopausal women, and the questionnaire was filled out as Yes/No whether or not an individual was receiving any HRT. Physical activity level was evaluated by a validated self-report questionnaire [19].

Osteoporosis diagnosis

BMD status at the lumbar spine (L1-L4), femoral neck, and total hip were evaluated by applying dual X-ray absorptiometry (DXA Hologic Discovery WI (S/N 88,102), Bedford, Virginia, USA). Osteoporosis was defined as a T-score of − 2.5 or lower in at least one site reported.

Included screening models

Six osteoporosis risk assessment tools including ORAI [14], Malaysian Osteoporosis Screening Tool (MOST) [20], osteoporosis prescreening risk assessment (OPERA) [21], OPMIP [17], Osteoporosis Index of Risk (OSIRIS) [22] and OSTA [15] were included in the study. The characteristic of the six risk assessment tools is presented in Table 1. In summary, information about the models consisted of weight, height, corticosteroids, hormone replacement therapy, history of minor trauma fractures, duration of years post-menopause, diabetes mellitus, physical activity, and hip circumference. The original definitions of the variables of each model were used to determine the risk of osteoporosis.

Table 1 Characteristics of the seven osteoporosis/fracture risk assessment tools

The FRAX tool calculates the probability of age-specific fractures based on information from a person’s clinical risk factors. FRAX outcomes include a 10-year probability of a major osteoporotic fracture (pelvic fracture, clinical spine, arm, or wrist) and a 10-year probability of a pelvic fracture [23]. Considering the common use of the FRAX tool in clinical practice, we also assessed the performance of the model in the recommended cut-off points.

Ethical considerations

Research Ethics Committees of both Tehran University of Medical Sciences and Bushehr University of Medical Sciences approved the protocol of the BEHP study. Written informed consent was signed by all the participants before recruiting for the study.

Statistical analyses

Quantitative variables were descripted as mean ± standard deviation or median (interquartile range) and qualitative variables were expressed as numbers (percentage).

The assessment of risk values and classification of individuals into high- and low-risk groups was done using the variables in each model and with the scoring system provided in the main articles. Considering the values of bone density as a gold standard and classifying the studied women into two groups, normal/osteopenic or osteoporotic, false- positive and - negative values and true -positive and -negative were determined. Consequently, the performance measurement criteria of diagnostic tests such as sensitivity, specificity, positive predictive value, and negative predictive value for each model were calculated. Sensitivity was defined as the proportion of women with osteoporosis (T scores ≤–2.5 in either femoral neck, total hip, or lumbar spine) who tested positive on the risk assessment tool (i.e., having index values in the range considered increased risk); specificity was defined as the proportion of women without osteoporosis (by BMD according to WHO definition) who tested normal on the risk index assessment (i.e., index values in the range considered low risk). Youden’s J statistic index which is often used in conjunction with receiver operating characteristic (ROC) curve analysis [24], was calculated for each model and the models were compared. The value of the Youden’s J index is between zero and one. A value of 1 indicates that there are no false positives or false negatives; thus the test is complete [25]. The ROC curve was plotted for each model. The ROC curve is an optimized model that shows the performance of binary classifiers considering the specificity, sensitivity, and accuracy of the model. Positive predictive value, as the ratio of osteoporosis patients truly diagnosed as positive to all those who had positive test results, and negative predictive value, as the probability that individuals with a negative screening test truly don’t have the osteoporosis, were also calculated. Considering the importance of false positives and negatives, as well as the ease and risk of diagnostic testing, sensitivity of ≥ 70%, specificity of ≥ 40%, and AUC of at least 60% was defined as acceptable model performance among the study population. We also combined the appropriate model using the parallel approach to find if such combinations can improve the model performance to identify high risk individuals.

All analyzes were performed using STATA statistical package (version 14) and a significance level was considered as a p-value < 0.05.

Results

The baseline characteristics of the study participants are presented in Table 2. A total of 1237 female participants with a mean age ± standard deviation of 69.1 ± 6.3 years (age range, 60–94 years) were included. Overall, 733 cases (59.2%) were osteoporotic based on BMD measured at either site. About 80% of patients had no history of fractures, and approximately 75% of participants had low physical activity.

Table 2 Baseline characteristic of study participants

Model performances at their recommended cut-points among the study population

The sensitivity of the seven models ranged from 16.7% (OSIRIS) to 100% (ORAI and MOST) at their recommended cut-off points (Table 3). Considering the Youden index, both OPERA and FRAX models had the optimal performance among the study population. The FRAX had a sensitivity of 78.1%, specificity of 45.6%, and a positive predictive value of 67.6%. The OPERA model had a sensitivity of 76.6% and 46% specificity. Excluding ORAI and MOST, the PPV of the four remaining models ranged from 32 to 80%, whilst the corresponding NPV ranged from 29 to 68%. All models yielded an AUC of ≥ 0.6. (see Fig. 1). The sensitivity, specificity, PPV, NPV, and AUC of the included screening tools are presented in Table 3.

Table 3 Performance of different osteoporosis/fracture screening models at their recommended cut points among postmenopausal study participants
Fig. 1
figure 1

The ROC curve for the performance of different osteoporosis/fracture screening models in Iranian postmenopausal women

Combined model performances at their recommended cut points among the study population

At the next step, the models with the appropriate performance were combined using the parallel approach and high risk individuals were considered as participants who are positive based on each model. The sensitivity, specificity, PPV, and NPV of the combined screening tools are presented in Table 4.

Table 4 Performance of combined osteoporosis/fracture screening models among study participants

Four clinical risk factors were overlapped in the OPERA and FRAX models (age, weight, history of fractures, and steroid use), and the combination of the two models consisted of 12 clinical risk factors. This approach resulted in a sensitivity of 85.4%. On the other hand, the number of false-negative cases in the OPERA and FRAX were 171 and 160, respectively, which after combining was reduced to 107 patients.

On the other hand, since the OSTA model had the appropriate specificity, AUC, and Youden index, it was combined with the OPERA and FRAX. After a combination of OSTA and FRAX, the sensitivity of OSTA raised from 57.2 to 83.3%. Moreover, the false-negative number of the OSTA model was reduced from 313 to 122.

Two risk factors (age and weight) were common in the combination of OSTA and OPERA models, and the combination of the two models consisted of five clinical risk factors. The sensitivity of the new model increased to 84.8%, and the false- negative number was reduced to 111.

All combined models had optimal performance (sensitivity above 80%), and there was no statistically significant difference among them.

Discussion

During the last decades several risk-assessment tools have been developed to reduce the cost burden of unnecessary bone densitometry. These tools are easy to perform with low cost [26], especially in rural regions where access to DXA scan is restricted. In this cross-sectional study, we evaluated the performance of seven valid osteoporosis/fracture risk assessment tools in 1237 postmenopausal women to identify the appropriate model for osteoporosis/fracture diagnosis in our population. About 59% and 37% of our study population had osteoporosis and osteopenia, respectively, through DXA examination. The sensitivity of the seven models ranged from 16.7% (OSIRIS) to 100% (ORAI and MOST) at their recommended cut-off points. Considering the Youden index, the FRAX and OPERA yielded the optimal performance with the sensitivity of 78.1% and 76.5%, respectively. Moreover, after combining the models (FRAX, OPERA, and OSTA), the sensitivity of the models increased to more than 83%. Applying these tools to different populations has yielded various performance outcomes. The results of some studies showed high sensitivity and optimal performance[27], while others represented lower indices[28]. These discrepancies can be explained by the fact that these tools are created in different demographic samples.Unlike other screening models, the FRAX tool is a country-specific model with its intervention threshold. The World Health Organization (WHO) recommends the use of country-specific intervention thresholds based on the incidence of pelvic fractures and their demographics. One of the benefits of the FRAX tool, as stated in the U.S. preventive services task force (USPSTF) recommendation, is that it “relies on accessible clinical data”. Its development is supported by a broad international corporation and is widely endorsed in the two major United States groups and is freely available to physicians and the general public[29].

In line with our findings, Chen et al. [30] from Taiwan examined the performance of nine osteoporosis/fracture screening tools (including FRAX) in 553 individuals over the age of 60 years (357 women). The mean age of the participating women was 67.1 years and 23.7% of the subjects had a T-score of less than − 2.5 in the femoral neck. The researchers found that with a cut-off point of ≥ 3%, the FRAX model had acceptable diagnostic accuracy (AUC = 0.75), the sensitivity of 80%, and a specificity of 54%.

On the other hand, in a 3-year population-based study conducted by Rubin et al. [31] from Denmark, the performance of FRAX tools (without BMD) compared to simpler screening tools such as OST, ORAI, OSIRIS, and SCORE was evaluated among 3614 women aged 40–90 years. There was no difference in AUC values ​​between FRAX and simpler instruments; the AUC values ​​of the FRAX tool were 0.701 (all fractures) and 0.722 (major osteoporotic fractures). The researchers concluded that simpler models based on lower risk factors, which are easier to use in the clinic by the general physicians or the patient themselves, could well be used as well as FRAX to identify women at increased risk for fractures.

OPERA was another model with optimal performance in our study population. Consistent with our results, Sadiq et al. [32] from Pakistan evaluated the OPERA performance on 200 women above 40 years and found that OPERA showed a sensitivity of 68% among their study population. On the other hand, Mohammad Abou-Hashem et al. [33] found poor performance with a sensitivity of 5.5% regarding OPERA. This inconsistency can be justified by the difference in mean ages, wights, and sample sizes between the above studies. OPERA was originally developed in 2005 by Salaffi et al. [21] in a group of Italian post-menopausal women and attained 88.1% sensitivity at the femoral neck to 90% at the lumbar spine area. It is based on five variables, including age, weight, premature menopause, previous history of fracture with minor trauma, and steroid consumption to predict low BMD.

As mentioned earlier, an effective screening tool that categorized patients at high risk for osteoporosis can reduce the burden of DXA examination. These algorithms were developed using data from Western populations, so they may not be suitable for use among Asian populations due to differences in genetics, lifestyle, and environmental factors.

In the present study, the highest specificity (79%) was related to the OSTA model with a Youden index of 0.361, which indicates that this model had low false positives and had a higher power in finding normal people. This simplest model, which is based on two variables of age and weight, was developed for the Asian population, which in the main study had a sensitivity of 91% and a specificity of 45% [15].

Since the FRAX tool has been calibrated and is commonly applied in Iran, we tried to combine it with two models, OPERA and OSTA to investigate the possibility of upgrading it,. The FRAX model consists of 11 clinical risk factors, OPERA consists of 5 clinical risk factors and OSTA consists of only 2 clinical risk factors of age and weight. After combining the FRAX-OPERA model, the number of risk factors increased to 12 and the sensitivity of FRAX increased from 78 to 85%. Besides, the false- negative number of the FRAX model was 160, which was reduced to 107 after the combination with OPERA, indicating that misdiagnosis of the disease would occur in a smaller number of real patients. Moreover, after the FRAX-OSTA combination, the number of the required risk factors was 11 and the sensitivity increased to 83%. False negatives were also reduced to 122. On the other hand, the combination of the OSTA-OPERA models also had only 5 clinical risk factors and had a sensitivity of 85%, which reduced the number of false negatives to 111. Therefore, combining both OPERA and OSTA models with the current FRAX model can be resulted in identifying high-risk populations for osteoporosis.

The present study was performed in a large population of postmenopausal women, and it was the first study that evaluated different osteoporosis/fracture risk assessment tools among Iranian population. However, there were some limitations regarding this study. As it was perfomed in population of a specific geographical rigion, the reults may not be possible to generalize to all Iranian women. Nevertheless, this study continues to add to the knowledge of a risk assessment tool suitable for the Iranian postmenopausal women. Further studies using larger, and population-based data may develop different models and calculate external validation.

Conclusion

In conclusion, it was observed that FRAX (model with 11 simple variables) and OPERA (model with 5 simple variables) had the best performance among the study population. Moreover, after combining the models, the performance of each model was improved.