Introduction

Osteoporosis is a common disease among senior citizens in Taiwan. In individuals aged 50 and above, the prevalence of osteoporosis is 23.9% in men and 38.3% in women [1]. Osteoporosis is characterized by low bone mass and microarchitectural deterioration of bone tissue. The destruction of scaffold would cause compromised bone strength and thus lead to unwanted events, such as increasing the risk of fractures after minor traumas [2, 3]. Osteoporotic fractures were often observed in certain body parts, such as hips and lumbar spine. These fractures, if treated inappropriately, usually lead to immobility, and consequently a worsen quality of life. Therefore, an early diagnosis and in time treatment for osteoporosis to prevent the vicious deterioration are both clinically and socially essential. Bone mineral density (BMD), measured by dual-energy X-ray absorptiometry (DXA), is the current gold standard of diagnosing osteoporosis. An examination of DXA is recommended for women 65 years and older and postmenopausal women younger than 65 years with increased fracture risk by US Preventive Services Task Force (USPSTF) [4]. For men 70 years and older and men age between 50 to 70 with increased fracture risk, an examination of DXA is recommended by The Endocrine Society guidelines for Osteoporosis in Men [5, 6]. The Taiwanese Osteoporosis Association (TOA) also incorporated recommendations from USPSTF for suggesting candidates to receive BMD measurement [7]. However, in remote rural areas, there is a higher proportion of elderly residents, and many of their children work elsewhere. Furthermore, due to inadequate transportation infrastructure, limited access to healthcare for the elderly has resulted in more significant health problems. These examples highlight the importance of osteoporosis screening tools (OSTs), which incorporates several clinically accessible factors, to predict the risk of osteoporosis, particularly when DXA scanners are not easily accessible in rural areas. These tools are extremely helpful in remote regions or rural communities and to enhance the citizens’ health awareness as a preliminary assessment [8,9,10]. However, study concerning the performance of OSTs in Taiwan, where citizens’ body stature is different due to ethnicity and traditional herbs containing steroid is popular [11, 12], is limited. In the present study, we aimed to validate 10 existing osteoporosis screening tools OSTs in rural communities in Taiwan [13,14,15,16,17,18,19,20,21,22,23]. Our goal was to identify a user-friendly OST that performs well for both genders and to determine the best cut-off value for identifying individuals who need further DXA scanning.

Materials and Methods

Guidelines

This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for observational studies [24]. The Research Ethics Committee of the National Health Research Institutes has approved the study protocol (NTUH-REC No.: 202106076RIND).

Study Design and Sample Population

This prospective, cross-sectional analysis included participants with age ≥ 50 years from 31 communities in Taiwan, to determine the most suitable OSTs for Taiwanese senior citizens. Recruitment, evaluation, and intervention were performed in congregated meal services (CMS) centers in the communities of Yunlin County, a rural area in Taiwan. CMS offers lunch to the elderly at an affordable price in community centers close to their homes. Approximately 150 CMS centers within Yunlin County cater to the needs of nearly 3,000 senior citizens every day. All community-dwelling residents aged 50 years or older who participate in CMS are considered eligible. All needed clinical factors, including demographic data, body stature, history of low energy fractures, parental history of hip fractures, whether actively smoking or not, alcohol use, glucocorticoid therapy, diabetes mellitus, thyroid disease, and rheumatic arthritis (Supplement Table 1), were collected through a comprehensive questionnaire and their BMDs of lumbar spine, bilateral femoral neck, and bilateral hip were determined by a mobile DXA. The execution of DXA was carried out entirely in accordance with ISCD (International Society for Clinical Densitometry) guidelines [25]. Low energy fractures could be defined as a fracture caused by a trip, slip, or fall from a standing height [26]. We employ the Hologic Discovery Wi Bone Densitometer (Hologic Inc, Bedford, MA) for assessing bone mineral density, utilizing the NHANES III database for T-scores, which represents an international and Taiwan-endorsed consensus [27]. The results of BMD were converted into T-scores, using the average BMD of a young-adult population as reference [28]. Following the WHO criteria [29], we defined those with a T-score less than -2.5 at any examined body site (i.e., femoral neck, total hip, or lumbar spine) as osteoporosis. A total of 635 senior citizens from 31 different community care stations responded to our questionnaire during September 2021 and April 2022. We excluded 68 responses which were blank or with several missing items, leaving 567 responses enrolled for analysis. The percentage of the response rate was 89.3%.

Table 1 Demographics of study cohort

Osteoporosis Screening Tools

Among various risk assessment tools, only few of them have been validated in external studies. Based on previous studies, we included 10 existing valid and reliable OSTs which have been externally validated [30]. The 10 OSTs included in the study were Fracture Risk Assessment Tool (FRAX-Major, FRAX-M) [13,14,15], FRAX-Hip (FRAX-H), Simple Calculated Osteoporosis Risk Estimation (SCORE) [16], National Osteoporosis Foundation Score (NOF) [17], Osteoporosis Prescreening Risk Assessment (OPERA) [18], Osteoporosis Index of Risk (OSIRIS) [19], Osteoporosis Risk Assessment Instrument (ORAI) [20], Age, Bulk, One or Never Estrogen (ABONE) [21], Osteoporosis Self-Assessment Tool for Asians (OSTA) [22], and Body weight criteria (BWC) [23]. The suggestive cut-off values were based on previous studies [13,14,15,16,17,18,19,20,21,22,23]. Noteworthily, FRAX was originally developed for predicting fracture risk rather than for screening osteoporosis [31]. Though being as a fracture risk assessment tool, FRAX was included in this study to evaluate whether it also serves as a good OST in Taiwan. FRAX was utilized without entering the data of BMD in this study. The comparison and calculation of the included OSTs were listed in Supplement Table 1 and Supplement Table 2.

Table 2 Comparison of screening tools and validity in females

Statistical Analysis

Discrimination analysis, measured by the area under the receiver operating characteristic curve, was performed to evaluate the performance of the 10 OSTs [13,14,15,16,17,18,19,20,21,22,23]. An AUC value of 1 indicates the best discriminatory ability while of 0.5 suggests no better discrimination than a random guess. An AUC value of at least 0.7 is required to be fair or clinically acceptable [32]. Youden method was applied to find the best cut-off value in this cohort [33,34,35]. Sensitivities, specificities, positive predictive values (PPVs), and negative predictive values (NPVs) of the 10 OSTs using both previously suggestive cut-off values and the best cut-off values in this cohort were also applied. All statistical analyses were conducted by SPSS ver. 25 and Microsoft Excel ver. 16.66.

Results

Patient Characteristics

Among the 567 included senior citizens, 107 were males with mean age of 75.1 ± 8.7 years; the other 460 females with mean age of 74.7 ± 8.7 years. Seventy-nine females (17%) experienced low-energy fracture after age 40 years, 22 (4.8%) had parental hip fractures, and 12.6% had menopause before 45 years old. Nearly half of the females with early menopause had received estrogen for more than 6 months. Nine male senior citizens (8.4%) experienced low-energy fracture after 40 years old, 7 (6.5%) had parental hip fractures, and had a relatively high proportion of current smoker (9.3%) and alcohol use (5.6%) comparing with females. Out of the surveyed individuals, 27 males (25.2%) and 123 females (26.5%) were diagnosed with diabetes. Additionally, four males (3.7%) and 45 females (9.8%) had thyroid disease. Furthermore, rheumatic arthritis was present in one male (0.9%) and 11 females (2.4%). Twelve percent of females and 6.6% of males had been diagnosed osteoporosis, but only two males (1.9%) and 19 females (4.1%) were under osteoporosis treatment. There are also 353 patients taking medications for high blood pressure or sleeping pills, which can easily lead to dizziness and result in falls. The DXA examination revealed 63.0% of females and 22.4% of males having osteoporosis (Table 1).

Clinical Prediction Outcome in Female Patients

The performance of the 10 OSTs using recommended cut-off values in females was provided in Table 2. Most of the tools had an AUC value ranged between 0.60–0.70. OSIRIS and OSTA presented the best AUC value with 0.71 (0.66–0.76) and 0.70 (0.66–0.75). The sensitivity of OSTs in females ranged from 29.3% to 99.7%, with most between 85.0% and 100%. The PPV results of all 10 OSTs exceed 60.0%, and ranged from 63.8% to 78.5%. When we used adjusted thresholds based on Youden’s cut-offs (Table 3), the sensitivity of OSTs in females ranged from 64.1% to 88.3%, with most between 60.0–80.0%. FRAX-H, SCORE, NOF, OSIRIS, ORAI, ABONE and BWC had a significant increase in specificity. NOF, OPERA, OSIRIS, OSTA, and BWC had acceptable sensitivity and specificity that both exceed 60% by using Youden’s cut-offs.

Table 3 Operating characteristics of each test at optimum cut-off based on Youden's Index among females

Clinical Prediction Outcome in Male Patients

The performance of the 10 OSTs using recommended cut-off values in males is provided in Table 4. BWC had the best AUC value of 0.77 (0.67–0.86), followed by OSTA, SCORE, and OSIRIS. Seven out of the 10 OSTs, namely SCORE, NOF, OSIRIS, ORAI, ABONE, and BWC, had a sensitivity higher than 90%. The NPV of the OSTs in males all showed outstanding results (77.7% −100%). While using Youden’s cut-offs as adjusted thresholds (Table 5), FRAX-M and OSTA had a significant increase in sensitivity, whereas FRAX-H, SCORE, NOF, OSIRIS, ORAI, ABONE, and BWC had an increase in specificity. The NPV of the OSTs in males under the Youden’s cut-offs were excellent, ranging from 82.6% to 97.7%.

Table 4 Comparison of screening tools and validity in males
Table 5 Operating Characteristics of each test at optimum cut-off based on Youden's Index among males

Discussion

Our study validated the 10 OSTs in not only females but also males. The results provided favorable options of OSTs for both females and males in rural communities of Taiwan. The performance of OSTs in males were found not inferior to that in females. In view of AUC value, only OSIRIS and OSTA showed acceptable performance in females, while four OSTs, namely SCORE, OSIRIS, OSTA, and BWC, showed acceptable performance in males. Considering the practicality, the status of OSIRIS and OSTA could be promoted due to their fair performance in both females and males. Possible factors that may affect the validation of OSTs in different cohorts were investigated. The study provided a reference for future application and development of OST in rural area of Taiwan.

The Epidemiological Characteristics that Affect the Validation Results

We suggested that the prevalence of osteoporosis affect the performance of OSTs. In present study, the AUC value of OSTs in females ranged from 0.61 to 0.71, which were not as satisfying as their development studies and some other studies of external validation [27]. For instance, SCORE had an AUC value of 0.77 in the development study. The development study of ORAI also had an AUC value of 0.80. Except for OSTA, these OSTs are primarily developed and validated for Caucasian postmenopausal females. Data from the National Health and Nutrition Examination Survey from 2017 to 2018 (NHANES 2017–2018) reported that in the United States, the prevalence of osteoporosis at either the femur neck or lumbar spine or both among females aged 50 and over was 19.6% [36]. However, the Nutrition and Health Survey in Taiwan from 2005 to 2008 (NAHSIT 2005–2008) showed that the prevalence of osteoporosis was 38.3% in females aged 50 years and older [1]. Although osteoporosis prevalence rises with age, variations in prevalence rates may impact the selection of the optimal screening tool. Therefore, it was reasonable to get a suboptimal AUC value in the present study.

The mean age of the cohort might affect sensitivity and specificity of OSTs. The mean age of females in this study (74.7 ± 8.7 years) was older than that of the previous development and validation studies (60 to 64 years) [28]. Due to the older age of our study population, it was easier to get positive result under the original cut-offs. Under the circumstance that a large proportion of the population was reported positive by OSTs, most sensitivities in females were around 90.0% and the PPVs were close to the prevalence of osteoporosis.

Noteworthily, FRAX-M, OPERA and OSTA differed from the other OSTs and presented with a lower sensitivity in both females and males. Compared to other screening tools, OSTA stands out as the only tool specifically developed for the Asian population. This development ensures that its sensitivity and specificity are well-balanced, making it particularly suited for use in Asian demographics. For FRAX and OPERA, the inclusion of steroid use and secondary osteoporosis might be a possible explanation.

The Performance of OST in Males

Most of these tools were developed for postmenopausal females [28]. However, some previous studies in Taiwan had validated OSTs in males [37]. Our study also showed valuable results when applying these tools in males. The performance of certain OSTs in males were not inferior to that in females in our study. The AUC values of SCORE, OSIRIS, OSTA, and BWC in males were higher than 0.70, and were even better than the results in females.

According to the data collected, we could believe that males in our study population might have better awareness of health. National Health Interview Survey (NHIS) in 2017 showed that in Taiwan the proportion of smokers were 23.5% in males aged 60 to 69 years and 10.9% in males aged 70 years and older, while only 9.3% of male participants in our study were current smokers. In terms of alcohol use, NHIS in 2017 showed that 18.6% of males aged 70 years and older in Taiwan had alcohol use in recent 1 month, and 61.2% of them had a habit of alcohol use every 1 to 3 days, whereas only 5.6% of male participants in our study admitted habit of alcohol use. Furthermore, senior citizens willing to visit local social-care stations are likely to have better mobility and lesser comorbidities. Therefore, OSTs which performed well enough in this study might be suitable for healthier senior males.

Thresholds of OSTs to Identify Osteoporosis

Setting a cut-off for each OST involves in a trade-off between sensitivity and specificity. Youden’s index represents a cut-point that optimizes the differentiating ability of OSTs while giving equal weight to sensitivity and specificity. However, in real-world practice, a Youden cut-off does not necessarily lead to the maximum of benefits or the minimum of economic burden.

When utilizing screening tool for osteoporosis, different strategies should be applied based on various situations. Lowering the sensitivity and enhancing the specificity of an OST would lessen the cost of reaching for DXA screenings due to lesser false positive cases but might raise the cost of treating osteoporotic complications due to missing potential cases of osteoporosis. For long-term care residents, which are frail and sometimes bedridden, OSTs with a greater sensitivity and PPV would be preferred due the difficulty and inconvenience of visiting hospital [36]. However, our study population were healthy enough to visit the local social-care stations and participate in congregated meal services provided for elderly in the community. We believe that in this situation, AUC should be prioritized. It provides a comprehensive measure of overall test accuracy, combining sensitivity and specificity across all possible thresholds. A higher AUC indicates better overall performance of the test. AUC is the most comprehensive and should be prioritized because it evaluates the balance between sensitivity and specificity across all decision thresholds.

The issue of setting a threshold still have much to be discussed. Su et al. developed and validated the Osteoporosis Self-Assessment tools for Taiwan (OSTAi) [38]. Risk calculation formula of OSTAi is 0.2 × (age (years)–body weight (kg)). The risk categories in OSTAi include low risk (values < −1), medium risk (values between 2 and − 1), and high risk (values ≥ 2). OSTAi and OSTA shared the same factor and coefficient in risk calculation formula. The main difference between OSTAi and OSTA is the threshold of risk categories. The threshold of high-risk category is stricter in OSTA. An observational study conducted in southern Taiwan showed that OSTA reported a significantly higher proportion of individuals at low and moderate risks and a significantly lower proportion of individuals at high risk of osteoporosis than OSTAi, suggesting that OSTA is more suitable than OSTAi for individuals at low risk of osteoporosis [37]. We further validated OSTAi in female population of our study, with sensitivity of 83.1%, specificity of 46.5%, PPV of 72.6%, and NPV of 61.7%. Since we were not able to measure the benefits and costs precisely, more practical-based studies are needed to determine the best cut-off value for OSTs.

A Simple Tool for Elders

As a self-assessment tool which targets mainly the seniors, it should better be simple and user-friendly. The most complex tool in this study included eleven clinical risk factors (FRAX) in its algorithm. However, OSIRIS and OSTA, which performed well in both genders, include four and two clinical risk factors, respectively (Supplement Table 1). This result is supportive of the conclusion of two systematic reviews [28, 39], which claimed that there are no tools that consistently performing best, and that simpler tools perform not inferior to the more complex risk assessment tools. We also emphasize that users can choose the appropriate OSTs based on our results and their own needs. For instance, if they aim to identify more osteoporosis cases, they can select a questionnaire with higher sensitivity. Conversely, if they prefer that most cases screened positive for osteoporosis are confirmed upon referral, they should choose a questionnaire with a higher PPV. Regarding its simplicity and excellent performance, our study reported OSTA as the most user-friendly tool for seniors.

Limitations

The cohort for this investigation primarily comprised elderly individuals from rural central Taiwan, who were participant of CMS. Such an inclusion criterion potentially induces selection biases, notably excluding elderly individuals who possess a higher socioeconomic status. Moreover, the participants from CMS predominantly consisted of women. On the other hand, according to a previous national osteoporosis survey, the prevalence of osteoporosis in the age group similar to our study population is comparable, especially among women. While we acknowledge that our results may not be fully generalizable to the national population, the sample is representative within the context of our specific research focus and might partially reflect the general population's reality [1].

Although we collected data on sarcopenia and frailty in our study, these were not used in this particular research. The main reason is that our study aimed to validate existing osteoporosis screening questionnaires, which have already undergone at least one external validation, to determine which is more accurate in predicting osteoporosis among participants in a rural community. These previous questionnaires did not use variables related to sarcopenia or frailty [30]. It remains unclear how these factors might influence the results, necessitating further investigation into their potential effects in the future.

Finally, we acknowledge that the high prevalence of osteoporosis in the study population, particularly among female participants, could indeed impact the performance of the OST. It is important to consider these factors when interpreting the performance of the OST in different populations. Despite this, the OST remains a valuable tool for screening, and our findings underscore the need for further validation in diverse populations with varying prevalence rates.

Conclusion

Among the ten osteoporosis screening tools, OSTA appeared to be a useful and simple tool for seniors of both genders. However, most osteoporosis screening tools in our studies showed suboptimal performance in view of AUC values. Further adjustment according to epidemiological data and risk factors was necessary while applying existing OSTs to different cohorts.