Introduction

Osteoporosis and its consequences represent a significant public health issue with about 40% of white postmenopausal women being affected by osteoporosis, and, with an aging population, this number is expected to increase over the next several years. The fractures most commonly occur in the spine, hip, or wrist, but other bones such as the trochanter, humerus, or ribs can also be affected [1, 2].

Fractures that arise as a consequence of osteoporosis result in considerable morbidities, increased mortality, and increased health care costs [3]. Low bone mass is a crucial component of fracture risk, but several risk factors, such as age, sex, low body mass index, previous fractures, asthma, cardiovascular disease, chronic liver disease, advanced chronic kidney disease, diabetes, rheumatoid arthritis, systemic lupus erythematosus, glucocorticoids use, smoking, alcohol, and family history of fractures have to be considered as well [4,5,6,7].

Human exposure to toxic metals and metalloids has received considerable attention over the last decades, so much so that the Agency for Toxic Substances and Disease Registry (ATSDR) has classified arsenic, cadmium, lead, and mercury in the top of the priority list of hazardous substances, requiring constant evaluation of human exposure [8]. Exposure to metals and metalloids occurs through various routes, and the time and “dose” of exposure differ widely among individuals. Meanwhile, polluted water, soil, air, smoking, and food are known as main routes of exposure [9,10,11].

Exposure to toxic metals and metalloids have been reported as a risk factor for fractures and degenerative bone diseases such as osteoporosis [12]. On the other hand, little is known about the influence of co-exposure to multiple metals and metalloids on bone density. Cadmium (Cd) is widely known to have toxic effects on bones, in which in vivo and in vitro studies have shown that exposure to Cd decreases bone mineralization, alters bone formation, and increases fracture and osteoporosis risk [13,14,15,16]. Commonly in the environment, lead is easy to be absorbed, and, in human adults, trabecular and cortical bones store 90–95% of the lead found within the body [17]. Clinical studies have shown that as the amount of lead accumulated in the body increases, bone density decreases, and fracture risk increases [18, 19]. However, research on the association between heavy metals and bone health is scarce, requiring further and more specific investigations.

Data analysis models often associate exposure from a single compound to health outcomes. On the other hand, current developments in data mining techniques enable an analysis of co-exposure to multiple compounds on health outcomes [20]. Therefore, the utilization of such an approach would allow for a better understanding of the health effects associated with co-exposure to multiple compounds [21, 22]. Therefore, the present study was conducted to examine the associations of blood and urinary levels of toxic elements with bone mineral density (BMD) loss in a representative sample, who participated in the 2005–2006, 2007–2008, and 2009–2010 survey cycles of the National Health and Nutrition Examination Survey (NHANES) with the use of a data mining approach.

Materials and Methods

Data Source

Data were obtained from 3 cycles (2005–2006, 2007–2008, and 2009–2010) of NHANES data. NHANES is a population-based survey of the non-institutionalized US population that includes demographic and laboratory data, interview data, and a physical examination of the subjects [23,24,25]. In the present study, we included white women in the analysis if they were over 50 years of age and had been selected for BMD testing (N = 1892), since this group has a higher prevalence of osteoporosis. The bone loss group was defined as participants having T-score values below − 1.0 (T-score < − 1.0), and the normal group was defined as participants having T-score values equal to or higher than −1.0 (T-score ≥ − 1.0).

Metals and Metalloids in Blood and Urine

We considered the NHANES data reported for individual levels in the following:

  1. 1.

    Whole blood: cadmium (Cd), lead (Pb), mercury (Hg—total and inorganic)

  2. 2.

    Urine: arsenic (As—total and speciated), antimony (Sb), barium (Ba), beryllium (Be), cadmium (Cd), cobalt (Co), cesium (Cs), lead (Pb), mercury (Hg), molybdenum (Mo), platinum (Pt), thallium (Tl), tungsten (W), and uranium (U)

Trace elements were measured in clinical specimens by the National Center for Environmental Health Laboratories (CDC, Atlanta, GA) by using inductively coupled plasma mass spectrometry (ICP-MS) [26]. The creatinine-adjusted levels were considered for urinary concentrations. The levels of Be, Co, Cs, Mo, Pb, Tl, U, arsenous acid, arsenic acid, arsenocholine, monomethylarsonic acid, and trimethylarsine oxide in urine were excluded from the data analysis because results were below the limit of detection.

Covariates and Potential Confounders

Demographic variables (age, body mass index), medical history (heart attack, diabetes, cancer, asthma, chronic liver disease, rheumatoid arthritis), glucocorticoid use, heavy alcohol use (eight or more drinks a week), and tobacco use (never, former and current smoker) were considered as covariates in the multivariate models.

Data Analysis

Feature Selection

Feature selection is a pre-processing step of data mining, which aims for identification and removal of features that are considered unimportant to the classification process. We employed a filter wrapper method built around the random forest classification algorithm to evaluate individually each feature of the feature set and determine their relative importance on T-score [27]. The entire analysis was conducted using R software, version 3.6.2 [28].

Classification Algorithms and Model Evaluation

After removing the irrelevant features, we trained a support vector machines (SVM) algorithm to verify the importance of the remaining variables and to determine which factors could best predict BMD loss. SVM is a classification technique introduced by Cortes and Vapnik [29] which has been popularized in the data mining and classification literature due to its efficiency and empirical success [30, 31]. SVM algorithm aims to obtain an optimal hyperplane with maximum margin to separate the classes of samples applying nonlinear kernel functions to map data into high-dimensional space. In summary, the algorithm computes the decision boundary based on the samples that are nearest to the maximum-margin hyperplane, which are designated support vectors [30, 31].

The construction of the SVM models follows a specific methodology combined with the random forest measurements (mean decrease in accuracy). We start by training the SVM with only one variable which received the highest importance in the random forest metric. Next, we added the second best-rated variable to the training set. This process resulted in K = {k1; k2;. ..; km} feature subsets, which are generated and used to train an SVM model, where kj is composed of the j best features according to the random forest measurements, and m = 30 is the total number of features in the original feature set. A total of 30 SVM models were built following this scheme.

For each resulting SVM model, we performed a grid search in order to find the best values for the C and σ parameters. The values considered were C = {0,0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.5, 2,5} and σ = {0, 0.01, 0.02, 0.025, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.25, 0.5, 0.75, 0.9}.

To evaluate the performance of the classification model, we used k-fold cross-validation with k = 10. This method splits the data into k subsets and uses k-1 fold to train data and one-fold to test data. The relationship between correct and incorrect classifications is organized in a confusion matrix to obtain the measurement performances of accuracy, sensitivity, and specificity. The entire analysis was conducted using R software, version 3.6.2 [28].

Results

Our study subjects consisted of 1892 white women adults; 793 (41.9%) had the T-score greater than or equal to − 1.0, and they were categorized in the normal group, and 1099 (58.1%) had the T-score lower than − 1.0, and were categorized in the bone loss group. Table 1 shows the age, femoral neck BMD, body mass index (BMI), and categorical variables: alcohol drinking, smoking status, arthritis, asthma, cancer history, diabetes, glucocorticoids therapy, heart attack, liver, and thyroid disorder of the NHANES participants in two groups divided by the T-score. Participants from the normal group were, on average, 63.1 ± 8.9 years old, had a BMI of 31.2 ± 6.0 kg/cm2, and had a femoral neck BMD of 0.83 ± 0.08 g/cm2. Regarding the bone loss group, the mean age was 70.8 ± 9.5 years old, the mean BMI was 26.8 ± 5.3 kg/cm2, and the mean femoral neck BMD was 0.63 ± 0.07 g/cm2 (SD ± 0.1). The prevalence of cancer (33.7%), liver disorders (5.5%), as well as thyroid disorders (37.8%) were greater in the low BMD group as compared to the normal BMD group.

Table 1 Characteristics of the study population overall and by groups

Table 2 shows geometric means (GMs) and confidence interval 95% (CI 95%) of metals and metalloids measured in either whole blood or urine separated by group or combined/overall. Several of the metals in the blood and urine were higher in the bone loss group as compared to the normal group. Indeed, the elements As, Cd, and W had GM (CI 95%) urine concentration values of 7.8 (7.3–8.3), 0.35 (0.33–0.37), and 0.070 (0.067–0.074) μg/L, respectively, for the bone loss group; and 6.6 (6.0–7.2), 0.25 (0.24–0.28), and 0.057 (0.053–0.060) μg/L, respectively, for the normal group. Figure 1 is a graphical representation of the urinary concentration of As, Cd, and W in both groups.

Table 2 Geometric means and confidence interval 95% (CI 95%) of metals and metalloids overall and by groups
Fig. 1
figure 1

Distribution of the concentration level of urine heavy metals across the two groups: normal and bone loss groups based on femoral hip T-score

The classification results for the 30 SVM models that were developed are presented in Table 3. The model with the best accuracy (96.46%), sensitivity (95.02%), and specificity (97.47%) was model #19. Here, the training subset was formed by age, BMI, urinary concentration of As, Cd, W, Sb, Ba, Hg, dimethylarsonic acid (DMA), arsenobetaine (AB), Hg, Pt, whole blood concentration of Pb, Hg (total and inorganic), and also arthritis, cancer, thyroid, and former smoker status. However, the model (number #5) which included the five-best features-selected from random forest (Fig. 2), where the training subset formed by age, BMI, urinary concentration of As, Cd, W, also have achieved high scores for accuracy (92.18%), sensitivity (90.50%), and specificity (93.35%). Together, these data demonstrate the importance of these factors and metals to the classification, since they alone were capable of generating a classification model with a high prediction of accuracy without requiring the other variables. Figure 3 shows the impact of these variables when compared to other models, model #2 has included only age and BMI as variables; model #5 is the model formed by age, BMI, urinary concentration of As, Cd, W; and model #19 is model which had the best accuracy, sensitivity, and specificity.

Table 3 Accuracy, sensitivity, and specificity values obtained by the SVM model trained on different feature subsets
Fig. 2
figure 2

Bar plots of relative importance of the compounds and other variables on BMD according to the random forest values computed

Fig. 3
figure 3

Receiver operating characteristics curve (ROC) traces the percentage of true positives to be accurately predicted, by a given logit model, as the prediction probability cutoff is lowered from 1 to 0

Discussion

To the best of our knowledge, this study is the first to evaluate the associations of blood and urinary levels of toxic elements with BMD loss in a representative sample of 1892 individuals with the use of a data mining approach.

In this study, three NHANES cycle databases were mined for general demographic, social, and medical history data from white female participants over 50 years. Additionally, concentrations of 13 metals in blood (Cd, Hg (total and inorganic), and Pb) and urine (As and speciated (arsenobetaine, dimethylarsonic acid), Ba, Cd, Hg, Sb, W) samples were examined and BMD was used to separate the participants into normal (T-score ≥ − 1.0) or bone loss (T-score < − 1.0) groups. The resulting database underwent SVM modeling to determine which factors could best predict BMD loss.

It is well-known that age and BMI are important factors for BMD. Aside from these factors, our modeling process was able to identify that inclusion of three metals (arsenic, cadmium, and tungsten) was also of critical importance in predicting BMD loss. Importantly, participants in the low BMD group had a higher concentration of all these metals in their urine than did the normal BMD group (Fig. 1). These findings might clarify a gap regarding the relationship between metal and metalloid exposure and bone health. Remarkably, higher concentrations of these metals showed significantly higher correlations to lower BMD than did smoking or diabetes, which are well-documented factors leading to bone loss and increased fracture risk [32, 33]. Furthermore, previous investigations have shown impaired bone healing due to smoking and even passive smoking, which highlights the important, yet neglected, the impact of heavy metal exposure on bone health [34, 35].

Exposure to arsenic typically results from either consumption via contaminated-arsenic drinking water, soil, and food, or arsenic inhalation in factories [36, 37]. The World Health Organization (WHO) considers that around 200 million people globally are exposed to the metalloid in drinking water at levels above 10 μg/L, the safety threshold [38]. In the present study, the urinary concentration of arsenic was 18.4% higher in the low BMD group compared to the normal BMD group.

Arsenic has the ability to accumulate in bone tissue, likely, competing with the phosphate group to reduce the formation of hydroxyapatite crystals, by instead forming apatite arsenate and other calcium arsenate crystals [39]. Previous studies have shown that exposure to arsenic decreases RANKL and RUNX2 expression, compromising osteoblast maturation, concomitant with reductions in the activity of alkaline phosphatase, as well as the VCAM-I adherence molecule causing a decrease in osteoblastogenesis and osteoblastic activity, thus impairing bone remodeling by unbalanced bone turnover [40].

Of interest, clinical studies have shown the use of dental paste containing arsenic trioxide for endodontic treatment of inflamed pulp that can cause alveolar bone osteomyelitis and osteonecrosis [41, 42]. In addition, some evidence suggests that Paget’s disease, which is caused by an imbalance in bone remodeling, might be associated with arsenic intoxication [43].

Besides, studies have presented a high prevalence of glucose intolerance, diabetes, and metabolic syndrome correlated with arsenic exposure, all of these health disorders are related to high blood glucose levels [21, 44,45,46,47]. Remarkably, high glucose concentrations are detrimental to osteocalcin synthesis by the osteoblasts and result in the accumulation of AGEs [48], which is linked to higher rates of osteoblast apoptosis and a higher osteoclast resorptive activity [49, 50]. As a result, bone microdamage accumulates, resulting in increased cortical porosity and bone fragility, which may lead to osteoporotic fractures. Moreover, the systemic inflammation associated with these diseases might activate bone resorption, resulting in decreased BMD [51, 52]. β cell line studies demonstrated the capacity of these cells to methylating inorganic arsenic into monomethylarsenous acid (MMA) and dimethylarsenous acid (DMA). Specifically, MMA can inhibit mitochondrial function and decrease glucose-induced insulin secretion [53,54,55]. Furthermore, insulin gene expression and transcription factor activities suffer significant effects from arsenic exposure. Arsenic might induce impairment of β cell function though a decrease of MafA transcriptional activity, such a decrease is an indication of β cell failure or de-differentiation [56,57,58].

It has been long known that glucose is a significant substrate for ATP production via glycolysis for osteoblasts and its progenitors [59]. Arsenic has the potential to inhibit ATP production during the process of glycolysis by replacing the phosphate anion with arsenate. This process is called as arsenolysis and might stop feeding osteoblast-mediated bone formation [60,61,62]. Parallel to glycolysis, glucose-6-phosphate is converted into 6-phosphogluconate using glucose-6-phosphate dehydrogenase via pentose phosphate pathway where NADP+ is converted into NADPH which keeps glutathione in its reduced form. Arsenic can inhibit glucose-6-phosphate dehydrogenase activity and, consequently, reduce glutathione levels [63].

In our study, we found that the urinary concentration of cadmium was 37.6% higher in the low BMD group as compared to that in the normal BMD group. Cadmium is widely distributed in the environment, and the exposure to this metal occurs mainly during the ingestion of food or inhalation of cigarette smoke. It is estimated that over 80% of ingested cadmium comes from cereals, primarily rice, and wheat [64].

In vivo studies in experimental animals demonstrate that chronic exposure to cadmium decreases bone volume and increases the percentage of TRAP-positive osteoclast cells in subchondral tibial bone, which can increase bone resorption [65]. Exposure to cadmium may also alter bone formation and mineralization processes since cadmium has been linked to decreased expression of RUNX2, osteocalcin, type I collagen, and alkaline phosphatase, which are markers of osteoblastic differentiation [13].

Clinical studies have shown that even low-level exposure to cadmium through diet and smoking is associated with low BMD and bone fragility in both postmenopausal women and elderly men. Moreover, they also found an association between fractures and cadmium in never-smoker patients, whose central exposure was from their diet. These findings suggest that long-term cadmium exposure has negative consequences on skeletal health [15, 16].

Similar to the arsenic, cadmium might affect the energy metabolism of osteoblasts and osteoblast progenitor cells [62]. Cadmium inhibits enzymes through its high affinity for the free electron pairs in cysteine -SH groups, which are essential in enzyme function. By decreasing the phosphofructokinase activity, cadmium has the potential to limit the glycolysis process in the liver and muscles [66]. We hypothesize that the same imbalance in energy metabolism might happen in bone differentiation and formation.

Mines and industries are the main occupational sources of human exposure to tungsten, where exposure can be due to pure tungsten, tungsten ore, or tungsten-containing alloys. Workers can be exposed through inhalation or dermal contact of contaminated air. Moreover, tungsten mineral is naturally present in the soil and consumption of contaminated water or air in regions near tungsten mines, industrial sites, or military sites are also environmental sources. Tungsten particulates in the air can be generated through weathering or emission from industrial and mining sites containing tungsten [67, 68].

The current studies have shown that the urinary concentration of tungsten in the low BMD group was 0.25-fold higher than the normal BMD group. Tungsten has been highlighted as an emerging contaminant, and yet there is limited knowledge of the potential human health risks [69]. In vivo evidence suggests that tungsten alters bone homeostasis since young male mice exposed to sodium tungstate (orally for 4 weeks) had enhanced rosiglitazone (PPARγ ligand)-induced gene expression and adipogenesis [70]. In general, within the bone marrow microenvironment increases in marrow fat usually result in decreases in bone mass. Therefore, the data that tungsten increases adipogenesis may suggest that tungsten increases the commitment of progenitor cells to the adipogenic pathway rather than the osteogenic pathway, which could have significant implications for bone quality.

Bone biology is a complex process consisting of the equilibrium between bone formation and bone resorption. To have a more expansive landscape of how metals and metalloids affect bone remodeling, we have evaluated the effects of multiple metals and metalloids on bone biology by an advanced data mining approach. Although our model has some limitations, which include the lack of our model validation by applying it to a secondary dataset, and the exclusion of several compounds such as bisphenols, parabens, and phthalates to evaluate the influence of these upon bone health, our findings provide insight into the important impact that arsenic, cadmium, and tungsten have on overall bone health (92.18% of accuracy, 90.50% of sensitivity, and 93.35% of specificity). In general, our data demonstrated the importance of these metals to be classified as risk factors for bone loss since together with age and BMI, they were capable of generating a classification model with a high prediction of accuracy without requiring any other variables.