Introduction

Arsenic (As), cadmium (Cd) and lead (Pb) are well-known toxic metals/metalloids (hereafter, simply referred to as “metals”), and substantial experimental and epidemiologic evidence supports a role of these metals in a number of human diseases [1]. Other non-essential metals such as aluminum (Al), antimony (Sb), barium (Ba), thallium (Tl), tungsten (W) and uranium (U) also can have negative health effects. Evidence from humans and animals suggests that exposure to these six metals is associated with neurological and mental disorders [2, 3], cardiovascular and kidney diseases [3, 4], adverse birth outcomes [5, 6], as well as impaired male reproductive health. To be specific, Al, Tl, W and U are found to affect germ cell development; [7,8,9,10] Ba, W and U are associated with altered male steroid hormone production [11,12,13].

Spot urine samples are often utilized to determine an individual’s exposure to Al, Sb, Ba, Tl, W and U in large population investigations [14,15,16,17,18,19], because their collection is noninvasive and requires little expertise. Furthermore, compared with other sample types (e.g., seminal fluid or blood), much larger volumes can be collected each time. However, Al, Sb, Ba, Tl, W and U measures in a single spot sample may accurately reflect an individual’s exposure only at a single time point, since various factors can affect urinary metal concentrations within individuals. Variation in external exposure magnitude as a result of changes in diet and lifestyle (e.g., smoking status) is the predominant factor directly influencing urinary element excretion [20]. Varied urinary flow rate due to changes in physiological status, salt intake and time of day can also lead to substantial variations in urinary metal levels [20]. Moreover, creatinine (Cr)-adjusted concentrations may introduce other sources of variability (e.g., related to age, sex, race/ethnicity, body size, and fat-free mass) that may not reflect urine dilution [21]. Therefore, hourly variability of metal concentrations occurs, and determining the exposure of an individual over time intervals of weeks or months may require multiple specimens.

Previous studies have reported substantial within-individual variability in urinary concentrations of arsenic, cadmium and lead with low-to-moderate reproducibility over days or months [22,23,24,25,26]. However, to date, very little is known about the within-individual and between-individual variability and the intra-day and inter-day variability in urinary levels of Al, Sb, Ba, Tl, W and U in spot samples, first-morning voids and 24-h collections. Due to differences in chemical characteristics (e.g., exposure magnitude, routes and frequency of exposure, and elimination half-life) [20], it is critical to understand analyte-specific variability patterns of urinary non-essential metals to optimize exposure assessments in epidemiological studies.

To fill the data gap, we assessed exposure variability and classification among 11 healthy adult Chinese men via repeated urinary measures of Al, Sb, Ba, Tl, W and U in spot samples, first-morning voids, and 24-h collections gathered on 8 days during a 3-month period. We also evaluated the influence of concentration corrections (i.e., Cr-adjusted concentrations, modeling Cr as covariates and urinary excretion rate calculation) on the reproducibility of Al, Sb, Ba, Tl, W and U measures and assessed the agreement among the 3 sample types (i.e., spot, first-morning, and 24-h samples) gathered within 24 h.

Materials and methods

Study population and design

The Ethics Committee of Tongji Medical College approved this study. From September 2012 to February 2013, we recruited 11 men living in a restricted geographical area in Wuhan, China to participate in a study designed to assess within-individual and between-individual variations in urinary biomarker concentrations [25, 27]. The volunteers were non-smoking healthy men aged 21–28 years (mean age: 24 ± 2.0 years) with no documented renal disease, diabetes, or occupational exposure. Before participation, all men provided written informed consent.

During a 2-phase study period over 3 months, the 11 men were asked to provide all urine samples on days 0, 1, 2, 3 and 4 (days apart, phase 1) and on days 30, 60, and 90 (including day 0, months apart, phase 2). Overall, 529 spot samples (6 voids were missing), including 88 first-morning samples, were collected. The men urinated 3–11 times (arithmetic mean: 6.0 voids) per 24-h period, and the volume of their spot samples ranged from 20 to 741 mL (arithmetic mean: 195 ± 117 mL). All men were asked to collect each void in a disposable, trace-element free polypropylene container. After recording the collection time and total volume voided, approximately 50 mL of urine were decanted into a trace-element-free polypropylene specimen cup. Participants immediately stored the specimens in a cooler filled with ice. Research fellows collected the specimens 3 times per day and froze them at −40 °C until analysis. The men had no diet restrictions but were required to complete a dietary record during the sampling days. Three participants ate fish on one occasion; none of the men reported eating dietary supplements or seafood other than fish.

Laboratory measurements

Urinary levels of Al, Sb, Ba, Tl, W and U were determined using a previously described method [18]. To ensure data accuracy and precision, we used the following quality controls: standard reference material 1640a (trace elements in natural water) and 2670a (toxic elements in urine) [18, 25], reagent blank samples and spiked pooled samples from 100 voids selected randomly from the 11 men. Analyte values were below the limits of quantification (LOQ) in blank samples. The spike recoveries for all of the metals under study were in the range of 89–112% and the coefficients of variance (interday and intraday variation) were lower than 10%. LOQ for Al, Sb, Ba, Tl, W and U were 0.39, 0.0031, 0.00039, 0.00067, 0.0011 and 0.033 μg/L, respectively. Values lower than the LOQ were assigned to LOQ/√2 for analysis. Concentrations of Cr in urine were determined using a commercial test kit based on the picric acid assay [27].

First-morning void was defined as the first specimen collected from an individual at or after 5:00 AM each day. Spot sample was defined as any individual specimens collected throughout a day (including first-morning voids). A simulated 24-h urine collection was calculated as the volume-weighted average levels of all individual spot samples obtained during a 24-h period starting at 0:00 AM.

Statistical analyses

All data analyses were performed using Statistical Analysis Software (SAS) (SAS Institute Inc., Cary, NC, USA). Because urinary measurements of Al, Sb, Ba, Tl, W and U followed a log-normal distribution, we used log10-transformed values for all subsequent analyses.

Four different models were constructed to assess the influence of concentration corrections on the variability of urinary metal measures: (1) without adjustment for urinary dilution (µg/L); (2) using Cr-adjusted concentrations (µg/g Cr), computed by dividing the unadjusted concentrations by urinary Cr (Cr-adjusted 24-h urine collection was computed by dividing the average uncorrected concentrations of metals by average creatinine levels); (3) modeling Cr as a covariate; and (4) using urinary excretion rate (µg/h), calculated by dividing the total mass of excretion by the time since prior void. We used Akaike information criterion (AIC) values to rank these models based on all spot samples (lower AIC indicates a better model fit).

Intraclass correlation coefficients (ICCs) were separately calculated for specimens gathered days apart only (collected on days 0, 1, 2, 3 and 4), specimens gathered months apart only (collected on days 0, 30, 60 and 90), and all specimens using a multilevel mixed-effects model with a random effect for each subject. The ICCs, ranging from 0 to 1, are the ratio of between-individual variance to the sum of between-individual and within-individual variance. ICC values lower than 0.40 indicate poor reproducibility; values ranging from 0.40 to 0.75 indicate fair-to-good reproducibility; and values greater than 0.75 indicate excellent reproducibility [28]. The between-individual, within-individual inter-day, and within-individual intra-day variances were estimated for spot samples. For first-morning voids and 24-h collections, only between-individual and within-individual inter-day variances were calculated since one value was available per day.

We constructed mixed regression models using urinary metal measurements in 24-h collections as the outcome variables and measures in same-day spot samples or first-morning voids as the predictor variables. We estimated the coefficient of determination (R 2) to assess the predictive power of the models. We calculated marginal R 2 (R 2m, the amount of variance explained by fixed effects) and conditional R 2 (R 2c, the proportion of variance explained by fixed and random effects) separately for the linear mixed models according to a method proposed by Nakagawa and Schielzeth [29].

The sensitivity and specificity of single spot sample as a predictor to identify men who were highly exposed (top 33%) based on their 3-month average metal concentrations were evaluated by comparing the distributions of true and predicted levels. For the “true” levels, we averaged metal measures (log10 scale) for each individual using their spot samples (including first-morning voids) collected on the 8 sampling days. Then, we identified the “true” top 33% exposure levels. For the “predicted” levels, we created 10 data sets, and each data set contained 1 randomly selected spot sample per individual. We then identified the “predicted” top 33% exposure levels within each data set. We reported average sensitivity and specificity observed across the 10 separate data sets. The same method was replicated to assess whether 2 or 3 spot specimens randomly gathered from each men on different days improved the exposure classification. The sensitivity and specificity analyses were calculated separately for spot samples and first-morning voids.

To facilitate investigations on the impact of urinary metal reproducibility, we calculated the minimal number of specimens (K) needed to estimate the individual-specific mean within 20% (D = 20) of the “true” levels with a probability of 95% (Z = 1.96) using the following equation: K = (Z × CV/D)2, where Z is the number of standard deviates required for a stated probability under the normal curve, CV is the within-individual coefficient of variance, and D is the desired percentage closeness to the homeostatic set point [30,31,32].

Results

Table 1 lists the unadjusted, Cr-adjusted and urinary excretion rates of Al, Sb, Ba, Tl, W and U in spot samples, first-morning voids and 24-h collections. All elements were detectable in more than 99% of the 529 specimens. Figure 1 presents the levels of 6 elements in all spot samples gathered during the 3-month period. As seen in the line charts, shifts of up to 2 (e.g., Al, Ba and W) or 3 (e.g., Sb, Tl and U) orders of magnitude occurred within a day and across days. However, we did not observe a clear rhythmic pattern for these elements.

Table 1 Unadjusted (μg/L), Cr-adjusted (μg/g Cr) and urinary excretion rate (μg/h) of metal levels in spot samples, first-morning voids and 24-h collections from 11 men
Fig. 1
figure 1

Urinary Cr-adjusted metal/metalloid levels (µg/g Cr) from 11 men collected on 8 days during a 3-month period. Each graph represents a participant (labeled as P1–P11). The dots in each graph represent the element levels in each spot sample (including first-morning voids). Al aluminum, Sb antimony, Ba barium, Tl thallium, W tungsten, U uranium

Effects of concentration corrections on reproducibility of metals

The apportionments of the within-individual and between-individual variances for the element measurements in spot samples were in close agreement based on unadjusted, Cr-adjusted, Cr as a covariate and urinary excretion rate calculation models (Table 2). The intra-day variance was the largest apportionment of the total variance (range: 47–84%) for Al, Sb, Ba, Tl and U in 4 different models; for W, however, the inter-day variance was predominant (range: 44–56%). According to the AIC values, the best model fit was achieved with the Cr-adjusted for W and Cr as a covariate models for Al, Sb, Ba, Tl and U. Because the variance apportionments of the six elements estimated from the Cr-adjusted and Cr as a covariate models were similar and because the 24-h collections cannot account for urine dilution by modeling Cr as a covariate, Cr-adjusted values were used in all subsequent analyses.

Table 2 The variance apportionment of log10-transformed metal concentrations/excretions in spot samples collected from 11 men (n = 529)

Reproducibility of metals among different sample types

Table 3 presents the variance apportionments of Cr-adjusted Al, Sb, Ba, Tl, W and U levels in spot samples, first-morning voids and 24-h collections. Poor reproducibility was obtained for serial measures of Al, Sb, Ba, Tl, W and U in spot samples during a 3-month period (Cr-adjusted ICCs = 0.01–0.28). Compared to spot samples, greater than 10% increases in ICCs were obtained for Al, Ba, Tl and U for first-morning voids and Al and Tl for 24-h collections. However, the within-individual variance remained the predominant apportionment of the total variance for Al, Sb, Ba, Tl, W and U based on first-morning voids and 24-h collections.

Table 3 The variance apportionment of log-transformed Cr-adjusted metal concentrations in the three sample types collected from 11 men

In our stratified analysis, serial measures of Tl in spot samples exhibited fair-to-good reproducibility over five consecutive days (Cr-adjusted ICC = 0.40) but became poor when samples were gathered months apart (Cr-adjusted ICC = 0.16) (Table 3). Poor reproducibility was observed for Al, Sb, Ba, W and U whether the spot samples were gathered days or months apart (Cr-adjusted ICCs = 0.01–0.14). The variance components of Al, Ba and W for the 3 sample types gathered months apart closely agreed with the samples collected days apart. For Sb, Tl and U, however, decreased ICCs in each sample type were observed when the samples were gathered further apart in time. Modeling time since the prior urination as an additional covariate did not substantially affect our results (results not shown).

Correspondence of metals in different sample types

Table 4 presents the mixed linear models reflecting the ability of spot samples or first-morning voids to predict same-day 24-h urine collections. Low-to-moderate predictive power was achieved for models examining Al, Sb, Ba, Tl, W and U in a spot sample and its respective 24-h collection (R 2c = 0.32–0.67). Using a first-morning void instead of a spot sample as a predictor of same-day 24-h collection provided greater than 10% increases only for U.

Table 4 Cr-adjusted models of 24-h excretions using same-day spot samples or first-morning voids as predictorsa

Sensitivity and specificity

Table 5 presents the ability of 1, 2 or 3 randomly selected spot samples or first-morning voids to identify the highly exposed men (based on their 3-month mean measures). The proportion of men who were in the top 33% of average exposure and who were correctly classified as such using single spot samples or first-morning voids (i.e., the sensitivities) ranged from 0.27 to 0.60 for Al, Sb, Ba, Tl, W and U. Tests of repeated spot samples gathered days apart provided ≥10% increases in sensitivities for Al, Sb, Ba, Tl and U. When 3 spot samples were gathered days apart, moderate-to-high sensitivities were obtained for Ba (0.73) and Tl (0.70). Within the first-morning sample group, compared with a single specimen, using 2 specimens gathered days apart to classify the participants offered ≥10% increases in sensitivities for Al, Ba and U. When 3 samples were collected days apart, moderate-to-high sensitivities were observed for Ba (0.67), Tl (0.67), W (0.63) and U (0.73). Collecting specimens months apart did not offer apparent advantages in exposure classification over specimens gathered days apart for Al, Sb, Ba, Tl and U. Specificity was uniformly greater than 0.70 regardless of the sampling strategy.

Table 5 Average sensitivity and specificity of 1, 2, or 3 urine samples as a predictor to identify the highly exposed men (top 33%) based on their 3-month mean measuresa

Number of samples needed to estimate average metal levels

The minimal number of samples needed to estimate individual-specific means within 20% of the “true” metal levels are presented in Table 6. Four specimens were sufficient for Tl and U to achieve a 95% chance that the mean was within 20% of the “true” level, whereas greater than 20 specimens were required for W and Ba.

Table 6 The minimum number of specimens (K) required to estimate an individual-specific mean within 20% of the “true” levels with a probability of 95%

Discussion

The detection frequencies and levels of Al, Sb, Ba, Tl, W and U in the urine of this study population are comparable to those reported for 2004 adult residents in a Chinese community [18]. Al, Sb, Ba, W and U levels in spot samples varied greatly over time intervals of days and months (Cr-adjusted ICCs = 0.01–0.14). Serial measures of Tl in spot samples showed fair-to-good reproducibility over 5 consecutive days (Cr-adjusted ICC = 0.40) but worsened when the specimens were gathered months apart (Cr-adjusted ICC = 0.16), possibly because of both day-to-day changes and monthly trends in metal excretion [33]. Similar to trends of the ICCs, in our sensitivity and specificity analyses, Al, Sb, Ba, Tl, W and U measured in single spot samples had high specificities (0.73–0.80) but relatively low sensitivities (0.27–0.47) when identifying the highly exposed men based on their 3-month mean concentrations (indicating high type 2 classification errors).

In this study population, low-to-moderate predictive power was achieved using Al, Sb, Ba, Tl, W and U in spot samples as predictors of same-day 24-h collections (R 2c = 0.32–0.67). Thus, spot samples may not be good surrogates for 24-h urine collections for these elements. Many researchers recommended first-morning voids as better approximations of true biomarker excretion than spot samples because of the longer accumulation period [34, 35]. However, tests of first-morning voids instead of spot samples as predictors of same-day 24-h collection provides a greater than 10% increase in R 2c c for only U. Moreover, though first-morning voids and 24-h collections exhibited greater consistency than spot samples for some elements (i.e., Al, Ba, Tl and U for first-morning voids and Al and Tl for 24-h collections), the within-individual (between-day) variance remained the predominant apportionment of the total variance for Al, Sb, Ba, Tl, W and U in first-morning voids and 24-h collections. The high proportion of inter-day variances indicates that single measures of these elements in first-morning voids or 24-h collections still cannot provide an accurate exposure estimation for an individual over weeks or months. Consistent with this speculation, in our sensitivity analysis, single first-morning voids correctly identified ≤ 60% of the men whose true 3-month average exposure of Al, Sb, Ba, Tl, W and U was in top 33%.

Cr-adjusted concentrations, calculated by dividing the measured biomarkers by urinary Cr, are commonly used in epidemiological studies as a method to account for urine dilution. This approach relies upon the assumption that Cr excretion is approximately constant across individuals and time. However, Cr levels can vary with gender, age, body mass index, dietary intake and physical activity. Modeling Cr as a covariate and using the urinary excretion rate calculation are two proposed methods to reduce the variability in biomarker concentrations from urine dilution [21]. The apportionment of within-individual and between-individual variance of urinary metal measurements closely agreed based on Cr-adjusted, Cr as a covariate and urinary excretion rate calculation models. Thus, our observations reflect the true variations in metal excretion rather than other sources of variability introduced by Cr adjustment, such as diet, muscle mass and body mass index. Consistent with a prior study [25], our results support the use of Cr-adjusted concentrations to standardize urinary non-essential metal levels among adult men. However, our study participants were very homogeneous (healthy young males aged 18–22 years). Large differences in age, diet and body size of adult men may result in very different normalized values among men with similar exposures. Covariate-adjusted standardization is a proposed new method to adjust for urinary creatinine when estimating the association between a health outcome and environmental chemicals [36], which was not assessed in the present study because of limited sample size. Additionally, we did not quantify specific gravity in urine. Additional research is needed to confirm our findings using covariate-adjusted standardization method and the density of urine for dilution adjustment.

Our findings emphasize the importance of determining analyte-specific patterns of variability to maximize study efficiency. Between-individual and within-individual variations in urinary chemical concentrations can be influenced by external exposure magnitude, route and frequency of exposure, and the kinetics of absorption, distribution, metabolism and elimination of analytes [20]. In our study population, Al, Sb, Ba, W and U levels in spot samples varied greatly over time intervals of days and months, whereas serial measures of Tl in spot samples exhibited fair-to-good reproducibility over 5 consecutive days (though the reproducibility decreased when specimens were gathered months apart). Four specimens were sufficient for Al and U to achieve a 95% chance that the mean was within 20% of the true level; however, greater than 20 specimens were required for Ba and W. Only a few prior studies have explored the variability of urinary levels of Al, Sb, Ba, Tl, W and U. Consistent with our findings, Paglia et al. [37], reported poor reproducibility in urinary levels of Al, Sb, Tl and U among 7 healthy adults who provided first-morning voids 3 times per season over 12 months (Cr-adjusted ICC < 0.40). The high variability observed in the serial measures of Al, Sb, Ba, Tl, W and U in urine over a 3-month period may be related to their short elimination half-life. Al, Sb, Ba, Tl, W and U are water-soluble; after exposure, all six metals are rapidly excreted from the human body in the urine, with an elimination half-life that ranges from hours to days [4, 38, 39].

Strengths of this study include the large numbers of spot, first-morning and 24-h samples. Because spermatogenesis occurs over a 90-day period [40], urine samples collected from the same participants during a 3-month period facilitated the investigation of the reliability of exposure assessment in reproductive health epidemiological studies. We also evaluated the correlations between spot samples, first-morning voids and same-day 24-h collections and assessed the influence of concentration corrections (i.e., Cr-adjusted concentrations, modeling Cr as covariates and urinary excretion rate calculation) on reproducibility of Al, Sb, Ba, Tl, W and U levels, which can be useful for informing the design of exposure estimation in population investigations.

Our study had some limitations. First, we did not explore potential factors that may affect the reproducibility of urinary Al, Sb, Ba, Tl, W and U levels because of the limited population size. Additionally, we relied on a rather homogeneous population with no consumption of tobacco, dietary supplements or seafood and with rare consumption of fish. These factors may have resulted in an underestimation of the between-individual variability compared with a more heterogeneous study group. Third, we did not perform duplicate aliquot precision analysis to assess variability resulting from analytic factors, which may bias estimate of within-individual variability. Finally, our study population was restricted to men of reproductive age. Because different exposure scenarios (e.g., diet and lifestyle) and physiological characteristics (e.g., age and gender) may result in different variability patterns, our results should be applied cautiously to other populations, especially children or pregnant women.

Conclusions

Measurements of Al, Sb, Ba, Tl, W and U in spot samples exhibit a high degree of within-individual variability over a 3-month period. Samples gathered over longer time intervals may be more variable. Using single spot samples to classify an individual’s 3-month average exposure of Al, Sb, Ba, Tl, W and U can lead to high type 2 classification errors. Collecting multiple specimens from an individual improved the classification; however, the number of specimens required to accurately estimate individuals’ exposure levels were different for the metals under study. The high proportion of inter-day variance of Al, Sb, Ba, Tl, W and U in first-morning voids and 24-h collections indicate that single measures in these elements still cannot offer accurate exposure estimations of an individual over time intervals of weeks or months.