Introduction

Linear growth is normally accompanied by large increases in bone mass throughout childhood. Chronic illness and its therapies can interfere with linear growth and bone-mass accrual. The identification of children with inadequate bone accrual and relatively low bone mass is essential to the prevention of adulthood osteoporosis [1]. Dual-energy x-ray absorptiometry (DXA) is the most common method for clinically evaluating bone mineral content (BMC) and areal bone mineral density (aBMD) [2], and current models for normal pediatric BMC and aBMD reference data have been created using LMS curves [3]. These bone-density standards adjust for age, sex, and race. An additional adjustment [4] provides a Z-score based on a child’s height. Other anthropometric variables, however, such as weight, sexual maturity, and body fat may confound the interpretation of a bone-mass measurement. The International Society of Clinical Densitometry (ISCD) 2007 Pediatric Position Statements advise that children be compared to pediatric reference data that are adjusted for body size, growth, and maturation [5]. A more complete model including additional anthropometric factors such as weight, height, sexual maturity, and body fat will result in Z-scores that reflect the combined contributions of these important variables.

The multivariate semi-metric smoothing (MCS2) algorithm can be used for additional adjustment to smooth the raw regression coefficients for weight, height, sexual maturity, and body fat across age for each sex/race combination to allow for a connected response surface and can include as many parameters as necessary to explain the data [6, 7]. MCS2 was designed for use on data sets with correlated parameters and uneven data cell sizes.

Previous work [8], focused on preliminary aBMD data of the spine, has shown that adjusting regressions of bone mineral measurements in children for anthropometric values like weight, height, and body fat leads to narrower distributions than methods that do not adjust for anthropometric values. Narrower distributions should lead to more reliable classifications. The goal of this study is to present complete linear models that have been smoothed using MCS2 for multiple pediatric bone-measurement sites.

Subjects and methods

Data set

The Bone Mineral Density in Childhood Study (BMDCS) collected six-year, longitudinal DXA measurements, including BMC and aBMD, at several anatomical sites in 2014 healthy children and adolescents, age 5–19 years, recruited from multiple ethnic groups at five clinical centers in the USA. The BMDCS study population and data collection methods have been described previously [3, 4]. The final age range of the sample was 5 to 22 years. For this study, age was rounded to the nearest age instead of being truncated to the previous birthday to match the methods used in other approaches [2, 4]. BMC and aBMD of the spine, whole body, whole body less head, femoral neck, hip, and forearm were measured using Hologic QDR4500A, QDR 4500W and Delphi/A bone densitometers (Hologic Inc. Bedford, MA). The cross-calibration of the bone densitometers was assessed by measuring the same European Spine Phantom as well as the same Hologic whole-body phantom six times in 7 years on each scanner. Longitudinal calibration was based on on-site Hologic spine phantoms and Hologic whole-body phantoms measured 3–5 times per week. The combined phantom measurements showed that the scanners worked within a range of 3 % for aBMD and 5 % for BMC. Anthropometric parameters measured on study participants included weight (kg), height (cm), body fat (percentage of fat weight relative to total weight determined from whole-body DXA scan), and sexual maturity. Sexual maturity was assessed according to Tanner Stage criteria for testes volume (males) and breast size (females) [9]. Subsequently, sexual maturity was classified as a binary variable (0 = Tanner stage 1–3; 1 = Tanner stage 4–5). Data from study-participant visits were excluded if they had used steroids, birth control medication, or other drugs known to influence bone, or if anthropometric measures were not available. The sample size for the different measurement sites used in the final MCS2 fits of the BMDCS data covering ages 5–22 were spine n = 10,376, whole body n = 10,425, hip n = 10,388, and arm n = 10,320. Data for ages 21 and 22 were used for smoothing but not for final estimation. BMC and aBMD measurements are always paired; if one is present, the other will be as well. Whereas the International Society for Clinical Densitometry does not advise DXA hip measurements in pediatric patients, the proximal femur has been shown to be responsive to exercise interventions [10], so it may be an important site to consider under some circumstances. In addition, as the proximal femur is one of two recommended sites for osteoporosis assessment in adults, it is a useful measure to obtain in teens who are likely to be monitored into adulthood, so that an early baseline can be established.

Data pretreatment/exclusions

Three data points were removed after consulting with the central data analysis facility because they appeared inflated. In addition, 11 points were removed as extreme outliers that did not fit prior or subsequent measurements of the same patient. To define the outliers, the annualized fractional change (AFC) was calculated for each point in the data set as follows:

$$ AFC\kern0.5em =\kern0.5em \left(B{M}_n\kern0.5em \hbox{--} \kern0.5em B{M}_{n-1}\right)\;/\;\left(B{M}_n\kern0.5em *\kern0.5em \left(Ag{e}_n\kern0.5em -\kern0.5em Ag{e}_{n-1}\right)\right) $$
(1)

AFC: annualized fractional change

BM: bone parameter of interest (BMC or aBMD)

Age: age where derivative is taken

The AFC for a given site was then fitted with a spline against age, and the residuals of the fit were scored as a Z-score. If the absolute value of a Z-score was larger than 7, we examined the trace over time and removed the value that did not fit with the rest of that patient’s data as an extreme outlier. To be conservative, if a BMC value was removed, then the associated aBMD value was also removed and vice versa. If a femoral neck value was removed, then the associated total hip value was removed as well and vice versa. We believe that these extreme outliers represent measurement errors and not real changes in the measured individual’s skeleton.

Analytical methods

For the MCS2 approach, a regression was performed for each age/sex/race group, and the resultant regression coefficients for weight, height, sexual maturity, and body fat were placed in a matrix, where each column represents a vector of estimates for a variable’s effect by age on the BMC or aBMD of a specific measurement site. These column vectors were sorted by the variable’s primacy within the model. A Cholesky factorization transformed the matrix to an orthonormal space, where each independent column was smoothed in a nonparametric way and fitted with a smooth spline curve. The inverse Cholesky factorization was used to transform the smoothed spline estimates of the variables back into the original space, where new intercepts were computed and then smoothed.

Partitioning the fits by age/sex/race groups weights the coefficients by the sample size in that group. The transformation into the orthonormal space acts as a component factorization and breaks up the variables’ collinearity. Smoothing in the orthonormal space protects the fit from large parameter jumps from age to age. Recalculating the intercepts based on the smoothed coefficients minimizes the increase in the fit’s residuals. By its nature, the method includes interactions between age/sex/race and the other fitted parameters. Figure 1 shows a representative sample of original and smoothed coefficients.

Fig. 1
figure 1

A sample of coefficients smoothed via multivariate semi-metric smoothing. This example presents the weight coefficients for black males’ arm BMC

One of the goals of creating a linear model was to use as few variables as possible, including as few categories as possible for each categorical variable. Models were run with several variables, and comparisons between the models’ adjusted R 2 were used to determine which models were most explanatory. The adjusted R 2 reflects how well a model accounts for the variability in the data and to order the primacy of the explanatory variables. The adjusted R 2 will not spuriously inflate as extra variables are introduced into the model, making the adjusted R 2 a useful tool to compare models with different explanatory variables [11]. Note that, due to interactions, adding in a later variable may add more to the fit than a previous parameter because the variables may contain a significant interaction that would not be present in the first fit.

Continuous variables from those models were then run in subsets already including the categorical variables, where the difference in adjusted R 2 was used to establish which continuous variables were primary.

Any deviation from the initially computed least squares coefficients will necessarily reduce the adjusted R 2 value and increase error estimates. The tradeoff in applying MCS2 is that the coefficients become smoothly connected over age. Adjusted R 2 was used to compare smoothed and non-smoothed models.

Results

Model creation

While not exactly collinear, age, weight, height, and sexual maturity were highly correlated. When two of these were present, the others became relatively unimportant. Table 1 lists the relative order of importance once previous parameters were already included in the fit. For BMC, the more important variables were age, height, weight, and body fat. Height and weight appeared roughly of equal importance. Sex, race, and sexual maturity appeared less important once other factors were already in the model. The factor ordering for aBMD was not as consistent as the factor ordering for BMC, although sexual maturity was consistently among the least important parameters, and weight was generally more important than height.

Table 1 Order of importance of parameters for fitting BMC and aBMD

In a practical sense, sexual maturity and body fat require additional effort to be obtained. The addition of sexual maturity to the model containing sex, age, height, and weight did not produce a dramatic increase in adjusted R 2, never more than 0.5 %. Thus, sexual maturity was dropped as has been recommended by other researchers [12]. Since body fat, acting as a surrogate for the lean body mass value [13, 14], was usually one of the more important variables but requires a total body scan, which may not always be feasible, we present coefficients for two models, one with body fat and one without.

Where BM is BMC or aBMD, model A can be presented as follows:

$$ B{M}_{ijk}\kern0.5em =\kern0.5em {\overline{BM}}_{ijk}\kern0.5em +\kern0.5em {\alpha}_{ijk}\kern0.5em \times \kern0.5em \mathrm{Weight}\kern0.5em +\kern0.5em {\beta}_{ijk}\kern0.5em \times \kern0.5em \mathrm{Height}\kern0.5em +\kern0.5em {\varepsilon}_{ijk} $$
(2)

α, β: smoothed by age

ε: error

i: 16 age groups (5–20 years)

j: two sexes (male/female)

k: two race groups (black/non-black)

and model B as follows:

$$ B{M}_{ijk}\kern0.5em =\kern0.5em {\overline{BM}}_{ijk}\kern0.5em +\kern0.5em {\alpha}_{ijk}\kern0.5em \times \mathrm{Weight}\kern0.5em +\kern0.5em {\beta}_{ijk}\kern0.5em \times \kern0.5em \mathrm{Height}\kern0.5em +\kern0.5em {\gamma}_{ijk}\kern0.5em \times \kern0.5em \mathrm{Body}\kern0.2em \mathrm{Fat}\kern0.5em +\kern0.5em {\varepsilon}_{ijk} $$
(3)

α, β, γ: smoothed by age

ε: error

i: 16 age groups (5–20 years)

j: two sexes (male/female)

k: two race groups (black/non-black)

These models will contain interactions between the continuous variables weight, height, and body fat and the categorical variables of age, sex, and race.

The nature of MCS2 demands that we order the continuous variables before we transform them into an orthogonal space. Table 2 presents this parameter ordering. Note that when age, race, and sex are already in the model for aBMD, weight always explains more than height, and when body fat is added, this parameter ranks consistently above height. For BMC, the ordering is less consistent. Also note that when age, race, and sex are already in the model, the relative importance of height and weight often swaps order compared to the ordering presented in Table 1, where the categorical variables have not already been included.

Table 2 Order of importance of continuous variables for fitting BMC and aBMD

MCS2 model comparison

Table 3 shows consistently higher adjusted R 2 values for BMC than aBMD. Also, model B, which adds body fat to weight and height, improves the R 2 value. A small decrease in R 2 is induced by the smoothing process. The aBMD at the femoral neck showed a lower R 2 value than the aBMD at the other sites, and the whole-body BMC sites showed the highest R 2 values of all the measurements of bone mass.

Table 3 Adjusted R 2 for BMC and aBMD using original and smoothed models

The smoothed coefficients, estimating whole-body-less-head aBMD for model A, are presented in Table 4 and for model B in Table 5. The creation of Z-scores is straightforward as follows:

Table 4 Model A MCS2 coefficients and root-mean-square errors (RMSE) for whole-body-less-head aBMD for blacks
Table 5 Model A MCS2 coefficients and RMSE for whole-body-less-head aBMD for non-blacks
$$ Z\kern0.5em =\kern0.5em \left(\mathrm{Measured}\ BM\kern0.5em -\kern0.5em B{M}_{ijk}\right)/{\mathrm{RMSE}}_{ijk} $$
(4)

For an 8-year old, non-black male with a weight of 25.1 kg, a height of 131.3 cm, and a measured body fat of 17.2 %, using Eq. 3 and Table  7, the expected whole-body-less-head aBMDijk is

Table 6 Model B MCS2 coefficients and RMSE for whole-body-less-head aBMD for blacks
$$ 0.402520\kern0.5em +\kern0.5em 0.012276\kern0.5em *\kern0.5em 25.1\kern0.5em -\kern0.5em 0.000093\kern0.5em *\kern0.5em 131.3\kern0.5em -\kern0.5em 0.004233\kern0.5em *\kern0.5em 17.2\kern0.5em =\kern0.5em 0.6256\kern0.5em \mathrm{g}\;/\;{\mathrm{cm}}^2 $$
(5)

If the individual has a measured whole-body-less-head aBMD of 0.5970 g/cm2, using Eq. 4 and Table 7, the Z-score is

Table 7 Model B MCS2 coefficients and RMSE for whole-body-less-head aBMD for non-blacks
$$ \left(0.5970\kern0.5em \hbox{--} \kern0.5em 0.6256\right)\kern0.5em /\kern0.5em 0.033553\kern0.5em =\kern0.5em -0.852 $$
(6)

The coefficients for the spine BMC/aBMD, whole-body BMC/aBMD, femoral neck BMC/aBMD, total hip BMC/aBMD and forearm BMC/aBMD are available in the online appendix. As an example, average whole-body-less-head BMC values of model B for non-black females at the cohort average height, weight, and body fat values are presented in Fig. 2.

Fig. 2
figure 2

Whole-body-less-head BMC values for non-black females at BMDCS cohort means for weight, height, and body fat by age. The curves show mean, ±1 and ±2 standard deviations

Normality

When applying the general linear model, one of the primary assumptions is that the errors in the data are normal and that normality will show up in the residuals once the fit is applied. For each of our 6 sites, we have 2 measurements, BMC and aBMD, for 2 sexes, and 2 races at 16 ages for a total of 768 groups. Application of a sequential Bonferroni test to the results of these normality tests for each group points to a few areas of concern [15]. In model B, 6-year-old black females’ hips fail the normality test for femoral neck BMC, femoral neck aBMD, and total hip BMC. There is a single individual visit in common for all of these groups where an outlier is responsible for the non-normality. In model A, 7 of the total of 768 groups fail the normality test. Again, in each of these groups, there is a single outlying point that creates non-normality. Since MCS2 smoothing will naturally smooth rough estimates to create a connected response surface across age for each sex/race group, a few groups out of 768 showing non-normality due to a single point per group is not too alarming.

Comparison between MCS2, LMS, and height-adjusted LMS

Z-scores formed from the MCS2 coefficients will adjust for anthropometric values. Table 8 shows the relative overlap between these values and previous models’ Z-scores. From the numbers in Table 8, it becomes clear that the normal anthropometric Z-scores formed by the MCS2 method identify a distinctly different subset of the population than the standard Z-scores. There is some overlap of the two groups, but the majority of those identified as having low Z-scores by the LMS method have a normal Z-score by the anthropometric method. There is somewhat better concordance between the height-adjusted LMS Z-scores and the anthropometric Z-scores of model A, but the discrepancy increases when body fat is included in model B.

Table 8 Percentage of individuals who had a Z-score < −2 using prior models [3] for BMC and aBMD and also have an anthropometric Z-score < −2

Discussion

The BMC, aBMD, and anthropometric data from the BMDCS represent the type of data for which the MCS2 was designed because the predictors are highly correlated and there is wide variation in the accuracy of the coefficients due to the fluctuations in the group’s sample sizes. Moreover, it is desirable that the regression coefficients at adjacent ages are reasonably similar. Data sets with these characteristics were the motivating force behind the development of MCS2 [5, 6].

The BMDCS data contain consistent anthropometric information as well as bone measurements for all common sites. This makes the simultaneous development of models spanning the various measurement sites much more consistent than using data from studies containing fewer sites and less complete anthropometric information. The large size of the BMDCS dataset gives us confidence that the models reliably describe the sampled population.

The adjusted R 2 values for aBMD at the femoral neck and total hip appear lower than the adjusted R 2 values for other measurement sites, whereas the adjusted R 2 values for BMC are in the same range as the values from other sites. This may be due to positioning problems, particularly rotation of the femur, causing variability in the femoral measurement since the femoral site is not fully developed in children and changes as they mature. It may also be that density at those sites is affected more by other developmental stresses than those caused by anthropometric values.

Z-scores adjusted for anthropometric values should be usable in populations more diverse than the BMDCS sample. There is great potential for models containing measurements of size and body fat outside the current ethnic groups and thus throughout the world. Also, if bone-density values are related to anthropometric values that are changing in the population over time, then anthropometrically corrected Z-scores may be the only way to maintain relevance to anthropometric values in a population that may be deviating from its original sample. Short of conducting another multi-year study to adjust the normal curves, such shifts in body size have been typically dealt with through periodic adjustments, as performed in the CDC growth charts [16, 17].

If a measurement for body fat is available, then model B should be used. Otherwise, model A is a suitable alternative. Because the patient database usually contains both weight and height, model A can be used retroactively. The need for a whole-body scan to determine body fat makes the application of model B more restricted. It is important that the percentage of body fat is determined by using the method used in the BMDCS, as using other methods will alter the percentage of body-fat results and lead to misleading model results.

Without a relative-risk fracture study, the purpose of any anthropometric adjustment is to understand the etiology of low bone mass (e.g., short stature for age or small bone area), not to assist in the prediction of fractures [18]. Whereas anthropometric models will have narrower distributions, it is clear that anthropometric Z-scores would be meant to augment standard Z-scores and not replace them. For example, in the case of an anorexic individual, who shows a low aBMD via the standard Z-scores but a normal bone density via the anthropometrically corrected Z-score, the interpretation may be that the individual’s aBMD is low and that treating the underlying eating disorder may be warranted rather than attempting to treat the bone directly.

Given the goal of augmenting standard Z-scores with anthropometric information, the models including the most anthropometric information would seem most desirable. The difference between a standard Z-score and any anthropometrically adjusted Z-score informs the clinician about the degree to which the standard Z-score is affected by the extra measurements included in the anthropometric Z-score. Table 6 shows that standard Z-scores, height-adjusted Z-scores, weight-height-adjusted Z-scores, and weight-height-fat-adjusted Z-scores all identify different segments of the population as critical. The more anthropometric information is included in the adjusted Z-score, the more the critical groups diverge. Using the models in sequence LMS | height-adjusted LMS | model A (weight and height) | model B (weight, height, and fat) increases the amount of information available to a clinician, who can then decide which pieces of information are most useful in a specific case.

In summary, we have derived new models for aBMD and BMC for the pediatric population by including anthropometric parameters that are shown to have a major influence on the models and that are sufficiently practical to obtain. The dataset was of adequate size to guarantee representative models for most age groups, although the number of observations available at the lower and upper age range somewhat diminished the reliability of the models at those ages. The models were not connected to existing data for adults, and such adjustments could prove beneficial, particularly in following BMC and aBMD changes from the pediatric into the adult age range.