Introduction

Chronic disease is one of the most potentially harmful impacts on childhood bone health. The detrimental factors involved in chronic diseases are often multifactorial involving poor nutrition, diminished physical activity, disordered puberty, inflammatory cytokine exposure and/or medication. These factors lead to reduced bone mineralisation and poor growth.

As such, it is important to establish a technique that can measure the extent to which bone strength has been compromised by a chronic condition and to estimate the child's risk of low trauma fracture. Additionally, this technique should also be able to identify those children with fractures who have underlying bone fragility.

Children with chronic diseases often have substantially reduced stature compared with their age-matched peers and the use of dual-energy X-ray absorptiometry (DXA) is limited in this instance [14]. This is because the measurement of areal bone mineral density (BMDa) relies on the two-dimensional areal projection of a three-dimensional object (bone) and is therefore intrinsically related to the size of the bone which is inherently related to the size of the child. Thus, DXA systematically overestimates bone density of a tall child and underestimates bone density in a short child [2]. Failure to account for delayed growth and maturation is a common cause of misinterpretation of paediatric DXA results [5, 6].

A number of mathematical and statistical size adjustment techniques have been developed to account for the size limitation of DXA [716]. However, there is currently no consensus as to the best method of adjustment and, in particular, which of these approaches will best predict fracture, either in childhood or in later life. The aim of this study was to establish, in a cohort of children referred for bone densitometry studies, whether different size adjustment techniques improve the diagnostic capability of DXA.

Materials and methods

Subjects

The study population consisted of 570 children (273 girls, 297 boys) referred for bone density assessment by DXA at Birmingham Children's Hospital between August 2001 and September 2010. All subjects had lumbar spine and total body measurements performed using a GE Lunar Prodigy™ bone densitometer (software version 8.1). Prior to scanning, the child's height (to the nearest 0.1 cm) was measured using a wall-mounted stadiometer and weight (to the nearest 0.1 kg) was measured using hospital balance scales. All measurements were made with the children in light indoor clothes or a hospital gown, without shoes. At the time of the DXA scan, the children's parents or guardians completed a questionnaire regarding fracture history, physical activity, nutrition, medication and pubertal development [17]. Children aged over 10 years completed a self-assessment of pubertal status, using photographs of Tanner stages, as previously validated [18]. From the DXA output, values for L2–L4 and total body less head (TBLH) BMDa (in gram per square centimeter), bone mineral content (BMC) (in gram), bone area (in square centimeter) and lean body mass (LBM) (in gram) were obtained. These were used along with age, height and weight to calculate bone and body composition standard deviation scores (Z-scores) from UK-based reference data [19].

Size adjustment methods

The reference models from which the Z-scores were calculated ranged from a simple model of BMDa with no adjustment for body size to a complex three-level regression model with complete statistical adjustment for age, gender, bone and body size. In brief, the six models used were.

Bone density for age (BMDa)—unadjusted

BMDa is the standard unadjusted DXA output.

Volumetric bone density for age (BMAD)

Bone mineral apparent density or volumetric bone density is calculated by estimating the approximate depth of the bone. It is an accepted size adjustment technique [7, 8, 20], and along with BMDa for age, it has also been shown to be related to fracture risk in children [21, 22].

Bone mineral content for height

Adjusting BMC for height requires no assumptions about bone size and stature; it assumes a direct relationship between the two parameters.

Allometric approach

The allometric approach or ‘Mølgaard’ model provides a three-stage assessment to explain low bone mass. The model assesses height for age, bone area for height, and BMC for bone area. These three steps then correspond to three different causes of reduced bone mass: short stature, narrow bones and light bones, respectively [9]. Since having narrow bones or light bones predisposes to fracture, both size adjustment stages (bone area for height and BMC for bone area) were considered in the diagnostic test evaluation.

Mechanostat functional model

The mechanostat or functional model uses an alternative approach to size adjustment based on mechanical principles proposed by Harold Frost [23] and was first applied to peripheral quantitative computed tomography (pQCT) measures of bone strength and muscle mass [24]. The two-stage algorithm was extended for DXA using the assumptions that BMC could act as a surrogate for bone strength and LBM as a surrogate for muscle force [14, 15]. The two stages of assessment are (a) whether the child has sufficient muscle mass for height and (b) whether it has sufficient bone mass for muscle.

Multiple regression analysis three-stage approach

This model has been referred to as the complete model [25] as it uses statistical modelling to adjust for age, body and bone size. Although multiple regression prediction models have been proposed in several different formats [10, 11, 26], the most comprehensive and logical diagnostic approach was published by Horlick and colleagues [13] as a three-level model. The proposed algorithm systematically adjusts for gender, ethnicity and age, then body size and finally bone size, from which eight different diagnostic categories can be established. For the purpose of this study, children were classified as having low bone mass if their Z-scores were either low or normal at level 1 (BMC for age and gender) and low at level 2 (BMC for age, gender, weight and height) and at level 3 (BMC for age, gender, weight, height and bone area).

Standard analysis of DXA scans was performed and the measured values were used to calculate age- and size-adjusted Z-scores for all six approaches, using an automated reporting spreadsheet.

Diagnostic test analysis

Low trauma fractures are considered the ultimate consequence of bone fragility and studies in the aged population have demonstrated that one or more fragility fracture increases the probability of a subsequent fracture [27]. Similarly, a previous fracture is the greatest predictor of subsequent fractures in children [28]. Therefore, the reference point in this study was taken as the documented existence of a low trauma fracture in the 12-month period prior to the DXA scan.

Fracture status was established for all 570 children either from medical records or consultant, radiograph or parent validation. Number, site (either long bone or vertebral) and timing of all fractures were recorded. Children were grouped as: (a) fracture, at least one low trauma fracture in the 12 months preceding DXA scan, or (b) no recent history of fracture. Accordingly, Z-scores for each of the normalisation techniques were compared for diagnostic accuracy against the reference standard of low trauma fracture.

Confusion or error matrix

To assess diagnostic accuracy of the size adjustment techniques, the binary prediction model was used, which utilises a 2 × 2 table known as the confusion or error matrix. The confusion matrix cross-tabulates the number of observed and predicted positive and negative outcomes. In the case of continuous variables such as Z-scores, the data must be dichotomised in order to construct the error matrix. The threshold criterion to dichotomise the Z-scores was set at −2.0 SDs below the mean predicted value for each of the techniques, as recommended by the current International Society of Clinical Densitometry (ISCD) guidelines for reporting DXA results in children [3].

Once established, the confusion matrix can be used to generate various diagnostic error and accuracy probabilities and ratios. Having established the test conditions and threshold criteria, error matrices were generated for each of the normalisation processes and used to calculate the error and accuracy probabilities and the relevant odds ratios. The area under the receiver operating characteristic curve (AUC) was calculated to test for significance for each of the diagnostic tests.

Statistical analysis

All statistical analyses were performed using SPSS version 17.0 (SPSS Inc., Chicago, IL) or Microsoft® Excel 2002. Data are presented as mean (SD) and graphical data as mean ± 95 % confidence interval, unless otherwise stated.

Results

Characteristics of the study population

Five hundred and seventy children were referred for a baseline DXA scan between August 2001 and September 2010. Of these 570, 120 were excluded from the analyses due to: poor scan quality (n = 13), age, height or bone area outside the UK dataset reference range (n = 88) or insufficient clinical information (n = 19) (Fig. 1).

Fig. 1
figure 1

Flow diagram of the inclusion and exclusion of children recruited into the diagnostic evaluation study

The remaining group consisted of 450 children (211 girls, 239 boys), with and without low trauma fractures, who could be categorised in to four groups, namely, osteogenesis imperfecta (OI) (n = 48), chronic conditions requiring long-term (≥6 months) corticosteroid treatment (n = 135), severely restricted mobility (n = 38) and other conditions associated with low bone mass or risk of fracture (other) (e.g. thalassaemia, anorexia nervosa) (n = 135). Their age and anthropometric characteristics are listed in Table 1. In the group as a whole, there were 181 children with reported low trauma fractures in the 12-month period prior to DXA scan, 33 with both long bone and vertebral fractures, 52 with vertebral fractures and 96 with long bone fractures. There were no significant differences between the fracture and non-fracture group for age, height Z-scores and weight Z-scores. However, children from all groups were significantly shorter than their healthy age-matched peers, with greatest reductions for immobile children (Table 1).

Table 1 Descriptive data of patients included for diagnostic test analysis

Any fracture analysis

Confusion matrices of the dichotomised Z-scores versus fracture status were generated with number of true and false positives and negatives for each of the size adjustment techniques. None of the adjustment techniques correctly classified all those with low trauma fractures as having low bone density or all those without fracture as normal, indicating that DXA is not a perfect test for the diagnosis of low trauma fracture. As a consequence of this, diagnostic test accuracy probabilities and odds were calculated from the confusion matrices and listed as means (95 % CI) in Tables 2 and 3.

Table 2 Diagnostic probabilities for any fracture for each of the size adjustment techniques for lumbar spine DXA measurements
Table 3 Diagnostic probabilities for any fracture for each of the size adjustment techniques for total body less head DXA measurement

Lumbar spine and TBLH BMDa for age were most sensitive for distinguishing those children with low trauma fractures from those without; however, they were also the least specific (Tables 2 and 3). All size adjustment techniques demonstrated greater specificity, but at the cost of lower sensitivity.

All adjustment techniques of DXA measurements increased the post-test probability of having a low trauma fracture, from a pre-test probability of 40 to 77 % at most (58–77 %), if a low Z-score for L2–L4 BMC for bone area was recorded. Although there were no statistically significant differences between the techniques on the whole, the increase in post-test probability was greater for lumbar spine than for TBLH measures.

Combining the likelihood ratio of a positive test with the likelihood ratio of a negative test to get an overall odds ratio or relative risk of fracture based on a low test result highlighted that the most accurate test with the greatest odds was L2–L4 bone mineral apparent density (BMAD) for age for the lumbar spine, and BMC for LBM adjusted for height for TBLH, with diagnostic odds ratio of 9.3 (5.8–14.9) and 6.5 (4.1–10.2), respectively. However, a non-significant trend was seen for higher odds for all size adjustment Z-scores. AUCs were significantly greater than 0.5 for all diagnostic tests, and in line with the odds ratios, the greatest values were reported for L2–L4 BMAD and BMC for LBM adjusted for height, for lumbar spine and TBLH, respectively.

Since no significant differences were observed between the different size adjustment techniques and test sensitivity was relatively poor for all techniques, further analysis was performed according to specific fracture type. The fracture group was partitioned according to whether the child had sustained a vertebral fracture with or without a long bone fracture or whether the child had sustained a long bone fracture only. The diagnostic odds and probabilities are presented in Supplementary Tables 4a, b for vertebral fracture and Supplementary Tables 5a, b for long bone fracture.

Vertebral fractures

Compared to any fracture, selecting vertebral fractures only increased the diagnostic test sensitivity for all size adjustment techniques. This increase was significant for L2-L4 BMDa and L2–L4 BMAD for age, L2–L4 BMC for bone area and L2–L4 BMC for bone and body size, where sensitivities increased from 59, 55, 44 and 46 % to 84, 80, 62 and 67 %, respectively. There were no significant differences between post-test probabilities for the lumbar spine and TBLH regions. The technique which discriminated best between vertebral fracture and no fracture was L2–L4 BMAD for age with odds of 30.7 to 1 (16.0–58.8).

Long bone fractures

In contrast to the analysis for vertebral fractures, long bone only fracture discrimination had poorer sensitivities, predictive probabilities and likelihood ratios for both lumbar spine and TBLH size adjustment techniques (data not shown). Consequently, the diagnostic odds ratios were reduced. The technique which discriminated best between long bone fracture and no fracture was TBLH BMC for LBM adjusted for height with odds of 5.2 to 1 (3.0–8.9).

The results presented include children with both primary and secondary osteoporoses. Since the children with osteogenesis imperfecta (a primary osteoporosis) will usually have more fractures than children with secondary osteoporosis, we repeated the analysis excluding the 48 children with OI. Although the diagnostic odds ratios were slightly reduced when compared to the groups as a whole, the overall results did not change significantly (results not shown). The differences in the diagnostic capabilities of DXA at predicting vertebral fracture versus long bone fragility fracture are highlighted in Fig. 2a, b.

Fig. 2
figure 2

Odds ratios (mean ± 95 % CI) for the lumbar spine (a) and TBLH (b) size adjustment techniques for DXA. The odds ratios represent the diagnostic accuracy of a DXA scan to confirm low bone mass in children with an existing low trauma fracture. Odds ratio for a long bone fracture are shown in light grey and for a vertebral fracture in dark grey. The greatest diagnostic odds for vertebral fracture are obtained from the lumbar spine scan with BMAD for age (a). The greatest diagnostic odds for long bone fractures are obtained from the total body scan with TBLH BMC for LBM adjusted for height having the greatest odds. However, there are no significant differences between any of the size adjustment techniques, either at the lumbar spine or the total body (b)

Discussion

This is the first study to compare diagnostic accuracy of DXA size adjustment techniques in children with a variety of chronic conditions suspected to be at risk of fracture. Different size adjustment techniques derived from a single reference population of healthy children were compared to a population of chronically ill children with and without fragility fracture.

The most sensitive but least specific size adjustment technique, based on detected low bone mass in the presence of a previous low trauma fracture in our high-prevalence population, was unadjusted lumbar spine BMDa for age. All size adjustment methods significantly improved diagnosis specificity, but the calculated differences in the accuracy and error, probabilities and odds were small, with no marked difference between any of the size adjustment techniques. Therefore, using any one of the size adjustments improves accuracy of DXA results.

Lumbar spine scanning resulted in the greatest diagnostic odds for predicting a vertebral fracture and TBLH scanning had the greatest odds for predicting a long bone fracture. Vertebral fractures are best predicted by L2–L4 BMAD for age and long bone fractures are best predicted by TBLH BMC for LBM adjusted for height suggesting that these may be the most suitable size adjustment techniques in clinical practice.

The main limitation of this study is the use of low-impact fracture as the reference point for defining osteoporosis. Contrary to the ISCD guidelines, not all children with osteoporosis will have a low trauma fracture and not all fractures will be a direct result of osteoporotic bones. Due to the retrospective design of our study, the circumstances and degree of trauma of reported fractures were not always recorded. However, studies have shown that prevalent fracture is considered a significant risk of further fracture, irrespective of the degree of trauma [29, 30]. Therefore, the impact of this limitation is not believed to have a significant effect on the overall results of this study. A further limitation is that no systematic search for vertebral fractures was made. However, any child presenting with back pain would have had further investigations such as plain radiography or spinal MRI, which should have identified any major vertebral changes. Unfortunately, for mild asymptomatic occult fractures, there may have been a degree of under-reporting.

A final consideration is that the diagnostic odds ratios presented here are only applicable in populations with a high prevalence of the disease, i.e. children referred for DXA already suspected to be at risk of osteoporosis. These results may not be transferable to large population studies of healthy children. However, it is likely that in a low prevalence population, the observed differences between size adjustment techniques would follow a similar pattern but with considerably lower odds.

To date, there have been relatively few studies comparing diagnostic accuracy of the different size adjustment techniques. Fewtrell and colleagues compared six different size adjustments of lumbar spine bone density measurements [25]. Similar to our results, all size adjustment techniques reduced the number of children classified as abnormal (Z-score of ≤−2), although there was a degree of misclassification between the results. They concluded that the simplest adjustments to put into routine clinical practice were lumbar spine BMAD and total body BMC for height. Unfortunately, they did not report on fractures or on specific chronic conditions, which may have an important role in any diagnostic threshold decision.

In a similar comparative study using a cohort of otherwise healthy children with upper limb fractures, Jones et al. also highlighted that BMAD had the strongest association with fracture risk [31]. However, they found no significant association with fracture for the ratio of BMC to LBM. The differences between the two studies could reflect the differences in the populations used and also the differences in the methods of adjustment. Jones et al. presented data on healthy children with fracture. In contrast, our study presents data on children with chronic conditions. The impact of disease on muscle and bone, growth and development may explain why adjusting for muscle mass and height resulted in a stronger association with fracture in our chronically diseased children compared with healthy children population studies.

Leonard and colleagues [16] compared different size adjustment techniques for whole body DXA with pQCT measures of cortical geometry and strength in healthy children. BMC and bone area for height showed the strongest correlation with pQCT measures for CSA and strength, with BMC for LBM or bone area having the lowest correlations. They argued that adjusting for LBM was not necessarily an appropriate method, an argument which was also supported by others [11]. In many disease states, LBM is low and these groups questioned the logic of adjusting one abnormal value with another. However, the mechanostat model approach provides the logic [14, 15, 23, 24] and adjustments for LBM thus may allude to the aetiology of reduced bone mass.

Post-test probabilities of a low trauma fractures in our study were significantly greater than pre-test probabilities demonstrating that using DXA is better than estimating the risk of fragility fracture without DXA. This confirms results from healthy population studies. The first prospective cohort study of 100 girls with fracture and 100 age-matched girls without fracture found that each 1 SD reduction in BMDa nearly doubled the risk of new fracture [21, 22]. A recent meta-analysis from paediatric population studies demonstrated an association between low BMDa and fracture, with a calculated −0.32 SD (−0.43 to −0.21) difference between fracture and no fracture [32]. In contrast, a study by Skaggs and colleagues [33], using computed tomography of the radius, found no difference in cancellous bone density levels between 50 girls with and without fracture. However, girls with fracture were heavier and had smaller cross-sectional bone size than girls without fracture, possibly alluding to the connection between the fall force and biomechanical resistance of the bone to fracture.

Confirmatory evidence on greater fracture risk with lower bone size for body size and lower bone mass for bone and body size has been recently published in two UK fracture studies [34, 35]. Clark and colleagues demonstrated, in a large prospective cohort of healthy children, that both size-adjusted BMC and bone area were reduced in the children who sustained a fracture with odds ratios of 1.89 and 1.51, respectively [34]. Manias and colleagues presented similar differences in size-adjusted BMC in children with incident or recurrent fracture(s) compared with healthy fracture-free controls [35].

The retrospective design of our study made it impossible to relate risk of new fracture with low bone mass or any of the size adjustment techniques. It is quite feasible that the reported relative risk of fracture, related to reductions in size-adjusted bone mass and bone area in otherwise healthy children, will also be applicable to those with chronic disease. However, interactions between chronic disease and bone growth, mineral accrual, body composition, pubertal maturation and therapeutic intervention may produce misleading results. As such, interpretation of the relative fracture risks associated with healthy children compared with the relative fracture risks in children with chronic diseases should be made with caution. An important consideration when estimating fracture risk both in adults and children is that bone density only reflects one component of bone strength. Bone mass, distribution and material properties also play a significant contribution in bone strength. This argument was further emphasised in a paper by Seeman [36] who suggested that DXA was a good starting point for estimating the risk of fracture but that the other factors such as the material and geometric qualities of bone may have a more significant role and that other measurement techniques may take the fore and enhance the ability to diagnose osteoporosis and identify those at risk.

In conclusion, vertebral fractures are best predicted by L2–L4 BMAD for age and long bone fractures are best predicted by TBLH BMC for LBM adjusted for height. Site-specific DXA size adjustments improve the diagnostic accuracy for predicting low trauma fractures. In a high-risk population, these techniques may be the most suitable size adjustment techniques to use in clinical practice. However, a DXA scan is only one part of a full clinical evaluation of a child at risk of osteoporosis and fracture, and as such, all aspects of fracture risk should be considered.