Abstract
Background
Determination of bone age is routinely used for following up substitution therapy in congenital adrenal hyperplasia (CAH) but today is a procedure with significant subjectivity.
Objective
The aim was to test the performance of automatic bone age rating by the BoneXpert software package in all radiographs of children with CAH seen at our clinic from 1975 to 2006.
Materials and methods
Eight hundred and ninety-two left-hand radiographs from 100 children aged 0 to 17 years were presented to a human rater and BoneXpert for bone age rating. Images where ratings differed by more than 1.5 years were each rerated by four human raters.
Results
Rerating was necessary in 20 images and the rerating result was closer to the BoneXpert result than to the original manual rating in 18/20 (90 %). Bone age rating precision based on the smoothness of longitudinal curves comprising a total of 327 data triplets spanning less than 1.7 years showed BoneXpert to be more precise (P<0.001).
Conclusion
BoneXpert performs reliable bone age ratings in children with CAH.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Congenital adrenal hyperplasia (CAH) is the generic term for a group of autosomal recessive metabolic diseases involving a deficiency of any of five enzymes responsible for the synthesis of cortisol in the adrenal gland. The most frequent form of CAH, accounting for more than 90 % of all cases, is attributable to a deficiency of the 21-hydrolase enzyme [1]. Life-endangering “salt-wasting syndrome” can ensue if both cortisol and aldosterone synthesis are affected. The variant, which only affects cortisol synthesis, is referred to as “simple virilizing CAH.” It manifests as virilization of variable degree in girls and precocious puberty in boys. Milder, non-classic forms of CAH often remain undetected in males and lead to hirsuitism and impaired fertility in females [2]. The worldwide incidence of CAH has been estimated at 1 in 16,000 births [3].
If left untreated, the overproduction of androgens in CAH also leads to accelerated gain in height. However, this effect is overlaid with accelerated maturation of the skeleton, resulting in short adult stature in most children [4–8]. Therefore, besides avoiding an Addisonian crisis, attainment of normal adult stature is one of the primary goals in the treatment of CAH. This primarily consists of providing the patient with enough glucocorticoids and mineralocorticoids to suppress ACTH-mediated excess androgen production. Finding the correct dosage poses a challenge because even slight overdoses of glucocorticoids will curb growth over time [6, 8–11]. A meta-analysis of 18 studies carried out from 1977 to 1997 found a mean adult height following glucocorticoid treatment of −1.37 SD below the respective population mean [10]. Experimental therapeutic approaches include addition of growth hormone [12] or peripheral inhibition of androgen activity and estrogen production.
Treatment of children with CAH involves regular measurement of 17-hydroxyprogesterone as well as bone age and height determination [1]. Some authors consider bone age to be the most useful follow-up parameter after body height, partly due to the fact that 17-hydroxyprogesterone responds quickly to medication, thus giving no indication of a patient’s long-term compliance.
Rating bone age in CAH can require greater than average expertise, especially when bone age is extremely advanced. In times of demands for higher productivity in radiology, radiologic expertise is becoming an increasingly precious resource and the advance of computerized bone age rating methods is therefore welcomed by many as a potential means of relieving radiologists from analyzing large numbers of unremarkable radiographs. The CE (European Conformity)-marked medical device used for this purpose is BoneXpert (Visiana, Holte, Denmark), a software package that can be used on one image at a time in daily routine or, alternatively, to analyze large quantities of hand radiographs in an unattended batch job and whose rating reliability in healthy populations has been demonstrated [13–16]. Furthermore, the automated method has been successfully used for children with short stature of various diagnoses [17] or with central precocious puberty [18]. However, it has not to our knowledge been validated in children exposed to androgens since early gestation, such as in the case of children with CAH. Further, our cohort of children with CAH includes some children with extremely advanced bone age due to the fact that they had emigrated from countries where screening for CAH was not institutionalized at the time of their birth. These children are particularly challenging in terms of management, starting with their bone age reading. We were interested to test whether the automated method could cope with radiographs from such children.
The purpose of this study was to determine how automated bone age performs on radiographs of children with CAH compared with human bone age raters.
Materials and methods
Eight hundred and ninety-two left-hand radiographs from 100 children and adolescents aged 0 to 17 years with a diagnosis of CAH who had been treated at our clinic during the period from January 1, 1975, to December 31, 2006, were included in the study. If the films were not already available in digital form as DICOM images, they were scanned with a Vidar Diagnostic Pro Advantage scanner (Vidar, Hemdon, VA). The study was approved by the Tübingen University Hospital Ethics Committee.
Automatic bone age rating was performed using BoneXpert version 1.0 (Visiana, Holte, Denmark, www.BoneXpert.com). BoneXpert calculates bone age based on the shape and appearance of 13 bones of the hand: the phalanges and metacarpals of the first, third and fifth ray and the radius and ulna. The program rejects a bone if its shape is abnormal or if its bone age deviates by more than 2.4 years from the mean of all bones. Furthermore, if fewer than 8 bones are accepted, the entire radiograph is rejected for bone age analysis. The intended Greulich-Pyle bone age range for the automatic rating in its current version is 2 to 15 years for girls, and 2.5 to 17 years for boys. A more detailed description of the calculation methods employed by BoneXpert has been published elsewhere [17, 19].
In 35 examinations, the image was available both as a DICOM file and as a printout on film of the DICOM image. A comparison of the ratings performed by automated bone age on the DICOM files and on the scanned film-printouts yielded differences no greater than 0.5 years, of which 18 were greater than 0.2 years. The mean difference was 0.0 years, i.e. there was no trend for the scanned films to yield older or younger bone age values than the DICOM images. The DICOM file was used whenever one was available.
Validity
Since an objective measure of bone age does not exist, we must resort to various indirect ways of assessing the validity (accuracy) of a new rating method. Thodberg et al. [15] circumvented the problem of the lack of an objective bone age by judging the accuracy of bone age rating on the basis of its ability to predict an objective parameter to which it is related, namely final height. However, this approach is not recommendable in children whose final height may be affected by subsequent treatment. In this paper, we look for deviations from the results obtained with the manual method in terms of bias and standard deviation. This is an example of analysis of agreement between two measurements, so it was natural to use Bland-Altman plots and – in accordance with this concept – to use the standard deviation rather than the correlation. It was tested whether the bias is significantly different from zero (a qualitative result), while the standard deviation was reported as a quantitative endpoint.
Notice that the standard deviation of the differences between the two methods is usually larger than the standard deviation from the line of fit because it ignores the bone age-related slope between the two methods in a Bland-Altman plot.
An additional, quantitative test was included as follows: Images for which BoneXpert bone age (BXBA) and manual bone age (ManBA) differed by more than 1.5 years were rerated by four experienced raters. Rerating was done without knowledge of ManBA, BXBA and chronological age. The mean of the four bone age values, referred to as the ReferenceBA, was compared anew with BXBA and ManBA. The same approach has been used previsouly [17, 18]. ReferenceBA acts as a secondary and very reliable outcome. The performance of BXBA and ManBA in terms of their deviation from ReferenceBA was calculated using the 1-sample proportions test with continuity correction, taking into account that only observations from different subjects were statistically independent.
Precision
By contrast, there is no need for an objective measure of bone age to determine the precision of a new bone age rating method, i.e. its ability to generate reproducible results. One way to determine this is to study the smoothness of longitudinal curves obtained. In the present study, the precision of automatic and manual rating was assessed in terms of the smoothness of longitudinal curves using the triplet method [19]. This involves breaking down individual longitudinal bone age series with n bone age measurements into n – 2 triplets of consecutive bone age measurements and considering the residual between the middle bone age and the linear interpolation between the two measurements on either side, for each triplet. The triplet method assumes that the three bone age measurements lie on a straight line as a function of age if there is no precision error. Hence, any deviation from the line is interpreted as due to precision error of the bone age method. Since in reality many more factors lead to a deviation from a straight line, the triplet method yields an upper limit of the true precision. To improve the estimate, we only considered triplets spanning less than 1.7 years. This left us with 327 triplets out of the original 604 (54 %). The result of the precision analysis is both quantitative and qualitative. The quantitative result was the estimated value of the precision of the automated and manual methods given with confidence intervals. In addition, we compared the precision of manual and automated methods and tested whether they were significantly different, i.e. a qualitative result.Footnote 1
To further illustrate the behaviour and precision of automated bone age compared with manual bone age rating in children with extremely deviant bone age, we plotted the longitudinal course of three children whose skeletal maturity was particularly advanced. For this we selected, from the subset of children of whom at least 12 images were available, the three with the most advanced bone age (mean of ManBA and BXBA minus CA) at their first visit.
Statistical calculations were performed using the JMP 9 software package (SAS Institute Inc., Cary, NC) and version 2.12.1 of the R statistics software (www.r-project.org).
Results
Analysis of rejected images
One hundred sixteen of the 892 images were rejected by BoneXpert. In 111 of these, the rejection was due to bone age being below BoneXpert’s specified rating range, i.e. ManBA below 2.0 years in girls or below 2.5 years in boys. For the remaining five images, three were due to poor image quality, one to improper scanning, and one remains unexplained and thus represents the inefficiency of the automated method.
Comparison between ManBA and BXBA
For the 776 images (480 from girls, 296 from boys; Table 1) analyzed by automated bone age, the mean difference BXBA – ManBA was −0.02 years (N.S.), the slope of the line of fit in a Bland-Altman plot was negligible at −0.02 years/year and the standard deviation (SD) of the signed differences was 0.72 years. The mean of the absolute (unsigned) differences was 0.54 years (SD 0.40 years). In 20 images, the absolute difference between BXBA and ManBA was greater than 1.5 years (Fig. 1).
These images were submitted for blind rerating by four raters (two radiologists and two pediatric endocrinologists – all having between 10 and 30 years of practice in bone age rating). None of the rerated images showed an absolute difference between ReferenceBA and BXBA greater than 1.5 years. ReferenceBA was closer to ManBA in 2 images and closer to BXBA in 18 (Table 2). To estimate the statistical significance of this observed advantage of BXBA, we note that the 20 rerated images are from ten children, so we have ten rather than 20 independent observations, and the two cases where ManBA is better than BXBA occur in children with three visits, so for these two children, ManBA was better than BXBA in one-third of the visits. Thus, we have observed BXBA to be better than ManBA in 9.33/10 independent cases (93 %). We now take as null hypothesis that ManBA and BX are equally close to the ReferenceBA and a proportion test then shows that BXBA is closer to ReferenceBA than is ManBA with P=0.02. (The test for the proportion 1 in 10 gives P=0.027 and for 0 in 10 gives P=0.004, and the quoted p-value is the interpolation to the proportion 0.67 in 10.) Notice also that in this computation of statistical significance it is irrelevant how many images and subjects there were in the total study, prior to selecting the disputed cases.
Our analysis of bone age rating precision based on the smoothness of longitudinal curves comprised a total of 327 data triplets spanning less than 1.7 years for both ManBA and BXBA. The following precision results were obtained: ManBA: 0.32 years (95 % CI: 0.29–0.35); BXBA: 0.21 years (95 % CI: 0.19–0.23). This indicates a significant difference in precision (P<0.001).
Figure 2 shows the longitudinal BXBA and ManBA curves of three children who had been selected for their extreme skeletal prematurity as described above (mean bone age advancement: 3.1 ± 3.0 years). None of the 20 images with a bone age discrepancy greater than 1.5 between ManBA and BXBA was from any of these 3 children. The SD of the signed differences between BXBA and ManBA for these three curves was 0.63 years. A comparison of the smoothness of the longitudinal curves generated by ManBA and BXBA for these 3 children (33 triplets spanning less than 1.7 years) yielded precision values of 0.24 years (95 % CI: 0.17–0.32) for ManBA and 0.16 (95 % CI: 0.12–0.23) years for BXBA (N.S.).
The difference between automatic and the original manual bone age rating as a function of bone age advancement (Fig. 3) shows a slope of 0.1 years/year (x-intercept = +0.7 years). This means that BoneXpert tends to produce slightly lower ratings than the manual rating with increasing bone age advancement.
Discussion
Rejected images – BoneXpert’s behaviour in the low bone age range
BoneXpert’s inability to rate images below bone age 2 in girls and 2.5 in boys was a greater limitation in this study than it had been in other clinical studies on BoneXpert [17, 18]. In contrast to those studies, the majority of children in the present study were regularly monitored for bone age from birth, which is when CAH is usually diagnosed. Thus, in 111 of the 116 images rejected this was due to low bone age, and the overall rejection rate was greater than 10 %. In a study on short stature and in another on central precocious puberty, the rejection rate was less than 1.5 % (14/1,097 and 9/732, respectively) [17, 18]. Extending BoneXpert’s application range further towards birth may be a worthwhile project in view of the present findings.
Interobserver error between automated and manual ratings
At 0.72 years, the standard deviation of the signed differences between the ratings of BoneXpert and ManBA was in the same range as in analogous studies on other pathologies [17, 18] where standard deviations ranged from 0.71 to 0.8 years. The mean of the absolute differences (0.54 years) compares favourably with the levels of interobserver error between human raters reported in the literature. Berst et al. [20] give 0.69 ± 0.48 years for the mean of the absolute differences between two trained observers performing bone age ratings on 107 radiographs in awareness of CA. Expressed in terms of the SD of the signed differences and assuming a normal distribution, this equates to 0.69/0.8 = 0.86 years. King et al. [21] reported bone age readings of 50 radiographs performed by each of three raters where the SD of the signed differences was 0.80 years [20].
Rerating results: implications for automated and manual rating
In 20 images, the discrepancy between manual and automatic rating was greater than 1.5 years. These were each blindly rerated by four independent raters. The mean of these four ratings (ReferenceBA) deviated by less than 1.5 years from BXBA for all images.
We can conclude from our results that the original discrepancies in the 20 rerated images were due more to manual errors than to errors in automatic rating. This is indicated by 18 reratings being closest to the automatic rating and 2 closest to the original manual rating, as well as by the smaller mean absolute difference between BXBA and ReferenceBA as compared to that between ManBA and ReferenceBA. If ReferenceBA is assumed to represent the “true bone age,” then BXBA was significantly more accurate than ManBA (P=0.02) in rating these 20 images. This is supplemented by the outcome of our comparison of the smoothness of longitudinal curves, which clearly showed the automatic rating method to be more precise, despite the fact that the automated method rated the images independently whereas the original manual raters usually had the previous rating and the age of the child available. This tends to enhance smoothness and improve the precision result derived for manual rating.
Performance of BoneXpert in children with extremely advanced bone age
Our comparison of ratings performed on children with far advanced bone age suggests that this is not a specific source of discrepancy between manual and automatic rating in CAH, since the SD of the signed differences between ManBA and BXBA for this subgroup was even smaller – albeit nonsignificantly – than they were for all children taken together.
Conclusion
BoneXpert supplies satisfactory bone age ratings in children with CAH within its designated bone age application range (bone age: 2–15 years for girls, 2.5–17 years for boys). The high rate of image rejection found in younger children underscores the need to extend the programme to infants.
Notes
The triplets formed from a child are not statistically independent because each visit can participate in up to three triplets. The confidence interval (CI) of the estimated precision was therefore estimated by a Monte Carlo technique, which showed that the CIs are a factor of 1.3 times larger than one would derive by assuming that the triplets were independent
References
White PC, Speiser PW (2000) Congenital adrenal hyperplasia due to 21-hydroxylase deficiency. Endocr Rev 21:245–291
Koletzko B (2007) Kinder-und Jugendmedizin, Heidelberg: Springer Medizin Verlag, Heidelberg
Pang SY, Wallace MA, Hofman L et al (1998) Worldwide experience in newborn screening for classical congenital adrenal hyperplasia due to 21-hydroxylase deficiency. Pediatrics 81:866–874
David M, Sempe M, Blanc M et al (1994) Final height in 69 patients with congenital adrenal hyperplasia due to 21-hydroxylase deficiency. Arch Pediatr 1:363–367
Ghali I, David M, David L (1978) Linear growth and pubertal development in treated congenital adrenal hyperplasia due to 21-hydroxylase deficiency. Obstet Gynecol Surv 33:120–122
Jääskeläinen J, Voutilainen R (1997) Growth of patients with 21-hydroxylase deficiency: an analysis of the factors influencing adult height. Pediatr Res 41:30–33
Muirhead S, Sellers EAC, Guyda H (2002) Indicators of adult height outcome in classical 21-hydroxylase deficiency congenital adrenal hyperplasia. J Pediatr 141:247–252
Van der Kamp HJ, Otten BJ, Buitenweg N et al (2002) Longitudinal analysis of growth and puberty in 21-hydroxylase deficiency patients. Arch Dis Child 87:139–144
Bonfig W, Bechtold S, Schmidt H et al (2007) Reduced final height outcome in congenital adrenal hyperplasia under prednisone treatment: deceleration of growth velocity during puberty. J Clin Endocrinol Metabol 92:1635–1639
Eugster EA, DiMeglio LA, Wright JC et al (2001) Height outcome in congenital adrenal hyperplasia caused by 21-hydroxylase deficiency: a meta-analysis. J Pediatr 138:26–32
Hargitai G, Solyom J, Battelino T et al (2000) Growth patterns and final height in congenital adrenal hyperplasia due to classical 21-hydroxylase deficiency. Horm Res 55:161–171
Lin-Su K, Vogiatzi MG, Marshall I et al (2005) Treatment with growth hormone and luteinizing hormone releasing hormone analog improves final adult height in children with congenital adrenal hyperplasia. J Clin Endocrinol Metabol 90:3318–3325
Martin DD, Neuhof J, Jenni OG et al (2010) Automatic determination of left-and right-hand bone age in the First Zurich Longitudinal Study. Horm Res Paediatr 74:50–55
Thodberg HH, Jenni OG, Caflisch J et al (2009) Prediction of adult height based on automated determination of bone age. J Clin Endocrinol Metab 94:4868–4874
Thodberg HH, Neuhof J, Ranke MB et al (2010) Validation of bone age methods by their ability to predict adult height. Horm Res Paediatr 74:15–22
van Rijn RR, Lequin MH, Thodberg HH (2009) Automatic determination of Greulich and Pyle bone age in healthy Dutch children. Pediatr Radiol 39:591–597
Martin DD, Deusch D, Schweizer R et al (2009) Clinical application of automated Greulich-Pyle bone age determination in children with short stature. Pediatr Radiol 39:598–607
Martin DD, Meister K, Schweizer R et al (2008) Validation of automatic bone age rating in children with precocious and early puberty. J Pediatr Endocrinol Metab. 2011;24(11-12):1009–1014
Thodberg HH, Kreiborg S, Juul A et al (2009) The BoneXpert method for automated determination of skeletal maturity. IEEE Trans Med Imaging 28:52–66
Berst MJ, Dolan L, Bogdanowicz MM et al (2001) Effect of knowledge of chronologic age on the variability of pediatric bone age determined using the Greulich and Pyle standards. AJR Am J Roentgonol 176:507–510
King DG, Steventon DM, O’Sullivan MP et al (1994) Reproducibility of bone ages when performed by radiology registrars: an audit of Tanner and Whitehouse II versus Greulich and Pyle methods. Br J Radiol 67:848–851
Conflicts of interest
None.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Martin, D.D., Heil, K., Heckmann, C. et al. Validation of automatic bone age determination in children with congenital adrenal hyperplasia. Pediatr Radiol 43, 1615–1621 (2013). https://doi.org/10.1007/s00247-013-2744-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00247-013-2744-8