Introduction

Measurement of the cortical thickness in metacarpals from plain radiographs is a classical method for assessment of skeletal status [1], and it was revived by the introduction of digital X-ray radiogrammetry (DXR) [2]. Extensive literature has demonstrated its clinical usefulness for diagnosis of osteoporosis [3] and as a unique imaging biomarker for progression of rheumatoid arthritis [4]. However, its clinical use is still rather limited, and we believe it deserves a more widespread use.

The DXR method has also been implemented for children as part of the BoneXpert software-based medical device for automated bone age assessment [5]; the method determines the bone health index (BHI) computed from the cortical thickness T, bone width W and bone length L [6]. Whereas BMD is approximately equal to T, BHI is approximately equal to T/(LW)0.333; in other words, BHI makes an adjustment for the variable size of children.

The aim of this work is to introduce a new implementation of the DXR method for adults derived from the paediatric version, including the following changes relative to the existing adult DXR method:

  1. 1.

    The measurement region is defined relative to the ends of the metacarpals, and its size is scaled to the length of the bone, which appears to be more correct than using a fixed region size.

  2. 2.

    Four different indices of cortical bones are presented. This is done to facilitate future studies where each index can be compared to clinically relevant endpoints such as fracture risk. Reference curves for European Caucasians for the indices are presented, and the relative standard deviations of the indices are reported.

  3. 3.

    The transition from childhood to adulthood is assessed with longitudinal data.

  4. 4.

    More powerful image analysis is introduced, allowing the method to analyse a wider range of images and to be used without manual oversight, except for very few cases flagged for visual inspection.

  5. 5.

    The method is implemented as a local service for PACS users for a better workflow.

Subjects and methods

The Jena cohort

The main data for this work are the Jena cohort of healthy subjects of age 9–100 years, who obtained a posterior-anterior hand radiograph in the emergency room to rule out fractures after trauma in the period 2001–2005 in Jena, Germany; no additional radiographs were obtained for study purposes.

The data were originally collected to form reference curves for DXR-BMD [7], and we used the same exclusion criteria, i.e. we excluded patients with visible metallic objects, fractures and amputations, and based on a questionnaire, we also excluded subjects with endocrine, rheumatic, renal or oncological disorders, as well as subjects taking bone-influencing medication. Finally, we excluded non-Caucasian subjects and subjects below age 9, and we used only images of the non-dominant hand.

Some images were originally acquired on film (22 %), while others were captured digitally and printed out (78 %), so that all images were available on film, which were scanned in 300 dpi in 2005. We used the same digital images; however, we examined all images to check for any reduction of size which had occurred in the printing process. Typically, this can be inferred from a centimetre scale printed on the image. We found that 30 % of the images were reduced by a factor ranging from 0.63 to 0.83, and we corrected our analysis accordingly.

The image quality varies due to the different methods of acquiring the images. The image analysis method handles this by automatically adjusting the contrast and brightness by a linear transformation of the grey tones. The images have slightly different sharpness or blur. This affects in particular the detection of the outer boundary of the cortex, and the method corrects for this by detecting the steepness of the outer edge.

In total, 1662 images were used from the Jena study of which 57 % were from males.

The Björk study

The Björk longitudinal study enrolled healthy Danish children for orthodontic treatment at the Royal Dental College in Copenhagen in a study designed to provide information on craniofacial growth in relation to somatic growth [8]. The subjects were born on average in 1952, and X-rays of the non-dominant hand were taken annually typically from age 7 to 21 years, and many subjects also had X-rays taken at 25, 30 and 35 years. We included the 116 subjects (62 males), which had at least one X-ray after age 24 years—leading to 1269 X-rays, i.e. each subject had on average 10.9 images.

The image analysis

The image analysis method is based on mathematical models of the shape and appearance of the bones in the radiographs [9]. The models were derived by machine learning: we manually traced the contours of bones in 200 subjects, and an automated method [10] was used to derive a fixed number of marks on the contours located at anatomically corresponding locations across the subjects—for instance, mark 1 resides consistently at the distal tip of metacarpal 2. These marks represent the shape of the bones. The shapes observed in a group of subjects are aligned to the same size and orientation, and a statistical shape model is constructed by principal component analysis of the mark locations [11].

The model reconstructs the bones in new X-rays by placing the marks in a trade-off between what the local image texture indicates and what the shape model regards as plausible. The integration of the shape model in the bone reconstruction makes it efficient in finding the bones, and it ensures that it finds only objects of the expected shape. The restriction also makes the method fast by reducing the search space.

For the present analysis, we employed a shape model of metacarpals 2–5 with 20 principal components trained to cover adult bones with age range starting at 18 years for boys and 16 years for girls and extending to 90 years.

An advantage of reconstructing the entire contour of the metacarpals—rather than just the shafts as in the initial DXR method—is that it allows the method to assess whether the bones have been found reliably. If some parts of the bone contours are not “supported” by the presented image, it is a sign that the image quality is insufficient or that the algorithm has made an error. This enables the method to validate its own reconstruction.

A second advantage is that knowledge of the location of the ends can be used to define the region of interest (ROI) for the radiogrammetry, and thirdly, it makes the bone length L available for forming alternative cortical bone indices. As in the paediatric version, each ROI is centred at 44 % of the distance from the proximal to the distal end, and it extends 25 % of the bone length, as illustrated in Fig. 1a. The aim is to cover the same anatomical region across subjects.

Fig. 1
figure 1

Illustration of the image analysis. The first example (a) is a normal subject and was automatically validated. The next five examples were deemed “questionable” by the method, and they require manual validation. b Disturbing rod overlaid on the image. c Crumbled film. d Non-standard pose leading to overlapping metacarpals. e Very old subject (92 years) with clear sign of degeneration. f Rheumatoid arthritis patient with Larsen score 5

When the method reconstructs the bone contours, it also derives a so-called misfit, a measure of the deviation of the observed image from what is expected from the trained image model. We divide the outcome of the analysis into three classes based on the misfit:

  1. 1.

    Accepted images. If the misfit is less than a certain lower threshold, the reconstruction is so convincing that there is no need for human inspection to validate the result—the system has done that by itself. Virtually, all good-quality images of normal subjects fall in this group.

  2. 2.

    Questionable images. If the misfit falls above the lower threshold, but below a certain upper threshold, the reconstruction can be right or wrong, and without manual control, the measurement cannot be used safely for diagnosis, but if the reconstruction is inspected by an expert and found acceptable, the measurement can be used.

  3. 3.

    Rejected images. If the misfit is larger than the upper threshold, the image is rejected for further analysis.Footnote 1

The thresholds were derived empirically by applying this procedure on independent test data. The lower threshold was set at the highest value at which all accepted images in the test data were all reliably analysed. Similarly, the upper threshold was set at the highest value at which more than two thirds of the questionable images were found acceptable upon visual inspection.

The next step in the algorithm constructs the outer and inner boundaries of the cortices of the shafts in metacarpals 2–4 as paths along the steepest gradients and the maximum intensities, respectively, as previously described [12, 13]. These boundaries are validated by five automated controls.

  1. 1.

    The six average thicknesses in the six cortices must be compatible.

  2. 2.

    The grey scale variation across the medullar region must be above a certain minimum.

  3. 3.

    The outer edges must not be blurred.

  4. 4.

    The outer edges must not be too sharp (due to image postprocessing).

  5. 5.

    The noise in the soft tissue region between the metacarpals should not be too large.

The so-called radiogrammetry errors triggered by these controls are almost exclusively caused by poor image quality, and the image then cannot be used for diagnosis without manual validation; thus, these radiogrammetry errors are placed in the class “questionable images”, and only if an expert judges the analysis to be acceptable can the measurement be used for diagnosis.

For both the Jena and Björk studies, we used only images which were accepted automatically by the image analysis method, i.e. with misfit below the lower threshold. About 25 % of the original images in the Jena study fell for this requirement, mainly because the ends of the metacarpals were not inside the exposed area (something that is not a problem for the original DXR method, which only requires the shafts to be visible). We also rejected four images with radiogrammetry errors. This resulted in the 1662 images.

As for the Björk study, this was a research study, and the image protocol for recording the hand X-rays was strict and presents the hands in a perfect posterior-anterior position, and 97.5 % of the images were automatically accepted. Eight cases with radiogrammetry errors were excluded.

The four cortical bone indices

The DXR method measures the bone width W and the cortical thickness T in the shafts of metacarpals 2–4. Assuming a cylindrical shape of the shaft, the transverse cortical area is computed as A = πTW (1 − T / W).

When we want to use A as basis for assessing the skeletal status of a subject (the amount of bone), we are faced with the problem that subjects present with different heights and widths, and this affect A in a “trivial”, i.e. predictable, way. We therefore seek to divide the area by suitable powers of W and L to eliminate the variation of A, which is merely due to variations in height and width.

This idea was introduced in [6] in an analysis of paediatric data, and therein, it was hypothesised that the optimal index of the form A / (W a L b) is the one with the smallest relative standard deviation (RSD) in healthy subjects when studied in groups of a given gender and bone age. This is the same as demanding W and L to explain as much as possible of the A variation.

For children, the optimal index according to this principle was found to be the BHI:

$$ \mathrm{B}\mathrm{H}\mathrm{I}=A/\left({W}^{1.333}{L}^{0.333}\right) \approx T/\left({W}^{0.333}{L}^{0.333}\right) $$

The approximation holds if T << W (and it neglects the factor π). The mathematically inclined reader will appreciate that BHI is the exact geometric mean of the three previously proposed indices: the metacarpal index (MCI) [14], the Exton-Smith Index (ESI) [15] and the cortical volume per area (VPA) [2]:

$$ \begin{array}{l}\kern0.24em \mathrm{M}\mathrm{C}\mathrm{I}=A/{W}^2 \approx T/W\hfill \\ {}\mathrm{E}\mathrm{S}\mathrm{I}=A/\left(W\ L\right) \approx T/L\hfill \\ {}\kern0.24em \mathrm{V}\mathrm{P}\mathrm{A}=A/W \approx T\hfill \end{array} $$

VPA is the volume of cortical bone in the shaft divided by the area of its projection, so multiplying by the calcium density of compact bone (approx. 1.2 g/cm3 [16]) yields the areal BMD.

BHI was found to have RSD 7.5 % in children, while the indices MCI, ESI and VPA had about 8.5 % RSD.

It should be stressed that the “optimality” of the index with smallest RSD is based on a heuristic principle, i.e. there is no clinical evidence for optimality. So one should consider the RSD as a way to characterise an index, i.e. a “descriptive statistic”.

The local PACS service model

We propose to implement the software in the same way as the paediatric version: the hospital downloads the software and installs it on an existing computer, where it is configured as a service on the local DICOM network, i.e. the DXR analysis engine is available as a DICOM node—or server—and all PACS users can export a hand X-ray to the node, which then automatically analyses the image and sends back an annotated DICOM image to PACS, where it is automatically saved in the same “study” as the original image. The result of the analysis is written onto the image itself.

Since the original DICOM image contains the image resolution, the gender and the patient age, BHI and other indices can be computed with no further input, including SD scores for the gender and age.

This allows a workflow where the radiographer, immediately after acquiring the X-ray, stores the image in PACS and sends it to the DXR DICOM node. The radiologist, as well as the referring physician, can then look up the result in PACS with no delay. If the image is questionable or rejected, this is indicated on the image.

Results

Image acceptance and rejection

The Jena cohort contains about 25 % images where the metacarpals are not fully contained in the exposed area, and the method consistently assigns them a misfit above the lower threshold—i.e. they are labelled “questionable”, and they should be rejected, since not seeing the bone ends prevents a reliable determination of the bone length L. We ignore this large fraction of irrelevant images in the following discussion, because they are a clear deviation from the protocol for imaging of the metacarpals for this new DXR implementation.

The remaining questionable images constitute 3 % of the images in the Jena cohort (considering only males above 18 years and females above 16 years). Likewise, there are 2.5 % questionable images in the Björk study. It is interesting to look at the detailed nature of these cases. The intended use of the method is that they should be inspected manually in the clinical workflow to decide whether the BHI measurement is valid or not. We found five types of questionable images, as illustrated in Fig. 1b–f:

  1. 1.

    Foreign objects on top of the hand. Figure 1b shows a rod which is sometimes used in the Björk study to fix the hand in the correct pose. Other examples of this type are bandages or—in other data sets—implants.

  2. 2.

    Poor image quality. Figure 1c is an example from the Björk study where the film had disintegrated chemically or “crumbled”. Other examples of this type are artefacts from the scanning process and severe overexposure or underexposure.

  3. 3.

    Deviation from the standard hand pose. Some of the Jena cohort images did not adhere to the protocol that requires placing the hand flat on the detector. Sometimes, the hand was in a more relaxed pose, so that the metacarpals curved slightly towards the thumb and were closer, or even slightly overlapping, as shown in the example in Fig. 1d. Since the model was developed on hands in the standard pose, it found these cases less plausible, i.e. with larger misfit.

  4. 4.

    Old age. Above age 80 years, the Jena images often showed strong signs of degeneration, e.g. arthrosis, and this yielded increased misfit—an example is shown in Fig. 1e.

  5. 5.

    Rheumatoid arthritis. Figure 1f shows an example of a patient with advanced rheumatoid arthritis (Larsen score 5) from Jena, though excluded from the cohort due to this disorder.

We also inspected all the accepted cases (i.e. those with misfit below the lower threshold), and they were indeed all valid. Also, the system was in all cases able to automatically decide whether it was a right or a left hand.

Reference curves

The Jena data were used to form reference curves for the four indices and for T, W and L, for the age range 12–90 years for girls and 14–90 years for males. Figures 2 and 3 show the curves for BHI and W. The reference curves were computed as moving averages of the data, but above age 20 for males and 18 for females, we substituted this with a sixth-order polynomial fitted to the averages. Furthermore, above age 70 years, we used a fit of the data to a straight line—for W, this was used already from age 50 years. Reference curves for all four indices can be found in the supplementary material.

Fig. 2
figure 2

The bone health index (BHI) observed in the Jena cohort, versus age. The reference curves are superimposed

Fig. 3
figure 3

The average width of the shafts in metacarpals 2–4 observed in the Jena cohort, versus age. The reference curves are superimposed

We computed the root mean square (RMS) deviation of the data from the reference curve in intervals of age, and after smoothing, we arrive at SD values for each age to be used to compute the SD score (SDS) for a new case.

These reference curves are intended for males with bone age above 17 years and for females with bone age above 15 years. For children below these limits, it is recommended to use instead the previously published BHI reference curves versus bone age [6, 17].

Comparison of indices

To derive the relative SD (RSD) at fixed age and gender, we computed the SDs as the RMS deviations of the data from the reference curves in three age intervals, 20–40, 40–60 and 60–80 years, and the results are presented in Table 1.

Table 1 Relative SDs (in %) of four indices of cortical bone for subjects in the Jena cohort

We found that BHI had significantly smaller RSD than the two dimensionless indices MCI and ESI (p < 0.01), which in turn were slightly below VPA. RSD for BHI was 7.4 % for age 20–40 years, close to the 7.5 % found in children, and it grew to 9.7 % at ages 60–80 years.

The transition from childhood to adulthood

Figure 4 shows BHI for the Björk data, including also an average curve of these data, and we have superimposed the paediatric reference curve [6] and a moving average curve of the Jena data. The Björk subjects have about 1 % higher BHI than the Jena cohort at all ages. This is, however, not significant, since with 116 subjects and an RSD of 7.5 %, the difference would have to be at least 1.4 % to be significant. On the other hand, a 1 % difference in BHI level for the two populations would not be implausible. The main conclusion drawn from this figure is that the three studies—the paediatric, the adult and the Björk longitudinal “transition” study—were compatible.

Fig. 4
figure 4

The BHI observed in the Björk longitudinal study, versus age. The solid curve is the average curve of the 116 subject trajectories (each trajectory is formed by imputing the data by interpolation and by extrapolation based on the average curve). The dash-dotted curves represent the reference curve of BHI versus bone age from the Dutch Erasmus study, while the dashed curves are the moving averages of the Jena data shown in Fig. 2. The Erasmus reference curves are made versus bone age, and they pinpoint puberty much better: for boys, it shows a steeper ascent during puberty than the other two curves

Discussion

Comparison with the original DXR method

A main difference between the new and the original DXR methods is that the original DXR method centres the ROIs at the jointly narrowest point of the three shafts and it uses a fixed size of the ROIs. The new method centres the ROI relative to the bone ends (at 44 % from the proximal end) and scales the ROIs to 25 % of the bone length. Intuitively, this seems more correct anatomically, and it allows the method to generalise down to infants. It also seems more likely that the method finds the same ROI in a follow-up exam, e.g. a year later, where a slight rotation of a bone might change the location of the narrowest point.

The new method offers a choice between four different indices, which gives a flexibility in applications in several respects:

  1. 1.

    Firstly, if one wants to study the agreement between DXR and DXA measurements of bone density (mass per area), VPA is the index of choice, because it expresses cortical VPA.

  2. 2.

    Secondly, in cases where the images have unknown resolution, BHI and VPA cannot be used, but the dimensionless indices MCI and ESI are still valid, and the ROIs will still be sized correctly. To illustrate this, we show in Fig. 5 the DXR analysis of the X-ray of the Swiss anatomist Albert von Kölliger, who volunteered for this exposure during Wilhelm Conrad Röntgen’s first public presentation of X-rays, which took place in Würzburg on 23 January 1896. The analysis categorised the analysis as questionable—the algorithm could not by itself assert that the bones were found correctly, but Fig. 5 reveals that indeed they were. In addition, a warning that the image is too blurred was cast. MCI was determined to be 0.64. The MCI reference value at Von Kölliger’s age of 78.5 years is 0.493 with SD 0.062, so MCI SDS (i.e. Z score) is 2.4. In other words, this retrospective osteoporosis diagnosis shows a very healthy old man.Footnote 2

    Fig. 5
    figure 5

    X-ray of 78-year-old von Kölliger taken by Conrad Röntgen on 23 January 1896. The image was analysed with the DXR method yielding a metacarpal index of 0.64, which translates into a Z score of 2.4

  3. 3.

    Thirdly, when using the DXR method to assess incremental loss in cortical bone, e.g. for assessing progression of RA, the four bone indices are expected to perform equally well, since W and L change very little over a year. What matters is the precision, and all indices have more or less the same precision because it is dominated by the precision error on T.

A potential drawback of the new method could show up in cases where the joints of the bones are degenerated, in particular in very severe RA, as in the example in Fig. 1f, where a manual validation was necessary.

Normality of the Jena data

The Jena cohort data were previously used to form reference curves for BMD and MCI with the original DXR method, and the BMD values for the Jena cohort were approx. 5 % lower than for published curves for Danish, German and American Caucasians for women at peak bone mass and approx. 6 % lower than a published curve for German men [18]. It was speculated back then to be due to the recruitment of the Jena cohort from the emergency room; these subjects could be “weaker” than the average normal population.

However, in this reanalysis, we discovered that the films had been reduced by on average 6 %. The effect on BMD is approximately a 6 % decrease, and correcting for this brings the Jena DXR-BMD data in good agreement with the other Caucasian BMD reference curves. This indicates that the Jena data are indeed representative of the general healthy population, and the good agreement with the Björk data supports this further.

Our reference curve for men based on the Jena cohort is unique. The only other male reference curve [18] is based on three times fewer subjects per decade.

The male and female BHI reference curves have remarkably different shapes. While women show a distinct peak at 39 years, men display a “high plateau” ranging from 30 to 50 years, and while women have lost 10 % of their peak bone mass at age 60, men have lost only 2.5 %.

The bone width, shown in Fig. 3, increases steadily by 7 % from age 20 to 80 years for both males and female. While the two sexes have the same W before puberty, the adult males have 15 % larger W. As noted by Garn [19], this is remarkable, since males are only 7.5 % taller, so there is a clear sexual dimorphism here. We divide by (WL)0.3333 in the BHI expression, a factor which is 7.5 % larger for males than for females, and this has the effect that the BHI level at peak bone mass is almost the same in the two genders; it is only 3 % larger in males.

Questionable images

The number of questionable images was small, 3 %. If a stricter imaging protocol is used, the percentage will be considerably smaller, unless the subjects suffer from severe arthritis.

With 97 % or more of the cases being validated automatically, it is a limited workload to inspect the questionable cases, and it is even clinically relevant to bring these to the attention of a radiologist: the image quality can then be improved in the future, and—in case of abnormal bone structure—additional diagnostic information can be extracted from the image.

Transition to adults

The new method covers the age range from about 2 to 90 years and automatically reconstructs the metacarpals with a model appropriate for that age and then performs the DXR analysis. This extension of the paediatric BHI method to adults can be clinically useful for following paediatric patients at risk of poor bone health into adulthood. For instance, growth hormone-deficient children can be followed until peak bone mass, and the effect of GH treatment can be monitored [20].

Conclusion

This paper has presented a “second generation” of the DXR method and used it to form reference curves of four indices of cortical bone in normal Caucasian adults. As in children, BHI was found to have the smallest RSD. With this method, it becomes possible to study the relationship between all four indices and occurrence of fractures [3].

The development of DXR has always been centred on workflow and quality improvement: DXR was initially designed to replace the manual work with a calliper for radiogrammetry by a more precise computer method. The second step forward occurred when films were superseded by digital images so that the cumbersome film scanning could be avoided. The new DXR method takes a third step by integrating the calculation in the PACS workflow and introducing image analysis that places the ROIs relative to the ends of the bones and validates most of the cases, so that expert supervision is needed only for 3 % of the images.