Introduction

High-resolution peripheral quantitative computed tomography (HR-pQCT) is a non-invasive, low-radiation approach for assessing compartment-specific volumetric bone mineral density and bone microarchitecture in the peripheral skeleton, most commonly the distal radius and tibia. Until the introduction of the first HR-pQCT device in 2005 [1], assessment of human bone microarchitecture was largely limited to histomorphometric or micro-CT analysis of iliac crest biopsies [2, 3] or technically challenging, limited resolution micro-magnetic resonance imaging. The ability to investigate bone microarchitecture in vivo has provided novel insights into differences due to age, sex, and ethnicity [4,5,6,7,8,9,10,11,12,13]; changes due to disease [14,15,16,17]; response to nutritional and pharmacologic treatments [18,19,20,21,22]; the impact of physical activity [23,24,25,26]; and deficits associated with increased fracture risk [27]. The use of HR-pQCT for clinical research studies has grown rapidly in the past decade, and as its application continues to expand, there is a need to establish recommendations for best practices and to discuss unresolved issues related to its use in clinical studies. Furthermore, with the recent introduction of a second-generation HR-pQCT device with improved resolution and longer scan region, it is important to highlight the differences and comparability between scanner generations, future HR-pQCT devices, and alternative systems, such as cone-beam CT.

Thus, the objective of this manuscript is to provide guidance on the technical aspects of HR-pQCT for use in the clinical setting, including best practices for acquisition and analysis of scans, as well as provide guidance on the presentation and interpretation of results. The work presented here is a product of a joint IOF-ASBMR-ECTS working group, which met in person and by teleconference over several years to produce this document. The final document was endorsed by the IOF, ASBMR, and ECTS.

Overview of HR-pQCT techniques

The following sections describe the steps involved in the conventional evaluation of bone mineral density and microarchitecture in vivo using HR-pQCT. Each section provides the best practices and considerations when conducting clinical studies. In most cases, the first-generation HR-pQCT (XtremeCT, Scanco Medical AG, Brütisellen, Switzerland) is used as a standard for outlining each step, as it is widely used today. However, these practices are intended to be applicable broadly to other HR-pQCT scanners, namely the second-generation HR-pQCT (XtremeCT II, Scanco Medical AG, Brütisellen, Switzerland), as well as future devices that may be developed by other manufacturers. The basic steps include image acquisition, image processing, image analysis, and reporting of results.

Imaging principles and radiation dose

HR-pQCT provides a method to non-invasively and three-dimensionally evaluate bone mineral density and microarchitecture at high resolutions. The benefit of HR-pQCT over comparable micro-CT systems is the ability to obtain images of human extremities in vivo due to the relatively larger gantry size, at the compromise of somewhat reduced resolution. The basic imaging principles are based on the interaction of ionizing radiation (i.e., X-rays) with matter. As an X-ray beam passes through matter, it is attenuated in relation to the density of the matter, with denser materials such as bone, resulting in greater attenuation than low-density materials, such as soft tissue. In computed tomography, X-ray attenuation data is acquired at multiple projections around the specimen, which allows for a 3D image to be reconstructed [28]. The effective radiation dose from a standard HR-pQCT scan at the distal radius or tibia is 3–5 μSv depending on the scanner generation [29]. In comparison with other common medical imaging techniques, this is considered a low radiation dose procedure. For example, a hip scan using dual-energy X-ray absorptiometry (DXA), commonly used for monitoring osteoporosis, has an effective dose of approximately 9 μSv, a standard chest X-ray has an effective dose of approximately 100 μSv, and a hip CT scan has an approximate effective dose of 286–506 μSv [30, 31].

Image acquisition

Positioning and selection of scan region

Properly positioning the patient’s limb in the scanner and identifying the appropriate scan region are fundamental to all imaging protocols. The limb should be immobilized to minimize motion and properly positioned by using a padded, anatomically formed carbon fiber cast provided by the manufacturer. If not scanning bi-laterally, the technician should scan the non-dominant limb, unless prior fracture, surgery, or metal is present, in which case the contralateral side should be scanned.

Once the limb is placed into the gantry of the scanner, a 2D scout view is obtained to select the region of interest for the 3D measurement (this process takes approximately 2–4 min per limb). There are then two approaches used to define the region where the scan is acquired: (1) fixed offset distance and (2) relative offset distance, also called the %-of-length offset. For the fixed offset method, the operator places a reference line at the inflection point on the endplate of the distal radius or tibial plafond, and the scan region begins 9.5 mm and 22.5 mm proximal to the reference line for the radius and tibia, respectively (Fig. 1). The scan region then extends proximally from this point by 9.02 mm (110 slices) for first-generation scanners and 10.20 mm (168 slices) for second-generation scanners [1, 32]. This approach has been implemented in early studies using the second-generation HR-pQCT [32, 33]; however, due to the increased scan region, an additional 1.0 mm is acquired proximally, a predominantly cortical region. As a result, the manufacturer recommendation for the second-generation HR-pQCT is to use a fixed offset of 9.0 mm and 22.0 mm proximal to the reference line for the radius and tibia, respectively [29]. This difference positions the scan region so that it aligns with the center of the first-generation scan region, extending an additional 0.5 mm in both the proximal and distal region.

Fig. 1
figure 1

Scout view from a first-generation HR-pQCT showing reference line placement for the fixed offset distance (left) and relative offset distance (right) methods for the radius and tibia [36]. The edge of the radiocarpal joint surface of the distal radius and tibial plafond are marked in red. Scan regions represented here are approximate and illustrate a scenario where the fixed and relative offsets are aligned

In the %-of-length approach, the operator measures the limb length and then selects a certain percent of the limb length, depending on the specific protocol, as the center point for the scan region. In this approach, the reference line placement is identical as above for the tibia, but for the radius, the reference line is placed at the proximal margin of the radial head (Fig. 1) [34, 35].

The limitation of a %-of-length measure is that it requires an accurate external physical limb length measurement prior to scanning, and it assumes that bone regions (epiphysis, metaphysis, and diaphysis) are proportional to limb length. The ulnar length is recommended as a surrogate for radial length because it is more reliable to identify the distal and proximal ends of the ulna. The ulnar length should be taken as the Euclidean distance between a horizontal surface on which the elbow is rested and the styloid process at the wrist. The tibial length should be taken as the Euclidean distance between the tibial plateau at the knee and the medial tibial malleolus at the ankle [36]. For a detailed description of measurement methods, we recommend the guidelines provided by Bonaretti and colleagues [36]. If using a %-of-length offset, for adults, we recommend the scan position be centered at 4.0% offset from the proximal margin of the radiocarpal joint surface of the distal radius, and 7.3% offset from the tibial plafond (Fig. 1). These relative offsets align, on average, with the standard fixed-length offset of the first-generation scanner. Centering the scan offset in this way yields equivalent variability in distal and proximal directions, minimizing bias introduced by differences in the relative length of the measured limb [37]. In longitudinal studies in adults, limb length should be measured at baseline, and the same length measurement used at follow-up visits in lieu of repeating the limb length measurement at each follow-up time point.

To date, most HR-pQCT studies in adults have employed the fixed offset distance approach, though this method is likely to confound the interpretation of results for studies that compare groups of differing heights (and therefore limb length), such as comparisons by sex and race/ethnicity, and even age, due to secular trends in limb length [37,38,39,40]. This is because a fixed distance offset from an anatomic reference results in measurements that are acquired at a relatively “too” distal location in individuals with long limbs and relatively “too” proximal position in individuals with short limbs. These differences cannot necessarily be made equivalent by statistical adjustment for height or limb length [37]. However, while the bone microarchitecture in the metaphyseal region is highly variable as a function of limb length [38], the difference in scan location between a relative (%-of-length) and a fixed offset approach is generally small. Nevertheless, an individual with a stature that is different from the population mean will have some measurement bias exceeding typical motion-induced errors due to positioning from a fixed offset [37, 41].

Another form of measurement error that arises from reference line placement is operator-driven variability. Bonaretti and colleagues reported that intra-operator variability in scan positioning corresponds to up to 4% of the total scan length, while inter-operator positioning variability is up to 7.5% of the total scan length [42]. This translates to significant measurement precision error, often exceeding the precision error attributable to subject motion. Standardized training for scan positioning, described by Bonaretti and colleagues, can reduce operator positioning error by approximately half [42].

In addition to the conventional scan region, it is possible to acquire scans more proximal to the standard position in order to investigate predominantly cortical bone, either as a proximal subset of slices extracted from a standard distal scan [43], an additional contiguous scan [44], or an independent scan proximal to the standard position [26, 45, 46]. Accordingly, there is some evidence that more proximal locations along the radius and tibia provide greater sensitivity to cortical bone changes. The second-generation HR-pQCT allows greater access to proximal scan positions at the forearm and tibia, and it is now possible to develop diaphyseal scan protocol centered at 30% of the ulnar/tibial length using the standard leg and forearm casts, and up to 66% of ulnar/tibial length using custom casts. Acquisition of scans at these regions allows for additional measurement of cortical bone, muscle, and fat, analogous to lower resolution pQCT [47, 48]; however, development of standardized scan acquisition and analysis protocols is ongoing.

In summary, it is recommended that research centers transition to a standardized protocol in future study designs, using the %-of-length offset scan region described above. However, given the historical use of a fixed offset scan region, there exist valuable cross-sectional and longitudinal datasets worldwide based upon this scan protocol. Consequently, the use of the fixed offset is an acceptable alternative, should studies need to compare results to historical databases. Due to differences across research centers, it is important to report in all publications how the scan region was selected, including choice of reference line placement, type of offset (fixed vs. relative), distance of offset, and total scan length.

Scan acquisition and motion artifacts

Scan acquisition varies depending on the in vivo measurement protocol, which is scanner and study specific. The standard protocol and scanner specifications for the commonly used HR-pQCT scanners, XtremeCT and XtremeCT II, are summarized in Table 1.

Table 1 Technical parameters of first-generation (XtremeCT) and second-generation (XtremeCT II) HR-pQCT standard in vivo scan protocol. Technical specifications adapted from Manske et al. [32]

Scanned images should be inspected visually for motion artifacts, especially at the distal radius due to higher prevalence of motion at this location. Subject motion during image acquisition can lead to degraded image quality and introduce error, particularly for trabecular and cortical microarchitecture [41, 49, 50]. To determine what degree of motion is acceptable, several grading scales have been developed, where the most commonly used is a 5-level motion grading scale (best score is 1, worst score is 5) that aims to differentiate motion on the severity of artifacts, summarized in Fig. 2 [41]. However, even with a standardized scoring system, motion scoring remains subjective, and operator agreement has shown to remain only moderate, even with intensive training [49,50,51]. Automatic techniques analyzing CT projections have been suggested to provide a standardized approach [50]; however, these procedures have not yet been integrated into HR-pQCT protocols. Until automated techniques are integrated, the 5-level grading scale is recommended, with scoring done consistently by the same operator where possible. At the time of scanning, if motion artifacts with a score of three or more are observed, then it is recommended the scan be repeated.

Fig. 2
figure 2

Motion grading guideline as recommended by the manufacturer and presented by Sode et al. [41], with visual examples provided here for second-generation HR-pQCT radius (top row) and tibia (bottom row) scans

In general, density-based measures are less sensitive to motion artifacts than structure-based measures. It is acceptable to include all outcome variables from scans with a motion score of three or less, as the precision error is not substantially compromised for density (< 1% error), microarchitecture (< 5% error), and biomechanical parameters, such as estimated failure load (< 4% error) [50]. Scans with a motion score of four may be acceptable for density-based measurements, but should not be used for trabecular and cortical microarchitecture or biomechanical assessment. Scans with a motion artifact score of 5 should not be used.

Image processing

Segmentation of the cortical and trabecular bone compartments is necessary for density and structural analysis. Over the past decades, automated and semi-automated methods have been developed to improve accurate and repeatable extraction of the cortical and trabecular compartments.

Standard vs. extended cortical analysis

In the first-generation HR-pQCT standard analysis, the operator guides a semi-automated slice-by-slice contouring process to identify the periosteal boundary of the bone, thereby extracting the bone region from surrounding soft tissue (Fig. 3a). The bone region is extracted using a Laplace-Hamming filter and global threshold cut-off of 400 permille (‰) to generate the segmented bone volume be used for proceeding morphological analysis (Fig. 3b). Delineation of the cortical and trabecular compartments is done automatically using a filter and threshold-based algorithm, where the original greyscale bone region is smoothed using a high-Gaussian weighted filter, then a cut-off of 160 permille (‰) is used to identify voxels that belong to cortical bone [52]. However, this method is insufficient for extraction of the cortex when it is thin and/or highly porous (Fig. 3c) [53, 54] or when the trabecular structure is rich and well connected to the cortex. Thus, an alternate algorithm, the so-called “extended cortical analysis” which uses a dual-threshold segmentation technique, has been incorporated to provide a more robust extraction of the cortical and trabecular compartments [55, 56]. It involves a two-step algorithm to automatically identify the periosteal and then the endocortical surface. In the first step, a connectivity filter is applied to the previously generated segmented bone volume to create a mask of the whole bone region. In the second step, a dilation-erosion operation is applied to the background (i.e., marrow cavity) of the original segmented bone volume to remove trabeculae, thus leaving a mask of the cortical compartment (Fig. 3d) [55]. Once the compartments are defined, the extended cortical analysis allows for the assessment of cortical porosity and cortical tissue mineral density, as well as a direct measure of cortical thickness, described in detail in the later sections. With the deployment of the second-generation HR-pQCT scanner, the dual-threshold technique is the default segmentation method, and the image filtration and segmentation thresholds have been adjusted to account for the increased resolution, summarized in Table 1.

Fig. 3
figure 3

Example of distal radius scan from a first-generation HR-pQCT and outputs from image processing protocol, where a is the greyscale slice-wise view with the periosteal contour (green) identified, b segmented whole bone volume, c cortical bone region extracted using the standard analysis protocol, and d cortical bone region extracted using the dual-threshold technique

Manual correction

Although the dual-threshold technique improves the segmentation of the cortical and trabecular compartments, errors can persist. It is imperative that operators check the periosteal and endocortical contours visually for errors and apply manual corrections as necessary. Omitting corrections of the automatically generated contours leads to greater accuracy errors arising in very low- or high-density bone, resulting in a systematic bias with certain parameters, such as cortical density and thickness [57]. This has the potential to skew the results of studies that investigate osteoporotic or highly active populations; therefore, we advise that contours be checked and manually corrected [57].

Drawbacks to manual corrections include the increased time spent visually inspecting images and the potential for increased precision error that arises when manual corrections are applied [58, 59]. In particular, the endocortical contour can be highly subjective in low-density bone, and human interpretation by one or more operators introduces precision error. Nevertheless, inter-operator variability introduces less error than the accuracy error resulting from uncorrected contours [59], and short-term precision between using corrected versus uncorrected contours is comparable [58]. If manual corrections are applied, it is essential the operator has a suitable level of training to minimize precision error [57].

Image analysis

The defined cortical and trabecular compartments are the basis for measuring density and microarchitecture properties. Tables 2 and 3 provide a summary of common parameters, units, and methods of measurement from HR-pQCT modalities as have been previously described in the literature [1, 32].

Table 2 Definition, units, and measurement of methods of common whole bone and cortical microarchitecture parameters obtained from available HR-pQCT systems. Italicized parameters are the minimum set to be reported when describing cortical bone morphology
Table 3 Definition, units, and measurement of methods of common trabecular microarchitecture parameters obtained from available HR-pQCT systems. Italicized parameters are the minimum set to be reported when describing trabecular bone morphology

Trabecular microarchitecture

Trabecular microarchitecture is assessed from the segmented image; however, the spatial resolution of the first-generation HR-pQCT limits the ability to measure trabecular thickness, and as a result, some of the trabecular morphology measures are derived using 2D stereologic methods rather than directly measured. Specifically, trabecular bone volume fraction (Tb.BV/TV) is computed as the ratio of the trabecular bone mineral density (Tb.BMD) and 1200 mg HA/cm3, which is assumed to be the density of fully mineralized bone (Table 3). Alternatively, trabecular number (Tb.N) is directly measured using a ridge extraction technique [52]. Trabecular separation (Tb.Sp) and trabecular thickness (Tb.Th) are then derived from these two parameters assuming a plate-like morphology, as described in Table 3 [60].

In contrast, the second-generation HR-pQCT has sufficient spatial resolution to “directly” measure Tb.BV/TV, Tb.Th, and Tb.Sp. Accordingly, Tb.BV/TV is defined as the ratio of voxels in the mineralized bone phase to the total number of voxels in the trabecular compartment. Tb.Sp and Tb.Th are measured using the distance transformation method, where 3D distances are estimated by fitting maximal spheres inside the structure (or void space in the case of Tb.Sp) and taking the average sphere diameter as the mean thickness [61].

In addition to common morphometric indices, additional parameters have been developed to estimate the level of anisotropy in trabecular microarchitecture from HR-pQCT images. These non-metric indices include the structure model index (SMI) [60], connectivity density (Conn.D) [62], and mean intercept length (MIL) [63], outlined in Table 3. Another non-standard analysis includes individual trabecular bone segmentation (ITS)–based morphological analysis [64]. This approach performs a complete volumetric decomposition of individual trabecular plates and rods to characterize bone morphology and orientation of trabecular bone. These non-standard analyses are susceptible to resolution effects and demand cautious interpretation.

Cortical microarchitecture

Primary cortical microarchitecture parameters include the cortical thickness (Ct.Th) and cortical porosity (Ct.Po). The standard analysis for first-generation HR-pQCT derives Ct.Th from the mean cortical volume divided by the periosteal surface area. However, in the extended cortical analysis, the cortical thickness is directly measured using distance transformation methods once the cortical compartment has been extracted using the dual-threshold segmentation technique. Measurement of Ct.Po is limited by the spatial resolution of the HR-pQCT images, as Haversian canals can range in size from 30 to 350 μm. However, it is estimated that pores smaller than 90 μm contribute to only 5–8% of the total pore volume [65]. Measurement of Ct.Po is improved with the second-generation of HR-pQCT due to increased resolution [33].

Two methods currently exist to assess Ct.Po: (1) the threshold-based approach [55, 56] implemented in the XtremeCT analysis software provided by Scanco and (2) the density-based approach implemented in the StrAx1.0 software [43]. The threshold-based approach is a fully automated segmentation contouring approach integrated with extended cortical analysis method. Briefly, intra-cortical porosity (Ct.Po) is calculated as the ratio of the total pore volume within the cortical compartment to the sum of the cortical volume [56]. This method has the advantage of segmenting the individual pores from the scan, which allows for structural information to be measured, including mean cortical pore diameter (Ct.Po.Dm); however, it is limited to capturing only pores within the limits of resolution and pores that do not intersect the periosteal or endocortical surfaces.

Alternatively, the density-based approach segments bone into a compact-appearing cortex, transitional zone, and trabecular compartment, described in detail elsewhere by Zebaze and colleagues [43]. In brief, Ct.Po is quantified in each of these regions by assuming that fully mineralized bone has a density between 1000 and 1200 mg HA/cm3. Voxels with a density lower than 1000 mg HA/cm3 indicate that some ratio of the voxel, proportional to its density, is composed of void space (i.e., pores). Porosity is quantified by estimating the ratio of void space present in each voxel and taking the mean across all voxels in the compartment of interest [43]. The density-based method of assessing Ct.Po aims to capture pores with diameters below the spatial resolution of the scanner; however, it relies on the assumption of a fixed bone tissue mineral density and is susceptible to image noise and beam hardening.

A comparison of methods for in vivo assessment of Ct.Po revealed that both approaches are well correlated with gold standard porosity measurements from synchrotron radiation micro-computed tomography (SRμCT); however, each method has inherent systematic errors. In compact-appearing cortical bone specimens, the threshold-based approach underestimates Ct.Po by 3 to 11% as it does not capture smaller pores, whereas the density-based approach overestimates Ct.Po by 6 to 21% due to misclassifying image noise and artifacts as void space [65]. The threshold-based approach has improved accuracy in Ct.Po measurements with the second-generation HR-pQCT as this approach is resolution-dependent, whereas the density-based approach would not be impacted as it is not resolution-dependent. Both methods are widely adopted measures of Ct.Po, but due to methodologic differences, they cannot be directly compared and study methodologies should clearly state which method is used.

Finite element analysis

Morphometric parameters are numerous and provide valuable insight describing the structure of a bone, but the interpretation of these data can be complex. Finite element (FE) analysis is a computer modeling technique that, when coupled with HR-pQCT, provides a non-invasive approach to intrinsically account for the complex bone structure and estimate bone strength [66]. The fundamental requirements for an FE model are selecting the bone geometry, assigning material properties, and defining boundary conditions to simulate loading conditions, typically determined through validation studies using experimental loading tests of cadaveric bones [67,68,69]. The application of FE procedures to HR-pQCT images has been explored in depth and is often referred to as micro-FE (μFE). Most HR-pQCT-based μFE models to date have defined constitutive properties that are linear, with uniform elastic material properties. However, non-linear models and density-based elastic material properties have also been implemented [67, 70,71,72]. Here, we outline key considerations and recommendations when choosing a μFE procedure, with a specific focus on linear, homogeneous models for standard HR-pQCT scan regions.

Model generation and boundary conditions

Geometry is determined from the segmented HR-pQCT image by converting it into an FE mesh, most commonly using a voxel-by-voxel conversion where each voxel is converted into a cubic hexahedral finite element [73]. First-generation HR-pQCT images typically result in FE models with 1–9 million elements [66] and second-generation in the range of 2–24 million elements, depending on the scan site and bone volume, resulting in a direct representation of the bone microarchitecture (Fig. 4).

Fig. 4
figure 4

Example of a segmented second-generation HR-pQCT tibia scan converted into an FE mesh using a voxel-by-voxel conversion approach. The model is composed of millions of hexahedral elements

Boundary conditions are selected to simulate a loading condition. Standard HR-pQCT scans are best suited for simulating a compression test along the z-axis (longitudinal axis), defined perpendicular to scan cross-section. One end is assigned a fixed constraint in the z-direction and a fixed displacement is applied to the opposite end resulting in 1% apparent strain. If the fixed constraint surface includes conditions that prevent lateral expansion during compression, the test is referred to as axial compression; otherwise, it is termed a uniaxial compression. The results of a compression test will differ slightly depending on whether axial or uniaxial boundary conditions are applied, but they are strongly linearly associated (r2 > 0.99), and methods have been developed to allow harmonization by applying a linear correction factor [74].

Material properties must also be defined for the model, including Poisson’s ratio and the elastic modulus (also called the tissue modulus or Young’s modulus). A Poisson’s ratio of 0.3 is conventionally used for all μFE models. However, selection of the elastic modulus requires closer consideration as it ultimately defines the stiffness of the material under tension or compression. The most common approach is to assign a uniform elastic modulus, previously determined through direct comparison to experimental loading tests, to all bone tissue in the model. A summary of common elastic modulus values (and their respective boundary conditions) for first- and second-generation HR-pQCT scanners are summarized in Table 4. The modulus utilized for a given scanner generation will not yield the same results if applied to another generation, as the elastic modulus is dependent on both the mesh resolution and segmentation protocol, which impact the bone structure extracted from the image. Hence, for second-generation HR-pQCT, an elastic modulus of 8748 MPa has been back-calculated from the modulus validated for the first-generation HR-pQCT [74], and 10,000 MPa has been determined from direct comparison to experimental loading tests [68].

Table 4 Summary of elastic modulus, boundary conditions, and common yield criterion for first- and second-generation HR-pQCT. All models applied a 1% compressive strain in the axial direction and used a Poisson’s ratio of 0.3. Unless specified, the proposed yield criterion and associated constitutive properties were compared with mechanical compressive loading tests; however, loading configurations relative to the scan region vary across studies [67,68,69, 74, 76, 77, 111]

FE outcomes

Table 5 provides a summary of common outcomes from μFE models, with proposed nomenclature and units. The most relevant primary outcomes include stiffness [kN/mm] and yield load, commonly referred to as the estimated failure load [kN]. A linear, elastic FE model intrinsically cannot directly measure failure load; however, non-linear approaches that are more appropriate for strength measurement are computationally demanding and therefore not often employed in high-resolution models [67, 75]. Instead, a failure criterion often used with linear FE models assumes bone yields when a specified volume of bone tissue (critical volume) exceeds a specified critical strain. This is often referred to as the “Pistoia criterion”, and a typical set of yield parameters used assumes failure load can be estimated when the critical strain of 0.7% is exceeded for 2% for the bone tissue [76]; however, a complete table of commonly used yield parameters are provided in Table 4. Use of the Pistoia criterion for HR-pQCT-derived μFE models has led to strong correlations (r2 = 0.73–0.95) with experimentally measured whole bone strength for the first-generation HR-pQCT [67, 77, 78], although some have suggested a 7.5% critical volume and 0.7% critical strain are more appropriate [77]. For the second-generation HR-pQCT, where scan volume and resolution differ, appropriate criterion needs to be adjusted [74]. A recent study by Arias-Moreno and colleagues determined that the appropriate failure criterion for the second-generation HR-pQCT should be a critical strain of 1.0% and critical volume of 5% under axial loading conditions [68]. Regardless of the specific parameters chosen, these variations of criteria provide highly comparable correlations with experimental bone strength and have been shown to be suitable predictors for failure load, but absolute values obtained using different yield parameters should not be directly compared. μFE-estimated failure load at the distal tibia scan site has also been shown to be well correlated to bone strength at the femoral neck and vertebra [79].

Table 5 Definition, units, and measurement of methods of common micro-finite element analysis outcomes obtained from available HR-pQCT systems

A common misconception is that μFE models predict failure load in a fall condition; however, boundary conditions in that scenario are prohibitively difficult to estimate with an enormous range of loading possibilities. Instead, standardized axial or uniaxial μFE tests of bone strength provide a reproducible approach that gives insight into the effect of bone microarchitecture on estimated bone strength. In addition to the failure load, there are several other μFE model outputs that provide additional insight into the biomechanical properties of bone, details are provided in Table 5.

In summary, for first-generation HR-pQCT, the choice of boundary conditions, elastic modulus, and yield criterion have varied among studies, but harmonization techniques are available to compare results using the different configurations summarized in Table 4 [74]. For the second-generation HR-pQCT, we recommend using an elastic modulus of 10,000 MPa with axial boundary conditions, and yield criterion of 1.0% critical strain and 5% critical volume. However, if this is not feasible, harmonization methods like those used for the first-generation HR-pQCT should be developed and used to compare results across studies [74]. As future systems become available, it will be necessary to determine the appropriate elastic modulus and yield criterion through back-calculation or direct validation for that particular system, ideally so that comparisons across HR-pQCT technologies are possible.

FE solvers

Due to the resolution of the images, μFE models tend to be very large, on the order of 10s of millions of degrees of freedom, and thus, the models pose a challenge for traditional commercial FE solvers. Fortunately, this problem has been overcome by the use of specialized solvers that are designed to leverage the fact that each element in the model (i.e., voxel) has the same hexahedral shape [80]. Scanco Medical provides built-in linear elastic FE software for the XtremeCT and XtremeCT II scanners, with libraries of pre-defined tests that may be applied. Specialized third-party FE solvers are also available that allow for faster computation time and flexibility, including commercial software (e.g., Numerics88 Solutions) as well as open-source versions (e.g., ParFE).

Assumptions

It is important to be aware that μFE carries assumptions about tissue behavior. For instance, in μFE models, bone is assumed to be isotropic and behave purely linearly, both of which are only approximations of the true case. If applied to the atypical bone, this could possibly result in erroneous assumptions about bone strength in certain circumstances (e.g., osteogenesis imperfecta, fracture healing, or altered tissue-level properties) and care must be taken in designing studies and interpreting μFE results in these cases.

Image registration for longitudinal studies

A growing number of studies are producing longitudinal data to evaluate the effects of aging [10, 81] as well as pharmacologic [19, 82] or exercise interventions [25]. Small differences in participant positioning in follow-up scans can result in variations of axial position and in the degree of tilt, illustrated in Fig. 5, and this reduces the ability to detect longitudinal change, particularly for parameters that vary considerably along the length of the bone (e.g., cortical geometry). Despite procedures to ensure that positioning is standardized as much as possible during each image acquisition, small variations in the scan region are inevitable. Consequently, reproducibility has been shown to be suboptimal with precision errors for trabecular and cortical parameters of up to 7.0% and 20.3% RMS-CV% respectively in the radius [83]. Registration techniques reduce positioning errors and are necessary in longitudinal studies to ensure the same region of the image is analyzed at each time point.

Fig. 5
figure 5

Examples of axial (a) and angular (b) offsets in between scan regions captured at baseline (blue) and follow-up (red)

2D registration

A common 2D registration technique uses the total cross-sectional area of each slice to determine an optimal offset between paired (baseline and follow-up) scans resulting in a common volume of interest with a subset of the original slices acquired [56]. This 2D technique is implemented in the Scanco software. It is capable of correcting for discrepancies in an axial direction, but cannot take into consideration the variation in tilt of the limb (Fig. 5). Notably, the 2D registration technique precludes evaluation of periosteal bone apposition and thus limits the interpretation of studies where a change in bone size is anticipated, such as growth, exercise, or perhaps aging over an extended observation.

3D registration

Methods using 3D rigid registration account for both positional (axial) and angular differences in images by determining the necessary 3D transform (translational and rotational) between two images. The method is based on optimally overlapping all image data within a volume of interest (e.g., within the periosteal contours) and so noise or structures outside the bone of interest do not impact registration. It is somewhat more complex to implement because it requires an optimization process, typically including a pre-alignment (e.g., based on the center of masses of the two images) followed by an optimizer-guided alignment based on a metric of best overlap (e.g., mutual information), ultimately resulting in a 3D transform matrix [84]. Strategies to maximize efficiency and minimize risk of errors include using downscaled images for initial alignment, and then increasing to full resolution to refine the alignment. Nevertheless, as with any automated process, it can result in errors and therefore it is imperative to visually assess the degree of overlap to ensure a reasonable solution has been achieved. The resulting 3D transform is used to identify a common region between multiple scans in a series, and the masks for that region are transformed, rather than the images themselves, to prevent image degradation associated with interpolation [84]. Currently, the 3D registration procedures are not part of the standard Scanco software.

While 3D registration provides excellent reproducibility for morphological parameters, it is unfortunately not as effective for μFE analysis. The application of boundary conditions to non-parallel surfaces resulting from common regions of interest from 3D registration in μFE models is not trivial, and currently, there is ongoing research in this area. To date, reproducibility of μFE outputs has not been found to be greater with 3D than 2D registration; therefore, use of 3D registration is not yet recommended for μFE [84]. Instead, the recommended approach is to apply μFE to 2D registered data, but only relative changes should be compared in this context and not absolute outcomes, as a shorter segment of bone will yield different results from a full-height image. If absolute outcomes are to be compared, unregistered data should be used for μFE analyses and comparisons.

Additional considerations for image registration

Typically, in longitudinal studies, the periosteal and endocortical surfaces are defined independently between baseline and follow-up scans. However, in some cases, it may be preferable to compare changes in a fixed region over time, such as for detecting trabecularization of the cortical region. This can be accomplished by taking the defined cortical and trabecular regions at baseline and applying these to the follow-up scans (in lieu of re-defining the regions independently) [85]. This method ensures the exact same region of bone is compared between baseline and follow-up. Consequently, this approach precludes detecting changes in cortical thickness, perimeter, or changes in total, cortical, or trabecular areas. When conducting 3D registration, it is important to specify if trabecular and cortical compartments were defined independently or if regions were defined at baseline and applied during registration on follow-up scans.

There are circumstances where the value of registration may be diminished. In long-term treatment and aging studies, there can be significant changes in bone geometry and microstructure. This can lead to inaccuracies when using both 2D and 3D registration. Consequently, shorter intervals may be required to allow sequential transformations to take place, with smaller incremental changes, to allow a more optimal comparison of bone health between the start and end of the study. Furthermore, in children, bone growth leads to challenges because a fixed distance from the endplate will gradually become a more distal part of the bone and bone size is anticipated to change substantially [86]. Lastly, in the case of fracture healing, rapid remodeling occurs that changes overall bone geometry [71, 87] which poses challenges in terms of image registration and longitudinal comparison of bone morphology. It is imperative to visually inspect registered images to confirm that unexpected errors have not occurred during registration before the quantitative assessment is performed.

In general, follow-up scans should be registered to ensure the equivalent region is assessed, where 3D registration is the preferred method; however, if this approach is not feasible, 2D registration is an acceptable alternative. In both cases, a reasonable level of overlap must exist to include results from morphological or μFE analysis, so that the region is representative of the full scan length. An optimal minimum overlap between baseline and follow-up scans has not been validated, but we recommend a minimum volumetric overlap of 75%. There are circumstances where in a series of multiple follow-up scans there is one scan that is poorly positioned (i.e., less than 75% overlap). Rather than removing the entire series, that single scan can be removed so that the remainder of the subject’s data can be used in the analysis. The choice of overlap cut-off should be specified in the study design and median or average overlap reported in the results.

Reporting results

Reporting density and morphometric data

The decision of which density and microarchitecture parameters to report depends on the research question. Nevertheless, with the goal of developing standardized procedures for using HR-pQCT in clinical studies, a minimum set of parameters should be reported to appropriately characterize the trabecular and cortical bone. As the research field shifts from the first-generation to the second-generation HR-pQCT, appropriate terminology is necessary, as certain morphological parameters, such as Tb.Th, have fundamentally different methods of measurement between scanner generations and thus cannot be directly compared.

Historically, with the introduction of direct 3D morphological measures, bone microarchitecture that was measured directly instead of derived was denoted with an asterisk (e.g., Tb.Sp*) [61]. However, this naming convention has not been used consistently in the literature. As direct measurement becomes more commonplace with regular use of the extended cortical analysis protocol and shift towards the second-generation HR-pQCT, we propose instead that direct measures be denoted without an asterisk (e.g., Tb.Sp), whereas derived measures be identified with the superscript “d” (e.g., Tb.Spd). Although the type of measurement method can often be inferred based on the scanner generation, it is more appropriate to use explicit nomenclature. Although inconsistencies exist in literature to date, this approach will help avoid confusion in future studies. For example, Ct.Th can be measured directly or derived using the first-generation HR-pQCT scanner, but if not explicitly stated it is not clear which approach was used. Tables 2 and 3 provide a summary of recommended terminology for common indices for future studies, and the minimum set of parameters to be reported are provided in bold.

Reporting μFE data

The choice of reported μFE outputs should be chosen with consideration of what biomechanical properties are most relevant to the research question. In many cases, μFE outputs are highly correlated, and it is not advised to report all available parameters without reasonable justification. Commonly reported parameters for standard HR-pQCT scans are summarized in Table 5. Primary outcomes are strength estimates (such as failure load) and stiffness; it is recommended at least one of these parameters is reported at a minimum; however, it is usually not necessary to report both due to high correlation (i.e., r > 0.9). Other insightful parameters include load distribution properties, specifically compartment load sharing at the distal and proximal ends of the scan region, or in certain cases stress/strain distributions within the model. As μFE outputs are highly dependent on model properties and loading conditions, it is important to provide sufficient details of the μFE analysis conducted (as summarized in Table 4) as well as the mesh generation method, and computational solver used. Additional information that is beneficial to report, especially for non-standard scan sites or techniques, is the model complexity (usually reported and the average number of elements in the models), the average computation time per model, and specifications of the computing system (i.e., hardware) used.

Precision error

An important factor in the clinical application of HR-pQCT is its ability to produce highly precise and accurate results. Precision error arises from a combination of technical and operational factors and is specific to the scanner, parameter, operator(s), and study design. As discussed throughout this article, important contributors to precision error are motion artifacts [49, 51], subject positioning and reference line placement [42], manual correction of periosteal and endocortical contours [58, 59, 88], and use of registration techniques [84]. Precision error is most often reported as the root-mean-squared coefficient of variance (RMS-CV%) of repeated measurements in a group of individuals. Reported ranges of short-term precision for first- and second-generation HR-pQCT are summarized in Table 6 [27]. Although these ranges provide an idea of expected precision error, it is nevertheless dependent on the individual scanner and operators, and precision should be determined for each scanner and research center in a manner that reflects the study design. For cross-sectional study designs, precision should be measured without registration techniques applied. Longitudinal studies should report precision values with registration applied in a manner that reflects the technique that will be applied to the dataset. In addition, the precision measurements should be conducted on a cohort that reflects the demographics of the study (i.e., elderly versus younger subjects).

Table 6 Precision error ranges, reported as the RMS-CV%, for first-generation (XtremeCT) and second-generation (XtremeCT II) scanners. First-generation ranges are from the meta-analysis conducted by Mikolajewicz and colleagues [27], and second-generation ranges were obtained from short-term reproducibility studies conducted to date [33, 88, 112]

Reporting results from longitudinal studies

When reporting results for longitudinal studies, the method used for image registration should be reported, along with the average amount of overlap obtained between accepted baseline and follow-up scans, and the number of scans or subjects excluded from the analyses due to poor overlap. Depending on the study design, investigators may also consider reporting results in comparison with the least significant change (LSC), defined as the minimum change observed in an individual that can be considered statistically significant (p < 0.05), calculated by multiplying precision error by 2.77 [89]. A summary of ranges reported for LSC (primarily from the first-generation XtremeCT) is provided in Table 6; however, LSC should be estimated from the precision error of the specific scanner and registration methods used.

Considerations for multiple comparisons

Due to the large number of parameters that are produced using HR-pQCT and high correlation between some parameters, studies are susceptible to the issue of multiple comparisons and falsely positive statistical tests, or type I error. For example, there are often at least 11 parameters that are presented in HR-pQCT studies (see Tables 2 and 3) and this number is multiplied by two when examining both the radius and tibia. If a p value threshold of 0.05 is used to denote statistical significance between groups, then it is expected that by chance at least one comparison may be significant, when in reality it is not. When reporting results, investigators must interpret their results with this in mind.

Although there are no specific recommendations for how to address this issue, one approach could be to formally control for the potential of false-positive statistical tests, with the selection of statistical test dependent on the study design and power. The most conservative approach would be a Bonferroni correction, an alternate option is the Benjamini-Hochberg correction, which is advantageous as it accounts for the false discovery rate with a limited impact on power. It also favors large studies and it is easy to implement [90, 91]. Another way to preserve power in small studies could be to use a hierarchical method, adjusting for the false discovery rate [92]. Though not commonly employed for HR-pQCT studies, this method has the theoretical advantage of relying on a priori hypotheses in the analysis of HR-pQCT data. For instance, a first step could be to test for differences in total density, then in trabecular and cortical density if the comparison of total density is significant. Thereafter, the various parameters of each compartment can be tested if there is a significant difference for the compartment [93].

Quality control and training

Routine steps should be taken to ensure that the HR-pQCT system is working properly, and operators should be trained appropriately. The system should undergo routine service and maintenance check by the manufacturer annually.

Daily and weekly quality control

The user should follow the manufacturer’s protocol for quality control to ensure the system is functioning properly. This typically includes daily and weekly scans of a quality control phantom to check performance prior to in vivo scanning and to monitor the stability of density and microarchitecture parameters. A drift over time in the phantom measurements signals decay in the X-ray emission, which has the potential to confound longitudinal studies if not addressed. Use of Shewhart charts to track scanner stability is recommended, and although no specific guidelines for when a scanner should be recalibrated have been developed, typical changes on the order of 1.5% from the mean phantom density are considered a threshold for when an intervention should take place. Daily quality control scans should also be inspected visually to identify common artifacts, such as ring artifacts, or distortions that would affect measurement outcomes. These should only be corrected by the manufacturer during routine maintenance, but it is prudent for the user to carefully monitor image quality control.

Operator training

Operator training is an essential aspect of obtaining high-quality data. Although there is no formal training course or certification for operators of the HR-pQCT device, it is recommended that new operators undergo thorough training from experienced operators in patient management and positioning, location of the reference line, and manual correction of contours generated by the automated and semi-automated protocols. Unfortunately, limited formal training is available despite the knowledge that scan precision and inter-operator variability can be reduced using a standard training platform [42]. In terms of reference line placement, it is recommended that new operators complete the online training developed by UCSF (http://webapps.radiology.ucsf.edu/refline/) [42]. Comparable training for contouring has yet to be developed, and so it is recommended new operators receive training from an experienced operator, and guidelines previously reported be followed to minimize error bias [57]. In the future, standardized training courses should be offered.

Other considerations

Multi-center studies

Measurement of multi-center precision and cross-calibration is important when pooling data from multiple research groups and scan sites, as inter-scanner differences are a key source of error. When compared with single-center precision, HR-pQCT parameters are less precise across different scanners, which may be attributed to differences in quality control of scanners, calibration, and intrinsic manufacturer differences. Burghardt and colleagues reported LSC values for the first-generation HR-pQCT ranging from two to five times that of short-term single-center precision [94]. There is some evidence in that second-generation HR-pQCT scanners have notably improved inter-scanner precision, resulting in differences below LSC [95]. However, this may not apply broadly and cross-calibration procedures should still be employed for future studies using second-generation HR-pQCT. Although no standardized procedure exists for the estimation of multi-center precision, the use of a phantom that mimics geometry, microarchitecture, and composition of standard scan regions is recommended for cross-calibration [94]. This information collected can be employed to establish cross-calibration procedures to minimize inter-scanner errors; however, individual approaches vary and there is currently no standardized approach. Optimal methods for cross-calibration in multi-center studies have not yet been established and studies in this area are greatly needed. Of note, the issue of multi-center calibration is more important for cross-sectional studies; in longitudinal studies that have primary outcomes of individual change, the concern for inter-scanner variation is lessened.

Adjusting for different scanner generations

The introduction of the second-generation HR-pQCT poses a challenge of comparing research findings across different systems. Factors such as differences in resolution, scan region of interest, and measurement methods make it challenging to compare data from different generations, and this will be further confounded when future systems become available. Although it is possible for the second-generation HR-pQCT to be operated in a configuration mimicking the original system, this is not ideal because it defeats the purpose of having improved signal-to-noise and better quality images. Cross-calibration between scanner generations has demonstrated that some parameters from the first-generation can be converted to equivalent parameters for the second-generation; however, some parameters such as Tb.Th, which are highly dependent on resolution, should not be compared across systems [29, 33].

Beam hardening and scatter

Beam hardening and scatter effects that arise due to increased adipose (fat) tissue overlying the scan region, as might be expected in an obese population, can introduce measurable bias in HR-pQCT parameters [96]. Specifically, overlying adipose tissue can result in underestimation of density measurements and failure load, and altered trabecular microarchitecture, often leading to overestimation of bone microarchitecture impairment [96]. Similarly, high-density objects such as a clinical cast composed of plaster-of-Paris or fiberglass can result in similar bias, due to beam hardening effects, and should be accounted for in a quantitative analysis if they are present in the scan [97, 98]. Future work is needed to determine the appropriate standardized procedures for overweight and obese individuals.

Bone tissue mineralization

Variations in bone tissue mineralization can influence morphological measurements obtained from HR-pQCT. Extraction of the bone structure uses a fixed threshold technique, and consequently, under-mineralized bone tissue that falls below the threshold may not be captured. In addition, μFE analysis usually assumes a uniform tissue modulus for all of the bone, and thus, variation in the degree of mineralization is not accounted for in bone strength estimates. Typical variability in tissue mineralization has little effect on morphological or μFE-derived outcomes [67, 99]. However, results must be interpreted carefully in situations that produce more extreme values of tissue mineralization, such as metabolic bone diseases (e.g., osteopetrosis) or pharmacologic treatments. In particular, induction of new bone formation by anabolic therapies may lead to a relatively undermineralized bone matrix, whereas potent anti-remodeling drugs lead to an increase in tissue mineralization.

Other high-resolution systems

The HR-pQCT systems discussed here focuses on available systems, manufactured by Scanco Medical AG. It is expected that HR-pQCT systems developed by other manufacturers will become available in the near future. In addition, existing imaging technologies are being adapted to study extremity bone microarchitecture as a substitute to HR-pQCT, namely cone-beam CT (CBCT) [100,101,102,103]. Current CBCT systems designed for extremity scanning offer larger scan lengths and shorter scanning times relative to HR-pQCT, with some systems allowing for scan acquisition in a weight-bearing configuration [103]. These benefits come with the trade-off of lower resolution, typically 127-μm nominal resolution reconstructed to a 75-μm voxel size [100]. Consequently, CBCT is susceptible to partial volume effects and can struggle to distinguish thinner trabecular structures that are visible with HR-pQCT [102]. As the application of CBCT for bone microarchitecture grows, and new HR-pQCT systems become available, the recommendations and guidelines outlined in this article should be considered.

Pediatric studies

There is growing interest in the application of HR-pQCT in pediatrics for the study of normal skeletal acquisition [8], as well as the impact of disease [15] and physical activity [24] on skeletal acquisition. However, measurement of bone density and microarchitecture in children and adolescents poses unique challenges due to the complexities of long bone growth and presence of the growth plate near the scan region [86]. Non-linear bone growth and differences in maturation between sexes and ethnicities pose challenges in the appropriate selection of a standardized scan region. Several protocols have been proposed for the selection of the scan region to account for long bone growth [104], and although there is no consensus regarding the most appropriate scan site, there is an agreement that a relative offset should be used in children and adolescents. Further work is required to form a consensus on a standardized protocol for pediatric studies, including how to handle the transition from a pediatric to adult scanning protocol in longitudinal studies where participants are followed from childhood into adulthood.

Non-standard applications

HR-pQCT is specifically designed to measure the distal radius and distal tibia in adults, but its application to study bone microarchitecture at other sites has expanded in recent years. The introduction of XtremeCT II, which has a larger gantry and redesigned staging mechanism, permits knee and elbow joints to be imaged [105]. Furthermore, applications beyond the standard protocol have included the study of fracture healing at the distal radius [71], quantification of muscle morphometry [47], progression of inflammatory arthritis through quantification of joint space narrowing and development of erosions in finger and wrist joints [106, 107], and investigation of changes in bone microarchitecture at the knee due to injury or osteoarthritis [108]. These emerging applications are likely to become more prominent in HR-pQCT research and will require careful development of standardized protocols. A prominent example of this is the development of research focused on inflammatory arthritis, with as many as 20 research centers actively publishing in this area. Ongoing standardization of this novel application is overseen by the international consortium, SPECTRA (Study grouP for xtrEme-Computed Tomography in Rheumatoid Arthritis) [109, 110].

Summary of recommendations

Recommendations for best practices for the acquisition and analysis of HR-pQCT imaging techniques have been presented here, with nomenclature and recommendations on presenting and interpreting results. Many decisions on appropriate techniques and study design remain dependent on the research question. However, with the uptake in the clinical application of HR-pQCT, a certain degree of standardization is necessary to support the further advancement of the clinical application of HR-pQCT. The key recommendations for standard analysis of the distal radius and tibia HR-pQCT scans are summarized in Table 7. Implementation of these recommendations as “best practices” should facilitate the comparison of results across studies by minimizing technical variation in scan acquisition and analysis.

Table 7 Summary of recommendations

Acknowledgments

These guidelines have been reviewed and endorsed by the Committee of Scientific Advisors of the International Osteoporosis Foundation, the European Calcified Tissue Society, and the American Society for Bone and Mineral Research.