Introduction

Osteoporosis, one of the most common disorders of the elderly, results in significant morbidity and mortality. White women over 50 have a 50% chance of fracturing in their lifetime [1]. Hip fractures are especially serious, as 12–20% of all patients with hip fracture die within the first year after the fracture [2] and 36% of women and 48% of men die within 2 years [3]. Of those who survive, half do not regain their prefracture level of independence [2,4,5]. These consequences may be ameliorated if individuals with low bone mass are identified and treatment initiated prior to fracture. Clinical risk factors for low bone mass or fractures include being female, elderly, tall, or having a light body weight, history of prior fragility fractures or corticoid steroid use [6,7,8,9,10,11,12]. The fact that two-thirds of the adult US population visits their dentist annually [13] and that dentists often make radiographs of the jaws, makes dentists a potentially valuable resource for patient screening for signs of osteoporosis. Several lines of work have demonstrated that individuals with osteoporosis have altered the morphology of the mandible [14,15,16,17,18,19,20]. Specifically, resorption and thinning of the inferior border of the mandible has been correlated to low hip and spine bone mineral density (BMD). Dentists identifying patients with clinical and radiographic risk factors associated with low bone mass could make appropriate referrals for diagnosis and management.

The aim of this study was to evaluate the efficacy of clinical and dental panoramic radiographic risk factors for identifying individuals with low femoral bone mass.

Materials and methods

Subjects

Of 652 women who visited our clinic for BMD assessment between 1996 and 2003, 227 Japanese postmenopausal women aged 33–84 years (mean±SD, 57.3±7.5) consented to participate in this study. Twenty-seven women had bilateral oophorectomy. Nine women were receiving estrogen replacement therapy (ERT); however, the duration of ERT was less than 1 year in eight women and less than 4 years in one individual. All subjects had no menstruation for at least 1 year prior to their BMD measurement. Exclusion criteria were: no consent given for panoramic radiographs and questionnaire, use of tobacco or medications that affect bone metabolism, and presence of metabolic bone disease, cancers with bone metastasis, significant renal impairment, liver disorder, bone destructive lesions in the mandible, non-vertebral osteoporotic fractures or vertebral osteoporotic fracture on X-ray at BMD assessment. Each participating university granted Institutional Review Board approval for this study.

Femoral BMD measurements

BMD at the femoral neck was measured by dual energy X-ray absorptiometry (DXA) (DPX-alpha; Lunar Co., Madison, Wisc., USA). The in vivo short-term precision error for femoral BMD in our clinic is 2.9%. Femoral BMD were categorized as normal (T-score greater than −1.0), osteopenia (T-score –1.0 to –2.5) or osteoporosis (T-score less than −2.5) according to the WHO classification. Subjects’ age, height, weight and age at menopause were determined at the time of the DXA examination.

Panoramic measurements

Panoramic radiographs were made on all subjects who gave informed consent at the time of DXA measurement. All dental panoramic radiographs were obtained with AZ-3000 (Asahi Co., Kyoto, Japan) at 12 mA and 15 s; the kVp varied between 70 and 80. Screens of speed group 200 (HG-M, Fuji Photo Film Co., Tokyo, Japan) and film (UR-2, Fuji) were used. Mandibular cortical shape on dental panoramic radiograph was determined by observing the mandible distally from the mental foramen bilaterally and categorized into one of three groups according the method of Klemetti et al. [21] as follows: normal cortex: endosteal margin of the cortex is even and sharp on both sides; mildly to moderately eroded cortex: endosteal margin shows semilunar defects (lacunar resorption) or appears to form endosteal cortical residues; severely eroded cortex: cortical layer forms heavy endosteal cortical residues and is clearly porous.

Overall agreements for intra-observer and inter-observer performances were 92% and 82%, respectively.

Measurement of mandibular cortical width was made bilaterally on the radiographs at the site of mental foramen according to our previous study [16]. A line parallel to the long axis of the mandible and tangential to the inferior border of the mandible was drawn. A line perpendicular to this tangent intersecting inferior border of mental foramen was constructed, along which mandibular cortical width was measured by calipers. Mean cortical width on both sides of the jaw was used in this study. The coefficient of variation due to positioning error and operator error in cortical width measure was less than 2%. Intra-observer variation in cortical width measure was 0.1 mm, which was similar to inter-observer variation.

Strut analyses

We examined the projected image of the trabecular structure of the mandibular ramus on the panoramic radiograph with a custom program. A region of interest measuring approximately 8 cm2 was identified in the mid portion of each ramus. The program performed a strut analysis and the values measured for each ramus were averaged. The strut analysis employs methods described both in studies of trabeculae pattern using radiographs [22,23] and histologic slides [24,25,26]. Patient radiographs were scanned at 600 dpi, made uniform in overall intensity by blurring the image by applying a Gaussian filter with a sigma of 10 pixels, and subtracting the blurred version of the image from the original. This process was repeated 3 times and resulted in a density-corrected image, i.e. an image with an overall uniform density on a scale much larger than the size of individual trabeculae. The density-corrected image was then made binary, skeletonized and analyzed to measure strut features of the selected area of the radiograph, including length of the skeletonized trabeculae, number of termini and nodes per unit area, and number and lengths of strut segments between termini and nodes.

Run length analysis

Textural properties of the digitized radiographs were examined using a run length analysis described by Cortet et al. [27]. This method calculates various statistics whose values vary depending on the size and distribution of image features along a specified direction (e.g. vertical or horizontal). Two statistics, defined as short run (R1) and long run (R2) emphasis statistics, were evaluated. These are defined as:

$$ {R_{1} ={\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {\frac{{L(i,j)}} {{j^{2} }}} }} \mathord{\left/ {\vphantom {{R_{1} ={\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {\frac{{L(i,j)}} {{j^{2} }}} }} {{\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {L(i,j)} }}}} \right. \kern-\nulldelimiterspace} {{\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {L(i,j)} }} $$
$$ {R_{2} ={\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {j^{2} L(i,j)} }} \mathord{\left/ {\vphantom {{R_{2} ={\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {j^{2} L(i,j)} }} {{\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {L(i,j)} }}}} \right. \kern-\nulldelimiterspace} {{\sum\limits_{i=1}^N {} }{\sum\limits_{j=1}^K {L(i,j)} }} $$

where N is the number of gray levels in the image, K is the size of the image in a specified direction and L (i, j) is number of occurrences of “runs” or adjacent pixels of intensity i and length j in the given direction. In the current study R1 and R2 were calculated for the vertical (superio-inferior) and horizontal (antero-posterior) directions of the density-corrected image.

Statistical analysis

We computed the means of the continuous clinical and radiographic putative risk features by osteoporotic group. The P-value is from an analysis of variance (ANOVA) that simultaneously compares the three means. One clinical variable, bilateral oophorectomy, was binary (yes/no) and we compared the percentage of women to have had a bilateral oophorectomy in the three osteoporotic groups by chi-square techniques. We also tested the predictive power of the clinical and radiographic risk factors to classify individuals as being in one of three categories of femoral BMD (normal, osteopenic or osteoporotic) by using classification and regression trees (CART) analysis [28]. This multivariate approach, an alternative to linear regression techniques, can be used to predict categorical (classification) or continuous (regression) outcomes [28,29]. Multivariate methods often afford an increase in power to correctly classify subjects over methods using single variables. The CART method employs a binary recursive algorithm. At the start, all individuals are considered together at the “root” of a prediction tree. The data are split on the variable that results in the largest difference between the two successive “nodes” (in terms of percent of low bone mass or normal individuals). In each daughter node, variables are again examined to find the predictor that results in the best split between low bone mass and normal individuals. Splitting continues until stopping criteria are reached or until further splitting does not improve classification. Individuals in these terminal nodes (“leaves”) are classified as normal, osteopenic or osteoporotic. The weighted kappa statistic was used to assess agreement between the actual and the bone mass category predicted by the different trees. The 95% confidence intervals (CI) were calculated. We also measured cross-validation error rates, an estimate of the performance of the CART classification tree with new data. This is accomplished by randomly dividing the data into ten parts. Trees are successively built using nine of the parts, and tested on the tenth part. The mean of these ten misclassification errors is reported as the cross-validation error rate.

Results

Clinical risk factors

Mean subject age, height, and weight were different in the three bone density groups (Table 1). Neither mean age at menopause nor the percentage of women with bilateral oophorectomy was significantly different in the three bone density groups. Using only clinical risk factors, the CART algorithm (Fig. 1) correctly identified the WHO classification of 137 (60%) of the 227 subjects. Of the 157 individuals with femoral osteopenia or osteoporosis, 136 were so predicted (sensitivity=87%). Of the 70 individuals with normal femoral BMD values, 31 were so identified (specificity=44%).

Table 1 Clinical risk factor means by femur WHO class
Fig. 1
figure 1

CART result of clinical predictors of osteoporotic category. The top cell contains all study subjects. At each split, the variable that produces the most favorable division of the data is indicated, along with the levels of the variable at which the best split occurs. Age is the most effective clinical variable for separating osteoporotic subjects from normal or osteopenic subjects. Weight is the most useful clinical variable for separating the latter group into osteopenic or normal subjects. In each terminal node, shown with a bold box outline, the predicted osteoporosis class is indicated with bold text

Radiographic features

Mean mandible cortical width, cortical index and all run length features were significantly different in the three bone density groups (Table 2). Mean levels of multiple strut and Fourier variables were not significantly different in the three bone density groups. A CART analysis considering only radiographic features (Fig. 2) found that the most important radiographic risk factors for classifying subjects by WHO status were thickness of inferior border of the mandible, total length of the node-to-terminus segments in the ROI, and horizontal long run emphasis (R2) statistic. This algorithm correctly identified the WHO class of 144 (63%) of the 227 subjects. Of the 157 individuals with femoral osteopenia or osteoporosis, 130 were so identified although none of the subjects with osteoporosis was correctly classified (sensitivity=83%). Of the 70 individuals with normal BMD, 46 were so identified (specificity=66%). The 95% confidence interval of the weighted kappa score for the radiographic risk factor algorithm (weighted kappa=0.38, SE=0.05, CI=0.28–0.49) overlaps that of the clinical risk factor algorithm (weighted kappa=0.48, SE=0.06, CI=0.37–0.59). Similarly, the 95% confidence interval of the cross-validation error rate of the radiographic algorithm (cross-validation error rate=0.445, SE=0.033), an estimate of its performance with new data, overlaps that of the clinical algorithm (cross-validation error rate=0.471, SE=0.033).

Table 2 Radiographic risk factor means by femur WHO class
Fig. 2
figure 2

CART result of radiographic predictors of osteoporotic category. The most useful radiographic predictors for separating subjects are mandibular cortical thickness, total node to terminus strut length square cm of the region of interest, and horizontal run length (R2)

Clinical and radiographic risk factors combined

A CART analysis using both clinical and radiographic features found that the most useful risk factors for classification were thickness of inferior border of the mandible and age (Fig. 3). This algorithm correctly classified the WHO class of 143 (63%) of the 227 subjects (Table 3). Of the 157 individuals having femoral osteopenia or osteoporosis, 130 were so predicted (sensitivity=83%). Of the 70 individuals with normal BMD, 30 were so identified (specificity=43%). The 95% confidence interval for the weighted score and the cross-validation error score overlaps with the corresponding ranges of the algorithms using clinical or radiographic risk factors alone.

Fig. 3
figure 3

CART result of clinical and radiographic predictors considered together and osteoporotic category. Cortical thickness and age are the best of the combined clinical and radiographic predictors for classifying subjects

Table 3 CART classification matrix of study subjects using clinical and radiographic risk factors combined by femur WHO class. Weighted kappa=0.50, SE=0.06, CI=0.39–0.60, exact P-value=5.11×10-16, cross-validation error rate=0.396, SE=0.032

Discussion

This study found that algorithms using clinical or radiographic risk factors available to dentists, alone or in combination, identified individuals with low femoral bone mass. The most useful clinical risk factors in this study, age and weight, are well recognized in the osteoporosis literature. In both CART trees using clinical risk factors, older age and lower weight were associated with reduced BMD. These findings are comparable to those of studies using the Osteoporosis Self-Assessment Tool for Asians (OSTA) [6]. Our model differs in that weight is only considered in women under 63.5 years. In addition, the OSTA model was designed to identify individuals with a femoral neck BMD T-score less than or equal to −2.5, while ours was designed to classify individuals as normal, osteopenic or osteoporotic. The OSTA model has a sensitivity of 91% and specificity of 45% in the sample used for its development and 98% and 29%, respectively, in the population used for its validation. These results are comparable to our clinical algorithm that identified 87% of the subjects with osteopenia or osteoporosis while correctly classifying 44% of the normal subjects. Other risk models for predicting low BMD or hip fracture, including Osteoporosis Risk Assessment Index (ORAI) [7], fracture index [8], Rotterdam Hip Fracture Risk Score [30], Osteoporosis Index Of Risk (OSIRIS) [9] and Simple Calculated Osteoporosis Risk Estimation (SCORE) [10], have also found age and weight to be important risk factors. Although each of these models also uses other variables including current or past estrogen use, current smoking, history of self or maternal hip fracture, or current rheumatoid arthritis, their sensitivities and specificities for predicting low femoral BMD are comparable to OSTA and the results of the current study.

The most useful radiographic risk factor for classifying subjects in this study was thickness of inferior mandibular border. This work is consistent with previous studies that found thinning of the mandibular inferior border in subjects with osteoporosis [16,21,31,32,33,34], although some studies have found no such relationship [19,35]. Klemetti et al. [21] found that a diagnostic threshold of 4 mm width of the inferior border is optimal, although insufficient by itself, for good classification of subjects. Devlin and Horner [14] found that a diagnostic threshold of 4.34 mm yielded the highest diagnostic accuracy (sensitivity of 67%, specificity of 74%) in distinguishing between individuals with normal versus low bone density (T-score less than or equal to −1 in the femoral neck, lumbar spine or proximal or distal forearm). Bollen found that subjects with a history of osteoporotic fracture had a mean cortical thickness of 3.8 mm while subjects without fracture had a mean cortical thickness of 4.5 mm [34]. These values correspond well with the optimal classification threshold found in this study of 4.785 mm. We also found that combining thickness of the inferior border with age we are able to identify 130 of the 157 subjects with low bone mass (sensitivity of 83%) but only correctly identify 30 of 70 normal subjects with low bone mass (specificity of 43%). Devlin and Horner opined that a high incidence of false positives would be unnecessarily stressful and thus suggest reducing the threshold thickness to 3 mm [14]. Doing so increased their specificity to 100% but at a cost of a sensitivity of only 20% for predicting low bone mass. Using a thickness of 3 mm with our data gave comparable results. We believe that if dental panoramic radiographs are used in assessing patients for signs of osteoporosis, it is more appropriate to set the threshold in the mid 4 mm range to identify substantively more individuals with low bone mass.

Changes in trabecular structure associated with osteoporosis are well described in the hip [36], pelvis [37], spine [38,39], distal radius [27,40,41,42] and mandible [43]. Aaron et al. noted that aging women tend to loose trabeculae in the ilium whereas men show thinning of trabeculae [44]. Tested individually, short and long run length R1 and R2 values, measured horizontally or vertically, were significantly different in the three osteoporosis groups. Our finding that subjects with osteoporosis have relatively low short run (R1) and high long run (R2) values are comparable to those of Cortet et al. [27]. These findings suggest are consistent with the concept that the projected trabecular pattern is less complex in individuals with osteoporosis; that there is a relative loss of small side branches and retention of longer trabecular struts. In the CART radiographic algorithm only long run (R2) and total length of node-to-terminus segments per unit area were found useful on a subset of subjects. Both these measures were less useful than cortical thickness.

Clearly a cost-effective means for screening large numbers of individuals is needed. The results of this study suggest that dentists have the information to assist in identifying individuals with low bone mass. Panoramic radiography is common in dental practices and the dose is comparable to a set of four conventional bitewing views. The clinical model and the clinical and radiographic combined models were the most efficient for screening purposes as they use only two readily attainable values. Each gives comparable results. Currently the model considering only clinical features has the virtue of simplicity of use in the dental office. Once patients are identified in the dental office as being at risk of osteoporosis they should be referred to their primary care physician for appropriate evaluation.