Introduction

Osteoporosis is highly prevalent in the Japanese population and increases the risk of fracture, which, therefore, increases financial and social costs [1]. A cohort study reported that there were 13 million osteoporosis patients in Japan [2]. In 2007, an estimated 150,000 people in Japan experienced hip fracture [3]. Vertebral fracture in older adults is associated with increased risk of subsequent fracture, and leads to increased risk of morbidity and mortality [1]. The reported 5-year survival rate of patients with osteoporotic hip fracture was 32–61 % [4, 5]. Thus, it is necessary to diagnose osteoporosis early to prevent femoral fractures and to avoid elderly people becoming bedridden.

Osteoporosis is diagnosed by measuring the bone mineral density (BMD) of the lumbar vertebrae and proximal femoral bones with dual-energy X-ray absorptiometry (DXA) [6, 7]. Taguchi et al. examined the width and shape of the mandibular cortical bone on panoramic radiographs and found a relationship between BMD and the mandibular cortical index [813]. Mandibular cortical width and shape differed among groups according to their BMD as measured by DXA [1012]. Elevation of biochemical markers has been markedly associated with cortical erosion [1416]. Thus, the role of panoramic radiographs in screening for osteoporosis has been validated. Patients can be screened for osteoporosis based on the criteria of mandibular cortical bone width and shape.

In recent years, computer-assisted detection (CAD) systems have been developed to screen for osteoporosis on panoramic radiographs [1721]. Nakamoto et al. developed a CAD system to evaluate mandibular cortical erosion based on a mathematical morphology method [17]. Muramatsu et al. designed a system to extract the cortical contour and automatically measure cortical bone width, by applying an edge detection technique [20].

However, the reported inter-observer agreement in evaluating mandibular cortical shape has ranged from relatively high to low [2231]. Although this discrepancy may be caused by individual differences in evaluating the endosteal margin of the mandibular cortical bone [8, 25], the causes of poor agreement have not been thoroughly evaluated. Inter-observer agreement will be improved with better understanding of the factors involved in evaluating cortical shape and clarification of the causes of diagnostic disagreement. This understanding will improve the diagnostic performance of CAD systems.

In this study, three experienced observers classified mandibular cortical shape and their intra- and inter-observer agreement was calculated. Agreement was compared with the results of previous studies. The involvement of cortical width in evaluating cortical shape was examined, and the causes of diagnostic disagreement were investigated.

Materials and methods

Subjects

Digital panoramic radiographs from 228 consecutive female patients at Asahi University Dental Hospital and 12 cooperating facilities in 2014 were evaluated. Patient age ranged from 17 to 88 years, with a median age of 65 years.

This study was approved by the Ethics Committee of Asahi University.

Intra- and inter-observer agreement

Three specialists in oral and maxillofacial radiology with more than 25 years of experience (AK, AT, and EA) evaluated the radiographs. The three observers underwent specific training prior to the actual interpretation. They learned the criteria of Klemetti et al. [32], and practiced the classification using 100 previously prepared panoramic radiographs of patients whose bone mineral content was known.

The 228 anonymous digital panoramic radiographs were randomly displayed. The observers evaluated the mandibular cortical shape and divided the radiographs into the following three classes, according the criteria of Klemetti et al. [32]: Class 1, the endosteal margin of the cortex was even and sharp on both sides; Class 2, the endosteal margin showed semilunar defects or endosteal cortical residues on one or both sides; and Class 3, the cortical layer formed heavy endosteal cortical residues and was clearly porous (Fig. 1). Each observer evaluated each radiograph twice. The second evaluation was performed at least 2 months after the first.

Fig. 1
figure 1

Criteria for evaluating mandibular cortical shape. a Class 1: the endosteal margin of the cortex is even and sharp on both sides. b Class 2: the endosteal margin shows semilunar defects or endosteal cortical residues on one or both sides. c Class 3: the cortical layer forms heavy endosteal cortical residues and is clearly porous

Intra-observer agreement (Cohen’s weighted kappa) between the first and second evaluations of each observer was calculated. Inter-observer agreement among the three observers was also calculated. The results were compared with those of previous reports regarding intra- and inter-observer agreement [2231]. The interpretation of Cohen’s kappa values is shown in Table 1 [33].

Table 1 Interpretation of Cohen’s kappa

Involvement of cortical width in evaluating cortical shape

Prior to this study, panoramic radiographs of the dental X-ray head phantom (Kyoto Kagaku Co, Ltd, Kyoto, Japan) were taken with each set of equipment in each institution for calibration. The magnification of each panoramic radiograph was obtained, and the respective corrected value was used in the subsequent analysis.

Mandibular cortical width was measured according to Taguchi’s method [11, 27, 34, 35]. A line was drawn parallel to the long axis of the mandible and tangential to the inferior border of the mandible. This line intersected the inferior border of the mental foramen. The mandibular cortical width was measured manually by one oral and maxillofacial radiologist (YA) [21, 22]. The measurement was performed three times, and the average value was calculated. Measurements were obtained on both sides of the mandible, and the smaller value was accepted. The involvement of mandibular cortical width in evaluating cortical shape was investigated.

The reliability of measurement was also obtained prior to this study. One examiner (YA) initially measured the width of 10 mandibles three times, and the intra-examiner intraclass correlation coefficient (ICC) was obtained [ICC (1,3) = 0.989]. In addition, two examiners (YA and AK) measured the width of 10 mandibles once each; inter-examiner ICC (2,1) was 0.984. The reliability of measurement was sufficiently high.

Causes of diagnostic disagreement

Six diagnoses were made for each radiograph (two interpretations by each of the three observers) and cases in which four or fewer out of the six diagnoses matched were categorized as a disagreement. The causes of diagnostic disagreement were investigated by two oral and maxillofacial radiologists (YA and EA) with reference to Taguchi’s article [7]. Taguchi listed the following points as causes of disagreement: (1) slight resorption was seen at the endosteal margin of the cortical bone, although the cortical width was sufficiently thick; (2) endosteal cortical residues were seen near the markedly thinned smooth cortex; and (3) the hyoid bone was projected superimposed on the thin cortical bone. Each radiologist separately judged whether the discrepancy in evaluation could be attributed to one of these three causes or to a different cause. If the judgment of two radiologists did not match, a decision was made by discussion.

Results

Intra- and inter-observer agreement

Intra-observer agreement results are shown in Table 2. The overall kappa values were 0.58 for Observer A, 0.76 for Observer B, and 0.75 for Observer C, indicating moderate to substantial agreement. Regarding evaluation of each class, the kappa value for Class 1 showed substantial to almost perfect agreement for all three observers. However, the kappa values for Classes 2 and 3 were smaller than for Class 1.

Table 2 Intra-observer agreement (three classification)

Inter-observer agreement results are shown in Table 3. The overall kappa values were 0.62 and 0.69 for the first and second interpretations, respectively, indicating substantial agreement. Although the kappa values for Class 1 were very high (0.81 and 0.80), those for Class 2 were lower (0.57 and 0.43).

Table 3 Inter-observer agreement (three classification)

A summary of previous reports regarding intra- and inter-observer agreement is shown in Table 4. Intra-observer kappa values showed substantial to almost perfect agreement in many reports. Inter-observer agreement ranged widely from 0.30 to 0.86.

Table 4 Summary of intra- and inter-observer agreement in previous reports

Involvement of cortical width in evaluating cortical shape

Figure 2 shows there was a large overlap in the cortical widths of cases diagnosed as Class 2 and those diagnosed as Class 3. A smaller overlap was seen between Class 1 and Class 3 cases.

Fig. 2
figure 2

Involvement of cortical width in evaluating cortical shape. The horizontal axis indicates the mandibular cortical width (mm). The vertical axis indicates the frequency of each classification of the cortical shape. The normal distribution fitting curves are shown. Blue indicates Class 1; yellow Class 2; and red Class 3

Causes of diagnostic disagreement

The causes of diagnostic disagreement are shown in Table 5. The cases classified as Class 1 or 2, Class 2 or 3, and Class 1–3 numbered 14, 25, and 10, respectively. The disagreement was mostly found in classification as Class 2 or 3.

Table 5 Causes for disagreement of diagnosis of mandibular cortical shape

In all 14 cases that were classified as Class 1 or Class 2, the cortical width was sufficiently maintained and slight resorption was seen at the endosteal margin of the cortical bone (Fig. 3a). Evaluation of this resorption may differ according to the observer. In four cases, the hyoid bone was superimposed on the mandibular cortical bone, and therefore, evaluation of the endosteum of the cortical bone became more difficult (Fig. 3b).

Fig. 3
figure 3

Causes of diagnostic disagreement about mandibular cortical shape. a The cortical width is sufficiently maintained and slight resorption (arrow) is seen at the endosteal margin of the cortical bone. b The hyoid bone (arrow) is superimposed on the mandibular cortical bone. c Endosteal cortical residues (arrows) can be seen near the markedly thinned smooth cortex

In all 25 cases that were classified as Class 2 or Class 3, endosteal cortical residues were seen near the markedly thinned smooth cortex (Fig. 3c), complicating evaluation. In eight cases, the hyoid bone was superimposed on the thin cortical bone, and therefore, evaluation of the endosteum of the cortical bone became more difficult (Fig. 3b).

In the cases that were variously diagnosed as Class 1, Class 2, or Class 3, the cause of discrepancy was thought to be slight resorption at the endosteal margin of cortical bone (Fig. 3a), or endosteal cortical residues near the thinned smooth cortex (Fig. 3c).

Discussion

The use of panoramic radiographs has gained worldwide recognition as an effective method for screening for osteoporosis [11, 3638]. Several studies have explored the relationships between mandibular cortical index and BMD, bone turnover, and fracture risk. The correlation coefficient between mandibular cortical width and BMD was reported to be 0.44 [10]. The odds ratio for osteoporosis for patients of the lowermost quartile of cortical width was 5.43, compared with those in the uppermost quartile [10]. The odds ratio for osteoporosis in individuals with a severely eroded cortex was 14.7, compared with those with a normal cortex [10]. Regarding bone turnover, biochemical markers elevate in association with severe cortical erosion [1416]. Regarding the risk of osteoporotic fractures, the odds ratio for fracture in patients with a severely eroded cortex was 8.0 [39]. A similar trend was found in a Japanese study [38]. Therefore, female patients diagnosed with Class 3 cortical shape should be referred to a specialized medical facility for further examination. It is important that dentists are able to diagnose patients with Class 3 cortical shape on panoramic radiographs. To improve diagnostic accuracy, it is necessary to examine the reliability of the diagnosis and to clarify the causes of diagnostic discrepancies.

Most previous studies have reported substantial to almost perfect intra-observer agreement in evaluating cortical shape [22, 2628, 30]. In this study, intra-observer kappa values for Classes 2 and 3 were lower than for Class 1. In contrast, reported inter-observer kappa values range widely from 0.30 to 0.86 [2227, 2931]. In this study, the inter-observer kappa for Class 2 was low, indicating moderate agreement. Based on the above findings, improving diagnostic reliability between Classes 2 and 3 will lead to better intra- and inter-observer agreement. Ledgerton et al. reported that most discrepancies occurred at the border between two categories (Class 1–2 or Class 2–3) [25]. Sutthiprapaporn et al. reported that general dental practitioners had sufficient diagnostic skills after attending a training lecture [40]. Therefore, training in how to distinguish between Classes 2 and 3 according to Klemetti et al. [32] should increase diagnostic performance.

When observers classify cortical shape, they may refer to cortical width. In this study, there was a large overlap in cortical width between cases diagnosed as Class 2 and those diagnosed as Class 3. It may be difficult to determine which patients should be referred to a medical facility based only on cortical width.

Finally, the causes of diagnostic discrepancy were examined. In the cases that were classified as Class 1 or Class 2, slight resorption was present at the endosteal margin of the sufficiently thick cortical bone. The observers could not judge easily whether this indicated a part of the eroded cortex. Taguchi pointed out that this finding was seen frequently where the trabecular bone tails connect to the inferior cortex in patients with healthy skeletal BMD, and might be misdiagnosed as eroded cortex [8]. Such trabecular bone can be diagnosed on three-dimensional images, such as computed tomography and cone beam computed tomography. The disagreement was mostly found in Class 2 or 3 classifications. These cases had endosteal cortical residues near the markedly thinned smooth cortex. These findings were frequently seen in patients with a severely eroded cortex [8]. The last cause of diagnostic disagreement was superimposition of the hyoid bone on the cortical bone. The hyoid bone may hide the cortical characteristics, especially in patients with a thin cortex, complicating diagnosis.

The limitation of this study was that no BMD data were obtained. Inter-observer agreement in evaluation of mandibular cortical shape was clarified. However, it is not known which diagnosis was correct, when diagnoses differed among the observers.

In conclusion, the intra- and inter-observer overall kappa values in the diagnosis of mandibular cortical shape indicated moderate to substantial agreement. The kappa value for Class 2 was smaller than for Class 1. There was a large overlap in cortical width between cases diagnosed as Class 2 and those diagnosed as Class 3. The disagreement was mostly found in Class 2 or 3 classifications. The main cause of disagreement in these cases was that endosteal cortical residues were seen near the markedly thinned smooth cortex. Development of criteria for categorizing borders is important. Categorization at areas of overlap between Classes 2 and 3 is likely to be improved by adding information about the cortical width to the cortical shape classification.