Introduction

Panoramic radiography is frequently used for screening for various abnormalities of the jaws and their adjacent structures, and is recognized as a reliable and convenient technique [1]. However, because of the complexity of the relationships between anatomical structures and the panoramic image layer, panoramic radiography images may sometimes be difficult to interpret, especially for inexperienced observers, with the result that critical disease may be overlooked [2]. In this regard, a number of computer-assisted detection/diagnosis (CAD) systems have been developed for various diseases, including maxillary sinusitis [3], osteoporosis [4, 5], and carotid artery calcification [6]. In these systems, image characteristics that are extracted by experienced human observers are input into the CAD system for diagnostic assistance. More recently, deep learning (DL) systems with convolutional neural networks (CNN) have been introduced into the field of oral and maxillofacial diagnostic imaging [7,8,9,10,11,12,13,14,15,16,17]. In this technique, the computer system can automatically learn to extract image characteristics suitable for performing various tasks such as classification, segmentation, and image enhancement. Object detection, which is one such function, involves the detection of objects learned during the training process. One such CNN-based DL system is DetectNet, which outputs the XY coordinates of a detected object as a colored box when a testing image is input into the trained learning model [18]. DetectNet has already been applied to diseases in various medical fields, such as mammary disease, and showed high diagnostic performance [19]. Although deep learning systems may provide a way to fully automate the detection of abnormalities or anatomic structures, there are relatively small number of studies reporting their application to panoramic imaging [11,12,13,14,15,16].

Vertical root fractures (VRF) are reported to occur in 3.7–30.8% of endodontically treated teeth, and are predominantly seen in the mandibular premolars and molars [20,21,22,23]. They are one of the most difficult dental diseases to treat conservatively, and almost all VRF teeth are either extracted or treated by hemisection or root separation techniques. However, early treatment involving the resection of affected roots can achieve relatively long survival times for the remaining roots, with 5- and 10-year survival rates of 94% and 64%, respectively [24]. In addition, VRFs are sometimes identified incidentally on panoramic images, because some endodontically treated teeth show no or only slight symptoms, even when a VRF is present [25]. Although the detectability of VRF is reported to be higher with cone-beam computed tomography (CBCT) for dental use than with intra-oral or panoramic radiography [20,21,22, 26,27,28,29,30,31,32], the radiation dose to patients from CBCT is relatively large [20, 30, 31], and CBCT is not therefore suitable for screening purposes.

The aims of the present study were to develop an object detection model to identify teeth with a VRF on panoramic images, and to evaluate the diagnostic performance of this model.

Materials and methods

Materials

A total of 1914 images were extracted from our hospital image database of 65,490 panoramic images stored between April 2013 and February 2019. These 1914 images were retrieved using the search engine of our radiology information system, using the reference words “root fracture” on the imaging reports. From these 1914 extracted images, 300 subjects (150 females and 150 males, with a mean age of 66.05 years) were selected through a review process involving two oral and maxillofacial radiologists (MF and EA) and an endodontist (KI). For each included subject, at least one tooth with a VRF could be clearly identified on panoramic images, with all three observers being in agreement. In 28 subjects, two teeth showed a VRF on a single image, while one subject had three teeth with a VRF. Consequently, 300 panoramic images of 1039 × 1378 pixels showing 330 teeth with a VRF were downloaded in jpeg format, and these served to create the dataset for the deep learning process. The distributions of the tooth types of these 330 teeth (described true boxes) are summarized in Table 1. The VRFs were most frequently observed in the mandibular molars (54.8%) and mandibular premolars (17.6%). Most of the teeth with a VRF (96.4%) could be judged as endodontically treated teeth according to the finding of root canal filling materials.

Table 1 Materials and results of the computer-aided diagnosis model according to tooth type

All 300 images were obtained on a panoramic machine (Veraviewepocs X550 P-CR, J. Morita Mfg Corp., Kyoto, Japan) with a tube voltage of 75 kVp, tube current of 9 mA, and acquisition time of 16 s.

Preparation of the dataset

Images of 900 × 900 pixels were cropped from the downloaded images and teeth with a VRF were labeled by the setting of arbitrary-sized rectangular regions of interest (ROI) sufficient to contain their crown and roots. These ROIs were set by an oral and maxillofacial radiologist (MF) using Image J software (National Institute of Health, Bethesda, Maryland, USA), and the coordinates of the upper left (X1, Y1) and lower right (X2, Y2) corners were recorded and converted to text form (Fig. 1).

Fig. 1
figure 1

The composition of an item in the dataset. Two files are included, the first being the panoramic image and the second a label file including the XY axis coordinate information. These two files must be named with the same letter

Architecture of the deep learning system

The deep learning process was performed using the Digits version 5.0 training system (Nvidia, California, USA) with a customized DetectNet (https://devblogs.nvidia.com/detectnet-deep-neural-network-object-detection-digits/). The workstation had an Ubuntu 16.04 operating system and GeForce 1080Ti graphics processor unit (Nvidia).

Learning process and evaluation of diagnostic performance

A fivefold cross-validation method was applied to the training and testing process [33, 34] (Fig. 2). The dataset was divided into five parts, with four parts (b, c, d, and e) being used as training data and the remaining part (a) being used as testing data. This process was repeated five times, changing the testing dataset each time. The training processes involved 1000 epochs using the ADAM (adaptive moment estimation) solver with an initial learning rate of 0.0001. Five learning models were created, and the corresponding testing data were applied to the respective models. Detected areas were shown as red boxes on each image of the testing dataset (Fig. 3), and these were assigned as correct when they sufficiently included the root with the VRF and fracture lines.

Fig. 2
figure 2

A flow chart of the fivefold cross-validation procedure. Estimated diagnostic performance was calculated as the average of the five models’ test results

Fig. 3
figure 3

A case of successful detection. Black arrow shows a VRF tooth

Diagnostic performance was evaluated with recall, precision, and F measure values [35,36,37,38] as follows:

$${\text{recall}} = {\text{number}}\,{\text{of}}\,{\text{correctly detected}}\,{\text{boxes}}/{\text{number}}\,{\text{of}}\,{\text{all}}\,{\text{true}}\,{\text{boxes}}$$
$$\begin{aligned} &{\text{precision}}= \,{\text{number}}\,{\text{of}}\,{\text{correctly}}\,{\text{detected}}\,{\text{boxes}} /\left( {\text{number}}\right.\\ &\left.{\text{of}}\,{\text{correctly detected}}{\text{boxes}}\, + \,{\text{number}}\,{\text{of}}\,{\text{falsely detected boxes}} \right) \end{aligned}$$
$$F{\text{ measure}} = 2 \times \left( {{\text{recall}}\, + \,{\text{precision}}} \right)/\left( {{\text{recall}}\, + \,{\text{precision}}} \right).$$

The “number of all true boxes” is the number of teeth that truly have a VRF (n = 330). In other word, recall and precision mean sensitivity and positive predictive value, respectively. These two indices have a trade-off relationship, therefore, their harmonic means which is the so-called F measure is also used to evaluate the performances of machine learning [35,36,37,38]. Estimated diagnostic performances were defined as the means of the results of the five learning models.

Results

The testing results are summarized in Table 1. Of the 267 boxes detected on the 300 images, 247 boxes correctly identified teeth having a VRF, but 20 boxes incorrectly marked teeth without a VRF. Among the incorrectly detected 15 mandibular molars, 2 received root separation surgery. Out of the 330 teeth with true VRFs, 83 teeth were not detected, with no relevant box being shown on the images. Seven (58.3%) of 12 teeth without endodontic treatment were not detected. The recall rates were low for the maxillary incisors but high for the mandibular premolars and molars, while the precision rates were high regardless of tooth type (Table 2). Consequently, the estimated recall, precision, and F measure were 0.75, 0.93, and 0.83, respectively.

Table 2 Diagnostic performance according to tooth type

A typical correctly detected box is shown in Fig. 3, with a VRF being detected in the right mandibular first molar. Figure 4 shows a maxillary second molar without endodontic treatment that was not detected in the image. In Fig. 5, although a VRF is correctly detected in the mandibular left premolar, the first molar after root separation is incorrectly detected as a tooth with a VRF. Figure 6 shows a right first molar showing periapical and bifurcation radiolucency that was misdiagnosed as having a VRF.

Fig. 4
figure 4

A case of failed detection. White arrow shows a VRF tooth

Fig. 5
figure 5

A case of failed detection. Black arrow shows a VRF tooth, white arrow shows a misdetected tooth

Fig. 6
figure 6

A case of failed detection. Black arrow shows a VRF tooth, white arrow shows a misdetected tooth

Discussion

In the present study, VRFs were predominantly observed in the mandibular molars (54.8%), followed by the mandibular premolars (17.6%). These results support the descriptions in textbooks [39] and previous reports [20,21,22,23]. To the contrary, a study from Romania reported the maxillary incisors to be the predominant tooth type with fracture in teeth extracted from females and males [40]. This discrepancy could be attributed to various factors, including differences in the study population, inclusion of horizontal fractures as well as VRFs, and the influence of different VRF verification methods such as panoramic radiography, cone-beam CT, or endoscopy. Anyhow, the authors are in agreement with the finding that VRFs were mostly observed in endodontically treated teeth [20,21,22, 25, 29, 41].

The detectability of VRFs on CBCT is reported to be significantly higher than on panoramic or intra-oral radiography [20,21,22, 26,27,28,29,30,31,32]. Takeshita et al. compared diagnostic performance between these modalities and showed that the performance of panoramic radiography according to the area under the receiver operating characteristic curve was lower than that of CBCT, but equivalent to full-mouth intra-oral radiography [32]. They emphasized that attention should be paid to the diagnosis of VRFs of the incisors and premolars on panoramic radiography, because the former can be overlapped by the vertebrae and the latter may not be exposed on the orthoradial projection angle. Although CBCT allows three-dimensional visualization of the fracture line, it should not be used for screening because of the considerable exposure to ionizing radiation. Therefore, it is recommended that CBCT should only be used as an additional examination after screening of the whole teeth by panoramic radiography.

In recent years, various studies have been reported to evaluate the use of DL system for various oral and maxillofacial imaging procedures, such as periapical radiography [7, 8], panoramic radiography [9,10,11,12,13,14], and CT images [15]. The object detection functionality is also used for diagnosing abnormalities or anatomic structures on panoramic images [11,12,13,14,15,16]. For automated diagnosis, the object detection technique is thought to be more effective than classification method. Therefore, we conducted this study as a step towards creating a fully automated diagnostic system for panoramic images. Although our results demonstrate the possibility of the technique, the currently acquired performances may be insufficient for clinical practice. The low number of teeth used for the training process may be a reason for the low recall rates in the maxillary teeth and mandibular incisors. Moreover, more than half of the endodontically untreated teeth with a VRF were not detected in the present study. This result may be attributed to the low numbers of such teeth and the fact that the learning model probably extracted the characteristics of endodontically treated teeth. Therefore, a future study should be conducted with higher numbers of such teeth with a VRF, to create a more effective model.

In the case of falsely detected teeth, mandibular molars in the postoperative state of root separation might be assigned as having a VRF. Although these teeth were regarded as falsely detected boxes in the present study, the detection of separated teeth could be another purpose of the model. In this case, if they were to be assigned as correct detection, the performance would improve. In addition, even though a fracture line may not be detected on a radiograph, teeth with VRF frequently show characteristic bony resorption, which presents with special appearances on imaging [42,43,44], such as the so-called halo lesion. The deep learning system might learn and extract these characteristic appearances, and then detect such teeth on the basis of their boney appearance when the testing data sets are applied to the learning model. If so, the teeth that were assigned as falsely detected may include those teeth with a true VRF but that went undetected on radiographs. In this regard, a study should be planned using teeth with endoscopically verified VRFs.

This study has some limitations. First, there were too few training data items for the maxillary teeth, mandibular incisors, and endodontically untreated teeth. Second, we only selected panoramic radiography images that included clear VRF lines. Third, all the panoramic images were obtained in just a single hospital. Fourth, preparing the dataset and creating the model was a time-consuming task. Future studies should be addressed towards solving these problems.

Conclusion

We developed an artificial intelligence model for detecting vertical tooth fracture on panoramic radiography. Evaluation of the performance of this model revealed recall of 0.75, precision of 0.93, and an F measure of 0.83, thereby showing that a CAD system for panoramic radiography has potential for this particular function.