Introduction

In the last decade, we have seen a paradigm shift towards a three-dimensional (3D) approach for orthodontic and orthognathic analysis and treatment planning. Conventional records like facial photographs, plaster casts, or traditional frontal and lateral cephalometry, which have been in use since the 1930s [1, 2], have been replaced by 3D photographs, digital dental casts and 3D cone beam CT (CBCT) [35]. Integration of these new 3D records eventually will lead to a virtual head of the patient, which can be used for case analysis and treatment planning in the virtual operating room [6].

CBCT technology has seen a huge increase of its applications, and the number of publications on this topic grows every week. CBCT has reduced radiation compared with a multislice CT scan [7, 8] and therefore can be used for a wider range of patients. Nevertheless, a CBCT scan is still not suitable for the routine orthodontic patient [4]. However, for patients with craniofacial anomalies, orofacial clefts or orthognathic cases, the CBCT has many benefits.

Recently, it has been shown [913] that conventional lateral cephalometric radiographs, which may still be considered as the “gold standard”, can be compared with constructed lateral cephalometric radiographs from CBCT scans, and thus, the latter can be used for longitudinal research in cases where there are conventional radiographic records from the past. In a similar study on frontal radiographs, some differences between conventional and constructed images for CBCT scans were found [14]. These differences were caused by different positioning of the skulls in both X-ray devices. When constructing two-dimensional (2D) images from 3D data, the patient positioning is of utmost importance, but when performing sole 3D cephalometry, the positioning of the patient is of no importance. In two other studies from our group in which we investigated whether conventional radiographs could be compared with 3D models of human skulls, differences between both methods were found, showing that comparison of 2D and 3D records cannot be recommended [15, 16].

Multiple studies have been done on comparison of conventional 2D with constructed 2D out of 3D cephalometry [913] or comparing 2D with 3D cephalometry [15, 16], but at present, to our knowledge, there is no study concerning the comparison of 3D models from two different CBCT devices. At present, there are many CBCT devices on the market, all with their own possibilities and limitations. The aim of this study was to evaluate whether measurements on 3D models of human skulls derived from two different CBCT scans are comparable and moreover if the choice of device used has an influence on the performed measurements.

Materials and methods

Materials

The sample consisted of 40 dry human skulls obtained from the collection of the Department of Orthodontics and Oral Biology of the Radboud University Nijmegen Medical Centre. The skulls were selected from a larger sample according to the following criteria: presence of permanent upper and lower incisors, presence of first permanent upper and lower molars and presence of a reproducible, stable occlusion. The mandible was related to the skull based on the position of the condyle in the fossa and maximum occlusal interdigitation. The mandibular position was fixed with broad tape from the ipsilateral temporal bone around the horizontal ramus of the mandible to the contralateral temporal bone.

Radiography

Each skull was positioned in the Iluma Cone Beam CT scanner (Imtech, Ardmore, OK, USA) and in the i-CAT CBCT scanner (Imaging Sciences International, Inc. Hatfield, PA, USA). The skulls were placed on a foam platform in the i-CAT and on a Perspex layer in the Iluma with the Frankfurt Horizontal plane parallel to the floor, in the centre of the CBCT scanner using the midline light beam to coincide with the midsagittal plane. A CBCT scan was taken of all skulls with both devices. The following settings were used; for the lluma 120 kVp, 3.8 mA s, 20 s with a resolution of 0.3 voxel. For the i-CAT 129 kVp, 47.74 mA s, 40 s with a resolution of 0.4 mm/voxel.

Two 3D models of each skull were constructed (Fig. 1a, b), one from the i-Cat CBCT data and one from the Iluma CBCT data. The 3D models were constructed with Maxilim® (Medicim, Sint-Niklaas, Belgium). This same software was used to analyse the constructed 3D models by identifying landmarks on the actual 3D reconstructed models.

Fig. 1
figure 1

CBCT-constructed 3D models of the same skull. a Three-dimensional model from the i-CAT. b Three-dimensional model from the Iluma

Measurements

For the morphometric analysis, 19 hard tissue landmarks (Table 1) were identified on the 3D models. Fourteen widely used cephalometric variables (12 angles and two linear ratios) were calculated using Maxilim® (Table 2).

Table 1 Anatomical landmarks used in this study
Table 2 Cephalometric variables

Statistical analysis

For both the 3D models derived from the Iluma CBCT scanner and from the i-CAT CBCT scanner, the same operator (OV) marked the landmarks on all 80 3D models five times, each time with a time interval of 1 week. The intra-observer reliability was calculated by means of the Pearson correlation coefficient for the first and second measurement. The mean value and standard deviation of the five measurements performed were used for the statistical analysis. For each measurement, the duplicate measurement error was calculated and compared with the duplicate measurement error of the same measurement for the other device. Because the standard deviations of the measurement errors were skewed, non-parametric statistics were performed. Wilcoxon signed ranks tests were performed to compare the means of corresponding measurements on the two 3D models of the same skull.

Results

Intra-observer reliability, expressed as Pearson correlation coefficient, for the first and second measurements ranged between 0.42 and 0.98 with an average of 0.89 for the i-CAT measurements and between 0.43 and 0.99 with an average of 0.91 for the Iluma measurements (see Table 3). The duplicate measurement error for the i-CAT CBCT-constructed 3D models was significantly smaller for two measurements (NSL/NL, Ll–CR–Lr) as compared with the standard error of the measurements on the Iluma CBCT-constructed 3D models (see Table 3). The duplicate measurement error for the Iluma CBCT-constructed 3D models was significantly smaller for five measurements (SNB, R CR–L, NL/ML, NSL/BOP and AGl/AGr/Lr) as compared with the standard error of the measurements on the i-CAT CBCT-constructed 3D models. For seven measurements, there were no statistical significant differences regarding the measurement error between both types of radiographs (SNA, R ME–AG, NSL/NL, AGr/AGl/Ll, AGl/ME/AGr, ME/AGl/CR and ME/AGr/CR). Reproducibility of the measurements on the Iluma CBCT-constructed 3D models was therefore higher compared with the reproducibility of the i-Cat CBCT-constructed 3D models.

Table 3 Intra-observer reliability expressed as Pearson’s correlation coefficient for first and second measurements and the average duplicate measurement error for repeated measurements of 40 cases with their corresponding P value according to Wilcoxon signed ranks test

A statistically significant difference was found between the i-CAT CBCT-constructed 3D models and the Iluma CBCT-constructed 3D models for eight out of 14 measurements (Table 4). Though the differences for these eight measurements (SNA, R Cr-L, R ME–AG, NL/ML, NSL/ML, NSL/NL, AGr/AGl/Ll and AGl/AGr/Lr) between the two devices was clearly statistically significant (all P values <0.05); for most of them, the actual mean average difference for repeated measurements was small, ranging from 0.27° to 0.95°. For the angles NL/ML and NSL/ML, the difference between both methods was more substantial, with values of 2.56° and 3.21°, respectively. For the ratios, the average differences were statistically significant and ranged from 1.46% to 1.84%. For six measurements (SNB, NSL/BOP, AGl/Me/AGr, Ll/CR/Lr, Me/AGl/CR and Me/AGr/CR), no statistically significant differences were found.

Table 4 Difference between corresponding average measurements on the i-CAT 3D model and on the Iluma 3D model with its 95% confidence interval, corresponding P value and reliability, expressed as Pearson’s correlation coefficient

Discussion

In this study, 14 widely used cephalometric measurements on CBCT-constructed 3D models of dry human skulls derived from two different CBCT devices were compared. Dry skulls were used since it is not considered to be ethical to expose patients twice to radiation. In this way, two models were obtained that were utterly comparable.

The largest error in 2D and 3D cephalometric studies is the error in landmark identification [17, 18], and each landmark exhibits a characteristic pattern of error that contributes to measurement inaccuracy [19, 20]. Images from dry skulls are not affected by any distortion caused by soft tissues. This reduces the chance of errors in landmark identification because it makes an accurate localisation of bony landmarks easier. On the other hand, testing the reliability of the real clinical setting, i.e. soft tissue measurements, is not possible.

In this study, all measurements were performed by one observer. The question remains whether this observer made a systematic error. The duplicate measurement errors in Table 3 are acceptable. Since the stability of the new 3D measurements and comparison between two CBCT devices is the focus of the article, using a consensus approach would be favourable if a comparison with a gold standard was to be made. But, such a gold standard is not available. Furthermore, possible systematic errors in the identification of landmarks would be the same for both types of 3D models and therefore would not have any influence on the reproducibility. Averaging/combining the measurements of more observers would lead to underestimation of the true variability of the measurements. It is therefore justified to have one observer for this type of study.

In a previous study, it was shown that the position of the skull in frontal radiology, where a 2D image is obtained from a 3D data set, plays an important role in the outcome of a cephalometric analysis since the measurements are influenced by tilt or rotations [14]. The position of the patient in a CBCT scanner for 3D measurements is not that important since rotation along the long axis or the sagittal axis is not of any influence on angles or distances measured. In this study, each skull was placed in both CBCT scanners once, and from both data sets, a 3D model was constructed. Both 3D models were analysed five times. This approach was chosen because the biggest error in cephalometric studies is the error in landmark identification [17], and we wanted to eliminate other factors that could influence the identification of landmarks. In a clinical situation, the repeated positioning of the patient in a CBCT scanner is probably more variable than the single positioning of a skull in a study protocol. Nevertheless, positioning of the patient is of little to no importance when working in 3D only.

There was a difference in resolution between both CBCT scanners: a voxel size of 0.3 mm for the Iluma and voxel size of 0.4 mm for the i-CAT. At the time this investigation was performed, the extended height scans used in this study had a pre-set voxel size by the CBCT scanner, which could not be altered. Therefore, the difference in voxel size for both scans could not be avoided. The difference between both scanners is 0.1 mm per voxel; therefore, it is highly unlikely that this has any influence on the measurements performed and no clinical significance is to be expected.

Five repetitions of the measurements were undertaken to allow estimation of the variability of each landmark for each skull. That is where the amount of repetitions of measurements pays off. For determining the intra-observer reliability, however, per definition only two measurements are needed. Statistically, there is no reason to suspect a different intra-observer reliability between first and second measurement series compared with, e.g. the second and third, or third and fourth. Therefore, additional intra-observer reliabilities do not give any relevant additional information. This is why only the reliability of the first two series of measurements is mentioned in our results. The mean value of all five repeated measurements, however, was used for further statistical analysis.

Because the measurement errors were skewed, the measurement error is influenced by outliers, making the mean measurement error less suited for comparisons between methods. Therefore, the Wilcoxon signed ranks tests were used to compare the measurement error between the two methods.

For eight out of 14 measurements (Table 4), statistically significant differences between the i-CAT and the Iluma CBCT-constructed 3D models were found. For five of these measurements, these differences were smaller than or similar to the standard error for these measurements and therefore of no clinical significance. For AGl/AGr/Lr (degrees), the difference was with 0.95° larger than the standard error, but still within clinical acceptable range. For the measurements NL/ML and NSL/ML, the differences were respectively 2.56° and 3.21°, which are larger than the standard error of these measurements and in our opinion beyond a reasonable clinical acceptable range.

The question of what might have caused these differences arises. Maybe one or several landmarks used to calculate these variables were differently placed in both types of 3D models. Since other measurements for which the same landmarks are used do not display any differences between both types of 3D models, the difference for NL/ML and NSL/ML must be caused by the Gonion landmark. A possible explanation for this is that while making a 3D reconstruction (Fig. 1a, b) a part of the angulus mandibulae is reconstructed differently. During the acquisition of the scan, voxels are awarded different Hounsfield units (HU). When a 3D model is constructed, these HU values are used to differentiate between hard tissues, soft tissues and air. During the acquisition, a voxel can only be awarded one HU value. Due to the size of the voxels, it often happens that not all space in the voxel is filled with the same tissue. For instance, when an image is acquired, only 75% of the voxel is filled with bone and the other 25% is filled with air or soft tissue. Since the voxel can only have one value, the value is awarded to the tissue which is most present in the voxel. So, the area where there is no bone can be seen as bone in the constructed 3D model since the voxel has the HU value of bone. If the skulls were positioned exactly the same in both CBCT devices, this would not be a problem since these mistakes would be made in both scans. However, since the positioning of the skulls will not be exactly the same for both devices, in the 3D constructed models, small differences can appear between the i-CAT model and the Iluma model. This can result in minor differences in the positioning of Gonion, which leads to another orientation of the mandibular plane and therefore different measurements. Thus, different HU values necessary for reconstruction of different types of CBCT 3D models can cause differences in the 3D models.

Conclusion

There are statistical significant differences between some angular measurements performed on i-CAT CBCT-constructed 3D models compared with Iluma-constructed 3D models. Two of these measurements had a clinically relevant difference. This means that care should be taken when interpreting measurements made on CBCT 3D models derived from different CBCT devices.