Introduction

Over the last years, different intraoral scanning systems for direct digitalization have been launched to the dental market, in order to enhance the impression procedure and manufacturing workflow for computer-aided design (CAD)/computer-aided manufacturing (CAM) fabricated restorations. Although currently available impression materials offer a reliable method for the reproduction of the clinical situation [1], in clinical routine, dental impressions obviously offer some room for improvement [2]. Thus, the comprehensive objective of the evolving direct digital impression technique is to avoid the error-prone multistep and therefore volatile process of conventional impression, plaster cast and extraoral indirect digitalization, by a more standardized, reliable and predictable approach [3]. However, this means the new digital impression technologies and innovative devices have to be compared to the current standard procedures, to evaluate their reliability and competitiveness. Impartial from the underlying technology, for single-tooth restorations, the current systems seem to offer sufficient scan accuracy and fit of resulting restorations respectively and are able to keep up with the traditional process [46].

However, scanning larger areas requires merging of multiple single images, which leads to a progressive distortion and therefore higher inaccuracy of the resulting dataset. This is confirmed by evaluation of full arch scans that still reveal higher inaccuracies of datasets from intraoral scanning systems, then of datasets resulting from the conventional workflow [7]. Also, the scanning strategy seems to influence the accuracy of the resulting datasets when full arches are scanned [8]. Due to these facts, the routinely application of intraoral scanning in daily practise is mostly limited to small areas of the dental arch and is applied to create single-tooth restorations or small fixed dental prostheses (FDPs) [911].

Nevertheless, representing the basis for the later restorations, the accuracy of the scanning procedure plays an important role. Therefore, the three-dimensional datasets resulting from the different scanning systems have to be evaluated and compared under each other and to the conventional workflow on basis of a conventional impression and manufacturing of a plaster cast. For this purpose, the datasets can be compared by the superimposition to a highly accurate reference dataset and the divergences can be analysed using an inspection software. The 3D analysis focuses on the “trueness” as a value for the closeness between the test dataset and the reference dataset and the “precision” as a value for the repeatability of the procedure, which is usually expressed in terms of standard deviations [12].

Four-unit FDPs seem to represent currently the maximum indication for the application of intraoral scanning in daily routine. This study compares the method of direct digitalization versus the method of indirect digitalization using an in vitro reference model imitating this frequently occurring clinical situation. The null hypotheses of the study were that concerning the accuracy of the obtained datasets, (1) no differences can be found between the digitalization methods and (2) no differences can be found between the scanning systems for direct digitalization.

Material and methods

Testing model and reference dataset

A titanium model with a premolar and molar with a chamfer preparation representing the base for a four-unit FDP was used as testing model (Fig. 1). It was digitized by industrial computed tomography (Perkin Elmer PE XRD 1620; Feinfocus FXE 225.99 at EZRT, Fürth, Germany) to receive the reference dataset (REF). The fabrication of this titanium testing model, its digitalization and the postprocessing of measurement data (Marching-Cubes-Method) to receive the highly accurate reference dataset, was explicitly described in a previous publication [13].

Fig. 1
figure 1

Titanium testing model of the present laboratory study

Direct digitalization using five intraoral scanners

The titanium testing model was directly digitized using five intraoral scanners: (1) CS 3500 intraoral scanner (Dental Imaging software 6.14.0; Carestream Health Inc., Brunn am Gebirge, Austria), (2) Zfx Intrascan (software version 5.02; Zfx GmbH, Dachau, Germay), (3) CEREC AC Bluecam (software version 4.2.4.72893; Sirona, Bensheim, Germany), (4) CEREC AC Omnicam (software version 4.2.3.68181; Sirona, Bensheim, Germany) and (5) True Definition (software version 4.0.3.1; 3 M ESPE Dental Products, Seefeld, Germany).

The scanning-process (scanning strategy, application of scan powder, handling, etc.) was carried out according to each manufacturer’s recommendations. All scans were performed of an experienced user of intraoral scanners, who has absolved prior training sessions before conducting the scans.

Before scanning with CEREC AC Bluecam, a thin covering powder layer was applied on the model using CEREC Scanpowder (Optispray, LOT no. A3281; Sirona Dental Systems, Bensheim, Germany). Prior to scanning with True Definition, a light powdering with titanium dioxide particles (Lava Scanpowder LOT no. N203051; 3M ESPE Dental Products, Seefeld, Germany) was carried out to generate the necessary stochastic pattern as a prerequisite for accuracy of this scanning system. Though the manufacturer recommends no powdering for scanning with the CS 3500 intraoral scanner, data acquisition of the titanium reference model was not viable without previous powder application. Therefore, a light powdering with titanium dioxide particles (Lava Scanpowder) was carried out. In all cases, the opaquing layer was renewed before each scan. As predefined by the manufacturers, no powdering was conducted before scanning with the Zfx Intrascan and the CEREC AC Omnicam.

With each scanning system, 12 scans were performed and the corresponding surface tesselation language (STL) datasets were postprocessed and exported. The STL datasets were allocated from the respective manufacturers in accordance to usual clinical workflows.

Indirect digitalization of the sectioned master casts

As control group, 12 conventional impressions were taken of the testing model using a polyether material (Impregum Penta, LOT no. Base 512888, Catalyst 512314; 3M ESPE Dental Products, Seefeld, Germany) in light-cured custom-made trays (Palatray XL, LOT no. 132651; Heraeus Kulzer; Hanau, Germany). The setting time and procedures were in line with the manufacturers’ recommendations. The impressions were disinfected for 2 min (ORBI-sept Abformdesinfektion, LOT no. A0984; Orbis Dental Handelsgesellschaft mbH, Münster, Germany) according to clinical procedures. After 24 h the impressions were poured with a scannable type IV–gypsum (Fino Scan Stone LOT no. 313096; DTShop GmbH, Bad Bocklet, Germany). After 24 h, sockets for the unsectioned plaster casts were manufactured (suppen-sockler, LOT no. 4-55262; Picodent Dental-Produktions- und Vertriebs-GmbH, Wipperfürth, Germany). The plaster casts were separated from the stands and sectioned for the manufacturing of master casts. The indirect digitalization of the mater casts was performed using the D-810 laboratory scanning device (Dental System 2014-1 × 64 (build 2.9.9.5—16.09.2014); 3Shape, Copenhagen, Denmark).

Alignment of datasets

All STL datasets from all the groups (N = 72; n = 12) were imported into the inspection software Qualify 12 (Version 02.01.2012; Geomagic, Morrisville, NC, US). Prior to the quantitative analysis, the examiner was specially trained for usage of the software. To ensure a precise superimposition, the datasets were reduced to the field of interest (prepared area of the abutment teeth). Therefore, all artefacts and irrelevant areas below the preparation lines were eliminated manually to ensure precise superimposition and equal boundaries of all datasets (Fig. 2). For comparison, all datasets were separately aligned with the REF dataset by a best fit algorithm.

Fig. 2
figure 2

Superimposition procedure of the test dataset and the reference dataset. a Reference dataset (grey). b Imported test dataset (blue) and rough alignment with the reference dataset by best fit algorithm. c Selection of the irrelevant surface areas (red) below the preparation margin followed by manual elimination of irrelevant areas of the test dataset in accordance to the reference dataset. d Reference dataset and reduced test dataset before the final best fit alignment

Analysis of 3D divergences to determine the parameter trueness and precision

By use of the inspection software, Qualify 12, the three-dimensional Euclidean distances (EDs) for each single measurement point were calculated, which could take positive or negative values in relation to the REF dataset. Divergences between a test and the reference dataset were given as the following parameters: positive deviation, negative deviation and absolute deviation. The absolute values of the ED values were calculated by summing up the magnitude of positive and negative deviations and dividing the result by the number of measured points.

To enable the overall performance of each single scanning system, the test datasets were singularly superimposed with the reference dataset in order to receive values for trueness and precision simultaneously out of one single alignment process. Hereby, the numerical values of the spatial divergences were used to determine the trueness, while the precision was defined on basis of the standard deviation of the spatial divergences.

Statistical analysis

Descriptive statistics was calculated for each parameter and group. Data was tested on normal distribution using the Kolmogorov-Smirnov and Shapiro-Wilk tests. Kruskal-Wallis H test (p < 0.05) followed by Mann Whitney U post hoc analysis were performed to evaluate statistically significant differences for the trueness within the positive, negative and absolute deviations. Due to multiple comparisons, a Bonferroni correction was applied for individual pairwise comparisons. After Bonferroni correction, a p value of <0.003 (0.05/15) was considered statistically significant.

To detect statistical differences of the variances within the single groups, representing the precision, a comparison between all groups by the Levene test was performed.

To detect a correlation between the numbers of captured points by the scanning system and the absolute trueness, a Spearman Rho correlation was computed. For statistical analysis, the Statistical Package for the Social Science 23.0.0.0 (IBM, Armonk, USA) was applied.

Results

Table 1 gives the mean values and their standard deviation, as well as the minimum, median, maximum and 95 % confidence interval for each parameter after superimposition. Kolmogorov-Smirnov and Shapiro-Wilk tests revealed that 6 of 18 test groups feature no normal distribution for the deviation parameters. The Kruskal-Wallis H test showed significant differences within the performances of the different intraoral scanning systems and then compared to the conventional control group (conventional impression + master cast + 3Shape D810 laboratory scanner). For the single digitalization systems, an inhomogeneity of variances could be observed. Boxplots of mean negative, mean positive and mean absolute deviations are depicted in Figs. 3 and 4. An overview about the trueness and the precision of the investigated systems is given in Fig. 5 in the form of a bull’s eye.

Table 1 Values of positive, negative and absolute deviation parameters (mean value, standard deviation (SD), minimum, median, maximum, 95 % confidence interval (CI) in mm) calculated from 12 scans in each group
Fig. 3
figure 3

Boxplots of the positive and negative spatial divergences of the single scanning systems

Fig. 4
figure 4

Boxplots of the absolute spatial divergences of the single scanning systems

Fig. 5
figure 5

Trueness and precision of the parameter “mean absolute value” displayed true to scale for each group. The distance of the circles to the “bull’s eye” represents the “trueness” of the single method, whereas the diameter of the circles (SD) represents the distribution of single results and therefore the repeatability, thus “precision”

Spatial deviations to determine trueness

Regarding the positive deviations, True Definition showed the least ED values and therefore the best trueness, followed by the CS 3500 intraoral scanner and the control group. The highest positive deviations were shown by the CEREC AC Bluecam, the Zfx Intrascan and the CEREC AC Omnicam in ascending order.

In view of negative values, also the True Definition and the CS 3500 resulted in the lowest ED values, while the CS 3500 differed not to the control group. The control group, the CEREC AC Bluecam and the Zfx Intrascan showed similar performances. The Zfx Intrascan and the CEREC AC Omnicam were within one value range and showed the highest negative deviations.

For the absolute values of ED, the True Definition showed significantly the least deviations followed by the CS 3500 intraoral scanner and the control group. The CEREC AC Bluecam, the Zfx Intrascan and the CEREC AC Omnicam showed the significantly highest values for the ED in ascending order. For all parameters, the p values are given in Table 2.

Table 2 Significances between single groups for analysing the “trueness”

Standard deviation to determine precision

Regarding the precision for the positive values, the CS 3500 intraoral scanner together with the True Definition showed the lowest values and therefore the best performance. The highest values and therefore the worst precision were shown by the Zfx Intrascan.

For the negative range, the CS 3500 intraoral scanner obtained the highest precision, followed by the CEREC AC Omnicam, the True Definition and the CEREC AC Bluecam. The control group, followed by the Zfx Intrascan, resulted in the highest values.

In view of the absolute precision, the CS 3500 intraoral scanner showed the lowest values, followed by the True Definition together with the CEREC AC Bluecam and the CEREC AC Omnicam. The control group was followed by the Zfx Intrascan, resulting in the least precision. For all parameters, the p values are given in Table 3.

Table 3 Significances between single groups for analysing the “precision”

Correlation of the number of acquired points and the trueness

For all three trueness parameters, a correlation to the number of acquired points could be observed (absolute values: r = −0.730, p < 0.001; positive values: r = −0.697, p < 0.001, negative values: r = −0.716; p < 0.001).

Discussion

The results of this study lead to the rejection of both null hypotheses, because direct and indirect digitalization revealed different accuracies of the obtained three-dimensional datasets when compared to a highly accurate reference dataset. Further on, between the five investigated intraoral scanning systems, differences regarding the accuracy were found.

For the accuracy analysis of the datasets, a previously introduced method was applied, using a reference dataset generated by industrial computed tomography and a best fit alignment for evaluation and interpretation of spatial divergences between datasets [13, 14]. The approach employed in this study uses the positive and negative deviation as well as the absolute values of the EDs to estimate and compare the different digitalization methods and scanners. Hereby, the spatial deviations are interpreted as the trueness. However, this parameter distinguishes positive and negative ranges and therefore does not represent the overall trueness. But, it might help to estimate the size (over or under/greater or smaller) of the test dataset in comparison to the reference dataset. As trueness generally is defined as the spatial proximity between two datasets, the absolute values of the EDs were calculated and conduced as a value for the evaluation of the trueness, disregarding the algebraic sign [12]. Taking these absolute EDs as values for the overall trueness of the digitalization methods, the intraoral scanner True Definition offers the significantly highest trueness, followed by the CS 3500 intraoral scanner. The method of indirect digitalization and applying conventional impressions, plaster casts and the 3Shape D-810 laboratory scanner resulted in a significantly lower trueness. The Zfx Intrascan and both investigated CEREC systems presented the significantly lowest trueness. Regarding the precision, representing the repeatability of the single digitalization methods, the variance and therefore the standard deviation was used as parameter. Regarding the overall precision again, the CS 3500 intraoral scanner and the True Definition obtained the lowest values and showed therefore the highest reproducibility. The method of indirect digitalization using the D-810 laboratory scanner resulted in significantly higher precision than using the Zfx Intrascan. Both CEREC systems were in the same precision range as the indirect method using the D-810 laboratory scanner.

Expressing the accuracy in terms of trueness and precision is a common method, applied in prior studies [4, 7, 8, 15, 16]: superimposition of the test dataset with the reference dataset for the analysis of precision and superimposition of test datasets under each other for the analysis of the precision. However, as the authors of the current study state that trueness (mean value) and precision (standard deviation) are dependent on each other, so in the present study, the test datasets were singularly superimposed with the reference dataset in order to receive values for trueness and precision simultaneously out of one single alignment process. This enables to better estimate the overall performance of the single scanners.

The method of data acquisition for the reference dataset, the data handling and the interpretation of divergences after the superimposition of virtual model datasets, differed in comparison to other studies [4, 7, 15, 16]. In some studies, 20 % or even 40 % of measured points were excluded from the analysis. This procedure bares the risk of receiving to accurate datasets, due to the fact that outliers are eliminated before analysis. Therefore, the authors of this manuscript decided for the described analysing method. For accurate analysis in the current investigation, all datasets were reduced to the field of interest (abutments above the preparation margin) before the analyses were conducted. Thereby, very consistent boundaries of test and reference datasets could be ensured, resulting in a reproducible superimposition. Therefore, the authors decide to use all measurement points for the analysis of the divergences.

Different levels of accuracy between the different intraoral scanners may have different possible reasons. Firstly, the physical resolution of the scanning system and further the postprocessing of the data, like the matching algorithm for the single captured images as well as the size of the triangles in the resulting STL dataset [11]. Therefore, also the differing number of measurement points has to be considered when interpreting the results. For the present study, a correlation within all tested systems between the number of data points and the trueness values could be observed. Regarding the considerable different number of data points, it seems feasible that possible maximal outliers may influence the mean EDs much more when the overall amount of data points decreases. However, for the accuracy analysis of a single idealized abutment model using the same analysing software, no correlation between the number of data points and the accuracy of the dataset in comparison to a reference was found [17]. Nevertheless, fewer surface points lead to smaller datasets that can be handled easier and transferred faster online for postprocessing or manufacturing. On the other hand, there might be the risk of losing important surface information, which may lead to higher inaccuracies when describing organic surfaces especially on objects representing small radii and sharp corners (secant error).

An additional influencing parameter for the different accuracy ranges might be the reflective surface of the titanium testing model, what might be seen as a limitation of the study design. Therefore, the results of the scanning systems that were used without powder (Omnicam and Zfx Intrascan) can be questioned. Maybe the use of powder before scanning would enhance the accuracy; however, manufacturers of the systems recommend using them powder-free in all clinical situations. Additionally, this is supported by the fact that the CS 3500 intraoral scanner was not able to capture the titanium surface; although, the manufacturer’s recommendation state that no prior powder application is required before scanning. Clinicians should consider that when deciding for one system. Further investigations should be conducted to figure out the influence of scanning powder on the accuracy of the virtual model data and to specify the optimal prerequisites for accurate data collection. Clinically, a slight powder layer might be also used as an indicator for moisture, especially in areas where liquids (saliva/blood) are difficult to detect during scanning.

Commonly, it has to be questioned to what extent the manufacturing process, including the CAD and CAM parameters, may influence the resulting dental restoration. Therefore, not only the three-dimensional datasets should have to be investigated but also the subsequent manufacturing process. A previous published study testing the full process chain revealed that the marginal fit of zirconia four-unit FDPs after direct digitalization using the LAVA C.O.S. is comparable to that of four-unit FDPs after indirect digitalization [10], even though the underlying datasets of the direct digitalization showed higher accuracy [14]. This means the higher accuracy of the virtual model datasets did not result in a better marginal fit of resulting zirconia restorations. Likewise, for the iTero intraoral scanning system, the higher accuracy of the investigated virtual datasets after direct digitalization did not result in a better marginal fit of zirconia restorations compared to indirect digitalization [13]. However, for frameworks from base metal alloy, the higher accuracy of direct digitalization resulted in a better marginal fit compared to frameworks after indirect digitalization [13]. Opposing this, a further study investigated frameworks from zirconia and base metal alloy after direct digitalization using the LAVA C.O.S. compared to indirect digitalization using the LAVA Scan ST [18]. Hereby, the fit of zirconia restorations was not affected by the digitalization method. For base metal alloy, direct digitalization leads to better marginal fit than indirect digitalization. Knowing the influencing factors during the physical manufacturing by milling or rapid prototyping procedures leads to the conclusion that scanning accuracy is one important factor in the whole process chain but might not be overestimated. Also, for example, the sintering parameters of zirconia seem to have an obvious effect on the fit of the restoration.

The in vitro trueness of a scanning method is important to know, as trueness parameters cannot be evaluated in vivo yet due to missing reference structures. Additionally, in vitro trueness data can be combined with in vivo evaluation of the reproducibility (precision), which can give a good picture about the performance of the scanners. Having that in mind, all tested scanners are feasible to produce datasets of four-unit FDPs within a clinical acceptable range. Against this background, the absolute trueness of a scan might play a minor part, as long as the occurring error is repeatable and the amount of trueness inaccuracy is known. Then, the “predictable” inaccuracy might be compensated by the software. This for sure is only right within certain limits and not for outliers outside clinically acceptable ranges.

All over, the digital workflow offers the possibility for compensation of measuring inaccuracies by the adaptation of the interfaces between single working steps. This is one of the advantages of digital dentistry and comprises the potential of higher standardization of the clinical and technical processes.

For the present study, a laboratory testing design on an in vitro model was selected to facilitate standardized ideal conditions for the assessment of data. Compared with this, for performing in vivo tests, the influence of blood or saliva and the lack of space or infragingival preparation margins cannot completely be ruled and are different between single individuals. Also, the interindividual skills and usage frequencies in clinical practise as well as the scanning strategy may influence the accuracy results. Therefore, further studies should be conducted to investigate the possible influence of the scanning strategy and of the learning process of the operator. Additionally, in the present study, the applied technique for obtaining a three-dimensional highly accurate reference dataset by computed tomography is for ethical reasons obsolete on human beings due to radiation exposure. Generally, further studies investigating the complete workflow including the manufacturing of the dental restoration have to be evaluated, if the high in vitro scanning accuracy of the True Definition and the CS 3500 intraoral scanner can be transferred into better marginal and internal fit of dental restorations and that these systems can confirm their performance under in vivo conditions.

Conclusion

Within the limitations of this in vitro study, no explicit recommendation concerning the method of digitizing intraoral structures can be imposed. Concerning the trueness and precision, the method of indirect digitalization—with a great variety of single process steps—showed values in the midfield compared to all the tested systems. All datasets were in a clinically acceptable accuracy range.