1 Introduction

Dental caries are related to the loss or erosion of tooth structures, which can be prevented with proper treatment [1]. The human tooth is a layered structure composed of enamel, dentine, and a pulp chamber, from the outside to the inside. The detection of early tooth decay and prompt treatment are important to avoid its progression. Both visual and radiographic examinations are commonly used for defect detection in teeth. Visual examination is a rather subjective method that involves the medical knowledge and clinical experience of the examiner, and radiographic examination does not easily allow hard tissue defects to be defined [2] and involves radiation exposure. Thus, new methods should be introduced to provide more reliable and accurate information.

Ultrasound is widely used in the medical field for diagnosis, acoustic sensing, and nondestructive testing. Due to its short wavelength and associated high resolution, high-frequency ultrasound is most suitable for tooth applications [3]. Although ultrasonography in dentistry dates back to the early 1960’s [4], it not developed into an efficient diagnostic technique. The limitation of ultrasonic transducer size may be the main obstacle since it is difficult to find a suitable hand-held transducer to carry out examinations in vivo. Early studies focused on the detection of dental flaws or lesions [57], or tried to measure the thickness of enamel [810] with an operating frequency of 10-20 MHz. Hughes et al. [11] proved that the minimum frequency required is 22 MHz for the detection of early decay or acid erosion. Huysmans and Thijssen [8] concluded that ultrasonic measurement of enamel thickness is feasible without enamel preparation. Louwersea et al. [9] found that ultrasonic enamel thickness measurement is not useful in dental practice because the ranges of measurement were too large. This might be caused by variation in probe tip positioning. All the work carried out was based on one-dimensional ultrasound signals or two-dimensional (2D) ultrasound images. 2D images provide only one perspective of the sample and do not provide comprehensive information. In addition, it is difficult to reproduce the same image plane for follow-up studies and treatment.

In contrast, three-dimensional (3D) images can provide an overall view of the sample and do not have the problem of relocation. 3D images of teeth have been obtained using high-frequency ultrasound [1214]. The researchers showed that by using high-frequency ultrasound transducers, 3D representations of the enamel layer could be constructed and small bony defects could be detected. In order to minimize the complexity of natural dental samples, Hughes et al. [12] polished the enamel layer and then encased the sample in epoxy. Heger et al. [13] proposed a scanning scheme based on the superimposition of a limited number of additional image planes with a fixed incidence angle. However, their mechanical system was fairly sophisticated and the samples needed to be prepared. Mahmoud et al. [2] used two transducers (with center frequencies of 30 and 60 MHz, respectively) to achieve the 3D acquisition of the human jawbone and detected the bone surface, which was reconstructed by plotting the bone depth with respect to a common axial reference. Sun et al. [15] used 3D finite element modeling to simulate ultrasonic wave propagation in teeth with several dental conditions and discussed the variation of characteristic features in the ultrasonic wave pattern. This led to a better understanding of how ultrasound could be applied to the diagnosis of pathology within teeth. Pekam et al. [14] applied a thin-spline robust point matching algorithm to the surface reconstruction of prepared teeth and a comparison of the reconstructed casts revealed an average discrepancy of <30 μm between the ultrasound scans and the optical reference group. 3D ultrasound images allow a 3D-volume-rendered view of the whole tissue, where an arbitrary 2D image can be visualized easily. Thus, 3D image reconstruction and visualization give doctors a direct and full view of the pathological regions, making it suitable for surgery simulation and treatment planning [16, 17]. However, limited work has been done concerning 3D image reconstruction of teeth using ultrasound imaging. There has not been an implementation that provides a full 3D view of the enamel layer without changing its original form on account of the complexity and diversity of human teeth.

The present study set up an ultrasound scanning system with a computer-controlled 3D translation arm that provides accurate positioning of each A-scan line. Based on arrays of received radio-frequency (RF) signals, digital signal processing methods are applied and 3D images of human teeth are reconstructed and visualized using volume rendering. A frequency-dependent compensation algorithm is utilized to decrease the information loss caused by attenuation. The thickness of tooth enamel can be measured from the 3D images. Micro-computed tomography (micro-CT) data are acquired, visualized, and measured for comparison.

2 Materials and Methods

2.1 System Design and Development

The scanning system comprises the following components: a single-element focused transducer (PI50-2, Olympus NDT Inc., Waltham, MA, USA) with a center frequency of 50 MHz and a bandwidth of 34 MHz, a general-purpose ultrasonic pulser-receiver (Model 5900PR, Olympus NDT Inc.), a 3D translation device with its control box and control card (MPC08SP, Leetro Automation Co. Ltd., Chengdu, PRC), an acquisition card (CompuScope 22G8, GaGe Applied Technology Inc., IL, USA), and a computer, as shown in Fig. 1. The sample is mounted in a water tank just under the transducer, which is driven by the signal generated and amplified in the pulser-receiver. The acoustic power is set to be 32 μJ. An analogue band-pass filter in the pulser-receiver, whose cut-off frequencies are set to 1 and 200 MHz, respectively, is applied to the RF ultrasonic echoes. The transducer is attached to a plastic arm fixed to the translation device, which can move freely in three axial directions under the control of a control box connected to the control card installed in the computer. The echo signals collected by the acquisition card are processed with the computer. The standard focal length of the transducer is 0.75 inches (19.05 mm), and hence the distance between the transducer and the sample is set to 3–5 mm since the height of tooth samples varies from 7 to 14 mm in the water tank.

Fig. 1
figure 1

Schematic of system

A software system was developed using Visual C++ (Microsoft Visual Studio 2010, Microsoft, Redmond, WA, USA) on the computer. It was designed to perform scanning control, signal acquisition, image processing, volume reconstruction, and 3D measurement. Visualization Toolkits (VTK, version 5.10.1, Kitware Inc., NY, USA) was utilized for volume rendering and measurement [18]. The dynamic range of imaging was 48 dB.

2.2 Scanning Method

Before scanning, the position of the transducer is adjusted according to the sample’s height so that its surface is within the focal region. Thereafter, a world coordinate system is built up for the following 3D reconstruction. The start position of the transducer is assumed to be the origin and the first scanning point. The Z-axis of the coordinate system is set to be the propagation direction of the ultrasonic pulse. Then, based on the length and width of the sample, the scan ranges along the X- and Y-axes are set up by the operator using the software system. The path for the movement of the transducer is fixed by the software system. A 2D grid matrix is defined to contain Nx × Ny scanning points, each of which corresponds to a position where the transducer collects echoes. The range of the grid matrix fully covers the tooth sample to be scanned. The spacing between adjacent scanning points (i.e., the increment of the transducer position, which must be a multiple of 3.125 μm) is set by the operator. The software designed to control the movement of the 3D translation device makes the device move from point to point. As soon as it reaches the scanning point, the transducer sends out waves, collects the RF data, and records the position of the current scanning point. Then, it moves on to the next point. The increment of the transducer position determines the resolution of the subsequent 3D reconstruction of enamel tissue. The smaller the step size, the higher is the resolution, but at the expense of a longer scanning time. In this study, the increment of the transducer position was set to be 62.5 μm considering a tradeoff between the data acquisition speed and the resolution. To collect the echoes at all scanning points, the transducer moved with respect to the first increment in the X-positive direction and then in the Y-positive direction, resulting in a zigzag scanning pattern. The procedure was repeated until all the scanning points within the scan range were visited.

2.3 RF Signal Processing

With a collection of scanned RF data, signal processing algorithms were implemented to process the raw data. First, the echo signals are sampled at a frequency of 125 MHz by the acquisition card. Each RF line contains 1024 sample points in 16-bit signed format. All these lines are saved as a 3D matrix based on the way of scanning. The collected signals are then processed by a band-pass filter with cut-off frequencies of 20 and 60 MHz, respectively, to reduce the influence of high-frequency noise and environmental noise at low frequencies.

Attenuation is a dominant feature of ultrasound signals and most time gain compensation algorithms work with envelope detected data [19, 20]. Those algorithms may introduce artifacts and hide useful clinical information [21]. The subject of attenuation of ultrasonic waves in biological tissues is extremely complex and remains poorly understood. Therefore, previous studies have had to prepare teeth to minimize image distortion, but at the cost of critical details. Here, a frequency-dependent compensation method is applied to RF data.

The power of an incident wave is an exponentially decreasing function of the propagation path:

$$W = W_{0} \exp ( - 2\alpha x)$$
(1)

where \(W_{0}\) is the incident power, \(\alpha\) is the attenuation coefficient, and \(x\) is the propagation distance.

Generally, the attenuation α is linearly proportional to the transmission frequency. Hence:

$$\alpha = \beta \cdot f$$
(2)

Assuming that the power spectral density of the emitted signal has a Gaussian form, the power spectrum of the reflected signal is given by:

$$S_{xx} (f,x) = \exp \left( - \frac{{(f - f_{0} )^{2} }}{{2\sigma^{2} }}\right)\exp ( - 4\beta fx)$$
(3)

where \(f_{0}\) is the emitted transducer center frequency and \(\sigma\) is the spectral standard deviation, which is related to the transducer bandwidth.

Then, the maximum energy frequency can be obtained by differentiating Eq. (3) with respect to \(f\) and setting it to zero:

$$f_{\hbox{max} } = f_{0} - 4\beta x\sigma^{2}$$
(4)

By differentiating Eq. (4) with respect to t = 2x/c (where c is the speed of ultrasound), the slope of the attenuation is given by:

$$\beta = - \frac{8.68}{{c\sigma^{2} }} \cdot \frac{{df_{\hbox{max} } }}{dt}$$
(5)

The maximum energy frequency here was estimated using a second-order autoregressive (AR2) model. The choice of AR2 for parametric spectra analysis is discussed elsewhere [22]. The RF data x(n) can be modeled as the output of a linear filter driven by white Gaussian noise u(n) with zero mean and variance \(\sigma_{u}^{2}\) as:

$$x(n) = a_{1} x(n - 1) + a_{2} x(n - 2) + u(n)$$
(6)

where a 1 and a 2 are the AR parameters in Yule–Walker equations [21]:

$$a_{1} R_{x} (0) + a_{2} R_{x} (1) = - R_{x} (1)$$
(7)
$$a_{1} R_{x} (1) + a_{2} R_{x} (0) = - R_{x} (2)$$
(8)

where \(R_{x} (\tau )\) represents the autocorrelation of x(n), which yields:

$$a_{1} = \frac{{R_{x} (1)R_{x} (2) - R_{x} (0)R_{x} (1)}}{{R_{x} (0)^{2} - R_{x} (1)^{2} }}$$
(9)
$$a_{2} = \frac{{R_{x} (1)^{2} - R_{x} (0)R_{x} (2)}}{{R_{x} (0)^{2} - R_{x} (1)^{2} }}$$
(10)

Accordingly, the power spectral density of an AR2 process is given by [22]:

$$S_{xx} (f,x) = \frac{{\sigma_{u}^{2} }}{{\left| {1 + a_{1} \exp ( - j2\pi f/f_{s} ) + a_{2} \exp ( - j4\pi f/f_{s} )} \right|^{2} }}$$
(11)

where \(f_{s}\) is the sampling frequency. By differentiating Eq. (11) with respect to f and setting it to zero, the maximum energy frequency is obtained as:

$$f_{\hbox{max} } = \frac{{f_{s} }}{2\pi }\cos^{ - 1} \left(\frac{{ - a_{1} }}{4}(1 + \frac{1}{{a_{2} }})\right)$$
(12)

The attenuation coefficient β is utilized to compensate the RF data as:

$$\hat{R}(x_{k} ,y_{k} ,z_{k} ) = R(x_{k} ,y_{k} ,z_{k} ) \times \exp (\beta fz_{k} )$$
(13)

where \(R(x_{k} ,y_{k} ,z_{k} )\) is the filtered RF signal at point \((x_{k} ,y_{k} ,z_{k} )\) and \(\hat{R}(x_{k} ,y_{k} ,z_{k} )\) is the compensated signal.

Then, the discrete Hilbert transform (DHT) is used to extract the envelope of the compensated data. The envelope of each RF line is consistent with the intensity of the ultrasound echoes. After a procedure of log-compression, the envelope is mapped to gray levels (0–255) at each point. The gray levels of all points can be used to construct 2D or 3D images directly.

2.4 Volume Reconstruction

In order to obtain a 3D image of the tooth sample, a 3D volume coordinate system should be established according to the scanning range and the focal length of the transducer before volume rendering. In order to simplify the reconstruction process, the coordinate system applied in the scanning method was reused. All the RF lines with the same X-coordinates created a 2D gray level image. Then, the gray level images were mapped onto the 3D volume coordinate system.

For 3D volume reconstruction based on 2D images, various methods have been proposed by our group [2326]. For the volume reconstruction in this study, only resampling along the Z direction was considered for simplicity. First, the number of resampled pixels N z along the Z direction was computed as:

$$N_{z} = \frac{{N_{x} \cdot D_{z} }}{{D_{x} }}$$
(14)

where N x is the number of the sampling points along the X direction, D z is the scanning depth (=1.89 cm in this study), and D x is the scanning distance along the X direction. As illustrated in Fig. 2, I j and I j+1 are the gray levels for two original pixels adjacent to resampled point I i , respectively. Then, bilinear interpolation was conducted for the resampling:

$$I_{i} = \, \left( {1 - d} \right)I_{j + 1} + dI_{j}$$
(15)

where d is the normalized distance between I i and I j+1 . The resolutions along X and Y directions were the same as the increment of the transducer, which was 62.5 μm. The resolution along the Z direction r z was therefore obtained as:

$$r_{z} = D_{z} /N_{z}$$
(16)
Fig. 2
figure 2

Illustration of interpolating original gray image along Z direction

2.5 Experimental Design

In order to verify the resolution and operating characteristics of the system, a coin made of steel and copper was placed in the water tank and scanned with an increment of 62.5 μm three times at a water temperature of 25 °C. The diameter and thickness of the coin were first measured with a micrometer by two operators, and then measured using the reconstructed 3D images. The measured results are given as the mean ± standard deviation.

In addition, four permanent molars extracted from one female and three male subjects, in their 20’s or 30’s, for clinical purposes in Guanghua College of Stomatology, Sun Yat-sen University, were scanned. Consent forms for the tests were received from all subjects. This work was approved by the Human Subject Ethics Committee of South China University of Technology. The samples were disinfected and kept in a solution of distilled water at 19 °C and 5% thymol before being scanned [27]. Due to the different sizes of the tooth samples, the scan length varied from 12.6 to 17.6 mm along the shorter direction, and from 20.8 to 27.0 mm along the longer direction, as shown in Table 1. The wave speed was assumed to be 5100 m/s in the coin and 5700 m/s in enamel [3].

Table 1 Scanning parameters for four human molar samples

Micro-CT is widely used in the research of tooth morphology. Therefore, the teeth were also scanned using a micro-CT system (μCT-80, Scanco Medical, Bassersdorf, Switzerland) and the obtained 3D images were reconstructed as the benchmark for comparison. The samples were placed in the specimen holder along the longitudinal axes such that the long axes of the tooth were perpendicular to the scanning plane. The accelerating voltage was 70 kV and the current was 114 μA. The samples were rotated over 360° and the slice thickness was 15 μm. The reconstructed images had an isotropic pixel resolution of 15 μm and a dynamic range of 16 bits with an image matrix of 1024 × 1024 pixels. The slices were saved as DICOM files and then imported into Mimics 17.0 (Materialise, Leuven, Belgium), which creates a 3D data model automatically.

3 Results

For the coin, the measured diameter was 20.50 ± 0.08 mm (mean ± SD) and the measured thickness was 1.65 ± 0.02 mm using a micrometer caliper. A typical 3D reconstructed image of the coin is shown in Fig. 3. Then, the two parameters were measured based on the 3D reconstructed images by the same operators. The results measured from the 3D images were compared to those measured by micrometer calipers, as shown in Table 2. Given that the measurement results obtained using the micrometer were regarded as the ground truth, the mean errors for the diameter and thickness for the proposed system were 0.14 ± 0.41% and 0.60 ± 1.44%, respectively.

Fig. 3
figure 3

Coin and its 3D image reconstructed using proposed system

Table 2 Results of three experiments conducted with coin

The complexity of the inner structure of the tooth made it impossible to acquire an accurate attenuation curve. Hence, segmented curves were chosen to approximate the attenuation coefficient and the turning point corresponded to the enamel surface. The linear fittings of the averaged maximum energy frequency are shown in red linear regression lines in Fig. 4. The slopes of the straight lines are taken as the derivative of the maximum energy frequency in Eq. (5). The attenuation coefficient β equaled 0.005 dB/cm MHz for the first part of the curve (see the left linear fitting line) and 0.14 dB/cm MHz for the second part (see the right linear fitting line). The lines fitted the data in a least-squares sense. In order to verify its reliability, the R 2 value of the second part was calculated and turned out to be 0.9123. Since the curve fluctuates greatly, the fitting could be thought of as a good one.

Fig. 4
figure 4

Evolution of f max versus depth obtained using AR2 model

Figure 5 shows a typical A-scan before and after compensation as well as the corresponding envelopes, which clearly illustrate the enamel surface and also the enamel-dentin junction (EDJ). The ratios between the amplitude from the water-enamel interface and EDJ with and without gain compensation were calculated. The average ratio was 6.47 before compensation and 5.78 after compensation. 3D images of one sample (labeled NO.2 in the following figures) with and without gain compensation were also reconstructed to illustrate the advantage of the proposed method. The results are shown in Fig. 6.

Fig. 5
figure 5

Typical RF line and corresponding envelope before and after compensation. a Original RF line, b envelope of a derived from DHT, c same RF line processed using gain compensation, and d envelope of c derived from DHT

Fig. 6
figure 6

3D images reconstructed without (left) and with (right) gain compensation

The 3D reconstructions of all four samples were then obtained, as shown in Fig. 7. The profiles of the reconstructed 3D ultrasound images closely match those of the teeth except for some discontinuous areas.

Fig. 7
figure 7

Four human molar samples and corresponding 3D images reconstructed using proposed system

When an ultrasonic wave encounters an interface between two materials at some inclined angle, the echoes reflected are influenced by the incident angle. Larger incident angles lead to weak echoes or even echo drop-outs, and consequently a discontinuous enamel surface. Based on the differences between the 3D reconstructions and the real samples, it could be concluded that the discontinuities (highlighted by yellow circles in Fig. 7) were caused by relatively large incident angles.

The teeth were also scanned using a micro-CT system and then reconstructed for comparison. Each CT data set contains 1262 to 1676 slices according to the length of the teeth. Use the slices, a 3D image could be directly obtained by Mimics 17.0 . The four 3D images are shown in Fig. 8. It is obvious that the images reconstructed using micro-CT data have a high signal-to-noise ratio.

Fig. 8
figure 8

3D images reconstructed from micro-CT data

Figure 9 shows the slide views of the 3D rendering of the four samples obtained using ultrasound data (left) and CT data (right). There is strong background noise at the bottom of the slices because the attenuation coefficient was considered to be linear with depth, but in reality the frequency curve varied non-linearly (see Fig. 4). The compensation algorithm amplifies the desired signal as well as the noise.

Fig. 9
figure 9

Measurements of enamel thickness at specific locations on four molar samples in slice mode using ultrasound data (left) and CT data (right). ad correspond to samples 1–4. “Line Distance” is vertical distance from water-enamel interface to enamel-dentin interface, which can be thought of as enamel thickness

The enamel thickness was measured in the slice mode of the software developed with VTK. The imaging accuracy of our system was assessed by comparing the measurements from the CT data sets and those from our ultrasound data sets. In this study, the enamel thickness at a specific location of every tooth was measured from the CT slices and the corresponding volumetric ultrasound image data by 3 experienced dentists for 10 runs, respectively. The mean differences of their measurements are presented in Table 3. Because it was difficult to find the exact same position for every measurement in the ultrasound data, the dentists tried to locate the measurement position empirically every time.

Table 3 Measurements of enamel thicknesses at specific location on every tooth sample using ultrasound and CT data sets

From the quantitative results, the measurements of our system are close to those of the CT scans, with a mean difference of −1.5 to −5.0%. In addition, the values measured using the proposed system are smaller than those obtained using the CT data, implying that the ultrasound speed in hard tissues (e.g., bones and tooth) is faster than that in water. Assuming that the measurements from the CT data are the true values, the average error in percentage for our system is 3.55%.

Based on the signal plot in Fig. 5d, the final image should have two bright lines/areas (water-enamel and enamel-dentin interfaces) and some gray areas in between. This did not match what is shown in Figs. 7 and 9. This is due to the theoretical range of gray levels being from 0 to 255, whereas in reality most of the data were distributed at very high gray levels (i.e., 240–255). If the actual range of gray levels was from 240 to 255, it would be very difficult to tell the difference between gray levels. As a result, the enamel appeared as a solid layer instead of a hollow layer in both 2D and 3D images. Furthermore, during volume rendering, the gray mapping and transparency mapping were adjusted accordingly to achieve the best visual effect. The 2D gray images were directly generated from the gray levels and could not be adjusted. This explains why the 3D images in Fig. 7 look less noisy than the 2D images in Fig. 9.

4 Discussion

In previous studies [12, 13], tooth samples had to be carefully prepared, including shouldering, chamfering, and tangential preparations, in order to simulate the form required to receive a crown. In contrast, in this study, only the superficial layer of the tooth, the enamel, was observed; thus, there was no need for any preparation, and the tooth enamel could be recovered in its original form using the proposed system. This is very helpful in the early detection of tooth decay and also very promising for regular dental diagnostic applications. 3D representations give a more informative view to dentists than do 2D images, and can facilitate the measurement and analysis of lesions in 3D space. Quantitative measurements were easily available at arbitrary spots with fairly satisfactory accuracy. A visually significant improvement of the rendering quality was also achieved. Furthermore, even users without any clinical experience can easily conduct measurements of the enamel layer at any position of interest.

Micro-CT can be used to generate images with higher resolution and better visual quality compared to those obtained with the proposed method. This is expected considering the different imaging principles of X-ray CT and ultrasound. However, the high resolution of CT images is achieved at the cost of increased price and time. The average time spent for collecting the data of one tooth was about 5 h, and the price of a micro-CT system is more than 10 times higher than that of the proposed system. Micro-CT is only fit for the detection of extracted teeth. In addition, the risk of radiation makes micro-CT unsuitable for frequent clinical examinations. Micro-CT does not provide practitioners with real-time information at arbitrary points during clinical treatment. In contrast, the average time for data acquisition, signal processing, volume reconstruction, and rendering of a tooth sample using our system is less than 15 min. Our system thus provides imaging results much faster.

It is worth noting that the current system can only be used for ex vivo scanning. Although ex vivo study can help doctors find the cause of tooth disease after tooth extraction, its usefulness is limited in clinical practice. Nevertheless, our study demonstrated that ultrasound is promising for routine examination in vivo with advances in medical ultrasound equipment. In this pilot study, we proved that it is feasible to reconstruct the enamel accurately using the RF data from an A-mode ultrasound transducer. If a system composed of a small transducer attached to a 3D motorized scanning stage and a small optical camera that could collect real-time intraoral images was designed, it could be directly applied into a patient’s mouth. This would inevitably cause some discomfort but compared to the risk of radiation, the discomfort may be acceptable for most patients. During routine oral examinations and treatments, the discomfort caused by medical equipment is accepted by most patients. Consequently, ultrasound may become a popular tool in clinics. For example, it can be used to monitor the wear of teeth, provide dentists with an accurate thickness of enamel during diagnosis procedures, and guide dentists when they remove the decay on the surface of the tooth during the treatment procedure. It is also noted that some areas of the tooth surface were missing and an incomplete tooth body in 3D was reconstructed. This was caused by the large incident angle of ultrasound waves, which reflected a small echo signal to the transducer. In practice, it was difficult to avoid this problem. Future work will be focused on using multiple transducers to get rid of the discontinuities in 3D reconstruction and on using a hand-held probe with wireless data transfer technology [28, 29] for in vivo clinical studies. A faster data acquisition card and more powerful data processing hardware will be used to improve the speed of scanning. Furthermore, the feasibility of 3D elasticity imaging [29, 30] for enamel tissues will be considered based on the proposed system.

5 Conclusion

In summary, a 3D ultrasound system for scanning and reconstructing the enamel layer was proposed. A high-frequency ultrasound transducer attached to a 3D translating device was utilized to scan the tooth samples. Signal processing techniques were applied to the collected RF signals and volume reconstruction was implemented to obtain 3D tooth images, in which the enamel boundaries could be detected accurately. The tooth samples remained in their original forms during the scanning. The scanning system was easy to use. The system was tested using ex vivo experiments with permanent molars. Results show that the system is capable of providing a great improvement in the 3D reconstructions of the enamel layer compared to the researches in the literature. These results suggest a promising future for high-frequency ultrasound for routine dental diagnosis.