Introduction

Surgical navigation or image guidance has the potential to improve patient outcomes in a variety of surgical fields and procedures by enabling improvements in spatial orientation and understanding, as well as the perception of hidden anatomy. The technology is routinely applied in the fields of orthopedic, neuro- and otorhinolaryngology (ORL) surgery; however, its effective application is limited by the accuracy at which the positions of tools can be localized relative to the anatomy of the patient. These limitations become vitally important as the scale of the procedure and anatomical structures approach or exceed the achievable navigation accuracy, as in the case of microsurgical procedures on the lateral skull base. Otologic and lateral skull base microsurgery depends on a precise surgical technique and includes interventions in the middle ear, inner ear, and lateral skull base (i.e., removal of cholesteatomas or chronic inflammations, benign and malign tumors, treatment of hearing loss by insertion of middle or inner ear implants). These procedures rely on the use of an operating microscope for the visualization of the surgical site, and while a significant amount of other technology, such as facial nerve monitoring, endoscopy, is routinely utilized, surgical navigation is rarely used, with limited accuracy being one of the reasons for this.

The final accuracy achievable with a given navigation system is dependent, to some extent, on each step of the image guidance workflow. Of these components registration, representing the calculation of the transformation between obtained image data and the physical position of the patient in the operating room (OR), contributes a large portion of the overall navigation error. A variety of methods can be used for registration and these can be roughly divided into landmark and surface-based methods; below we consider only methods relevant to registration in the region of the head. Landmarks may be artificial, in which case they can be anchored into bone or affixed to the skin, or natural anatomical features which can be identified with some degree of accuracy. Bone anchored fiducials represent the gold standard in terms of achievable accuracy, and these can be identified with high accuracy both in image data and on the patient, with target registration errors (TRE, the error caused by the registration process at a target position) of as low as 0.1 mm reported [1]. Although these methods allow high-accuracy registration, the use of bone-anchored markers increases the invasiveness of the procedure and may result in additional workflow complications as the fiducials must be implanted prior to imaging. The use of skin-affixed markers or anatomical landmarks can alleviate these issues but generally result in lower registration accuracy; reported errors when using these methods are typically above 1 mm [2].

Surface-based methods rely on the detection of surfaces in both the image and on the patient and may involve the use of tracked ultrasound [3, 4], laser surface scanning [57] or direct digitization [8] to detect the surface on the patient; TREs of between approximately 1 and 3 mm are typical within the literature. Alternative registration methods have also been investigated within the specific context of otologic surgery; the use of patient specific templates demonstrated an accuracy of \(0.94\pm 0.29\) mm at the round window in phantom testing [9]. The technique was further evaluated in 25 cochlear implantation surgeries, with estimated TRE values between 0.57 and 5 mm observed [10]. Up to now there is no standardized method for the reporting of registration accuracy making comparison between studies difficult, particularly when anatomical site and surgical task vary widely. For this reason, the evaluation of any registration technique should be completed within the specific context of its intended use. Similarly, the navigation accuracy required for any given procedure is dependent on the task to be performed and the specific anatomy of the patient; in the case of microsurgery on the lateral skull, the distances between structures such as the facial nerve, ossicular chain, chorda tympani, sigmoid sinus or middle fossa dura can provide an indication of the accuracy required. In the case of image guidance for minimally invasive cochlear implantation, the commonly cited threshold is 0.5 mm [11]. While this threshold is not sufficient to encompass the complexity of the anatomical situation, it does provide an indication of the level of accuracy required to effectively operate within the region, a level which none of the aforementioned registration techniques (excluding bone-anchored markers) consistently achieves.

Therefore, this work aims to investigate the achievable accuracy of patient-to-image registration for otologic microsurgical procedures without the utilization of bone-anchored fiducial screws. The developed method must be clinically suitable with respect to avoiding increases in the invasiveness of the existing surgical procedure, as well as being simple to perform and achieving accuracy levels approaching those of fiducial-based methods.

Access to the structures of the middle and inner ear and within the skull base often requires the removal of bone material with a surgical burr. The mastoidectomy, in which a portion of bone material is milled away from the mastoid posterior to the external auditory canal allowing the perception of and access to the anatomy in these regions, represents one typical example of this process. Before milling can begin, access to the surface of the bone is required: A retroauricular incision (c-shaped at approximately 5mm–1cm away from the conchal fold) is performed, a periosteal flap (U or V shaped) completed, the periosteum lifted from the bone and retractors applied to the site. Thus, the utilization of surface matching methods, through the direct digitization of the exposed bone surface using a tracked pointer is proposed. Described below is initial work toward the achievement of these aims and the evaluation of the proposed technique on a total of 14 human temporal bone specimens.

Materials and methods

Sample preparation, imaging and surface extraction

Fourteen human temporal bone specimens, preserved according to the method described in [12], were utilized for the evaluation of the proposed method. Of these, specimens with ID 1–8 were prepared with retroauricular incisions by two ORL surgeons; the surgeons were instructed to perform the incisions as for a standard mastoidectomy, as shown in Fig. 1.

Fig. 1
figure 1

Preparation of temporal bone samples: retroauricular incisions were performed by two ORL surgeons and fiducial screws implanted to act as a ground truth. Prior to digitization, any remaining periosteum within the exposed region was removed

Fig. 2
figure 2

The extraction of the mastoid from CT image data: The temporal bone is initially segmented utilizing marching cubes. The outer surface of the mastoid was subsequently separated from the larger model utilizing the image processing software Amira. Finally, four initialization points were defined on the mastoid surface

For trials 9–14 these incisions were not performed as the skin within the region had been removed during previous unrelated experiments; in these cases digitization was completed as much as possible within the region of interest as defined by a standard retroauricular incision.

Four fiducial screws (M-5243.05, Medartis, Switzerland) were implanted into each of the samples to act as the ground truth and preoperative imaging performed (Siemens Somatom Definition Edge, resolution 0.156 mm\(^{2}\), slice thickness 0.2 mm). The locations of the fiducials were extracted from the image data using a custom otologic surgical planning software [13] using the method described in [1]; the same software was then utilized for the extraction of the temporal bone using marching cubes [14] and the definition of reference points at the round window and on the surface of the mastoid. Temporal bone segmentation thresholds were defined qualitatively for each case.

The extracted temporal bone model was then imported into the image processing software Amira (FEI, France); the outer surface of the mastoid within a region of interest posterior to the external auditory canal was then separated and exported as a MATLAB(The MathWorks Inc., USA) file. The surface extraction process is demonstrated in Fig. 2. Finally, four points were selected along the posterior rim of the external auditory canal to act as matching algorithm initialization points.

Surface collection and ground truth registration

Each specimen was rigidly fixed using a temporal bone holder and a reference tracking marker rigidly attached with a single screw, with three percutaneous pins providing stabilization, and the sample placed within the workspace of the tracking system (CamBar B1, Axios3D, Germany). Point acquisition was then performed in three stages utilizing a custom tool with a 1.27-mm sphere at the tip. The center of the sphere at the tip of the tool relative to the attached marker was calibrated by pivoting before each trials.

The positions of each of the four implanted fiducial screws were first digitized (each fiducial was digitized 30 times and the mean calculated), the four initialization points were then digitized by placing the pointer on the bone in the approximate location corresponding to the points defined in the image (each was digitized 30 times and the mean calculated). Finally, the surface was digitized by moving the pointer along the bone within the exposed region; care was taken to ensure that the tool remained in contact with the surface at all times; however, no other constraints with respect to minimum distance between points, total extent of the digitized area or specific features to be digitized were applied. A total of 1500 points were collected per trial. The point acquisition process, including the collection of ground truth, initialization and surface points was completed five times per sample with all acquired points saved to text files for later analysis. All points were collected in the coordinate system of the reference tracking marker. The experimental setup is shown in Fig. 3.

Fig. 3
figure 3

The specimen was rigidly fixed utilizing a temporal bone holder, a reference marker attached and digitization of the fiducial screws, initialization points and mastoid surface within the exposed region completed

Registration and accuracy evaluation

All subsequent analysis was completed in MATLAB. As the digitization tool utilized is calibrated at the center of the sphere at its tip, the acquired points are offset from the surface of the bone by the radius of the sphere. Thus in order to compensate, the sphere radius was first added along the normals of the point cloud extracted from image data as shown in Fig. 4.

Fig. 4
figure 4

In order to compensate for the radius of the sphere at the tip of the tool, the radius was added to the along the normals of the point cloud extracted from image data. Note also that the calibration of the tool at the center of the sphere allows the collection of points without requiring that the tool remain perpendicular to the surface of the bone

The calculation of the ground truth registration and initial transformation matrix was then performed using paired point matching [15] and the fiducial positions and initialization positions, respectively. After initialization, the collected and extracted surfaces were registered utilizing iterative closest point (ICP). The libICP C++ library [16] was utilized for this final registration step, points were exported to text files, and matching was completed in an external application utilizing the point-to-plane variant of ICP, without outlier rejection.

Registration accuracy was evaluated at the two previously defined reference points: on the surface of the mastoid, within the exposed region, and at the round window on the cochlea. These reference points were transformed into the common coordinate system (that of the tracking reference fixed to the temporal bone) using both the ground truth and surface-based registrations. In the ideal case, in which surface and ground truth matched exactly, the transformation of the reference points from CT to patient tracker coordinate systems would result in identical positions from both transformations, in the patient tracker coordinate system. The registration accuracy was thus assessed through the calculation of the Euclidean distance between the ground truth and surface-based registration at these locations. The coordinate transforms required for evaluation are described below. The selected reference position \(({}^{\mathrm{Image}} x)\) is transformed from the image to the patient reference marker coordinate system (Pat) first utilizing the ground truth fiducial-based registration \(({}^{\mathrm{Pat}} T_{\mathrm{Image}}^{GT})\).

$$\begin{aligned} {}^{\mathrm{Pat}} x^{GT} = {}^{\mathrm{Pat}} T_{\mathrm{Image}}^{GT} \times {}^{\mathrm{Image}}{x} \end{aligned}$$
(1)

In which \(^{\mathrm{Pat}} x^{GT}\) is the position of the reference point in the patient reference coordinate system according to the ground truth registration. The points are then transformed again into the same coordinate system using the paired point initialization \(({}^{\mathrm{Pat}} T_{\mathrm{Image}}^{\mathrm{Init}})\) and surface-based registration \(({}^{\mathrm{Pat}} T_{\mathrm{Image}}^{SM})\).

$$\begin{aligned} {}^{\mathrm{Pat}} x^{SM} ={}^{\mathrm{Pat}} T_{\mathrm{Image}}^{SM} \times ^{\mathrm{Pat}} T_{\mathrm{Image}}^{\mathrm{Init}} \times {}^{\mathrm{Image}}{x} \end{aligned}$$
(2)

In which \(^{\mathrm{Pat}} x^{SM}\) is the position of the reference point in the patient reference coordinate system according to the surface-based registration. The target registration error (TRE) was then defined as the Euclidean distance between the transformed reference positions as below.

$$\begin{aligned} \mathrm{TRE}=\left| {{}^{\mathrm{Pat}} x^{SM} -{}^{\mathrm{Pat}} x^{GT}} \right| \end{aligned}$$
(3)

The acquisition of multiple ground truth registrations ensures the continued validity of the ground truth in the event of changes to the setup between trials. Thus each surface matching registration was compared only to the ground truth registration performed immediately prior.

Results

In all cases the surface of the mastoid could be extracted from image data utilizing the defined method. Point acquisition required approximately 3 min per trial, including the acquisition of the ground truth fiducial locations. In one case ground truth acquisition was not possible; this specimen was excluded from subsequent evaluation. The achieved accuracy in the remaining 13 trials is summarized in Fig. 5; mean errors of \(0.16\,\pm \,0.09\) and \(0.23\,\pm \,0.1\) mm were observed at the surface of the mastoid and at the round window, respectively.

Fig. 5
figure 5

Calculated errors between surface matching registration and fiducial-based ground truth at the surface of the mastoid and round window for each of the 13 successful trials, with mean and standard deviation in orange

A minimum error of 0.06 mm (temporal bone ID 6, digitization attempt 5) and maximum error of 0.44 mm (temporal bone ID 9, digitization attempt 1) were observed at the round window.

No correlation was observed between the accuracy of the initial matching step (calculated as the difference between the round window position from ground truth and initialization registrations) and final registration accuracy (Pearson’s correlation coefficient \(-\)0.05), suggesting that a highly accurate initialization of the matching algorithm is not required to avoid local minima far from the global minimum.

Furthermore, no correlation between the size of the digitized region (standard deviation of Euclidean distance between digitized positions and mean digitized position) and accuracy at the round window (Pearson’s correlation coefficient 0.04). A mean fiducial registration error of \(0.034\pm 0.014\) mm was observed for the ground truth registration; no correlation was observed between this value and the surface-based registration accuracy (Pearson’s correlation coefficient 0.29). Mean Euclidean distances between acquired points and the closest point extracted from image data after registration of 0.29 and 0.46 mm were observed; no correlation was observed between this value and final matching accuracy (Pearson’s correlation coefficient 0.14). Note that these matching error values may be affected by a number of factors including the density of the extracted point cloud, accidental removal of the tool tip during digitization, failure to sufficiently remove tissue from the bone surface and errors in the detection of the surface in image data.

Discussion

Presented above is work toward the realization of a high-accuracy method for registration of the lateral skull base without the use of bone-anchored fiducial screws. The described method utilizes the standard clinical practice of retroauricular bone exposure to allow the direct digitization of the surface of the mastoid surface using a tracked pointer with a 1.27 mm sphere at the tip. The developed technique was evaluated on a total of 13 temporal bone specimens with an accuracy of 0.23 \(\pm \) 0.1 mm achieved at a reference point on the round window of the cochlea, relative to a ground truth registration based on the use of bone-anchored fiducials.

The accuracy of the fiducial-based (ground truth) method has been well characterized in the past, with an accuracy of 0.1 \(\pm \) 0.04 mm observed relative to ground truth coordinate measurement machine (CMM) measurements [1]. The acquisition of multiple ground truths ensures the validity of the error calculations in the event of changes in the experimental setup, particularly the movement of the patient reference marker, but the variability of the acquired transformations may affect the reported accuracy. Subsequently, although the use of fiducials as a ground truth is not ideal, it does allow the evaluation of the developed method above, approximately, the previously characterized level (below this level it becomes unclear if errors are from the ground truth or evaluated method). Note that the methodology for fiducial digitization is slightly different in this study than that described in [1]; however, significant differences have not been observed when utilizing freehand digitization of the screw positions when compared to the previously described robotic method. Thus variations in the ground truth reference positions on the order of \(\pm 0.04\) mm can be expected. Note that one temporal bone was excluded from evaluation due to a problem with the ground truth registration; the exact cause of this remains unclear; however, in retrospect we suspect the error to have been caused by the presence of debris in one or more of the fiducial screws during digitization.

The achieved results represent a significant improvement over previously reported accuracies utilizing surface matching. The question remains, to what factors can we attribute this improvement? It is likely due to a number of aspects: The opportunity provided by the incisions to directly digitize the surface of the bone should lead to improvements over methods which rely on the direct digitization of the skin due to the rigidity of the digitized surface; however, the major factor was likely the utilization of a high-accuracy tracking system (CamBar B1, Axios3D, Germany) in combination with custom active tracking markers. The system in combination with the developed markers has demonstrated a mean accuracy of 0.024 \(\pm \) 0.014 mm (with a maximum error of 0.093 mm) in static positioning tests throughout the tracking system workspace [17]. This represents an improvement of approximately an order of magnitude with respect to the tracking hardware employed by the majority of available state of the art surgical navigation systems. Furthermore, it is likely that the use of high resolution CT data contributed to the improved accuracy. While clinical imaging equipment and protocols were utilized, the resolution utilized here may be higher than typically employed in clinical routine. The use of lower-resolution imaging would likely lead to decreased accuracy in image surface extraction and subsequently to decreased registration accuracy. Beyond these improvements, the technique as defined above utilizes standard algorithms for both the extraction of the surface from preoperative image data (marching cubes) and for the final matching step (point-to-plane ICP). Optimization in each of these areas may lead to further increases in the achievable registration accuracy; however, an improved ground truth may be required to determine the extent of these improvements.

The use of a standard ICP variant for the matching of point clouds additionally affects the registration workflow, as the algorithm requires a good initial guess in order to avoid falling into local minima; in this case anatomical landmarks on the posterior surface of the external auditory canal were utilized. Previous work has investigated alternative methods for initialization as well as variations on the selected ICP algorithm [18]; however, these alternatives did not achieve sufficient accuracy in simulation and phantom testing. Furthermore, the accuracy of the initial registration step does not correlate with the final achieved accuracy and does not add significant additional time to the registration process.

The defined evaluation process allows the assessment of the registration accuracy without requiring the user to repeatedly select a landmark on the temporal bone, which may be highly inaccurate. Thus the TRE results reported here do not include target localization error (TLE) and thus will be lower than reported in other studies in which the target must be explicitly digitized by the user.

From a clinical point of view, the developed technique can be completed quickly, with the collection of initialization points and surface requiring less than three minutes per trial; in all cases the final matching step required less than 5 s for calculation of the final registration matrix. The available digitization region is relatively poor in features, particularly as the region moves posteriorly away from the wall of the external auditory canal. Thus it is of vital importance that the features that are available, particularly the lateral posterior curve of the ear canal, the spine of Henle and the temporal line, are encompassed within the digitization region. The prominence of these features varies widely between individual patients, but each approximately constrains one degree of translational freedom (posterior EAC wall: dorsal-ventral translation, temporal line: cranial–caudal translation), with the surface of the mastoid constraining lateral–proximal translation of the point cloud. With the digitization of these features an accurate registration relative to the ground truth could be achieved in all cases.

The rigid fixation of the reference marker to the temporal bone requires the use of a single bone screw, with three pins providing stabilization, thereby increasing the invasiveness of the procedure. Accurate navigation within the region of the lateral skull base requires the tracking of the patient through the attachment of a reference marker, and while non-rigid attachment methods may be considered, they will likely compromise the overall navigation accuracy. Thus we consider the additional invasiveness justified if high-accuracy navigation is required; note also that the reference attachment does not introduce the workflow problems introduced by the use of fiducials for registration, which must be implanted prior to preoperative imaging and planning. The geometry of the reference attachment frame was not optimized for free hand navigation, and the position of the marker was constrained by the size of the temporal bone. Optimization of these factors may be possible and will be the subject of further evaluation in a clinical setting.

As the digitization of the surface in this study was performed only by a single user with a good understanding of the effects of the digitized features on the final accuracy of the matching algorithm, the results may not represent the accuracy achievable by a less experienced user. The experience of performing surface digitization in a lab setting on temporal bone specimens may also be dramatically different than that in a real clinical setting. Subsequently, although the reported results represent an excellent initial step, further evaluation in more clinically relevant scenarios, as well as evaluation of user variability, is required. This evaluation should also include an evaluation of the robustness of the algorithm to issues such as the temporary removal of the tip of the tool from surface of the bone (although this may have unknowingly occurred during these trials), or a method for detecting such errors defined. Additionally, as a total sample size of thirteen specimens is unlikely to capture the full range of anatomical variability throughout the population further evaluation with respect to differences in individual patient anatomy is also required.

Conclusions

A method for the registration of the temporal bone through the digitization of the mastoid surface using a tracked pointer, after exposure of the bone according to standard clinical practice, was proposed and evaluated. A mean registration accuracy of 0.23 \(\pm \) 0.1 mm was achieved at a target position on the round window of the cochlea. Future work will focus on further improvement of the accuracy and robustness of the process, as well as clinical evaluation of the technique.