Abstract
Purpose
Surgical cameras are prevalent in modern operating theatres often used as surrogates for direct vision. A surgical navigational system is a useful adjunct, but requires an accurate “hand-eye” calibration to determine the geometrical relationship between the surgical camera and tracking markers.
Methods
Using a tracked ball-tip stylus, we formulated hand-eye calibration as a Perspective-n-Point problem, which can be solved efficiently and accurately using as few as 15 measurements.
Results
The proposed hand-eye calibration algorithm was applied to three types of camera and validated against five other widely used methods. Using projection error as the accuracy metric, our proposed algorithm compared favourably with existing methods.
Conclusion
We present a fully automated hand-eye calibration technique, based on Procrustean point-to-line registration, which provides superior results for calibrating surgical cameras when compared to existing methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
In the context of minimally invasive surgery (MIS), surgical cameras such as laparoscopes, arthroscopes, and pass-through head-mounted displays (HMD) are often used as a surrogate for direct vision. They provide a superficial view of the anatomy and are incapable of visualizing internal structures beneath the organ surface [14]. One way to enhance surgical video is to overlay preoperative medical images (such as CT and MRI) or intraoperative ultrasound directly onto the video in an augmented reality (AR) environment. This necessitates a highly accurate hand-eye calibration [11, 24] between the optical axis of the camera and the spatial measuring device. Inaccurate calibration would result in the misalignment between virtual and real objects in the image overlay, creating additional mental burden for the surgeons and potential errors for instrument placement. Once properly calibrated, advanced visualization techniques can be used to facilitate in surgical planning [1] and surgical guidance [16, 18].
Hand-eye calibration is an active research topic in robotics [22], where the camera view (i.e. the eye) must be linked with the kinematics of the robotic systems (i.e. the hand). Most of the approaches rely on imaging salient features of a stationary object from different poses and then solving for rotation and translation either separately [24], jointly [5], or iteratively [6, 10]. Hand-eye calibration in a surgical environment is non-trivial [23], since issues such as sterilization, sensor attachment, and real-time computation requirements remain challenging. In tracked MIS, surgical instruments and patient anatomy are augmented with a dynamic reference frame (DRF), allowing their poses to be determined in a common coordinate system. Thus, an additional tracked calibration device may be used to aid the process of hand-eye calibration. Perhaps the simplest method is the Procrustean approach, where the hand-eye calibration is reduced to paired-point registration [12, 21]. Both Voruganti and Bartz [25] and Chen et al. [4] used a calibrated, tracked planar chessboard pattern for hand-eye calibration, where the 3D position of the chessboard corners is determined in both the tracker’s coordinate system (via tracking) and the camera’s optical axis (by solving some forms of Perspective-n-Point problem). Calibration of a chessboard to its DRF is a possible contributor to the fiducial localization error (FLE), and while simple and effective, the reported accuracy for these approaches remains sub-optimal [4].
Contribution
We present an accurate hand-eye calibration framework linking a surgical camera and an external spatial measurement device. This framework requires minimal user interaction and compares favourably with other algorithms found in the current literature. We demonstrate the applicability of this framework to three different types of camera commonly used in medical training, surgical planning and image-guided surgery. We propose to use a ball-tip stylus as the calibration device and formulate the hand-eye calibration as a Procrustean point-to-line registration. The numerical algorithm has a very compact formulation (“Appendix”), requiring minimal measurements (typically 12–15 tracked images) to converge to a stable and accurate calibration.
Methods
Without loss of generality, we assume the optical characteristics of the camera lens have been determined accurately by other means (such as Zhang [27]) and the images are un-distorted (i.e. both the radial and tangential distortions are removed). All lenses are assumed to have a fixed focal point. A passive optical spatial measurement device (Spectra, NDI, Canada) was used for this study, although other forms of tracking can be easily incorporated.
Apparatus and image processing
Three cameras were used in this study (Fig. 1): a commercial webcam (C920, Logitech, USA), a HMD with pass-through stereo camera (Ovrvision Pro, Ovrvision Inc., Japan), and a stereo laparoscope (surgical laparoscope, Olympus). Camera specifications are listed in Table 1. For the purpose of evaluating the efficacy of our framework, the DRFs are rigidly attached as close to the camera lenses as possible. While such spatial arrangement may not be clinical plausible (in the case of the rigid laparoscope), such arrangement has the advantage of minimizing tracking error due to lever arm effect of tracking uncertainty.
We formulate the hand-eye calibration as a Perspective-n-Point problem that can be solved efficiently using a Procrustean point-line registration [3]. A calibration tool in the form of a ball-tip stylus was designed, where the size of the ball tip accommodates the viewing depth of a particular camera (Fig. 2). The ball tip was painted red to facilitate automatic segmentation. The centre of the ball tip can be accurately calibrated by pivoting it against a hemispherical divot of matching radius, a hollow inverted-cone divot, or a hollow tube divot.
When imaged by a camera, the red ball tip is projected as a circular pattern, which can be segmented automatically. First, the acquired colour image (Fig. 3a) is transformed into HSV colour space of which a colour thresholding technique is applied to locate red objects (Fig. 3b). The colour-thresholded image is smoothed by a Gaussian and a median filter to reduce noise and artefacts. A Hough transform is then applied to detect a circular pattern within the smoothed image, which provides a segmentation of the centroid of the projected ball tip with sub-pixel accuracy (Fig. 3c). The detected circle is drawn back to the segmented image for visual validation (Fig. 3d). The detected sub-pixel location of the ball tip is recorded in conjunction with the tracker pose. This process is fully automatic and eliminates possible sources of error due to manual interaction and segmentation.
Hand-eye calibration as Perspective-n-Point
Given the camera matrix, a 3D coordinate system can be defined that is centred at the principal point of the camera lens, pointing in the direction of the focal point. Using the ideal pin hole camera model, the intrinsic parameters of a camera can be represented as:
where \((f_x,f_y)\) and \((c_x,c_y)\) are the focal point and principal point, respectively. A point \(Q=(X,Y,Z)^T\) in 3D space can be projected onto the image by:
where \(q=(x,y,w)^T\). Given a pixel location, however, only the corresponding ray can be computed by:
Using the camera model in a canonical form, a pixel can be represented in a homogeneous coordinate system of \(q=[x,y,1]^T\).
Using the calibrated ball-tip stylus as a calibration device, for each measurement we record the 3D location of the centre of the ball tip (\(Q_i = [X_i,Y_i,Z_i]^\mathrm{T}\) in tracker space), as well as its projection onto the image (\(q_i=[x_i,y_i,1]^\mathrm{T}\)). A line r emanating from the centre of the camera through the centroid of the projected ball can be formulated using Eq. 3 which, after calibration, must pass through the centre of the tracked ball tip (Fig. 4a). This scenario is identical to the Perspective-n-Point (PnP) problem in computer vision, for which we previously presented an efficient solution [3]. Our algorithm requires only simple matrix operations, and the computational requirement is minimal. Refer to “Appendix” for a MATLAB implementation.
Since there is a one-to-one correspondence between the points and lines, we refer to our algorithm as the Procrustean point-line calibration. In our prior work [3], this algorithm was compared against six other well-known PnP algorithms including Efficient PnP and Efficient PnP with Gauss–Newton refinement [15], Procrustean PnP [9], generalized Procrustean PnP [8], generalized Fiore algorithm [8], and the Orthogonal Iteration algorithm [17]. We concluded that our point-line solution, despite having a very compact formulation, performed favourably against other algorithms. We also demonstrated that our algorithm requires a minimum of three paired point-line measurements to establish a stable solution (assuming arbitrary line orientation), with the accuracy of the registration (as assessed by the target registration error, or TRE) increasing as a number of measurements increase. The sharpest drop in TRE occurs after 6–7 measurements, reaching a plateau after 12–15 measurements are acquired [2, 3].
Validation
Several well-known hand-eye calibration algorithms with the open- source implementation are available in the current literature: Tsai [24] presented an algorithm that utilizes a series of images rotated around a calibration board. The Navy algorithm [19] formulated a technique by solving AX = XB on the Euclidean Group. The Inria [11] calibration technique utilizes the calibration frame instead of the camera frame to reduce error. Dual [5] is a calibration algorithm that writes the line transformation with the dual quaternion product for relative position and orientation. Lastly, Branch and Bound [10] by Heller et al. minimizes an objective function based on the epipolar constraint and was shown to be globally optimal with respect to the \(L_{\infty }\) norm. These calibration techniques provide a basis for comparison for our proposed solution.
The ball-tip stylus is used as a validation tool, providing an automatic assessment framework with minimal user interaction. Each camera was calibrated using six hand-eye calibration algorithms. An independent set of images capturing the ball-tip stylus at varying depths was acquired. The projection errors, defined as the difference (in pixel space) between the 3D location of the ball tip (in tracker space) projected onto the image and the centre of the segmented circular centre (in video), are reported for each camera/hand-eye calibration algorithm combination (Fig. 4b). We choose not to report the back-projection error, which associates the error with a physical unit, as both projection and back-projection represent the same measure of quality (which is angular error) but expressed in different coordinate systems.
Data collection
The intrinsic parameters for all three cameras were determined using the method described by Zhang [27] as is implemented in OpenCVFootnote 1 using a square-checkerboard pattern. Each camera was calibrated multiple times to ensure accuracy and consistency. Per-camera intrinsic parameters and validation data were held constant when testing different hand-eye calibration methods, effectively isolating the detected errors to algorithm behaviour. The ball-tip stylus was calibrated multiple times using pivot calibration [26], achieved using an inverted cone as a reciprocal surface for ideal pivoting. A typical root-mean-square pivot calibration error for the two styluses shown in Fig. 2 was less than 0.5 mm, which can be achieved consistently.
For all acquired tracked images, the ball-tip stylus was stabilized using a passive arm and moved throughout the viewing frustum of the camera. Static images were acquired and associated with the tracking data. Each measurement, including moving the passive arm, typically took less than 10 s to acquire. To achieve accurate hand-eye calibration, the ball-tip stylus was move throughout the viewing frustum to maximize the spread of the fiducial placement [2]. Based on our prior work [3], we collected 12–15 measurements for each camera to derive the hand-eye calibration using our algorithm, as more measurement may not necessary improve the fitness of the calibration in any meaningful way.
An independent set of tracked images were acquired for validation purposes, using the ball-tip styluses, for each of the three cameras. All hand-eye calibration algorithms were evaluated using the same set of validation data, effectively isolating the difference in the projection error to algorithm behaviour. All validation images were acquired with varying stylus poses (particularly the orientation), so that any bias or inaccuracy in pivot calibration would be apparent in our validation data analysis (“Results” section).
Results
Accuracy of the hand-eye calibration as assessed by projection error can be visualized directly on the tracked images, as shown in Fig. 5. Using the ball-tip stylus, the centroid of the circular projection is automatically segmented (“Apparatus and image processing” section). The segmented centre is drawn on the original colour image, serving as the ground truth (shown as grey marker). The 3D location of the tracked stylus is then projected through the hand-eye calibration (as well as camera intrinsics) and displayed on the image as marker of different colours. The Euclidean distance between these colour marks compared to the ground truth is noted as the projection error (in pixels).
The accuracy of the commercial webcam as assessed by the projection error is shown in Fig. 6, with the distance between the ball tip to the centre of the camera ranges from 150.0 mm 700.0 mm. For this particular camera, we were unable to achieve a repeatable hand-eye calibration using Tsai [24] algorithm after several failed attempts; thus, its result is omitted from Fig. 6. The results for Inria, Navy, and Dual algorithms produced similar results and therefore overlap on the graph. The Tsai algorithm produced largest projection error due to unstable calibration, whereas our proposed Procrustean point-line algorithm produced the best result. The Branch and Bound algorithm produces consistent results, outperforming the Inria, Navy, and Dual algorithms. Our result suggests that there is no correlation between the projection error and the distance from the camera. A summary of the accuracy for the webcam is listed in Table 2.
Using the result of these hand-eye calibrations, an augmented reality (AR) visualization system was implemented which may be used for the purposes of training and instruction. A patient-specific lumbar vertebra (L2) phantom was manufactured based on a patient CT and registered to an optical DRF. As shown in Fig. 7, the overlay of the virtual representation of the spine coincides precisely with the image.
Accuracy results for the pass-through HMD, as assessed by projection error, are shown in Fig. 8 and summarized in Table 3. They demonstrate a similar trend; our proposed Procrustean point-line solution performed the best, followed closely by the Branch and Bound algorithm. The projection error does not correlate with the distance between the tracked stylus and the camera.
When properly calibrated, the popular Tsai [24] algorithm produced accurate result, although slightly worse than the Branch and Bound [10] algorithm. For both the webcam and the see-through HMD, both Navy [19] and Inria [11] performed almost identically. Both the webcam and HMD pass-through cameras were calibrated using the custom tool, with a ball-tip size of 30.00 mm in radius (Fig. 2b).
The surgical laparoscope by Olympus, originally used in the daVinci Surgical System, has a limited field of view (roughly 250.00 mm in depth). We were only able to calibrate this laparoscope using our proposed algorithm and the Branch and Bound algorithm, possibly due to the optics and the data selection requirement [20]. The projection error as a function of the distance from camera is shown in Fig. 9 and summarized in Table 4. The stylus with a ball-tip size of 10.00 mm (Fig. 2a) was used to accommodate the limited field of view.
For the surgical laparoscope with limited field of view, our proposed algorithm outperformed the Branch and Bound [10] algorithm in terms of maximum and mean projection errors by a factor of 3. The surgical laparoscope was difficult to calibrate using the standard techniques, possibly due to the requirement to collect measurement in a dense and controlled manner [20]. For all three cameras, our proposed algorithm consistently performed better than existing techniques. Visualizations of a typical projection errors projected onto the image for the Olympus surgical laparoscope are depicted in Fig. 10.
Discussion
The results presented in this paper demonstrate that our proposed Procrustean point-line hand-eye calibration is well suited for navigated minimally invasive surgery. It is scalable for cameras with varying field of view, ranging from a commercial webcam to surgical camera. The versatility, ease of implementation as well as ease of data collection make our proposed algorithm a suitable candidate for either preoperative or intraoperative hand-eye calibration. In our laboratory setup, a highly accurate hand-eye calibration can be achieved using 12–15 images, which can be acquired in less than 3 min.
For all the cameras tested, our algorithm delivered the best performance when compared against five other well-understood algorithms. In particular, the Branch and Bound algorithm is considered as the state of the art, as it was shown to be globally optimal with respect to the \(L_{\infty }\)-norm. We note that all five publicly available algorithms rely on imaging salient features of an object from varying poses; thus, they are inherently sensitive to range of poses between measurements as well as the quality of the lens/images acquired. In particular, one needs to be careful to optimize the tracked image acquisition for both the Tsai [24] and the Branch and Bound algorithm [10], as when the pose space are not sampled densely enough these algorithms tend to fail. Schmidt et al. [20] addressed the issue of data selection in detail.
As we present our algorithm as a Procrustean registration, its performance can be understood in terms of fiducial localization error (FLE) and target registration error (TRE). It is well understood that the TRE is proportional to FLE and is influenced by the fiducial configuration [7]. The ball-tip stylus, as a calibrator, can be calibrated accurately through pivot calibration. The spherical tip is projected onto the image as a circle, which can be segmented accurately and robustly. Both of these factors minimize the contribution to FLE. In addition, since this calibrator can be placed anywhere within the viewing frustum of the camera, the fiducial configuration can be maximized in terms of the spatial relationship to any target inside the frustum. The ease of optimizing data collection in terms of fiducial configuration is a possible explanation to the superior performance of our proposed algorithm. One possible future direction is to optimize fiducial placement based on a TRE prediction model [2]. A simple heuristic such as placing 16 fiducials evenly across a \(4\times 4\) grid on the image will almost guarantee a reasonable calibration.
Other Procrustean approaches such as Voruganti and Bartz [25] employ a planar chessboard, which requires its own calibration. In our experience, calibration of the tracked chessboard, which often is aided with a calibrated stylus, also contributes to FLE [4]. The minimization of FLE contributed by three tracked objects requires a considerable amount of engineering effort. In our approach, since we are using a tracked stylus as a calibrator directly, the need to calibrate an intermediate calibrator is eliminated.
We acknowledge that using a stylus as both the hand-eye calibration and validation tool may potentially introduce a systematic bias in favour of our algorithm. The use of the ball-tip stylus, nonetheless, provided a common evaluation framework for all other hand-eye calibration techniques. Employing a separate validation apparatus may introduce additional error (such as its own calibration); thus, it would be ambiguous if experimental error originated from the calibration tool or validation tool. Furthermore, if there is a more accurate validation tool suitable for clinical deployment, this tool should be used for calibration instead. The same paradox has been recognized in the context of ultrasound calibration for quite some time [13]. In our validation framework, any rotation and translational error in the image plane can be detected quantitatively using the projected error. Translation error in the viewing direction of the camera would manifest itself as a scaling error, which would be apparent in image overlay. Figure 7 provides anecdotal evidence of minimal scaling error using our algorithm.
Our proposed point-line calibration is an iterative algorithm, sensitive to the line configuration. Suppose all measurements were made where lines are parallel, our point-line registration will never reach an unique solution. In the case of hand-eye calibration of a camera, where lines form a bundle of rays at the camera origin, we have shown that our algorithm always converges from an uninitialized state as long as more than six measurements are acquired [3]. Using a good initial estimate would reduce the number of the iterations required by our algorithm; the efficient PnP [15] or the weak-perspective camera model estimate used by orthogonal iteration [17] would serve as suitable initial estimates for our algorithm.
Building on the TRE prediction model for point-line registration that was recently introduced [2], one possible future direction is to assess the fitness of the hand-eye calibration as soon as a new measurement is acquired. Both our experimental results and theoretical prediction [3] suggest that accurate calibration can be achieved using 12 to 15 measurements, with diminishing improvement when more measurements are available. The ability to establish an accurate calibration using minimal data acquisition may be advantageous in a clinical setting.
The pass-through HMD, Oculus Rift, has a built-in spatial measuring device based on IRED and inertia sensors. As shown in Fig. 1b, these IRED emitters are visible by the optical spatial measuring device. Under the standard setting, these IREDs flash in a binary pattern for 2 s in every 20 to compensate for the temporal drift in its inertia sensors. We are exploring the possibility to track these IREDs directly using the optical spatial measuring device to improve tracking accuracy as well as to improve ergonomics.
Conclusion
We present a hand-eye calibration for surgical cameras, using a ball-tip stylus as a calibrator. The calibration is formulated as a Procrustean point-line registration, requiring only a small number of measurements and yet achieves high accuracy. We demonstrated that the proposed algorithm performed equally as well or better than the most common existing methods. The Procrustean formulation allows a target registration error prediction model to be used as a real-time assessment for the fitness of the calibration.
Notes
References
Abhari K, Baxter J, Chen EC, Khan A, Peters T, de Ribaupierre S, Eagleson R (2015) Training for planning tumour resection: augmented reality and human factors. IEEE Trans Biomed Eng 62(6):1466–1477
Chen ECS, Peters TM, Ma B (2016) Guided ultrasound calibration: where, how, and how many calibration fiducials. Int J Comput Assist Radiol Surg 11(6):889–898
Chen ECS, Peters TM, Ma B (2017) Which point-line registration? In: Proceedings of the SPIE, vol 10135, pp 1013509–13
Chen ECS, Sarkar K, Baxter JSH, Moore J, Wedlake C, Peters TM (2012) An augmented reality platform for planning of minimally invasive cardiac surgeries. In: Proceedings of the SPIE, vol 8316
Daniilidis K (1999) Hand-eye calibration using dual quaternions. Int J Robot Res 18(3):286–298
Dornaika F, Horaud R (1998) Simultaneous robot-world and hand-eye calibration. IEEE Trans Robot Autom 14(4):617–622
Fitzpatrick J, West J (2001) The distribution of target registration error in rigid-body point-based registration. IEEE Trans Med Imaging 20(9):917–927
Fusiello A, Crosilla F, Malapelle F (2015) Procrustean point-line registration and the npnp problem. In: Proceedings of the 2015 International conference on 3D vision, 3DV ’15, pp 250–255. IEEE Computer Society
Garro V, Crosilla F, Fusiello A (2012) Solving the PNP problem withanisotropic orthogonal procrustes analysis. In: 2012 Second international conference on 3D imaging, modeling, processing, visualization transmission, pp 262–269
Heller J, Havlena M, Pajdla T (2012) A branch-and-bound algorithm for globally optimal hand-eye calibration. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1608–1615
Horaud R, Dornaika F (1995) Hand-eye calibration. Int J Robot Res 14(3):195–210
Horn BK (1987) Closed-form solution of absolute orientation using unit quaternions. J Opti Soc Am 4:629–642
Hsu PW, Prager RW, Gee AH, Treece GM (2009) Freehand 3D ultrasound calibration: a review. Springer, Berlin
Kang X, Azizian M, Wilson E, Wu K, Martin AD, Kane TD, Peters CA, Cleary K, Shekhar R (2014) Stereoscopic augmented reality for laparoscopic surgery. Surg Endosc 28(7):2227–2235
Lepetit V, Moreno-Noguer F, Fua P (2008) EPNP: an accurate o(n) solution to the PNP problem. Int J Comput Vis 81(2):155–166
Lorensen W, Cline H, Nafis C, Kikinis R, Altobelli D, Gleason L (1993) Enhancing reality in the operating room. In: Proceedings of the IEEE conference on visualization, 1993, pp 410–415
Lu CP, Hager GD, Mjolsness E (2000) Fast and globally convergent pose estimation from video images. IEEE Trans Pattern Anal Mach Intell 22(6):610–622
Marescaux J, Rubino F, Arenas M, Mutter D, Soler L (2004) Augmented-reality-assisted laparoscopic adrenalectomy. JAMA 292(18):2211–2215
Park FC, Martin BJ (1994) Robot sensor calibration: solving ax = xb on the Euclidean group. IEEE Trans Robot Autom 10(5):717–721
Schmidt J, Niemann H (2008) Data selection for hand-eye calibration: a vector quantization approach. Int J Robot Res 27(9):1027–1053
Schönemann PH (1966) A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1):1–10
Shah M, Eastman RD, Hong T (2012) An overview of robot-sensor calibration methods for evaluation of perception systems. In: Proceedings of the workshop on performance metrics for intelligent systems, PerMIS ’12, pp 15–20. ACM
Thompson S, Stoyanov D, Schneider C, Gurusamy K, Ourselin S, Davidson B, Hawkes D, Clarkson MJ (2016) Hand-eye calibration for rigid laparoscopes using an invariant point. Int J Comput Assist Radiol Surg 11(6):1071–1080
Tsai RY, Lenz RK (1989) A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans Robot Autom 5(3):345–358
Voruganti AKR, Bartz D (2008) Alternative online extrinsic calibration techniques for minimally invasive surgery. In: Proceedings of the 2008 ACM symposium on virtual reality software and technology, VRST ’08, pp 291–292. ACM
Yaniv Z (2015) Which pivot calibration? In: Proceedings of the SPIE, vol 9415, pp 941527–941527–9
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Acknowledgements
This study was funded by Canadian Institutes of Health Research (CIHR #FDN 143232), Canada Foundation for Innovation (CFI, #20994), and Natural Sciences and Engineering Research Council of Canada (NSERC #RPGIN 2014-04504). Co-op student funding for Isabella Morgan was provided by Northern Digital Inc. (Canada).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Isabella Morgan, Uditha Jayarathne, Adam Rankin, Terry M. Peters, and Elvis C. S. Chen declare that they have no conflict of interest.
Informed consent
This article does not contain patient data.
Ethical standards
This article does not contain any studies with human participants or animals performed by any of the authors.
Appendix: MATLAB implementation for Procrustean hand-eye calibration
Appendix: MATLAB implementation for Procrustean hand-eye calibration
Rights and permissions
About this article
Cite this article
Morgan, I., Jayarathne, U., Rankin, A. et al. Hand-eye calibration for surgical cameras: a Procrustean Perspective-n-Point solution. Int J CARS 12, 1141–1149 (2017). https://doi.org/10.1007/s11548-017-1590-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-017-1590-9