2D/3D Real-Time Tracking of Surgical Instruments Based on Endoscopic Image Processing

Agustinos, Anthony; Voros, Sandrine

doi:10.1007/978-3-319-29965-5_9

Anthony Agustinos¹⁷ &
Sandrine Voros¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9515))

Included in the following conference series:

Computer-Assisted and Robotic Endoscopy

1329 Accesses
7 Citations

Abstract

This paper describes a simple and robust algorithm which permits to track surgical instruments without artificial markers in endoscopic images. Based on image processing, this algorithm can estimate the 2D/3D pose of all the instruments visible in the image, in real-time (30 Hz). The originality of the approach is based on the use of a Frangi filter for detecting edges and the tip of instruments. The accuracy of the instruments’ location in the image is evaluated using an extensive dataset (1500 images, 3 laparoscopic surgeries). Pose estimation of instruments in space is quantitatively evaluated on a test bench through comparison with the ground truth positioning provided by a calibrated robotic instrument holder. This method opens perspectives in the real-time control of surgical robots and the intra-operative recognition of surgical gestures.

Access provided by Autonomous University of Puebla. Download conference paper PDF

2D-3D Pose Tracking of Rigid Instruments in Minimally Invasive Surgery

Image Based Surgical Instrument Pose Estimation with Multi-class Labelling and Optical Flow

Comparative Study of Two Laparoscopic Instrument Tracker Designs for Motion Analysis and Image-Guided Surgery: A Technical Evaluation

Keywords

1 Introduction

Laparoscopic surgery is a minimally invasive procedure. This technique reproduces the principles of conventional surgery with minimal physical trauma. Compared to open surgery, this approach is more beneficial to the patient but significantly increases the complexity of the surgical gestures. The constraints for surgeons are mostly ergonomic with the manipulation of surgical instruments (reduction of instrument mobility due to fixed insertion points on the abdominal cavity, loss of tactile sense) and the visualization of the surgical scene (limited field of view, indirect view of the surgical scene, endoscope manipulation). The realization of a laparoscopy requires a large adaptability from surgeons and requires a long learning curve.

Automatic localization of instruments can be helpful to respond to several limitations of laparoscopy and to assist surgeons during an intervention. For instance, [1] propose to localize instruments in space in a surgical trainer, based on a projective model and gradient image processing. In [2], a similar approach is proposed (also in a surgical trainer), with the addition of an extended Kalman filter to extract the edges of instruments.

In [3], the authors use the instrument insertion point as a constraint and a probabilistic algorithm to find instruments with the aim of controlling a robotic endoscope holder to assist surgeons during surgery.

All these methods use a gradient approach to extract instrument edges in the image. However, such approaches are sensitive to noise, illumination and shadows that can lead to insufficient segmentation for robust localization of instruments in the image [4]. To overcome this problem, we propose to use a 2D Frangi filter [5] to obtain a robust instruments edge detection. We present an algorithm to localize and track surgical instruments in endoscopic images in real-time. Our algorithm also permits to estimate the 3D position and orientation of the instruments using 2D information in the images, knowing the camera and instrument models.

2 Instrument Localization and Tracking Framework

The principle of our instrument detection algorithm consists in:

roughly identifying all regions corresponding to the location of an instrument in each laparoscopic image Sect. 2.1,
refining the instruments detection within the identified regions Sect. 2.2,
estimating the 3D pose of the instrument Sect. 2.3.

After an initial detection, the segmentation is constrained by the localization in the previous images to track the instrument.

2.1 Rough Extraction of Instruments Regions

First, the laparoscopic color image (Fig. 2a) is converted from the RGB color space to the CIELab color space. The L channel, corresponding to the luminance is removed to free ourselves from variations of light inherent to laparoscopic surgery. We thus obtain a grayscale image composed of the a and b channels (Fig. 2b) corresponding to the chromaticity $C_{ab}=\sqrt{a^2+b^2}$. Using this color space is more robust for challenging images than color spaces commonly used such as HSV [7] or RGB, see Fig. 1. We then binarize this grayscale image using an automatic Otsu thresholding approach [8]. Since the laparoscopic instruments have a color very distinct from the background (laparoscopic tools are usually black, metallic, or blue/green), instrument pixels will appear as white pixels whereas background pixels will appear as black (Fig. 2c). Of course, this pre-processing step is noisy, with background pixels appearing as white and tool pixels appearing as black (Fig. 2c). We disconnect the regions by skeletonizing the image using a simple distance transform [9] and refine the separation by performing a simple erosion step on a cross-shaped kernel (Fig. 2d). Finally, we use a contour detection algorithm [10] to extract the extreme outer contour of each region as an oriented bounding box (see Fig. 3b). Based on the observation that laparoscopic instruments have a long and thin cylindrical shape, we eliminate bounding boxes with a width/length ratio inferior to 2 (red boxes in Fig. 3c).

2.2 Fine Extraction of Instrument Edges

Now that we have potential bounding boxes for the instruments, we search for instrument edges within each bounding box. To do so, we use a Frangi filter [5], which is the major contribution of this paper. We compared the Frangi filter to the classical Canny filter [6] to search instrument edges (see Fig. 4). The Canny filter is the most classical gradient approach based on the Sobel filter. This filter uses a hysteresis thresholding that requires to find two optimal thresholds for accurate extraction of the edges of an instrument. However, as shown in Fig. 4, the conditions of the surgical scene evolves during an intervention, thresholds initially determined may no longer be optimal and cause of false detections. The advantage of the approach based on the Frangi filter is that it can be applied to different surgery conditions without adjusting the filter parameters. This filter is classically used in vessel detection in medical images. It is based on the computation of the eigenvalues of the image’s Hessian matrix $\lambda _1, \lambda _2$ such that $|\lambda _1|\leqslant |\lambda _2|$. The Hessian matrix is obtained by convolving the image with derivatives of a Gaussian kernel with standard deviation $\sigma $.

The Frangi filter function can be defined as:

$$\begin{aligned} \left\{ \begin{matrix} {0} &{} {\text {if}~ \lambda _2>0,}\\ {V_0=\exp (-\frac{R_B^2}{2\beta ^2})(1-\exp (-\frac{s^2}{2c^2}))} &{} \end{matrix} \right. \end{aligned}$$

(1)

where, $R_B = \frac{\lambda _1}{\lambda _2}$ is the blobness measure, $s=\sqrt{\lambda _1^2+\lambda _2^2}$ is the structureness measure and $c,~\beta $ are parameters to adjust the filter sensitivity. After applying the Frangi filter, each pixel value $V_0$ of the image indicates the pixel’s probability of belonging to a tubular structure. Here, we do not use the Frangi filter to extract the whole cylindrical shape of the instrument. Indeed, the instrument’s diameter in the image varies depending on its relative orientation with respect to the endoscope (i.e. we cannot fix the standard deviation $\sigma $). Instead, we apply the filter with a very low $\sigma $, in order to highlight the instrument edges (Fig. 5b). Finally, we identify the two borders of an instrument: the bounding box is extended and separated into two areas to search the top and bottom borders of the instrument separately using Hough transform [11] with a very low threshold, as illustrated in Fig. 5b. At this step, we can eliminate lines that are incompatible with a surgical instrument based on the relative orientation and position of the detected lines (as illustrated by Fig. 5c).

2.3 Estimation of 3D Pose of the Instruments

The two borders of an instrument define two tangent planes $\varvec{\small {\sum }}_\mathbf{i}$ of normal $\mathbf{n_i}$ passing through the optical center of the camera $\mathbf{C}$ in space (see Fig. 5g). The camera calibration can be obtained with a classical chessboard calibration procedure such as [12]. The intersection of these two planes is a line $\mathbf{D:(C,e_1)}$ parallel to the central axis of the instrument passing through the optical center of the camera with a direction vector $\mathbf{e_1}$. This line defines the instrument’s central axis direction in space. In order to fully describe the tool’s orientation in space, we need to find a point $\mathbf{P}$ on the instrument’s axis. To do so, we follow the approach proposed in [3]: the instrument is modeled as a finite cylinder of radius ${\rho }$ (see Fig. 5g). Such a point $\mathbf{P}$ can be easily computed on the plane perpendicular to the instrument’s axis (Fig. 5h). Indeed, $\mathbf{P}$ must respect the condition:

$$\begin{aligned} \lambda \mathbf{m_1}-\rho \mathbf{n_1}=\lambda \mathbf{m_2}+\rho \mathbf{n_2} \end{aligned}$$

(2)

where $\mathbf{m_i=e_1 }\otimes \mathbf{n_i}$, $\lambda $ is the distance from the optical center to tangent points $\mathbf{S_i}$ and $\mathbf{n_i}$ the normal to the plane $\mathbf i$. Using Eq. 2, we can compute $\lambda $ and obtain:

$$\begin{aligned} \overrightarrow{\mathbf{CP}}=\lambda \mathbf{m_1}-\rho \mathbf{n_1}=\rho \frac{\left\| \mathbf{n_1}+\mathbf{n_2} \right\| ^2}{(\mathbf{m_1}-\mathbf{m_2}).(\mathbf{n_1}+\mathbf{n_2})}{} \mathbf{m_1}-\rho \mathbf{n_1} \end{aligned}$$

(3)

Then, we search the position of the instrument’s tip $\mathbf{t_{im}}$, in the Frangi image along the projection of the instrument’s axis $\mathbf{(P,e_1)}$ in the image (see Fig. 5e). The pixel along the line with maximum grey level in the Frangi image is considered as the tip. Finally, we find the 3D position of the instrument’s tip $\mathbf{T}$ as the intersection of $\mathbf{(P,e_1)}$ and the projection line of the tool’s tip $\mathbf{(C,t_{im})}$.

2.4 Tracking of Surgical Instruments

For our instrument tracking algorithm, we assume that between two successive images, an instrument does not undergo large displacements. In the initial step (first image), we find the instrument as described in Sects. 2.1 and 2.2. In the following images, we find the candidate bounding boxes, but we refine the instrument search only inside the bounding box best compatible with the position/orientation of the instrument in the previous image. If the instrument is not found in several images, we re-initialize the algorithm. In the case of several instruments, it is possible to track all the visible instruments or a particular one. Since only one instrument can be inserted at once through an insertion point $\mathbf{I}$ on the abdominal wall, we can identify an instrument thanks to its insertion point, which can be easily computed using a pivot algorithm on $\mathbf{(P,e_1)}$.

Table 1. Laparoscopic images statistics

Full size table

3 Experiments and Results

Our algorithm is implemented in C$^{++}$ using OpenCV and OpenMP libraries. For the computations, we used an Intel Xeon PC 2.67 GHz, 3.48 GB RAM. The 2D evaluation was performed on real laparoscopic images (720$\,\times \,$556). The 3D evaluation was performed on a laparoscopy test bench using an OLYMPUS OTV600 CCD and an IC Imaging Source grabber ($720\,\times \,480$, 25 fps). To achieve a fast processing time the image resolution is divided by 2 for the region extraction and by 4 for the Frangi filter. We evaluated 2D tracking of our algorithm on three in-vivo video sequences of laparoscopic rectopexies obtained through the Digestive Departement of Grenoble Hospital with challenging situations (see Fig. 6).

In these images, the tip position and orientation of the instrument were compared to manual annotation. The results obtained for each sequence are presented in Table 1 with a mean error of 16.10 pixels (std. dev. of 28.98) for the tip position, a mean error of 0.90$^{\circ }$ (std. dev. 0.88$^{\circ }$) for the 2D orientation and a frequency of 30 Hz. Videos of this evaluation are included in supplementary material.

To evaluate the accuracy of the 3D pose estimation, we performed experiments on a testbench (see Fig. 7) consisting of a surgery trainer box on which a commercial robotic instrument holder is directly positioned, and a printout of a surgical scene as background. We compared the 3D tip position of the instrument found by our algorithm to the 3D tip position given by the robot expressed in the camera referential. This required calibrating the system to find the rigid transformation $\mathbf{T}$ between the robot and camera frame such that: $\mathbf{p_{cam}^{frangi}=Tp_{cam}^{robot}}$. $\mathbf{T}$ is obtained by pointing 12 points of a chessboard, for 6 chessboard positions, with the instrument carried by the robot (see Fig. 8). These 12 points can be expressed in the camera frame thanks to a standard extrinsic camera calibration procedure [12] and are also measured in the robot frame. We resolve a classical least squares system to find the rigid transformation between the two sets of 3D points coupled with a RANSAC to eliminate outliers. We obtain a camera calibration Root Mean Square (RMS) error of 0.25 pixels and $\mathbf{T}$ with a RMS error of 1.2 mm. Figure 9 shows an example of the robot trajectory and of our tracking method for a series of instrument movements. The results for 380 measurements are presented in Table 2. In all results presented, we fixed the Frangi filter parameters as $\sigma =2$, $\beta =0.5$ and $c=0.5\text {max}(s)$, according to recommandations from the literature.

Table 2. Error of the 3D pose estimation with our method compared to the position obtained with a robotic instrument holder

Full size table

4 Conclusion

We presented a surgical instrument tracking algorithm based on image processing. It permits to estimate the 2D/3D instruments pose in real-time without artificial fiducials. An extensive 2D evaluation on real surgical videos shows that our 2D pose estimation is accurate and robust on wide range of realistic cases. In difficult situations as a suture gesture, we can lose accuracy in the instrument’s tip position but the orientation is still correct. A machine learning approach as [13], applied in the neighbourhood of our estimated tip position could increase the accuracy of the tip detection. Our approach for 3D pose estimation was validated on a testbench using a printout of a surgery background. Although this might lack realism we estimated that the robustness of the proposed method on realistic images was already shown extensively on the 2D case. This 3D evaluation provides us with the precision range we can expect when the 2D detection works well. The greatest errors are found in the depth estimation along the z axis. This error could be reduced by using a stereoscopic endoscope.

Our 2D localization approach is robust and accurate enough to control a robotic endoscope holder. Even if the Frangi filter might not be the most obvious approach for edge detection, we showed that it works better than classical approaches. Other more sophisticated edge detection approaches could easily be compared on our image database. The 3D pose estimation could be useful for surgical gesture recognition or for co-manipulation, if we are able to increase the depth precision. Another application could be the online calibration of no rigidly-linked robotic endoscope and instrument holders, which could lead to less bulky surgical systems. Our next step will be to evaluate the 3D pose estimation more extensively in conditions closer to the clinical reality (cadaver experiments).

References

Cano, A.M., Gayá, F., Lamata, P., Sánchez-González, P., Gómez, E.J.: Laparoscopic tool tracking method for augmented reality surgical applications. In: Bello, F., Edwards, E. (eds.) ISBMS 2008. LNCS, vol. 5104, pp. 191–196. Springer, Heidelberg (2008)
Chapter Google Scholar
Zhou, J., Payandeh, S.: Visual tracking of laparoscopic instruments. J. Autom. Control Eng. 2(3), 234–241 (2014)
Article Google Scholar
Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instruments using statistical and geometric modeling. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part I. LNCS, vol. 6891, pp. 203–210. Springer, Heidelberg (2011)
Chapter Google Scholar
Allan, M., Ourselin, S., Thompson, S., Hawkes, D.J., Kelly, J., Stoyanov, D.: Toward detection and localization of instruments in minimally invasive surgery. IEEE Trans. Biomed. Eng. 60(4), 1050–1058 (2013)
Article Google Scholar
Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Wells, W.M., Colchester, A.C.F., Delp, S.L. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 130–137. Springer, Heidelberg (1998)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Doignon, C., Graebling, P., de Mathelin, M.: Real-time segmentation of surgical instruments inside the abdominal cavity using a joint hue saturation color feature. Real-Time Imaging 11(5), 429–442 (2005)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Distance transforms of sampled functions. Cornell University. (2004)
Google Scholar
Suzuki, S.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985)
Article MATH Google Scholar
Hough, V., Paul, C.: U.S. Patent No. 3,069,654. U.S. Patent and Trademark Office, Washington (1962)
Google Scholar
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Article Google Scholar
Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 692–699. Springer, Heidelberg (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

UJF-Grenoble 1, CNRS, TIMC-IMAG UMR 5525, Grenoble, 38401, France
Anthony Agustinos
UJF-Grenoble 1, INSERM, TIMC-IMAG UMR 5525, Grenoble, 38401, France
Sandrine Voros

Authors

Anthony Agustinos
View author publications
You can also search for this author in PubMed Google Scholar
Sandrine Voros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Agustinos .

Editor information

Editors and Affiliations

Xiamen University, Fujian, China
Xiongbiao Luo
KUKA Robotics, Augsburg, Germany
Tobias Reichl
Johns Hopkins University, Baltimore, Maryland, USA
Austin Reiter
University of Texas at Arlington, Arlington, Texas, USA
Gian-Luca Mariottini

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (avi 612 KB)

Supplementary material 2 (avi 955 KB)

Supplementary material 3 (avi 870 KB)

Supplementary material 4 (avi 1164 KB)

Supplementary material 5 (zip 31 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agustinos, A., Voros, S. (2016). 2D/3D Real-Time Tracking of Surgical Instruments Based on Endoscopic Image Processing. In: Luo, X., Reichl, T., Reiter, A., Mariottini, GL. (eds) Computer-Assisted and Robotic Endoscopy. CARE 2015. Lecture Notes in Computer Science(), vol 9515. Springer, Cham. https://doi.org/10.1007/978-3-319-29965-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-29965-5_9
Published: 20 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29964-8
Online ISBN: 978-3-319-29965-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics