Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery

Penza, Veronica; Ortiz, Jesús; Mattos, Leonardo S.; Forgione, Antonello; De Momi, Elena

doi:10.1007/s11548-015-1276-0

Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery

Original Article
Published: 26 September 2015

Volume 11, pages 197–206, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery

Download PDF

Veronica Penza^1,5,
Jesús Ortiz¹,
Leonardo S. Mattos¹,
Antonello Forgione^2,3,4 &
…
Elena De Momi⁵

1214 Accesses
28 Citations
3 Altmetric
Explore all metrics

Abstract

Purpose

Single-incision laparoscopic surgery decreases postoperative infections, but introduces limitations in the surgeon’s maneuverability and in the surgical field of view. This work aims at enhancing intra-operative surgical visualization by exploiting the 3D information about the surgical site. An interactive guidance system is proposed wherein the pose of preoperative tissue models is updated online. A critical process involves the intra-operative acquisition of tissue surfaces. It can be achieved using stereoscopic imaging and 3D reconstruction techniques. This work contributes to this process by proposing new methods for improved dense 3D reconstruction of soft tissues, which allows a more accurate deformation identification and facilitates the registration process.

Methods

Two methods for soft tissue 3D reconstruction are proposed: Method 1 follows the traditional approach of the block matching algorithm. Method 2 performs a nonparametric modified census transform to be more robust to illumination variation. The simple linear iterative clustering (SLIC) super-pixel algorithm is exploited for disparity refinement by filling holes in the disparity images.

Results

The methods were validated using two video datasets from the Hamlyn Centre, achieving an accuracy of 2.95 and 1.66 mm, respectively. A comparison with ground-truth data demonstrated the disparity refinement procedure: (1) increases the number of reconstructed points by up to 43 % and (2) does not affect the accuracy of the 3D reconstructions significantly.

Conclusion

Both methods give results that compare favorably with the state-of-the-art methods. The computational time constraints their applicability in real time, but can be greatly improved by using a GPU implementation.

Projective biomechanical depth matching for soft tissue registration in laparoscopic surgery

Article 26 May 2017

EMDQ-SLAM: Real-Time High-Resolution Reconstruction of Soft Tissue Surface from Stereo Laparoscopy Videos

Robust Dense Endoscopic Stereo Reconstruction for Minimally Invasive Surgery

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In open surgery of the abdomen, the trauma of laparotomy wounds is a source of infection or dehiscence and may increase the likelihood of postoperative chest infection, ileus and immobility [30]. The evolution of new minimally invasive approaches, such as minimally invasive surgery (MIS) and single-incision laparoscopic surgery (SILS), allows to perform the surgery by bringing the surgical instruments and endoscope to the surgical site through few or only one small incision, improving the physiological and immuno-responses with respect to open surgery [6], and reducing the trauma, without compromising the surgical quality [27, 31].

However, MIS and SILS introduce some limitations, including: (1) the insertion of the instruments through few access ports and their required length reduces the surgeon’s maneuverability at the surgical site; (2) the freedom of movement of the endoscopic cameras inside the patient’s body is limited, due to the single endoscopic port access; (3) the structures of interest, such as blood vessels or cancer areas, cannot be viewed from different points of view, compromising the accuracy and safety of the surgery; and (4) the surgical field visible to the surgeon is limited by the viewing angle of the endoscopic cameras. Such factors extend the learning curve for surgeons and increase operating times [28].

The recent development of advanced robotic systems for MIS and SILS has been motivated by drawbacks related to the maneuverability of the surgeon [28]. Robot-assisted devices can greatly help to restore the intuitiveness of operations in such procedures, but this is largely dependent on system’s surgical vision. For example, the da Vinci^® Surgical System [8] allows intuitive execution of surgery and hand–eye coordination through 3D vision and the configuration of its surgical console. Thus, a requirement in robotic MIS and SILS is to provide an appropriate surgical vision system optimally coupled to the control of the robotic arms. However, in many cases this may not be enough, especially in SILS operations: once the robot is inside and moving around the patient’s body, it is easy for the surgeon to get disoriented since the pose of the robot cannot be observed (or easily inferred) from the outside. This requires an extra learning phase even for experienced laparoscopy surgeons, and also poses a safety risk for the operation.

Computer-assisted technologies can enhance the view of the surgical field. Augmented reality (AR) or augmented virtuality (AV) systems can provide a more comfortable and efficient environment for the surgeon during surgery [15]. For example, in [16], preoperative information about the disease and a surgical plan are fused with the intra-operative visualization of the surgical field.

Nevertheless, aligning multimodal preoperative patient data is highly challenging in soft tissue surgery since the anatomy undergoes significant changes between the data acquisition (preoperative) phase and the surgical procedure (intra-operative) due to different factors [33]: (1) different pose of the patient with respect to the pose in which the preoperative image was taken; (2) CO2 abdominal insufflation for increasing the working volume (pneumoperitoneum); (3) instrument tissue interaction; (4) heart beat; (5) breathing; etc. Overcoming these challenges requires online estimation of the tissue deformations to correct the preoperative plan, which can be done, for example, through methods of depth estimation from stereo endoscopic images (3D reconstruction). The denser the reconstruction, the more accurate the deformation identification will be. The real-time 3D information from the surgical site can also help the surgeon to understand the relationship between the organs and the robotic devices. Moreover, for robotic SILS procedures the surgical site visualization issues can be overcome by introducing a virtual environment including a dynamic model of the robot within a surgical scene recreated from preoperative information. This virtual environment can be updated in real time by exploiting 3D reconstruction of the soft tissues [4, 22].

In this paper, we present two methods for dense 3D reconstruction of soft tissue from endoscopic images, exploiting a super-pixels approach for the disparity map refinement. The dense surface reconstruction algorithm facilitates the registration process between the preoperative model and the intra-operative site. The methods form the basis of the virtual assistive system, introduced in [20], for intra-operative guidance and improved surgical safety for abdominal SILS procedures (Fig. 1). The system provides: (1) surgical guidance by exploiting patient-specific preoperative 3D models obtained from CT scans to improve the operative vision and (2) improved surgical safety by providing the visualization of the pose of the robotic devices.

The paper is organized as follows: After a brief review on the state of the art in “State of the art” section, a description of the virtual assistive system and the contribution of this paper are presented in “Research overview” section. The complete algorithms for soft tissue surface reconstruction are described in “Materials and methods” section. “Evaluation” section presents the evaluation of the proposed methods, and the obtained results are presented and discussed in “Results” section and “Discussion and conclusion” section, respectively.

State of the art

The surface reconstruction of man-made environments using stereo images is a well-understood concept [14, 23]. However, 3D surface reconstruction of surgical endoscopic images is still an active area of research due to some challenging aspects that include: (1) the application in surgery requiring high accuracy and robustness in order to ensure patient safety; (2) difficulties arising due to large lens distortion, many texture-less areas, occlusions introduced by the surgical tools, specular highlights, smoke and blood produced during the intervention [25]; etc. Many approaches have been proposed in the literature in order to achieve more reliable, robust and real-time depth measurement to represent a deforming environment. Some approaches are based on feature detection and tracking [13], allowing the recovery of a sparse surface. In [26], a semi-dense reconstructed surface is obtained using feature detection and propagating the information into neighboring regions. The robustness of these methods depends on the availability of stable features. Other solutions are based on finding correspondences in the pixel intensities to give dense reconstructed surfaces. In [22], dense GPU-enhanced 3D reconstruction is obtained using the hybrid recursive matching algorithm performing a nonparametric transformation on the images. A nonparametric image transformation with model-based methods is also exploited in [24] for dense 3D reconstruction, with the assumption of a smooth and continuous surgical surface. [19] presents a comparative study of such state-of-the-art methods. Recently, color-based segmentation was used in stereo methods for depth estimation [11]. Such techniques first segment a reference image into regions and then label each region with disparity values, preserving the boundaries of the objects. For example, the segmentation is used in combination with belief propagation within a Markov random field framework in [32]. However, segmentation algorithms are computationally expensive and infrequently used on surgical images.

Research overview

The overall goal of our research is to enhance the visual information provided to the surgeon during robotic SILS procedures in the abdomen. Toward this end, a first prototype of a virtual assistive system was presented in [20]. An overview is shown in Fig. 1. In the preoperative phase, the surgeon extracts 3D surface models from a CT scan dataset of the patient. These models are inserted into a virtual environment. A stereo camera calibration and a real–virtual camera calibration are performed. During the surgery, a semiautomatic registration of the virtual abdomen on the real patient creates a connection between the reality and the virtual environment, which consists of:

1.
View of the entire abdomen model, where it is possible to visualize at run-time the motion of the robot and the surgical tools. The possibility to change the transparency of the model of the skin allows the surgeon to plan the entry point, adjusting the access region with respect to the target to be reached.
2.
A virtual camera image plane, which shows the structures from the same point of view as the real endoscopic camera. This can enable the visualization of the hidden structures using the transparencies function. A zoom feature allows the surgeon a wider field of view if desired.

The current design of the virtual assistive system from [20] is reliable only under the assumption that the patient remains in the same condition as he/she was during the CT scan phase. However, this is an ideal condition, and as noted earlier, the changes in anatomy from preoperative phase to intra-operative phase are unavoidable. To overcome this drawback, the 3D reconstruction of the tissues at run-time can serve as the keystone for updating the preoperative tissue models. Moreover, by knowing the pose of the tissue surface with respect to the robotic devices, it would be possible to define areas to be protected, e.g., main vessels or vital structures, improving the safety of the surgery.

Research contributions

Two techniques for 3D surface reconstruction of soft tissue are investigated in this paper:

1.
Method 1 follows the traditional approach of the block matching algorithm [23].
2.
Method 2 exploits a nonparametric transformation to make the stereo matching more robust to illumination variations.

The paper introduces a novel method to enhance the density of the reconstructed surface from these methods through disparity refinement based on simple linear iterative clustering (SLIC) super-pixels algorithm [1].

The algorithms introduced here are designed to enhance the visualization during robotic SILS procedures and are integrated into the virtual assistive system as highlighted in Fig. 1.

Materials and methods

The workflow for the soft tissue 3D reconstruction algorithm using the two methods is shown in Fig. 2, and it is described in detail in the following subsections.

Preprocessing

Surgical images acquired from an endoscope need preprocessing in order to facilitate the search for pixel correspondences. For greater accuracy in the 3D reconstruction, we calibrate the cameras using the OpenCV Library [17], obtaining the intrinsic and extrinsic parameters. The images are then undistorted and rectified, so that the corresponding pixels lie on the same horizontal line (epipolar line) [9]. Generally, the stereo cameras are on the tip of the inserted endoscope device and in proximity to a light source. The two cameras are exposed differently to the light source due to their different position with respect to it. This can produce over- or underexposure and differences in brightness/exposure between the images. Image processing removes such artifacts in the images and enhances the identification of the correspondences in texture-less and homogeneous areas. These steps are described in the following paragraphs.

Specularity removal Clinical images are usually affected by specular reflections due to the tissue characteristics and the proximity of the light source. The specular reflections appear as bright regions in the endoscopic images and can be mistaken for regular/irregular tissue. The algorithm described in [3] is implemented for removing specular highlights.

Image equalization Histogram equalization is performed to improve contrast and enhance details that are over- or underexposed. The method used is a function from the OpenCV Library (equalizedHist). The equalization is performed only in Method 1, since the census transform in Method 2 already makes it robust to illumination variation.

Surface reconstruction

The two cameras of the endoscope record the surgical field from two different points of view. The difference in position of an object between the left and right images is called disparity (d). Knowing the distance between the cameras (s) and the focal length of the cameras (f), the depth, i.e., the z coordinate of the reconstructed point, can be calculated with the following equation:

$$\begin{aligned} \hbox {depth}(i,j) = \frac{(f \cdot s) }{\hbox {disparity}(i,j)} \end{aligned}$$

(1)

where i and j are, respectively, rows and columns of the image.

Stereo correspondence is the process of identification of the matching between each pixel of the left and the right images ($im_{l}$ and $im_{r}$). The process can be divided in three steps, as stated in [23]: (1) matching cost computation; (2) aggregation cost computation; and (3) disparity computation. The two methods are described below:

Method 1

1.
Matching cost computation This method uses the abso lute intensity difference (AD) [10] operation to calculate the similarity between two pixels.
2.
Aggregation cost computation The aggregation of the matching cost is done by summing the AD of a window of size n x m, i.e., by computing the sum of absolute differences (SAD). Equation 2 shows the SAD value for a pixel (i, j):
$$\begin{aligned}&\hbox {SAD}(i,j,d)\nonumber \\&\quad =\sum _{\begin{array}{c} h=-m/2\\ h=-n/2 \end{array}}^{\begin{array}{c} m/2\\ n/2 \end{array}} \left| I_l(i+h, j+k) - I_r(i+h-\textit{d}, j+k) \right| \nonumber \\ \end{aligned}$$
(2)
where $I_l$ and $I_r$ is the intensity of each pixel of $im_{l}$ and $im_{r}$, respectively, and d is the disparity. The matching cost computation and aggregation cost of Method 1 are shown in Fig. 3.
3.
Disparity computation The computation of the final disparity involves choosing the disparity at each pixel that is associated with the minimum of the aggregation cost value. The Winner Takes All (WTA) strategy from [23] is used to find the minimum aggregation cost value at each pixel. Two optimizations are used to reduce the number of incorrect values (from an incorrect matching). With the first optimization, the disparity is defined to be invalid if two minimum aggregation cost values in the same image are within a threshold [18]. This operation helps to invalidate incorrect pixels on uniform surfaces. The second optimization, a left–right consistency (LRC) check, is performed in order to invalidate half-occluded pixels, i.e., objects viewed in one image but not in the other.

Method 2

1.
Matching cost computation The nonparametric census transform [2] is applied on $im_{l}$ and $im_{r}$, with a sparse modified approach from [12]. It converts each pixel inside a moving window into a string of bits C(i, j) (see Eq. 3), representing neighbor pixels according to a comparison with the central pixel and the mean value of the pixels inside the window (Eq. 4).
$$\begin{aligned} C(i,j)=\bigotimes _{h=-m/2} ^{m/2}\bigotimes _{k=-n/2} ^{n/2}\xi (I(i,j),I((i,j)+(h,k)))\nonumber \\ \end{aligned}$$
(3)
where $\bigotimes $ denotes the concatenation to a bit string. $\xi $ is the function for comparing the two intensities, defined as:
$$\begin{aligned} \xi (I, {I}^{\prime }) =\left\{ \begin{array}{ll} 00 &{}\quad {I}^{\prime } \le \hbox {min}(I, \bar{I})) \\ 01 &{}\quad {I}^{\prime } < I\\ 10 &{}\quad {I}^{\prime } > I\\ 11 &{}\quad {I}^{\prime }\ge \hbox {max}(I, \bar{I})) \end{array}\right. \end{aligned}$$
(4)
where ${I}^{\prime }$ is the moving pixel in the window compared with the central pixel I and $\bar{I}$ is the average of the pixel intensities inside the window.

The similarity between the two census-transformed images is analyzed using the Hamming distance (HD) measure applied to a moving window. HD compares bit strings representing the pixels and identifies the number of positions at which the corresponding bits are different, as shown in Eq. 5.
$$\begin{aligned} \hbox {HD}(i,j,d) =\sum _{b=0}^{m \cdot n}C_l(i,j)_b \oplus C_r(i-d,j)_b \end{aligned}$$
(5)
where $C_l$ and $C_r$ are the census transform of the left and right images, respectively, $\oplus $ is the XOR operator, and b represents the bit string.

The HD is computed only for a chessboard pattern of pixels inside the window in order to decrease the computational time of the algorithm.
2.
Aggregation cost computation The aggregation of the matching cost is done by summing the Hamming distance over the window of size $n \times m $, computing the sum of Hamming distances (SHD). The matching cost computation and aggregation cost of Method 2 are shown in Fig. 4.
3.
Disparity computation This step uses the same strategy as that for the Method 1.

To avoid separated layers in the reconstructed surface resulting from pixel-level precision, a sub-pixel refinement is applied using a parabola fitting [29] in both methods. A speckle removal filter is also applied in order to remove small artifacts, i.e., regions of large and small disparities that can be generated near the boundaries of shapes. This method is adopted from the OpenCV Library [17] function (filterSpeckles).

Disparity refinement

The disparity map, computed so far from the procedures described above, can contain invalid values (holes), which can compromise the usefulness of the reconstructed surface. As mentioned in “Research overview” section, the 3D reconstruction of the surgical field serves to update the pose of the preoperative models considering the real organ deformations and also to update the pose of safety areas to be protected from the surgical tools during surgery. Problems can arise if the motion and the deformation of the soft tissue are not identified, i.e., if the reconstructed surgical field is not dense and accurate.

To address these issues, a method to exploit the over-segmentation of the reference image $im_l$ is introduced here, which further refines the results and fills the holes in the disparity map. $im_l$ is segmented using the SLIC super-pixels algorithm [1], which adapts the k-means clustering approach to generate super-pixels efficiently and in less computational time. At each super-pixel, a label L is assigned. Then, $im_l(L)$ is set as a group of pixels from the left image with the same label, i.e., belonging to the same super-pixel. The same labels are assigned to the disparity map, $\hbox {disparity}(L)$. Each super-pixel is ideally a homogeneous area with similar or at least continuous depth. This information can be exploited in order to fill the holes in the disparity map, avoiding large discontinuities within a super-pixel, as shown in Fig. 5. We apply two different strategies depending on the valid values inside a label:

1.
Plane fitting The invalid disparity values are fitted to a plane which is computed on the valid values of the disparity map. The locally optimized RANSAC (LO-RANSAC) method [5] is used to find the optimal plane parameters in the pixel space, using the model of the plane described in Eq. 6.
$$\begin{aligned} \hbox {disparity}(i,j) = a \cdot i + b \cdot j + c \end{aligned}$$
(6)
where a, b and c are the coefficients of the fitted plane.
2.
Median fitting If the plane fitted on the valid values is not reliable (according to the LO-RANSAC results), the invalid disparity values are replaced by the median of the valid disparity values.

Evaluation

Quantitative evaluation

This section presents the evaluation of the methods for the 3D surface reconstruction. The performances of Method 1 and Method 2 are evaluated against the metrics of accuracy, computational time and percentage of matched pixels.

The Hamlyn Centre laparoscopic/endoscopic video dataset [7], with an associated ground truth representing a surgical scene, is used for the evaluation.

Stereo images of a silicon heart phantom were used in order to validate the results and ensure the proposed methods are robust and reliable when applied to a surgical scenario. The dataset consisted of intrinsic, extrinsic and distortion parameters for the cameras and a ground truth from a CT scan. We used 20 frames from two videos, heart1 and heart2. These frames are named $h1_{i}$ and $h2_{i}$, respectively, where i indicates the frame number. The ground-truth sets are called $g1_{i}$ and $g2_{i}$. The point clouds computed using the two methods are $pc1_{i}$ and $pc2_{i}$. The use of this dataset for the evaluation of the proposed system was preferred over other standard datasets, such as the Middlebury evaluation [22], because these provide only static images and do not show surgical scenarios, which present specific issues related to endoscopic imaging.

The error evaluation consists of computing the Euclidean distance between the corresponding points from the extracted point cloud and the rectified ground truth, as in Eq. 7. The final accuracy of each point cloud is computed as the median value of these Euclidean distances. Method 1 and Method 2 are evaluated on the two datasets heart1 and heart2 with and without the application of the disparity refinement exploiting SLIC segmentation (hereinafter called SLIC refinement).

$$\begin{aligned}&\hbox {error}(i,j) \nonumber \\&\quad =\sqrt{(pc_{x_{i,j}}-g_{x_{i,j}})^2+(pc_{y_{i,j}}-g_{y_{i,j}})^2 +(pc_{z_{i,j}}-g_{z_{i,j}})^2}\nonumber \\ \end{aligned}$$

(7)

where $pc_{x_{i,j}}$, $pc_{y_{i,j}}$ and $pc_{z_{i,j}}$ are the 3D coordinates of a point from the computed point cloud, and $g_{x_{i,j}}$, $g_{y_{i,j}}$ and $g_{z_{i,j}}$ are the coordinates from the ground truth. Since the point cloud and the ground truth are represented as 2D maps, we can simply calculate this error for each point considered as a pixel of the image.

A nonparametric test (Wilcoxon $p< 0.05$) was performed on the point clouds to test whether the accuracy before and after SLIC refinement was significantly different, for both Method 1 and Method 2. The same test was also performed to verify whether the accuracy obtained with Method 1 and Method 2 was significantly different.

The percentage of matched pixels is computed as the ratio of the valid reconstructed points identified in the region of interest with respect to the total number of points of the image. This region takes into consideration only the area of the image with a pixel intensity ${>}32$, eliminating the areas where the heart phantom is not visible. An example of the region of interest used for the evaluation is shown in Fig. 6.

The computational time is the time of execution of the algorithms. The code was developed in C$++$, using OpenCV for the management of the images with robot operating system (ROS) as framework [21]. The program was running on a system with GNU/Linux operating system, and a CPU Intel Core i7-4820K with four cores and hyper-threading (eight virtual cores).

Table 1 Parameters used in the evaluation

Full size table

The parameters and labels used during the evaluation are listed in Table 1. The window dimension for the census transform and the aggregation cost of Method 2 are taken from [12], where the authors identify the optimal parameters taking into consideration accuracy and computational time.

Qualitative evaluation

The target application for this research is robotic SILS procedure for the abdomen. Unfortunately, video datasets with ground-truth information are not available for this anatomical region. Therefore, qualitative evaluations of the proposed methods were performed using the abdominal dataset from the Hamlyn Center. This dataset approximates the envisioned application area of our system well, including motions of structures/organs and the presence of surgical instruments in the field of view.

Results

Figure 7 shows a sample image from heart1 processed with both methods, following the workflow of Fig. 2. The figure demonstrates the presence of the invalid values in the disparity map as well as the almost complete correction of the map with the application of SLIC refinement. The effect is clearly seen considering the percentage of matched pixels in Fig. 8. The application of the SLIC refinement increases the percentage of valid values to more than 70 % in this case.

A box plot representing the accuracy is shown in Fig. 9. These results show an accuracy of 3.17 and 3.27 mm using M1 and M1R and an accuracy of 1.68 and 1.77 mm for M2 and M2R, respectively. We can also see how the variance of these results is similar, indicating good consistency between the presented methods.

Figure 10 shows a colormap representing the accuracy of the reconstructed surface computed with the methods. Here again, the result of applying the SLIC refinement is demonstrated and how the filled holes present a low error with respect to the ground truth.

For the impact of the SLIC refinement on the accuracy, the nonparametric test did not reject the null hypothesis for both methods. This implies that the SLIC refinement did not have a statistically significant impact on the accuracies of M1 and M2. The same test rejected the null hypothesis in the comparison between the values obtained with M1 and M2. This implies that M2 is statistically better and provides significantly improved accuracy over M1.

We can see the results of the evaluation in Table 2, which shows the median values for the accuracy, percentage of matched pixels, and computational time. Both methods have a similar computational time, around 1.21 s per frame (with an image resolution of 288 $\times $ 360), and the SLIC refinement adds an overhead of only 0.1 s in both cases.

The results from the qualitative evaluation are shown in Fig. 11. We can see that applying the methods on an image representing a more complex surgical scenario, with different abdominal structures and instruments visible in the field of view, the methods are still able to reconstruct the surface, and SLIC refinement seems to fill correctly the invalid values of the disparity map.

Table 2 Median values of the accuracy, percentage of matched pixels, and computational time for both methods and datasets, with and without SLIC refinement

Full size table

Discussion and conclusion

The research presented in this paper aims at improving a dense 3D surface reconstruction of surgical environments, specifically to enable enhanced visualization during robotic SILS procedures of the abdomen. The 3D surface information from the surgical scene can be exploited to update the pose of preoperative models extracted from a CT scan. These models, inserted into a virtual assistive system, can provide guidance information to the surgeon, helping to improve the quality and safety of procedures through augmented visualization. In addition, information about the pose of the surgical robot with respect to tissue surfaces can assist in defining and enforcing safety constraints intra-operatively.

Two methods for 3D surface reconstruction were presented based on a traditional approach (M1) and a nonparametric transformation approach (M2). The comparison between the two methods shows that M2 has a significantly higher accuracy respect to M1 (around 1.7 mm). According to our team of surgeons, for the proposed application, the overall system accuracy should be below 10 mm. This is motivated by the fact that abdominal surgery cannot be considered as a precision surgery and it is characterized by the presence of motions and deformations of the organs. For this reason, the obtained accuracy with both methods can be considered as a good result since it should allow to comply with the required accuracy.

Regarding the valid reconstructed points, M1 provides a higher percentage of matching pixels. The methods were also complemented to enhance the density of the reconstruction using the SLIC disparity refinement. With the refinement, the percentage of matching pixels increases in both methods, reaching the 72.6 % in the best case. The refinement provided an improvement in the matching percentage up to 23 % in M1 and 43 % in M2, without any statistically significant impact on the accuracy.

A comparison of our error results with those from [19], which tests different state-of-the-art algorithm using the same video dataset, demonstrated that our errors are:

lower as compared to stereo block matching (SBM) and variational (Svar) algorithms.
similar magnitude as compared to stereo semi-global matching (SSGBM).

The strength of the methods presented in this paper is the usage of the SLIC super-pixel algorithm to obtain high-density valid disparity map, which can be exploited for augmented reality applications.

In comparison with [22], the accuracy achieved is slightly worse (mean error 2.47 mm against 1.55 mm), while the percentage of pixel reconstructed obtained exploiting the SLIC refinement is higher (72.6 % against 66.2 %).

In the extension of this research, future work will be focused on the development of a real-time implementation of the proposed algorithm, potentially based on a hybrid CPU–GPU processing framework. The expected speed-up is between 10 and 30 times for the current resolution of the images, which would allow reaching the target framerate of 25 fps. For higher-resolution images, in order to maintain a real-time execution of the algorithm, a subsampling could be applied. The motion tracking of the organs will also be investigated in combination with the soft tissue surface reconstruction. Finally, the components will be integrated into the virtual assistive system.

References

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Article PubMed Google Scholar
Banks J, Bennamoun M, Corke P (1997) Non-parametric techniques for fast and robust stereo matching. In: TENCON’97. IEEE region 10 annual conference. Speech and image technologies for computing and telecommunications. Proceedings of IEEE, IEEE, vol 1, pp 365–368
Barbalata C, Mattos L (2014) Laryngeal tumor detection and classification in endoscopic video. IEEE J Biomed Health Inform. doi:10.1109/JBHI.2014.2374975
Bergen T, Wittenberg T (2014) Stitching and surface reconstruction from endoscopic image sequences: a review of applications and methods. IEEE J Biomed Health Inform. doi:10.1109/JBHI.2014.2384134
Chum O, Matas J, Kittler J (2003) Locally optimized ransac. In: Pattern recognition, vol 2781. Springer, Heidelberg, pp 236–243. doi:10.1007/978-3-540-45243-0_31
Ee WWG, Lau WLJ, Yeo W, Von Bing Y, Yue WM (2014) Does minimally invasive surgery have a lower risk of surgical site infections compared with open spinal surgery? Clin Orthop Relat Res 472(6):1718–1724
PubMed Central PubMed Google Scholar
Giannarou S, Visentini-Scarzanella M, Yang GZ (2013) Probabilistic tracking of affine-invariant anisotropic regions. IEEE Trans Pattern Anal Mach Intell 35(1):130–143. doi:10.1109/TPAMI.2012.81
Article PubMed Google Scholar
Guthart G, Salisbury Jr JK (2000) The intuitive TM telesurgery system: overview and application. In: ICRA, pp 618–621
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge
Google Scholar
Kanade T, Okutomi M (1994) A stereo matching algorithm with an adaptive window: theory and experiment. IEEE Trans Pattern Anal Mach Intell 16(9):920–932
Article Google Scholar
Klaus A, Sormann M, Karner K (2006) Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International conference on pattern recognition, 2006. ICPR 2006, IEEE, vol 3, pp 15–18
Ma L, Li J, Ma J, Zhang H (2013) A modified census transform based on the neighborhood information for stereo matching algorithm. In: 2013 Seventh international conference on image and graphics (ICIG), IEEE, pp 533–538
Mountney P, Stoyanov D, Yang GZ (2010) Three-dimensional tissue deformation recovery and tracking. IEEE Signal Process Mag 27(4):14–24
Article Google Scholar
Musialski P, Wonka P, Aliaga DG, Wimmer M, Gool L, Purgathofer W (2013) A survey of urban reconstruction. In: Computer graphics forum, Wiley Online Library, vol 32, pp 146–177
Nicolau S, Goffin L, Soler L (2005) A low cost and accurate guidance system for laparoscopic surgery: validation on an abdominal phantom. In: Proceedings of the ACM symposium on virtual reality software and technology, ACM, pp 124–133
Nicolau S, Soler L, Mutter D, Marescaux J (2011) Augmented reality in laparoscopic surgical oncology. Surg Oncol 20(3):189–201
Article PubMed Google Scholar
OpenCV (2015). http://opencv.org/
Ortiz J, Calderon H, Fontaine J (2010) Disparity map computation on scalable computing. In: 2010 IEEE/ASME international conference on mechatronics and embedded systems and applications (MESA), IEEE, pp 301–306
Parchami M, Cadeddu JA, Mariottini GL (2014) Endoscopic stereo reconstruction: a comparative study. In: 2014 36th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, pp 2440–2443
Penza V, Ortiz J, De Momi E, Forgione A, Mattos L (2014) Virtual assistive system for robotic single incision laparoscopic surgery. In: 4th Joint workshop on computer/robot assisted surgery, pp 52–55
Quigley M, Conley K, Gerkey BP, Faust J, Foote T, Leibs J, Wheeler R, Ng AY (2009) Ros: an open-source robot operating system. In: ICRA workshop on open source software
Röhl S, Bodenstedt S, Suwelack S, Kenngott H, Müller-Stich BP, Dillmann R, Speidel S (2012) Dense gpu-enhanced surface reconstruction from stereo endoscopic images for intraoperative registration. Med Phys 39(3):1632–1645
Article PubMed Google Scholar
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1–3):7–42
Article Google Scholar
Schoob A, Kundrat D, Kleingrothe L, Kahrs LA, Andreff N, Ortmaier T (2014) Tissue surface information for intraoperative incision planning and focus adjustment in laser surgery. Int J Comput Assist Radiol Surg 10(2):171–181
Stoyanov D (2012) Surgical vision. Ann Biomed Eng 40(2):332–345
Article PubMed Google Scholar
Stoyanov D, Scarzanella MV, Pratt P, Yang GZ (2010) Real-time stereo reconstruction in robotically assisted minimally invasive surgery. In: Medical image computing and computer-assisted intervention–MICCAI 2010. Springer, pp 275–282
Tan HJ, Wolf JS, Ye Z, Hafez KS, Miller DC (2014) Population level assessment of hospital based outcomes following laparoscopic versus open partial nephrectomy during the adoption of minimally invasive surgery. J Urol 191(5):1231–1237
Article PubMed Google Scholar
Taylor G, Barrie J, Hood A, Culmer P, Neville A, Jayne D (2013) Surgical innovations: addressing the technology gaps in minimally invasive surgery. Trends Anaesth Crit Care 3(2):56–61
Article Google Scholar
Tian Q, Huhns MN (1986) Algorithms for subpixel registration. Comput Vis Graph Image Process 35(2):220–233
Article Google Scholar
Varela JE, Wilson SE, Nguyen NT (2010) Laparoscopic surgery significantly reduces surgical-site infections compared with open surgery. Surg Endosc 24(2):270–276
Article PubMed Google Scholar
Wichmann MW, Hüttl TP, Winter H, Spelsberg F, Angele MK, Heiss MM, Jauch KW (2005) Immunological effects of laparoscopic vs open colorectal surgery: a prospective clinical study. Arch Surg 140(7):692–697
Article PubMed Google Scholar
Zitnick CL, Kang SB (2007) Stereo for image-based rendering using image over-segmentation. Int J Comput Vis 75(1):49–65
Article Google Scholar
Zitova B, Flusser J (2003) Image registration methods: a survey. Image Vis Comput 21(11):977–1000
Article Google Scholar

Download references

Acknowledgments

Authors would like to thank Nikhil Deshpande for his kind suggestions and support during the elaboration of this paper.

Author information

Authors and Affiliations

Department of Advanced Robotics, Istituto Italiano di Tecnologia, via Morego, 30, 16163, Genoa, Italy
Veronica Penza, Jesús Ortiz & Leonardo S. Mattos
Ospedale Niguarda Ca’ Granda, P.zza Dell’Ospedale Maggiore, 3, 20162, Milan, Italy
Antonello Forgione
AIMS Academy, P.zza Dell’Ospedale Maggiore, 3, 20162, Milan, Italy
Antonello Forgione
Valuebiotech s.r.l., P.zza Dell’Ospedale Maggiore, 3, 20162, Milan, Italy
Antonello Forgione
Department of Electronics, Information and Bioengineering, Politecnico di Milano, P.zza L. Da Vinci, 32, 20133, Milan, Italy
Veronica Penza & Elena De Momi

Authors

Veronica Penza
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo S. Mattos
View author publications
You can also search for this author in PubMed Google Scholar
Antonello Forgione
View author publications
You can also search for this author in PubMed Google Scholar
Elena De Momi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veronica Penza.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Penza, V., Ortiz, J., Mattos, L.S. et al. Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery. Int J CARS 11, 197–206 (2016). https://doi.org/10.1007/s11548-015-1276-0

Download citation

Received: 12 January 2015
Accepted: 27 July 2015
Published: 26 September 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11548-015-1276-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Dense soft tissue 3D reconstruction refined with super-pixel segmentation for robotic abdominal surgery