Introduction

In open surgery of the abdomen, the trauma of laparotomy wounds is a source of infection or dehiscence and may increase the likelihood of postoperative chest infection, ileus and immobility [30]. The evolution of new minimally invasive approaches, such as minimally invasive surgery (MIS) and single-incision laparoscopic surgery (SILS), allows to perform the surgery by bringing the surgical instruments and endoscope to the surgical site through few or only one small incision, improving the physiological and immuno-responses with respect to open surgery [6], and reducing the trauma, without compromising the surgical quality [27, 31].

However, MIS and SILS introduce some limitations, including: (1) the insertion of the instruments through few access ports and their required length reduces the surgeon’s maneuverability at the surgical site; (2) the freedom of movement of the endoscopic cameras inside the patient’s body is limited, due to the single endoscopic port access; (3) the structures of interest, such as blood vessels or cancer areas, cannot be viewed from different points of view, compromising the accuracy and safety of the surgery; and (4) the surgical field visible to the surgeon is limited by the viewing angle of the endoscopic cameras. Such factors extend the learning curve for surgeons and increase operating times [28].

The recent development of advanced robotic systems for MIS and SILS has been motivated by drawbacks related to the maneuverability of the surgeon [28]. Robot-assisted devices can greatly help to restore the intuitiveness of operations in such procedures, but this is largely dependent on system’s surgical vision. For example, the da Vinci® Surgical System [8] allows intuitive execution of surgery and hand–eye coordination through 3D vision and the configuration of its surgical console. Thus, a requirement in robotic MIS and SILS is to provide an appropriate surgical vision system optimally coupled to the control of the robotic arms. However, in many cases this may not be enough, especially in SILS operations: once the robot is inside and moving around the patient’s body, it is easy for the surgeon to get disoriented since the pose of the robot cannot be observed (or easily inferred) from the outside. This requires an extra learning phase even for experienced laparoscopy surgeons, and also poses a safety risk for the operation.

Computer-assisted technologies can enhance the view of the surgical field. Augmented reality (AR) or augmented virtuality (AV) systems can provide a more comfortable and efficient environment for the surgeon during surgery [15]. For example, in [16], preoperative information about the disease and a surgical plan are fused with the intra-operative visualization of the surgical field.

Nevertheless, aligning multimodal preoperative patient data is highly challenging in soft tissue surgery since the anatomy undergoes significant changes between the data acquisition (preoperative) phase and the surgical procedure (intra-operative) due to different factors [33]: (1) different pose of the patient with respect to the pose in which the preoperative image was taken; (2) CO2 abdominal insufflation for increasing the working volume (pneumoperitoneum); (3) instrument tissue interaction; (4) heart beat; (5) breathing; etc. Overcoming these challenges requires online estimation of the tissue deformations to correct the preoperative plan, which can be done, for example, through methods of depth estimation from stereo endoscopic images (3D reconstruction). The denser the reconstruction, the more accurate the deformation identification will be. The real-time 3D information from the surgical site can also help the surgeon to understand the relationship between the organs and the robotic devices. Moreover, for robotic SILS procedures the surgical site visualization issues can be overcome by introducing a virtual environment including a dynamic model of the robot within a surgical scene recreated from preoperative information. This virtual environment can be updated in real time by exploiting 3D reconstruction of the soft tissues [4, 22].

In this paper, we present two methods for dense 3D reconstruction of soft tissue from endoscopic images, exploiting a super-pixels approach for the disparity map refinement. The dense surface reconstruction algorithm facilitates the registration process between the preoperative model and the intra-operative site. The methods form the basis of the virtual assistive system, introduced in [20], for intra-operative guidance and improved surgical safety for abdominal SILS procedures (Fig. 1). The system provides: (1) surgical guidance by exploiting patient-specific preoperative 3D models obtained from CT scans to improve the operative vision and (2) improved surgical safety by providing the visualization of the pose of the robotic devices.

The paper is organized as follows: After a brief review on the state of the art in “State of the art” section, a description of the virtual assistive system and the contribution of this paper are presented in “Research overview” section. The complete algorithms for soft tissue surface reconstruction are described in “Materials and methods” section. “Evaluation” section presents the evaluation of the proposed methods, and the obtained results are presented and discussed in “Results” section and “Discussion and conclusion” section, respectively.

Fig. 1
figure 1

Virtual Assistive System. The left side shows the preoperative phase with the processing of the CT scan images and the cameras calibration. The right side shows the intra-operative phase with the processing of the surgical images and the integration between the real and virtual images. The endoscope shown is from the da Vinci® system

State of the art

The surface reconstruction of man-made environments using stereo images is a well-understood concept [14, 23]. However, 3D surface reconstruction of surgical endoscopic images is still an active area of research due to some challenging aspects that include: (1) the application in surgery requiring high accuracy and robustness in order to ensure patient safety; (2) difficulties arising due to large lens distortion, many texture-less areas, occlusions introduced by the surgical tools, specular highlights, smoke and blood produced during the intervention [25]; etc. Many approaches have been proposed in the literature in order to achieve more reliable, robust and real-time depth measurement to represent a deforming environment. Some approaches are based on feature detection and tracking [13], allowing the recovery of a sparse surface. In [26], a semi-dense reconstructed surface is obtained using feature detection and propagating the information into neighboring regions. The robustness of these methods depends on the availability of stable features. Other solutions are based on finding correspondences in the pixel intensities to give dense reconstructed surfaces. In [22], dense GPU-enhanced 3D reconstruction is obtained using the hybrid recursive matching algorithm performing a nonparametric transformation on the images. A nonparametric image transformation with model-based methods is also exploited in [24] for dense 3D reconstruction, with the assumption of a smooth and continuous surgical surface. [19] presents a comparative study of such state-of-the-art methods. Recently, color-based segmentation was used in stereo methods for depth estimation [11]. Such techniques first segment a reference image into regions and then label each region with disparity values, preserving the boundaries of the objects. For example, the segmentation is used in combination with belief propagation within a Markov random field framework in [32]. However, segmentation algorithms are computationally expensive and infrequently used on surgical images.

Research overview

The overall goal of our research is to enhance the visual information provided to the surgeon during robotic SILS procedures in the abdomen. Toward this end, a first prototype of a virtual assistive system was presented in [20]. An overview is shown in Fig. 1. In the preoperative phase, the surgeon extracts 3D surface models from a CT scan dataset of the patient. These models are inserted into a virtual environment. A stereo camera calibration and a real–virtual camera calibration are performed. During the surgery, a semiautomatic registration of the virtual abdomen on the real patient creates a connection between the reality and the virtual environment, which consists of:

  1. 1.

    View of the entire abdomen model, where it is possible to visualize at run-time the motion of the robot and the surgical tools. The possibility to change the transparency of the model of the skin allows the surgeon to plan the entry point, adjusting the access region with respect to the target to be reached.

  2. 2.

    A virtual camera image plane, which shows the structures from the same point of view as the real endoscopic camera. This can enable the visualization of the hidden structures using the transparencies function. A zoom feature allows the surgeon a wider field of view if desired.

The current design of the virtual assistive system from [20] is reliable only under the assumption that the patient remains in the same condition as he/she was during the CT scan phase. However, this is an ideal condition, and as noted earlier, the changes in anatomy from preoperative phase to intra-operative phase are unavoidable. To overcome this drawback, the 3D reconstruction of the tissues at run-time can serve as the keystone for updating the preoperative tissue models. Moreover, by knowing the pose of the tissue surface with respect to the robotic devices, it would be possible to define areas to be protected, e.g., main vessels or vital structures, improving the safety of the surgery.

Research contributions

Two techniques for 3D surface reconstruction of soft tissue are investigated in this paper:

  1. 1.

    Method 1 follows the traditional approach of the block matching algorithm [23].

  2. 2.

    Method 2 exploits a nonparametric transformation to make the stereo matching more robust to illumination variations.

The paper introduces a novel method to enhance the density of the reconstructed surface from these methods through disparity refinement based on simple linear iterative clustering (SLIC) super-pixels algorithm [1].

The algorithms introduced here are designed to enhance the visualization during robotic SILS procedures and are integrated into the virtual assistive system as highlighted in Fig. 1.

Materials and methods

The workflow for the soft tissue 3D reconstruction algorithm using the two methods is shown in Fig. 2, and it is described in detail in the following subsections.

Fig. 2
figure 2

Workflow of the proposed surface extraction algorithm. Two methods are proposed: Method 1 one based on SAD and Method 2 based on census transform. Both methods are improved using the disparity refinement with super-pixel segmentation

Preprocessing

Surgical images acquired from an endoscope need preprocessing in order to facilitate the search for pixel correspondences. For greater accuracy in the 3D reconstruction, we calibrate the cameras using the OpenCV Library [17], obtaining the intrinsic and extrinsic parameters. The images are then undistorted and rectified, so that the corresponding pixels lie on the same horizontal line (epipolar line) [9]. Generally, the stereo cameras are on the tip of the inserted endoscope device and in proximity to a light source. The two cameras are exposed differently to the light source due to their different position with respect to it. This can produce over- or underexposure and differences in brightness/exposure between the images. Image processing removes such artifacts in the images and enhances the identification of the correspondences in texture-less and homogeneous areas. These steps are described in the following paragraphs.

Specularity removal Clinical images are usually affected by specular reflections due to the tissue characteristics and the proximity of the light source. The specular reflections appear as bright regions in the endoscopic images and can be mistaken for regular/irregular tissue. The algorithm described in [3] is implemented for removing specular highlights.

Image equalization Histogram equalization is performed to improve contrast and enhance details that are over- or underexposed. The method used is a function from the OpenCV Library (equalizedHist). The equalization is performed only in Method 1, since the census transform in Method 2 already makes it robust to illumination variation.

Surface reconstruction

The two cameras of the endoscope record the surgical field from two different points of view. The difference in position of an object between the left and right images is called disparity (d). Knowing the distance between the cameras (s) and the focal length of the cameras (f), the depth, i.e., the z coordinate of the reconstructed point, can be calculated with the following equation:

$$\begin{aligned} \hbox {depth}(i,j) = \frac{(f \cdot s) }{\hbox {disparity}(i,j)} \end{aligned}$$
(1)

where i and j are, respectively, rows and columns of the image.

Stereo correspondence is the process of identification of the matching between each pixel of the left and the right images (\(im_{l}\) and \(im_{r}\)). The process can be divided in three steps, as stated in [23]: (1) matching cost computation; (2) aggregation cost computation; and (3) disparity computation. The two methods are described below:

Method 1

  1. 1.

    Matching cost computation This method uses the abso lute intensity difference (AD) [10] operation to calculate the similarity between two pixels.

  2. 2.

    Aggregation cost computation The aggregation of the matching cost is done by summing the AD of a window of size n x m, i.e., by computing the sum of absolute differences (SAD). Equation 2 shows the SAD value for a pixel (ij):

    $$\begin{aligned}&\hbox {SAD}(i,j,d)\nonumber \\&\quad =\sum _{\begin{array}{c} h=-m/2\\ h=-n/2 \end{array}}^{\begin{array}{c} m/2\\ n/2 \end{array}} \left| I_l(i+h, j+k) - I_r(i+h-\textit{d}, j+k) \right| \nonumber \\ \end{aligned}$$
    (2)

    where \(I_l\) and \(I_r\) is the intensity of each pixel of \(im_{l}\) and \(im_{r}\), respectively, and d is the disparity. The matching cost computation and aggregation cost of Method 1 are shown in Fig. 3.

  3. 3.

    Disparity computation The computation of the final disparity involves choosing the disparity at each pixel that is associated with the minimum of the aggregation cost value. The Winner Takes All (WTA) strategy from [23] is used to find the minimum aggregation cost value at each pixel. Two optimizations are used to reduce the number of incorrect values (from an incorrect matching). With the first optimization, the disparity is defined to be invalid if two minimum aggregation cost values in the same image are within a threshold [18]. This operation helps to invalidate incorrect pixels on uniform surfaces. The second optimization, a left–right consistency (LRC) check, is performed in order to invalidate half-occluded pixels, i.e., objects viewed in one image but not in the other.

Fig. 3
figure 3

Disparity computation algorithm for Method 1. It uses the intensity of the pixels and the sum of absolute differences (SAD) operation for the aggregation

Method 2

  1. 1.

    Matching cost computation The nonparametric census transform [2] is applied on \(im_{l}\) and \(im_{r}\), with a sparse modified approach from [12]. It converts each pixel inside a moving window into a string of bits C(ij) (see Eq. 3), representing neighbor pixels according to a comparison with the central pixel and the mean value of the pixels inside the window (Eq. 4).

    $$\begin{aligned} C(i,j)=\bigotimes _{h=-m/2} ^{m/2}\bigotimes _{k=-n/2} ^{n/2}\xi (I(i,j),I((i,j)+(h,k)))\nonumber \\ \end{aligned}$$
    (3)

    where \(\bigotimes \) denotes the concatenation to a bit string. \(\xi \) is the function for comparing the two intensities, defined as:

    $$\begin{aligned} \xi (I, {I}^{\prime }) =\left\{ \begin{array}{ll} 00 &{}\quad {I}^{\prime } \le \hbox {min}(I, \bar{I})) \\ 01 &{}\quad {I}^{\prime } < I\\ 10 &{}\quad {I}^{\prime } > I\\ 11 &{}\quad {I}^{\prime }\ge \hbox {max}(I, \bar{I})) \end{array}\right. \end{aligned}$$
    (4)

    where \({I}^{\prime }\) is the moving pixel in the window compared with the central pixel I and \(\bar{I}\) is the average of the pixel intensities inside the window.

    The similarity between the two census-transformed images is analyzed using the Hamming distance (HD) measure applied to a moving window. HD compares bit strings representing the pixels and identifies the number of positions at which the corresponding bits are different, as shown in Eq. 5.

    $$\begin{aligned} \hbox {HD}(i,j,d) =\sum _{b=0}^{m \cdot n}C_l(i,j)_b \oplus C_r(i-d,j)_b \end{aligned}$$
    (5)

    where \(C_l\) and \(C_r\) are the census transform of the left and right images, respectively, \(\oplus \) is the XOR operator, and b represents the bit string.

    The HD is computed only for a chessboard pattern of pixels inside the window in order to decrease the computational time of the algorithm.

  2. 2.

    Aggregation cost computation The aggregation of the matching cost is done by summing the Hamming distance over the window of size \(n \times m \), computing the sum of Hamming distances (SHD). The matching cost computation and aggregation cost of Method 2 are shown in Fig. 4.

  3. 3.

    Disparity computation This step uses the same strategy as that for the Method 1.

Fig. 4
figure 4

Disparity computation algorithm for Method 2. It uses sparse modified census transform and Hamming distance as cost and aggregation function. The disparity value is the one associated with the minimum of the aggregation cost value

To avoid separated layers in the reconstructed surface resulting from pixel-level precision, a sub-pixel refinement is applied using a parabola fitting [29] in both methods. A speckle removal filter is also applied in order to remove small artifacts, i.e., regions of large and small disparities that can be generated near the boundaries of shapes. This method is adopted from the OpenCV Library [17] function (filterSpeckles).

Disparity refinement

The disparity map, computed so far from the procedures described above, can contain invalid values (holes), which can compromise the usefulness of the reconstructed surface. As mentioned in “Research overview” section, the 3D reconstruction of the surgical field serves to update the pose of the preoperative models considering the real organ deformations and also to update the pose of safety areas to be protected from the surgical tools during surgery. Problems can arise if the motion and the deformation of the soft tissue are not identified, i.e., if the reconstructed surgical field is not dense and accurate.

Fig. 5
figure 5

The super-pixel areas (on the left) can be used to fill gaps in the disparity image (on the right)

To address these issues, a method to exploit the over-segmentation of the reference image \(im_l\) is introduced here, which further refines the results and fills the holes in the disparity map. \(im_l\) is segmented using the SLIC super-pixels algorithm [1], which adapts the k-means clustering approach to generate super-pixels efficiently and in less computational time. At each super-pixel, a label L is assigned. Then, \(im_l(L)\) is set as a group of pixels from the left image with the same label, i.e., belonging to the same super-pixel. The same labels are assigned to the disparity map, \(\hbox {disparity}(L)\). Each super-pixel is ideally a homogeneous area with similar or at least continuous depth. This information can be exploited in order to fill the holes in the disparity map, avoiding large discontinuities within a super-pixel, as shown in Fig. 5. We apply two different strategies depending on the valid values inside a label:

  1. 1.

    Plane fitting The invalid disparity values are fitted to a plane which is computed on the valid values of the disparity map. The locally optimized RANSAC (LO-RANSAC) method [5] is used to find the optimal plane parameters in the pixel space, using the model of the plane described in Eq. 6.

    $$\begin{aligned} \hbox {disparity}(i,j) = a \cdot i + b \cdot j + c \end{aligned}$$
    (6)

    where a, b and c are the coefficients of the fitted plane.

  2. 2.

    Median fitting If the plane fitted on the valid values is not reliable (according to the LO-RANSAC results), the invalid disparity values are replaced by the median of the valid disparity values.

Evaluation

Quantitative evaluation

This section presents the evaluation of the methods for the 3D surface reconstruction. The performances of Method 1 and Method 2 are evaluated against the metrics of accuracy, computational time and percentage of matched pixels.

The Hamlyn Centre laparoscopic/endoscopic video dataset [7], with an associated ground truth representing a surgical scene, is used for the evaluation.

Stereo images of a silicon heart phantom were used in order to validate the results and ensure the proposed methods are robust and reliable when applied to a surgical scenario. The dataset consisted of intrinsic, extrinsic and distortion parameters for the cameras and a ground truth from a CT scan. We used 20 frames from two videos, heart1 and heart2. These frames are named \(h1_{i}\) and \(h2_{i}\), respectively, where i indicates the frame number. The ground-truth sets are called \(g1_{i}\) and \(g2_{i}\). The point clouds computed using the two methods are \(pc1_{i}\) and \(pc2_{i}\). The use of this dataset for the evaluation of the proposed system was preferred over other standard datasets, such as the Middlebury evaluation [22], because these provide only static images and do not show surgical scenarios, which present specific issues related to endoscopic imaging.

The error evaluation consists of computing the Euclidean distance between the corresponding points from the extracted point cloud and the rectified ground truth, as in Eq. 7. The final accuracy of each point cloud is computed as the median value of these Euclidean distances. Method 1 and Method 2 are evaluated on the two datasets heart1 and heart2 with and without the application of the disparity refinement exploiting SLIC segmentation (hereinafter called SLIC refinement).

$$\begin{aligned}&\hbox {error}(i,j) \nonumber \\&\quad =\sqrt{(pc_{x_{i,j}}-g_{x_{i,j}})^2+(pc_{y_{i,j}}-g_{y_{i,j}})^2 +(pc_{z_{i,j}}-g_{z_{i,j}})^2}\nonumber \\ \end{aligned}$$
(7)

where \(pc_{x_{i,j}}\), \(pc_{y_{i,j}}\) and \(pc_{z_{i,j}}\) are the 3D coordinates of a point from the computed point cloud, and \(g_{x_{i,j}}\), \(g_{y_{i,j}}\) and \(g_{z_{i,j}}\) are the coordinates from the ground truth. Since the point cloud and the ground truth are represented as 2D maps, we can simply calculate this error for each point considered as a pixel of the image.

A nonparametric test (Wilcoxon \(p< 0.05\)) was performed on the point clouds to test whether the accuracy before and after SLIC refinement was significantly different, for both Method 1 and Method 2. The same test was also performed to verify whether the accuracy obtained with Method 1 and Method 2 was significantly different.

The percentage of matched pixels is computed as the ratio of the valid reconstructed points identified in the region of interest with respect to the total number of points of the image. This region takes into consideration only the area of the image with a pixel intensity \({>}32\), eliminating the areas where the heart phantom is not visible. An example of the region of interest used for the evaluation is shown in Fig. 6.

The computational time is the time of execution of the algorithms. The code was developed in C\(++\), using OpenCV for the management of the images with robot operating system (ROS) as framework [21]. The program was running on a system with GNU/Linux operating system, and a CPU Intel Core i7-4820K with four cores and hyper-threading (eight virtual cores).

Fig. 6
figure 6

Example of the region of interest for heart1 and heart2 models. The points marked with magenta color were not included in the evaluation of the methods

Table 1 Parameters used in the evaluation

The parameters and labels used during the evaluation are listed in Table 1. The window dimension for the census transform and the aggregation cost of Method 2 are taken from [12], where the authors identify the optimal parameters taking into consideration accuracy and computational time.

Qualitative evaluation

The target application for this research is robotic SILS procedure for the abdomen. Unfortunately, video datasets with ground-truth information are not available for this anatomical region. Therefore, qualitative evaluations of the proposed methods were performed using the abdominal dataset from the Hamlyn Center. This dataset approximates the envisioned application area of our system well, including motions of structures/organs and the presence of surgical instruments in the field of view.

Fig. 7
figure 7

Example of the application of the workflow presented in this paper on one frame of heart1 dataset. From left to right, the RGB stereo images \(im_l\) and \(im_r\), the stereo images after the specularity removal filter and Method 1 at the top and Method 2 at the bottom. The Method 1 images show the equalization of the image, the disparity map calculation, the SLIC segmentation and the successive refinement of the disparity map. The Method 2 images show the census transform, the disparity map calculation, the SLIC segmentation and the successive refinement of the disparity map

Results

Figure 7 shows a sample image from heart1 processed with both methods, following the workflow of Fig. 2. The figure demonstrates the presence of the invalid values in the disparity map as well as the almost complete correction of the map with the application of SLIC refinement. The effect is clearly seen considering the percentage of matched pixels in Fig. 8. The application of the SLIC refinement increases the percentage of valid values to more than 70 % in this case.

A box plot representing the accuracy is shown in Fig. 9. These results show an accuracy of 3.17 and 3.27 mm using M1 and M1R and an accuracy of 1.68 and 1.77 mm for M2 and M2R, respectively. We can also see how the variance of these results is similar, indicating good consistency between the presented methods.

Figure 10 shows a colormap representing the accuracy of the reconstructed surface computed with the methods. Here again, the result of applying the SLIC refinement is demonstrated and how the filled holes present a low error with respect to the ground truth.

For the impact of the SLIC refinement on the accuracy, the nonparametric test did not reject the null hypothesis for both methods. This implies that the SLIC refinement did not have a statistically significant impact on the accuracies of M1 and M2. The same test rejected the null hypothesis in the comparison between the values obtained with M1 and M2. This implies that M2 is statistically better and provides significantly improved accuracy over M1.

We can see the results of the evaluation in Table 2, which shows the median values for the accuracy, percentage of matched pixels, and computational time. Both methods have a similar computational time, around 1.21 s per frame (with an image resolution of 288 \(\times \) 360), and the SLIC refinement adds an overhead of only 0.1 s in both cases.

The results from the qualitative evaluation are shown in Fig. 11. We can see that applying the methods on an image representing a more complex surgical scenario, with different abdominal structures and instruments visible in the field of view, the methods are still able to reconstruct the surface, and SLIC refinement seems to fill correctly the invalid values of the disparity map.

Fig. 8
figure 8

Box plot representation of the percentage of matching pixels (pixels that are considered valid) with Method 1 and Method 2 evaluated on heart1 and heart2 without and with the SLIC refinement

Fig. 9
figure 9

Box plot representation of the accuracy obtained with Method 1 and Method 2 evaluated on heart1 and heart2 without and with the SLIC refinement

Fig. 10
figure 10

Two-dimensional representation of the reconstructed 3D surface, obtained from Method 1 and Method 2 with and without SLIC refinement. The color represents the error

Table 2 Median values of the accuracy, percentage of matched pixels, and computational time for both methods and datasets, with and without SLIC refinement

Discussion and conclusion

The research presented in this paper aims at improving a dense 3D surface reconstruction of surgical environments, specifically to enable enhanced visualization during robotic SILS procedures of the abdomen. The 3D surface information from the surgical scene can be exploited to update the pose of preoperative models extracted from a CT scan. These models, inserted into a virtual assistive system, can provide guidance information to the surgeon, helping to improve the quality and safety of procedures through augmented visualization. In addition, information about the pose of the surgical robot with respect to tissue surfaces can assist in defining and enforcing safety constraints intra-operatively.

Two methods for 3D surface reconstruction were presented based on a traditional approach (M1) and a nonparametric transformation approach (M2). The comparison between the two methods shows that M2 has a significantly higher accuracy respect to M1 (around 1.7 mm). According to our team of surgeons, for the proposed application, the overall system accuracy should be below 10 mm. This is motivated by the fact that abdominal surgery cannot be considered as a precision surgery and it is characterized by the presence of motions and deformations of the organs. For this reason, the obtained accuracy with both methods can be considered as a good result since it should allow to comply with the required accuracy.

Fig. 11
figure 11

Qualitative results obtained by the application of M1 and M2 on a surgical video. The image on the left shows \(im_l\). The images on the right show the disparity maps obtained with the different methods presented in this paper

Regarding the valid reconstructed points, M1 provides a higher percentage of matching pixels. The methods were also complemented to enhance the density of the reconstruction using the SLIC disparity refinement. With the refinement, the percentage of matching pixels increases in both methods, reaching the 72.6 % in the best case. The refinement provided an improvement in the matching percentage up to 23 % in M1 and 43 % in M2, without any statistically significant impact on the accuracy.

A comparison of our error results with those from [19], which tests different state-of-the-art algorithm using the same video dataset, demonstrated that our errors are:

  • lower as compared to stereo block matching (SBM) and variational (Svar) algorithms.

  • similar magnitude as compared to stereo semi-global matching (SSGBM).

The strength of the methods presented in this paper is the usage of the SLIC super-pixel algorithm to obtain high-density valid disparity map, which can be exploited for augmented reality applications.

In comparison with [22], the accuracy achieved is slightly worse (mean error 2.47 mm against 1.55 mm), while the percentage of pixel reconstructed obtained exploiting the SLIC refinement is higher (72.6 % against 66.2 %).

In the extension of this research, future work will be focused on the development of a real-time implementation of the proposed algorithm, potentially based on a hybrid CPU–GPU processing framework. The expected speed-up is between 10 and 30 times for the current resolution of the images, which would allow reaching the target framerate of 25 fps. For higher-resolution images, in order to maintain a real-time execution of the algorithm, a subsampling could be applied. The motion tracking of the organs will also be investigated in combination with the soft tissue surface reconstruction. Finally, the components will be integrated into the virtual assistive system.