Minimally invasive surgery (MIS) is an attractive alternative to conventional open surgery and is known for improving outcomes, leaving fewer scars, and leading to significantly faster patient recovery [1, 2]. For certain surgical procedures, such as laparoscopic cholecystectomy, it has become the standard of care. Despite its success and its increasing application to treat or correct various pathologic conditions, full comprehension of the surgical field is more challenging in laparoscopic surgery than in open surgery. In laparoscopic procedures, real-time video of the surgical field acquired by a laparoscopic camera is currently the primary means to guide the procedures. Compared with open procedures, the lack of tactile feedback in laparoscopic procedures creates a greater need for enhanced intraoperative visualization for achieving safe and effective surgical outcomes.

Laparoscopic cameras have made significant image quality advances in recent years in that high-definition (HD) cameras are almost universally used. However, conventional laparoscopes are monocular and capable of providing only a single camera view. The resulting display is thus a flat representation of the three-dimensional (3D) operative field and does not give surgeons a good appreciation of the 3D spatial relationship among the anatomical structures. In addition, despite being rich in surface texture, the laparoscopic video provides no information on internal structures located beneath the visible organ surfaces. Both good depth perception and knowledge of internal structures are of critical importance for the safety and effectiveness of laparoscopic procedures and improved surgical outcomes [36].

In the present work, the limitation of depth perception is addressed through the use of a stereoscopic laparoscope. Although stereoscopic vision is already standard on the da Vinci surgical robot, the majority of laparoscopic surgeries are performed conventionally without this option. In recent years, stereoscopic laparoscopes have become available, and although their use remains small at present, it is expected to grow. Using stereoscopic laparoscopes, surgeons can perceive true depth and get an improved understanding of the 3D spatial relationship among the visible anatomical structures. Emerging studies are demonstrating superior surgical efficiency and precision when using stereoscopic vision [3, 4].

Even with HD image quality and stereoscopic vision, internal structures located beneath the visible organ surfaces cannot be visualized. Our solution to visualize the internal anatomy is to use laparoscopic ultrasound (LUS). Although LUS images traditionally are displayed on a separate display monitor, a more desirable approach may be augmented-reality (AR) visualization that integrates time-matched LUS and stereoscopic camera images into single multimodality images.

Several research groups have reported developing AR visualization for laparoscopic surgery and presented methods to fuse both pre- and intraoperative tomographic images with the live laparoscopic video. These efforts have been targeted for both robotic surgery [7, 8] and conventional surgery [917]. A challenge in developing AR visualization is the dynamic nature of the surgical anatomy and the movement and deformation of involved soft-tissue organs from respiration, cardiac rhythm, insufflation, and surgical manipulations. If using preoperative computed tomography (CT) and magnetic resonance (MR) images, these methods need to be modified to account for the movement and deformation of soft-tissue organs from the time of imaging as well as during the surgery. Despite many investigations, a reliable and robust image-to-video registration solution between preoperative CT or MR images and laparoscopic images remains technically difficult.

The use of LUS, which is capable of providing intraoperative images in real time, avoids the need to correct for soft-tissue movement and deformation, and accurate image-to-video registration can be accomplished with standard tracking methods. Leven et al. [7] took this approach to create a module for the da Vinci robotic surgical system. The module superimposes LUS images on stereoscopic laparoscopic video. In this study, the rigid LUS transducer was tracked by means of a vision-based method that relied on a distinctive pattern fixed to the shaft of the LUS transducer and located immediately proximal to the scanning area. The requirement that the LUS mounted pattern stay within the field of view of the stereoscopic camera for vision-based tracking to work is a limitation and affects the clinical viability of the overall approach. Cheung et al. [9] have reported an AR prototype with a stereoscopic laparoscope and a flexible-tip LUS. The two devices were tracked using an electromagnetic tracking system. A phantom study mimicking minimally invasive partial nephrectomy was performed in the laboratory [10]. The prototype, as designed and reported, is not suitable for human use.

In this article we describe the development and characterization of our novel stereoscopic AR prototype for conventional laparoscopic surgery based on LUS and the emerging stereo-laparoscopic camera technology. Designed and developed by a team of biomedical engineers and minimally invasive surgeons, the stereoscopic AR system merges live LUS images with stereoscopic laparoscopic video in real time to create an ultrasound-augmented stereoscopic video stream. Furthermore, the system has been designed with near-term clinical use as a primary goal. This novel capability provides laparoscopic surgeons two new visual cues: (1) perception of true depth with an improved understanding of the 3D spatial relationship among anatomical structures, and (2) visualization of critical internal structures such as blood vessels, bile ducts, and surgical targets such as tumors. Together, these provide a more comprehensive visualization of the surgical field.

Materials and methods

System overview

Figure 1 shows the developed stereoscopic AR system and a block diagram with major components. The system uses two imaging devices: a stereoscopic vision system (VSII, Visionsense Corp., New York, NY, USA), with a 5-mm laparoscope (called 3D laparoscope henceforth), and an ultrasound scanner (flex Focus 700, BK Medical, Herlev, Denmark), with a 10-mm laparoscopic transducer. The 3D laparoscope is a 0° scope with a 70° field of view. It has a fixed focal length of 2.95 cm. It provides an integrated light source and automatic white balance, both of which simplify use in the OR. The LUS transducer has an operating frequency range of 5–10 MHz, with a maximum scan depth of 13 cm. The LUS system is capable of gray-scale B-mode and color Doppler mode scanning. Both systems have been cleared by the FDA for clinical use.

Fig. 1
figure 1

The developed stereoscopic AR system on a rolling cart (left) and a block diagram of the major components of the system (right)

In our developed system, the stereoscopic video and LUS images are streamed over high-speed Ethernet from the 3D vision system and the LUS scanner to our stereoscopic AR fusion module. An optical tracker (Polaris, Northern Digital Inc., Waterloo, ON, Canada) is used to track the pose (location and orientation) of the 3D laparoscope and the LUS transducer. Utilizing the tracking data, the LUS images are overlaid on the stereoscopic video in real time, creating two ultrasound-augmented video streams (one for the left eye and the other for the right) for stereoscopic visualization. These two video streams are displayed on a 3D monitor and users can perceive the 3D effect (stereopsis) by wearing passive polarized glasses. The stereoscopic AR fusion module is implemented on a 64-bit Windows 7 PC with an 8-core 3.2-GHz Intel CPU, 12-GB memory, and an NVidia Quadro 4000 (NVidia Corp., Santa Clara, CA, USA) graphics card.

To track the 3D laparoscope and the LUS transducer, two reference frames with passive reflective spheres are mounted onto their respective handles (Fig. 2). The rigid structure of the 3D laparoscope and the LUS transducer allows tracking them through the reference frames, whose pose with respect to the tracker is known in real time.

Fig. 2
figure 2

Custom-designed fixtures, with reference frames attached, for the 3D laparoscope (top) and the LUS transducer (bottom)

System calibration

The image-to-video registration accuracy is critical from the standpoint of surgical safety. The optical tracker we used is rated to have submillimeter (i.e., negligible) tracking error. The main determinants of the accuracy in the developed system therefore come from how accurately the imaging parameters (focal length, lens distortion, and pixel size for the 3D laparoscope; scan depth and pixel size for the LUS transducer) of the two imaging devices are estimated during system calibration.

We used Zhang’s method [18] for calibrating the 3D laparoscope and the method proposed by Yaniv et al. [19] for calibrating the LUS transducer. To obtain stable and accurate tracking data, the optical tracker was mounted on a stationary tripod and the 3D laparoscope and the LUS transducer were held by a clamp to avoid hand tremor. With our current laboratory setup, it takes approximately 30 min to perform accurate calibration of both devices.

OR-friendly design

An OR-friendly system design is important for clinical viability and routine OR use. In our view, the ease of OR setup and system portability are two major aspects of OR compatibility.

Because the 3D laparoscope and the LUS transducer must be sterilized before each OR use, the reference frames affixed on them (Fig. 2) for tracking purposes must be disassembled and sterilized separately. The system calibration could be performed in the OR after reattaching the reference frames; however, doing so would consume expensive OR time and require an extra technician to be part of the surgical team.

To be able to reuse laboratory calibration results in the OR and thus minimize OR setup time, the reference frames should be affixed on the transducers in exactly the same position as they were before disassembly. To do this, purpose-designed mechanical fixtures were affixed on the 3D laparoscope and the LUS transducer uniquely and served as mounts for reference frames needed for optical tracking. This strategy maintains a fixed geometric relationship between the reference frames and the imaging devices before and after sterilization.

The fixture for the 3D laparoscope (Fig. 2) was printed on a 3D printer (Objet500 Connex, Stratasys Ltd., Eden Prairie, MN, USA) using a material that can withstand the standard sterilization process. The fixture for the LUS transducer (Fig. 2) was made of aluminum. The fixtures can be easily mounted on the devices in a unique way. To make our stereoscopic AR system portable for OR use, all the components were assembled on a rolling cart, as shown in Fig. 1.

Experiments

A series of experiments was conducted using the developed stereoscopic AR system. The system latency and image-to-video registration accuracy were measured in a well-controlled laboratory setting. To further evaluate system performance, a phantom study and two animal studies were conducted.

System evaluation

Low system latency and high image-to-video registration accuracy are critical for a successful clinical AR system. Poor latency or a notable lag between the movement of the imaging devices and the corresponding pictures on the display monitor will not be clinically acceptable. Likewise, poor image-to-video registration accuracy resulting in misaligned structures in the AR output will render the system unhelpful.

System latency was measured using the well-accepted method of imaging a high-resolution (millisecond) digital clock. The difference between the actual (clock) time and the time seen in the output image of our system determines system latency. To account for all delays, the stereoscopic AR system was run in the full-function mode with simultaneous LUS imaging. Several measurements were made to arrive at a mean value.

The image-to-video registration accuracy depends on (1) the calibration accuracy of the 3D laparoscope, (2) the calibration accuracy of the LUS transducer, and (3) the accuracy of stereoscopic AR visualization. The target registration error (TRE) metric was used to measure these accuracies. Given a 3D point, the TRE is defined as the 3D distance from its actual (reference) location to the computed location of the point. Figure 3 depicts the procedure of calculating TRE in our experiment. A tracked pointer, held by a clamp and with its tip immersed in a water tank, was imaged using the tracked LUS (held by another clamp). Two frames of stereoscopic AR video were then produced by aiming the tracked 3D laparoscope at the pointer tip from two different angles. The tip of the pointer provided the reference location (denoted as P). Its estimated location (denoted as P′) was calculated through triangulation [20] (depicted by orange lines in Fig. 3) using the two AR frames. Finally, the TRE was computed as the 3D distance from P to P′.

Fig. 3
figure 3

Illustration showing the procedure of calculating TRE

Phantom study

An intraoperative abdominal ultrasound phantom (IOUSFAN, Kyoto Kagaku Co. Ltd., Kyoto, Japan), created specifically for laparoscopic applications, was used to demonstrate the capability of the stereoscopic AR. The phantom includes realistic models of the liver, spleen, kidneys, pancreas, biliary tract, detailed vascular structures, and simulated lesions such as biliary stones and cysts and solid tumors in liver, pancreas, spleen and kidneys.

Animal study

All animal procedures were approved by the Institutional Animal Care and Use Committee (IACUC), and the animals were treated in accordance with the PHS Policy on Humane Care and Use of Laboratory Animals, the National Institutes of Health Guide for the Care and Use of Laboratory Animals, and the Animal Welfare Act. After successful anesthesia and intubation, a 40-kg female Yorkshire swine was placed in the left decubitus position. Two trocars were placed, one at the midline midabdomen (12 mm, for instruments and LUS transducer) and one at the right anterior axillary line in the lower abdomen (5 mm for 3D laparoscopic camera). In addition, a hand port was placed in the midline lower abdomen to provide direct access for enhanced tissue manipulation. Carbon dioxide pneumoperitoneum at 10 mmHg was created. After registration of the optical markers between the 3D laparoscopic camera and the LUS transducer, real-time tracking and stereoscopic AR visualization was started. Right kidney, liver, and biliary structures were examined, with the real-time LUS images superimposed on the 3D laparoscopic video to provide internal anatomical details of the organs. The second study, also on a 40-kg female Yorkshire swine, followed the same procedure.

Results

The system latency was measured to be 144 ± 19 ms. This value includes the native latencies of the constituent imaging systems and those of data streaming and the stereoscopic AR visualization computation pipeline. The calibration accuracy of the LUS transducer, the calibration accuracy of the 3D laparoscope, and the overall accuracy of stereoscopic AR visualization were 1.08 ± 0.18, 0.93 ± 0.18 mm (left-eye channel) and 0.93 ± 0.19 mm (right-eye channel), and 3.34 ± 0.59 mm (left-eye channel) and 2.76 ± 0.68 mm (right-eye channel), respectively.

Phantom study

Figure 4 shows two representative frames of the left-eye channel of the stereoscopic AR video generated using our system. The laparoscopic video, augmented with LUS images, revealed the internal structures in the liver. The first row has an LUS cross section of the gallbladder overlaid on the laparoscopic camera view. In the same manner, the second row has the intrahepatic vessels and a simulated tumor within the liver, and the third row shows the common bile duct.

Fig. 4
figure 4

Three stereoscopic AR video snapshots (left-eye channel) recorded during the phantom study. Each row has the original 3D laparoscopic camera image (left column), the original LUS image (middle column), and the stereoscopic AR image generated by our system (right column). The first row shows the gallbladder, the second row shows the cystic duct, and the third row shows a simulated solid tumor within the liver

Animal study

The stereoscopic AR system was successfully used for visualization experiments in two swine. Subsurface anatomical structures along with vascular flow in the liver, kidney, and biliary system were clearly observed. The surgeons were able to identify the major branches of the right renal vasculature throughout the parenchyma of the kidney from the hilum to the renal cortex. The calyces were also able to be identified in relation to the visible renal capsule. Next, the liver was interrogated with good visibility of the internal vasculature and major biliary structures. Even with rapid movements of the LUS transducer, the system latency was not noticeable to the naked eye. Visibility of the LUS image overlay was affected by the contrast of the background tissue. Darker tissues (i.e., renal parenchyma, liver) allowed for better visualization of the LUS overlay than lighter tissues (i.e., Gerota’s fascia, bowel, and peritoneum). The presence and use of the optical tracker were not seen to impede surgeon access or clinical workflow.

Representative 2D image captures are shown in Fig. 5, including the 3D laparoscopic video images (left-eye channel), the LUS images, and the output stereoscopic AR video images (left-eye channel) recorded in the animal studies. The LUS images were overlaid onto the video images correctly and the internal structures beneath the liver were visualized clearly at their proper locations.

Fig. 5
figure 5

Two stereoscopic AR video snapshots (left-eye channel) recorded during an animal study. Each row has the original 3D laparoscopic camera image (left column), the original LUS image (middle column), and the stereoscopic AR image generated by our system (right column)

Stereoscopic AR video

In addition to the static images presented in this article, three video clips are provided as electronic supplementary material. The video clips are of live stereoscopic AR visualization from the phantom and animal studies, with explanatory annotation and voice track.

The original stereoscopic AR video consists of left- and right-eye channels and requires a 3D monitor and polarized glasses to view it. To be viewed with the naked eyes, the left-eye channel of the stereoscopic video has been annotated in the first part. In the second part, the same content is presented in the red-cyan anaglyph 3D video format by compositing the two channels into a single channel, allowing viewers to appreciate the 3D effect using general video players (e.g., Windows Media Player or Quick Time Player) on a regular monitor by wearing red-cyan glasses.

Discussion

Augmented reality visualization for laparoscopic surgery remains an active area of research and development. We have presented a stereoscopic AR system with improved image-to-video registration accuracy which is possible through the use of intraoperative tomographic imaging in the form of LUS and superior depth perception from using stereoscopic vision. The system design also took into consideration practical issues and the OR workflow to design a system suitable for clinical use.

The majority of previously reported AR visualization methods attempted to overlay pre- and intraoperative CT/MR images on the laparoscopic video. Surface models of organs of interest often are derived from the images and registered with the laparoscopic video. A variety of registration algorithms, often the most critical component, has been developed and included point-to-surface registration [14], surface-to-surface registration [11], stereo vision-based tracking [8], and image-based tracking relying on custom-designed needles inserted into the organ as navigation aids [12]. Most reported image-to-video registration algorithms have required manual or semiautomatic initialization. In addition, because the organ models derived from pre- and intraoperative images represent a snapshot in time, the reported AR methods often cannot adapt to the changing surgical anatomy. Hostettler et al. [15, 16] have presented a method that attempts to nonrigidly register preoperative tomographic images with the patient and hence the surgical anatomy. They have developed organ deformation models that deform the preoperative images in real time according to the patient’s body shape which is obtained from intraoperative skin surface reconstruction. However, the approach does not model anatomic changes from insufflation and differences in patient positions between preoperative imaging and the actual surgery. It also needs intensive testing and validation to be ready for clinical use.

These systems and methods have many limitations that make them less desirable and less suitable for reliable operation and routine OR use. These limitations arise from (1) the inability of preoperative images to properly and accurately describe the ever-deforming anatomy during surgery; (2) the fact that the image-to-video registration procedures are mostly rigid, whereas the soft-tissue organs deform nonrigidly throughout the surgery; and (3) subjective and nonreproducible accuracy of interactive (i.e., manual or semiautomatic) registration procedures.

To provide continuous and automatic updates of the surgical field, Shekhar et al. [17] used CT continuously throughout a laparoscopic surgical procedure. Live CT and live laparoscopic video were registered using optical tracking. To minimize radiation exposure, the investigators used low-dose CT intraoperatively and employed high-speed deformable registration to incorporate vasculature data from a standard contrast CT taken immediately before starting surgery. Although the method does account for continuous deformation of the surgical anatomy and performs accurate CT-to-video registration, the risk of the patient and the surgical team being exposed to high levels of radiation, despite the use of low-dose CT, renders this approach clinically impractical in the foreseeable future.

In our prototype, the use of LUS eliminates any risk of radiation exposure and provides real-time imaging with multiple scanning modes (B-mode and color Doppler mode). The developed system also facilitates interpretation of LUS images. In current practice, LUS images are displayed on a separate monitor and are thus visually disconnected from the live laparoscopic video. This separate presentation requires that the surgeons mentally register the two types of images. This task is difficult to perform in general and especially so in the stressful environment of the OR. It also varies with operator expertise and is prone to error. Our stereoscopic AR system presents the LUS images in correct reference to the laparoscopic camera view in real time, obviating the need for mental image integration, eliminating associated errors, and potentially improving surgical efficiency.

Our system can be integrated into the existing surgical workflow seamlessly with minimal changes. Its clinically compatible design makes it possible to reuse the laboratory system calibration and maintain sufficient image-to-video registration accuracy in the OR. The current system latency of ~144 ms is sufficiently small to allow surgeons to perform laparoscopic procedures smoothly. The current image-to-video registration accuracy of ~3 mm is also acceptable. This accuracy may be improved by using more sophisticated calibration methods. For example, highly accurate ultrasound calibration can be obtained using the calibration methods [21] with a specially designed high-accuracy calibration phantom and advanced image processing. Our system provides four operation modes: stereoscopic AR, monocular AR, stereoscopic laparoscope only, and monocular laparoscope only. Surgeons can switch between these modes easily. Monocular modes make our system backward compatible in that the AR capability can also be offered in conjunction with conventional monocular laparoscopes.

Potential applications for stereoscopic AR visualization include laparoscopic biliary surgery and partial organ resection procedures. In biliary surgery, this enhanced visualization will accurately identify the bile duct and vascular anatomy in real time. If used as a training tool, resident identification of the common bile duct–cystic duct junction and vascular anatomy during laparoscopic cholecystectomy could become a standard of care by replacing other current techniques such as intraoperative cholangiogram and indocyanine green injection. In children, laparoscopic choledochal cyst excision and biliary reconstruction would be significantly enhanced by using stereoscopic AR because the variation and complexity of this condition is substantial. In these children, large dilated common bile ducts may obscure the portal vein due to both size and inflammation such that blind dissection behind the cyst results in major hemorrhage. With 3D AR visualization in subtotal organ resection procedures for tumors, identification of the tumor relative to normal parenchyma, such as in the pancreas and kidney, will allow rapid identification of the tumor (in the absence of palpation or direct visualization when subparenchymal) as well as the proximity of splenic vein/artery and renal vessels, pelvic calyces, and ureter. These early potential applications make the visualization technology presented here equally applicable to both ablative and reconstructive procedures.

Conclusion

Our work has addressed key issues in developing a clinical real-time stereoscopic AR visualization system for conventional laparoscopic surgery. The system prototype has been evaluated in phantoms and animals. Results showed acceptable system latency and image-to-video registration accuracy.

A specific future direction of our work is to replace optical tracking with electromagnetic tracking, which will eliminate the current line-of-sight requirement for tracking purposes and allow us to use a flexible-tip LUS transducer and even a flexible-tip 2D/3D laparoscope. Conducting clinical testing and collecting clinical feedback for improving the overall clinical usability of the system is another future direction.

AR visualization, supported with real-time imaging and improved depth perception, promises to greatly enhance the precision of current-generation laparoscopic surgeries while enabling new ones. It is expected that the full development of the real-time stereoscopic AR visualization system will make minimally invasive laparoscopic surgeries more precise and safer.