Keywords

Fig. 1.
figure 1

This figure illustrates the potential goal: medical specialists being able to work all together on a patient dataset of fused 3D imaging modalities (MRI, PET and/or CT). The current stage of development proposes a functional solution with a single user.

1 Introduction

Current surgical treatments rely on complex planning using traditional multiplanar rendering (MPR) of medical images including CT and MRI. The resulting imagery shows patients anatomical slices in axial, sagittal or coronal planes to plan surgery. However, details of vital internal structures are often scattered and become inconspicuous [3]. The use of 3D modeling partially solves this problem, allowing the reconstruction of patient-specific anatomy. Many studies revealed that using 3D models benefits in different surgical fields [3, 8, 9, 11, 13] improving surgical planning, shortening patient exposure time to general anaesthesia, decreasing blood loss and shortening wound exposure time. However, the visualisation and interaction of these complex 3D models on conventional environments with 2D screens remains difficult [1, 3]. Indeed, user’s point of view is limited by the windowing of his the screen and the manipulation via the mouse is not intuitive and a biased appreciation of distances [1, 3]. This complicates clinical diagnosis and surgical planning. Today, medical data visualisation extends beyond traditional 2D desktop environments through the development Mixed Reality (MR) and Virtual Reality (VR) head-mounted displays [2, 4, 6, 13]. These new paradigms have proven their high potential in the medical field including surgical training [6, 13] and planning [3, 14]. Specifically, the MR solutions have already shown an interest in the problem described by displaying a personalized data visualisation [7]. Using MR, the interaction with the hologram can be at the same location as where holographic presentation is perceived [1]. This point in particular has the potential to improve surgical-planning and surgical-navigation [3, 8, 9, 11, 12]. The main goal of this paper is to describe a MR tool that displays interactive holograms of virtual organs from clinical data as illustrated in Fig. 1. With this tool, we provide a powerful means to improve surgical planning and potentially improve surgical outcome. The end-user can interact with holograms through an interface developed in this project that combines different services offered by the HoloLens.

Fig. 2.
figure 2

Complete process from patient to medical experts using holograms to improve understanding and communication between the different actors.

2 Method and Technical Concepts

The usual pipeline used for viewing medical images in MR includes a step of preprocessed segmentation and modelisation of the medical images. The HoloLens, being an embedded system, are very far to have the processing power required to compute an advanced volumetric rendering. Therefore, a remote server with a high-end GPU to handle all the rendering processes is needed. We propose the following architecture to implement the latter is shown in Fig. 2. The process handling the volumetric rendering starts by loading the 3D scans of segmented organs in precomputation pipeline. Once the texture loaded, data are transferred to the GPU into the dedicated server and is used to render the 3D scene with all the filters and ray casting stereoscopic rendering.

Details of the rendering architecture pipeline (steps 3 to 5 of the Fig. 2) are provided in Fig. 3.

Fig. 3.
figure 3

Illustration of the real-time rendering pipeline.

The headset starts a remote client provided by Microsoft (MS) to receive the hologram projection so that the medical staff can visualise the data and interact in a natural manner. The headset sends back the spatial coordinates and the vocal commands to the server to update the virtual scene.

Various volumetric imaging protocols are used in clinical routine for surgical planning like CT, PET and MRI scans. Being able to mix and interact with all patient informations (volumetric anatomy as well as patient data file as floating windows) an AR setup constitutes a powerful framework for understanding and apprehending complex organ and tissue and even pathology. The proposed preprocessing pipeline could take scans as input align and encode them in a 32-bit \(4096^2\) matrix including segmented structures. It allowed to fuse complex structural and functional information in one single data structure stored as images in the database that can be efficiently visualised in the AR environment. The first step required to combine those scans is to resample and align them. All scans were first resampled to have 1 mm-edge length cubic volumetric pixels (voxels). The nearest neighbour interpolation was used to preserve Standardized Uptake Values SUV units in PET, whereas cubic interpolation was used for CT and MRI. The CT scan is set as a reference and both PET and MRI are mapped with a 3D translation. A second step consisted of cropping the data around the object of interest. The input of the main rendering pipeline is a \(4096\times 4096\) 32-bits-encoded matrix. Therefore, it was first reshaped all \(256^3\) volumes into a \(16\times 16\) series of 256 adjacent axial slices to match the \(4096^2\) pixel format of the rendering pipeline input. Then, the PET, CT and MRI data bit depth were transformed to 8 allowing us to encode all three modalities in the first three Red, Green and Blue bytes of the 32-bits-input matrix. Since CT and PET protocols yield voxel values corresponding to absolute physical quantities, simple object segmentation can be achieved by image thresholding and morphological closing. Bone was defined as voxel values \(f_{\text {CT}}(\varvec{x}) > 300\)HU. Arteries were defined as \(200< f_{\text {CT}}(\varvec{x}) < 250\)HU in CT images with IC. Various metabolic volumes were defined as \(f_{\text {PET}}(\varvec{x}) > t\), where t is a metabolic threshold in SUV. All resulting binary masks were closed with a spherical structural 3 mm-diameter element to remove small and disconnected segmentation components. The binary volumes were encoded as segmentation flags in the last Alpha byte.

The real-time virtual reconstruction of body structures in voxels is done by a GPGPU massive parallel computation ray casting algorithm [5] using the preprocessed image database described. The volumetric ray casting algorithm allows to dynamically change how the data coming from the different aligned scan images are used for the final rendering. As the database described above, each rendering contains four main components: PET, CT, MRI scans and segmentation. All components are registered on an RGB\(\alpha \) image and can be controlled in the shader parameters. The following colour modes are available: Greyscale for each layer, corresponding to the most widely used visualisation method in the medical field; Scan colour highlighting, voxel intensities within the scans and segmentation; Sliced data colouration.

Each layer can be independently controlled, filtered, colored and highlighted. Moreover, the user can change each layer aspect using ray-tracing and enhance target visualisation with segmentation highlight by changing the corresponding layer intensity. The ray casting algorithm updates the volume rendering according to the user commands: The user can quickly activate and deactivate each scan and have a perspective on the position of each body part with voice commands; The user can interact with the model in a spatial control field with the pinch command.

Fig. 4.
figure 4

Shaders implemented in this project according to use cases which display the data in different renderings. Extracted directly from the final HoloLens.

Geometric manipulations like rotation, scaling and translation provide a way to control the angles of the visualised hologram. This feature is essential to allow medical staff to fully use the 3D rendering and see the organic structure details. Moreover, the user can slice the 3D volume and remove data parts to focus on specific ones. Hand gestures and voice recognition algorithm voice based on MS API: MR Companion Kit.

3 Results

Results of initial testing the MR application within the Otorhinolaryngology Department of the Lausanne University Hospital (CHUV), it was concluded that the current version of the project can be applied in three different ways: (1) Greyscale image display (mostly used to plan surgery); (2) PET scan highlighted with false colours; (3) Segmentation highlighted with false colours.

The first rendering, shown in Fig. 4, is an example of a patient with a neck oropharyngeal cancer. The first image represents the mix of the three scan layers (PET, CT, and MRI) on a grey scale with a red highlight on the PET and CT. The second rendering shows the bone structure. A pink segmentation highlight is added to the 4th byte.

The third rendering shown in Fig. 4, adds a slicing functionality, which enables two kinds of renderings: one displaying a solid 2D slice and one keeping the volumetric rendering on the slice as shown in the last rendering, adding the optionality to navigate through different layers using different angles.

The current state of the application provides a proof that this concept and current material can support volumetric rendering with a dedicated server and a remote connection to the headset.

To have an estimate potential lags, a benchmark has been made with various shader models from the system as seen in Table 1. Details the average variation performance of the following functionalities: multiple-layer activation, user-finger input activation, vocal inputs, segmentation filtering, threshold filtering, scan slicing, x-ray and surface rendering as well as several colouring modes. The Table 1 certifies that immersion and hologram manipulation were very satisfying [10]. The current projects now focuses on improving the following aspects:

  • Low performance when starting the remote connection;

  • The remote connection resets itself if the frame reception takes too long because of packet loss;

  • Frame loss during pinching inputs often leads to inaccurate manipulations;

  • The above weak performance and connection reset issues will be fixed in a later version of the product. As for frame loss, improving the pipeline stability might be a suitable option.

Table 1. Benchmark of different HoloLens components

4 Conclusion and Perspective

This paper demonstrates the high potential of fused 3D data visualising. The protocol underlines an innovative software architecture enabling real-time, practical visualisation of a massive individual patient database (i.e. 30 fps and 120 ms delay). Moreover, the manipulation of simultaneous 3D anatomic reconstructions of PET, CT and MRI allows better clinical interpretation of complex and specific 3D anatomy. The protocol can be adapted in different disciplines, not only improving surgical planning for medical professionals but also enhance surgical training and thereby increase the surgical competence for future generations.

The next step will be adding multiple users in a single 3D scene, providing a more intuitive interface, and conducting clinical indoor user tests. Feedback from users point on that one of the main remaining issue concerns the easy-to-use interface. Besides, in terms of graphical renderings, the current approach does not allow very high image resolutions but only \(128^3\) voxel space equivalent; currently emphasis is placed on working on a fast and more advanced version taking into account real environment with natural sphere maps.

5 Compliance with Ethical Standards

Conflict of interest – The authors declare that they have no conflict of interest.

Human and animal rights – This article does not contain any studies with human participants or animals performed by any of the authors.