LumiPath – Towards Real-Time Physically-Based Rendering on Embedded Devices

Fink, Laura; Lee, Sing Chun; Wu, Jie Ying; Liu, Xingtong; Song, Tianyu; Velikova, Yordanka; Stamminger, Marc; Navab, Nassir; Unberath, Mathias

doi:10.1007/978-3-030-32254-0_75

Laura Fink^16,17,
Sing Chun Lee¹⁶,
Jie Ying Wu¹⁶,
Xingtong Liu¹⁶,
Tianyu Song¹⁶,
Yordanka Velikova¹⁶,
Marc Stamminger¹⁷,
Nassir Navab¹⁶ &
…
Mathias Unberath¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11768))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7997 Accesses
2 Citations
1 Altmetric

Abstract

With the increasing computational power of today’s workstations, real-time physically-based rendering is within reach, rapidly gaining attention across a variety of domains. These have expeditiously applied to medicine, where it is a powerful tool for intuitive 3D data visualization. Embedded devices such as optical see-through head-mounted displays (OST HMDs) have been a trend for medical augmented reality. However, leveraging the obvious benefits of physically-based rendering remains challenging on these devices because of limited computational power, memory usage, and power consumption. We navigate the compromise between device limitations and image quality to achieve reasonable rendering results by introducing a novel light field that can be sampled in real-time on embedded devices. We demonstrate its applications in medicine and discuss limitations of the proposed method. An open-source version of this project is available at https://github.com/lorafib/LumiPath which provides full insight on implementation and exemplary demonstrational material.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Color Rendering in Medical Extended-Reality Applications

Article 17 November 2020

A DirectX-Based DICOM Viewer for Multi-user Surgical Planning in Augmented Reality

OpenDIBR: Open Real-Time Depth-Image-Based renderer of light field videos for VR

Article 23 August 2023

Keywords

1 Introduction

Real-Time Physically-Based Rendering. Conventional rasterization methods generate images by artificially shading objects but mostly limit considerations to direct illumination. In contrast, physically-based rendering aims to synthesize images by simulating light propagation. To this end, these methods consider how light quanta are emitted from light sources and interact with the environment before impinging on a camera’s image plane. As a direct consequence, physically-based rendering additionally provides indirect illumination effects which have a high impact on perceived realism. One such method, ray tracing [12], simulates light rays in a reverse order. Incoming radiance is integrated for each pixel by following rays that are emitted from the camera. These rays hit objects in the scene which they interact with, based on the physical simulation of illumination phenomena such as reflection, refraction, and shadowing. From these hit-points, again all incoming radiance is integrated and rays are repeatedly traced until they eventually reach a light source (or exit the scene).

Accurately accounting for imaging physics can result in rendered images that are indiscernible from real ones. However, integrating incoming radiance for each pixel is computationally expensive and barely real-time. Hardware, like the Nvidia GeForce RTX, made a big step towards real-time ray tracing by incorporating deep learning technology, drastically reducing the required computations. This is achieved by aggressively limiting light-scene interactions; resulting artifacts are masked with machine learning-based post-processing.

The increase in compute capabilities of graphics processing units (GPUs) and advances of rendering algorithms have fueled the recent interest in adopting real-time physically-based rendering in daily applications. Unfortunately, these advances do not translate well to applications on embedded devices. This is because (1) GPU hardware cannot easily be miniaturized and integrated, and (2) remote-computation and streaming is not necessarily desirable (particularly in the medical context). In the remainder of this manuscript, we describe methods that aim at bringing real-time physically-based rendering to embedded devices.

Related Work. We limit our non-exhaustive review of related work to plenoptic functions (light fields), and physically-based rendering on embedded devices.

The Plenoptic Function. Light transport in a 3D static scene can be expressed as tracing the set of all possible rays; rays are defined by their origin $(x,y,z) \in \mathbb {R}^3$ and direction $(\theta , \phi ) \in [0, \pi ]\times [0, 2\pi ]$, yielding five degrees of freedom (DoFs). This description is referred to as light field or plenoptic function. Hardware limitations led to precomputing a subset of the plenoptic function in a domain of interest rather than simulating light-scene interaction on the fly. Image synthetization is then performed by sampling and interpolating the precomputed results. Such approaches are referred to as image based rendering [10] and are capable of highly reducing the computations needed at runtime.

Among the most well-known representatives of such approaches is the LumiGraph [3], which reduces the five DoFs of the plenoptic function to four. The LumiGraph is based on the assumption that the medium surrounding an object of interest is transparent (radiance is constant along the ray), and therefore, the plenoptic function can be parameterized in terms of a bounding surface, namely a cube. By heavily constraining possible camera-object arrangements, this surface can be further reduced by only considering two opposite sides of the cube, i.e. two planes. Two point sets $\text {P}_o$ and $\text {P}_d$ discretize the first and second plane, respectively. The set of precomputed rays can then be determined by connecting every point $p_o \in \text {P}_o$ with each point $p_d \in \text {P}_d$. This arrangement may lead to artifacts [1] due to a non-uniform sampling of the light field.

Camahort et al. [1] examined sampling on a sphere to provide more uniform light fields. They perform a binning approach based on a Bresenham-style discretization of the spherical surface which, in addition to not being perfectly uniform, has the major drawback of the runtime complexity or radiance information query being dependent on the number of bins.

Physically-Based Rendering on Embedded Devices. A patent [14] out of Siemens Healthineers is one of the closest works we are aware of that aims to achieve physically-based rendering on embedded devices. Their method is similar to ours in that it partly front-loads computations to accelerate image generation. However, the application still seems to depend on ray casting at runtime which is found to be a quite demanding task for today’s embedded devices, including head-mounted displays, in its own right [4].

Contributions. In summary, our contributions are:

An algorithm for real-time physically-based rendering-like results on embedded devices based on uniformly sampled light fields, which, to the best of our knowledge, is the first algorithm to do so.
A new 2D plenoptic function representation using two Spherical Fibonacci point sets [9], which are sampled uniformly and with arbitrary sampling size providing flexibility in tweaking memory to any embedded device.
Fast neighborhood query of our domain using an extended version of the Keinert Inverse Fibonacci Mapping [7]. It has constant time complexity per pixel, does not require additional query structures and is decoupled from the light field’s discretization granularity. The runtime only depends on the fixed number of queried neighbors needed during color interpolation.
An effective machine learning-based post processing filter which is well designed for the execution on embedded devices that trend to incorporate inferencing acceleration and its evaluation.

2 Method

In order to allow for physically-based rendering on embedded devices, our prototype consists of a two-step algorithm. First, we compute all values of a reformulated plenoptic function and save the outcome as texture, which trades off hardware resources for rendering quality (see Fig. 1). Second, we transform the computationally expensive rendering task into a fast data query and interpolation task using this new representation (see Fig. 2). Additionally, we present a neural network that performs post-rendering correction in order to resolve artifacts and vastly enhance image quality.

LumiPath-Based Rendering. The parameterization of the plenoptic function $L(x,y,z, \theta , \phi )$ implies that we consider our scene as static. Further, we assume that the medium outside of a bounding sphere S which encapsulates our domain of interest (DOI) is totally transparent. Thus, radiance along a ray remains constant and consequently, the radiance emitted from the DOI is equal to the radiance at the intersection point of a ray with S. We reparameterize and discretize the plenoptic function L according to the surface of S (with radius R and origin O) and hence reduce the domain of L to our DOI.

For uniform discretization of the surface of S, we use Spherical Fibonacci (SF) point sets $\text {P}^n_{SF}$ [9]. A point $p_i$ of a SF point set $\text {P}^n_{SF}$ is given by

$$\begin{aligned} p_i = C(\phi _i, \cos ^{-1}(z_i)), \text {with}\; \phi _i = 2\pi \left[ \frac{i}{\varPhi }\right] , z_i = 1 - \frac{2i + 1}{n}, \end{aligned}$$

where $\left[ x\right] $ is the fractional part of $x: \left[ x\right] = x - \lfloor x \rfloor $, C is the conversion of unit vectors from polar to Cartesian coordinates $C(\theta , \phi ) = (x,y,z)^T = (\cos (\phi ) \sin (\theta ), \sin (\phi ) \sin (\theta ), \cos (\theta ))^T$ and $i \in \left\{ 0, \dots , n-1\right\} $. We have two SF point sets $\text {P}^M_{o}$ and $\text {P}^N_{d}$, where M is the number of ray origins o and N is the number of directions d. The set of all rays $\text {R}^K$ is determined by the two spherical Fibonacci point sets $\text {P}^M_{o}$ and $\text {P}^N_{d}$. The each-to-each connection of $\text {P}^M_{o}$ and $\text {P}^N_{d}$ yields $M \times N$ as the cardinality of $\text {R}^K$. A ray $r_k \in \text {R}^K$ acts as camera ray for the path tracing and is given by

$$\begin{aligned} r_k = (O + Rp^o_i) + t \hat{d}_{i,j}, \text {with } d_{i,j} = p^d_j - p^o_i, i \in \left\{ 0, \dots , M\!-\!1\right\} , j \in \left\{ 0, \dots , N\!-\!1\right\} . \end{aligned}$$

We use a conventional path tracer comparable to [11]. During the tracing, ray origins are uniformly jittered on a disk with area to substantially reduce rendering noise at the cost of additional blurring of the result. The captured radiance for each $r_k$ is stored in a two dimensional texture. Each dimension of the texture corresponds to one of the point sets $\text {P}^M_{o}$ and $\text {P}^N_{d}$ and thus, the indices i and j not only identify a point given by the Fibonacci sequence but also the texel coordinates for memory accesses. Therefore, our reparameterized, discrete form of the plenoptic function is in fact 2D (parameterized by 2 indices of the point sets), denoted by $\hat{L}(i,j)$.

To synthesize images on the embedded device, we retrieve the precomputed physically-based rendering result from $\hat{L}(i,j)$ during the rasterization process of very simple sphere geometry, which is a simple texturing process (see Fig. 2). For each rasterization ray that hits the sphere S, we find two hit points, $h_o$ and $h_d$ for the front and back face, respectively (discarding tangential rays). Given a point h on the sphere S, we use Keinert’s inverse mapping [7] to find the nearest neighbor in an SF point set P in constant time. Hence, sampling of $\hat{L}$ queries the nearest neighbor of $h_o$ from $\text {P}_o^M$ and $h_d$ from $\text {P}_d^N$, denoted by i and j, and retrieves the sampled value $\hat{L}(i, i)$ from our plenoptic function.

Unfortunately, nearest neighbor sampling yields images that are piece-wise constant and thus, unpleasant in appearance. We modify the query to return up to nine neighbors of a point h instead. We observe that considering five neighbors for each point $h_o$ and $h_d$ leads to sufficient results, for 25 samples of $\hat{L}$ per pixel of the displayed image. Neighbors are weighted by their distance to the original hitpoints via a filter kernel of size $R\root 4 \of {5}\sqrt{\frac{4\pi }{\sqrt{5}N}}$, where N is the size of the SF point set. As the inverse mapping has constant time complexity and the number of samples is fixed, our image generation algorithm has constant time complexity for each pixel. Figure 3 shows representative images obtained with this method.

Generative Adversarial Network-Based Post Processing. Analyzing the higher frequency components of a LumiPath-based image clearly reveals a deterministic pattern of artifacts as shown in Fig. 4. A known method to improve image-based renderings is a view point or parallax correction which takes into account the distance of a hitpoint to the rendered surface [3]; however, this is non-trivial in use cases such as volume rendering, where the depth of hitpoints is ill-defined.

We use non-linear filtering in the form of a generative adversarial network (GAN) to improve image quality. Our network structure is adapted from [5]. We use a 3-layer U-Net as generator. The generative loss is the weighted sum of the Structural Similarity Index (SSIM) and L2 loss to encourage smooth structural and color reconstruction. The discriminator is made of 7 blocks of 2D convolution followed by ReLU and dropout. We modify the discriminator’s last layer to be average pooling, which emphasizes local patterns, as observed in our artifacts. We use the relativistic GAN, which assigns confidence value to whether a sample is fake or real, with mean-squared loss to speed up convergence [6]. The renderings of our dataset were generated from alternating viewpoints with uniformly sampled distances and view angles withing reasonable ranges and facing the object of interest. The renderings were randomly distributed into train (2220 image pairs), validation (204 image pairs) and test set (195 image pairs). Areas that the LumiPath did not cover were masked out in the reference image during training to prevent the network from hallucinating missing image parts and rather concentrate on local artifact patterns.

3 Results

Figure 4 shows representative images that were synthesized using a conventional path tracer and our LumiPath plenoptic function with and without learning-based post-processing. We evaluated our rendering results with respect to the conventionally path traced image (1024 samples per pixel) using quantitative image quality metrics, namely the SSIM and the Complex Wavelet SSIM [13], which are commonly used for image quality assessment. In contrast to the SSIM, the CWSSIM performs a complex wavelet transform of the image to a steerable pyramid (with 8 levels in total) prior to the analysis of contrast, structure and luminance [13]. Therefore, the CWSSIM is especially interesting as image-quality trade-offs for our LumiPath-based renderings are most prominent in higher frequency domains, e.g. along edges and specular highlights (see Fig. 4). Considering ten representative views, we obtain average SSIM values of 0.972/0.975 and a CWSSIM of 0.997/0.998 with/without post-processing post-processing.

The cardinality of the point sets $\text {P}^M_{o}$ and $\text {P}^N_{d}$ were set to $M = 12288$ and $N = 23576$, with points of $\text {P}^M_{o}$ limited to the upper hemisphere. Consequently, the precomputed texture had a size of 576 MB ($0.5\times 12288\times 23576\times 4$ Byte). We evaluated the performance of our (naïve) prototype on the Microsoft Hololens v1. We measured framerates between $\approx $5.2 to 9.1 fps ($\approx $14 to 15 fps in the emulator) for both eyes in total and views similar to the ones shown in Fig. 3. In comparison, rendering one frame of our ground truth as shown in Fig. 4 took about 3 min on a Nvidia GTX 980M. Our path tracing framework is based on the Nvidia Optix Engine.

4 Discussion and Conclusion

We present first steps towards a physically-based rendering pipeline on embedded devices that show promising results. The proposed method achieves $\approx $7.5 fps for both views on a Hololens v1. More work to optimize the code and enable GPU use will further improve performance.

As our prototype is currently designed in two disjoint parts, rendering is limited to static objects. In case of changing conditions, the light field has to be recalculated. Further, our current prototype visualizes surfaces rather than volumes. We will investigate how our approach translates to volume rendering applications that, ultimately, we consider our method most useful for.

While both the rendering and network run on the HoloLens, limited memory restricts them to run sequentially, with network execution not currently real-time capable. This can be mitigated by further code optimization, but as embedded devices become more powerful, their increased memory bandwidth and dedicated tensor processing units will enable more concurrency. We show that using a generative network to perform non-linear filtering, we can remove artifacts from our interpolation method. Our renderings were based on a lightfield that was below 600 MB, which seems appropriate for today’s embedded devices. Additionally, during the quantitative comparison to the pathtraced ground truth, we observe that our network implicitly denoises our rendering, which further enhances the perceived quality.

Future work in post-processing may explore network architectures that directly sample from our plenoptic function to synthesize the desired image and can be trained end-to-end. Such approach would be appealing since the origin and direction can be taken into account during the inference process. Doing so is not possible with the post-rendering correction described here that operates on already interpolated color values. In addition, a network that is aware of our light field structure might be helpful to further reduce the number of texture accesses, which we observed as one of the biggest bottlenecks in our current prototype (texture reads made up $\approx $33.3% of the frametime).

While we currently use machine learning-based post-rendering corrections, other approaches can also be incorporated into our pipeline to improve image quality. A promising approach would be the investigation of non-uniform sampling patterns of the Fibonacci Spheres, e.g. based on specific object properties. However, adaptive sampling of Fibonacci Spheres is complex and requires sophisticated handling of boundaries.

In summary, we understand our results as promising, yet preliminary evidence that our LumiPath algorithm can achieve reasonable real-time physically-based rendering results on untethered compute-limited devices such as OST HMDs. Finally, these developments may prove useful for light field displays which bring dynamic focal lengths to OST HMDs and solve the vergence-accommodation conflict [8].

References

Camahort, E., Lerios, A., Fussell, D.: Uniformly sampled light fields. In: Drettakis, G., Max, N. (eds.) Rendering Techniques 1998, pp. 117–130. Springer, Vienna (1998). https://doi.org/10.1007/978-3-7091-6453-2_11
Chapter Google Scholar
Coretti, M.: Study and implementation of a decomposable virtual skull in 54 anatomically correct elements. Thesis
Google Scholar
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 43–54. ACM, New York (1996)
Google Scholar
Hajek, J., et al.: Closing the calibration loop: an inside-out-tracking paradigm for augmented reality in orthopedic surgery. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 299–306. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_35
Chapter Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 5967–5976. IEEE (2017)
Google Scholar
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734 (2018)
Keinert, B., Innmann, M., Sänger, M., Stamminger, M.: Spherical fibonacci mapping. ACM Trans. Graph. 34(6), 193:1–193:7 (2015)
Article Google Scholar
Kramida, G.: Resolving the vergence-accommodation conflict in head-mounted displays. IEEE Trans. Vis. Comput. Graphics 22(7), 1912–1931 (2015)
Article Google Scholar
Marques, R., Bouville, C., Ribardière, M., Santos, L.P., Bouatouch, K.: Spherical Fibonacci point sets for illumination integrals. Comput, Graph. Forum 32, 134–143 (2013)
Article Google Scholar
McMillan, L., Bishop, G.: Plenoptic modeling: an image-based rendering system. In: SIGGRAPH, pp. 39–46. Citeseer (1995)
Google Scholar
Pharr, M., Jakob, W., Humphreys, G.: Physically Based Rendering: From Theory to Implementation. Morgan Kaufmann, Burlington (2016)
Google Scholar
Rademacher, P.: Ray tracing: graphics for the masses. XRDS 3(4), 3–7 (1997)
Article Google Scholar
Sampat, M.P., Wang, Z., Gupta, S., Bovik, A.C., Markey, M.K.: Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18(11), 2385–2401 (2009)
Article MathSciNet Google Scholar
Zhou, S.K., Engel, K.: Method and system for volume rendering based on 3D image filtering and real-time cinematic rendering. US Patent 9,984,493 (2018)
Google Scholar

Download references

Acknowledgements

The Titan V used for this research was donated by the Nvidia Corporation. The authors would like to thank Benjamin Keinert for helping to understand the inverse Fibonacci mapping and Arian Mehrfard for his help in acquiring screenshots.

Author information

Authors and Affiliations

Johns Hopkins University, Baltimore, USA
Laura Fink, Sing Chun Lee, Jie Ying Wu, Xingtong Liu, Tianyu Song, Yordanka Velikova, Nassir Navab & Mathias Unberath
Friedrich-Alexander University Erlangen-Nuremberg, Erlangen, Germany
Laura Fink & Marc Stamminger

Authors

Laura Fink
View author publications
You can also search for this author in PubMed Google Scholar
Sing Chun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xingtong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tianyu Song
View author publications
You can also search for this author in PubMed Google Scholar
Yordanka Velikova
View author publications
You can also search for this author in PubMed Google Scholar
Marc Stamminger
View author publications
You can also search for this author in PubMed Google Scholar
Nassir Navab
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Unberath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Unberath .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 60444 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fink, L. et al. (2019). LumiPath – Towards Real-Time Physically-Based Rendering on Embedded Devices. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11768. Springer, Cham. https://doi.org/10.1007/978-3-030-32254-0_75

Download citation

DOI: https://doi.org/10.1007/978-3-030-32254-0_75
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32253-3
Online ISBN: 978-3-030-32254-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

LumiPath – Towards Real-Time Physically-Based Rendering on Embedded Devices

Abstract

Similar content being viewed by others

Color Rendering in Medical Extended-Reality Applications

A DirectX-Based DICOM Viewer for Multi-user Surgical Planning in Augmented Reality

OpenDIBR: Open Real-Time Depth-Image-Based renderer of light field videos for VR

Keywords

1 Introduction

2 Method

3 Results

4 Discussion and Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

LumiPath – Towards Real-Time Physically-Based Rendering on Embedded Devices

Abstract

Similar content being viewed by others

Color Rendering in Medical Extended-Reality Applications

A DirectX-Based DICOM Viewer for Multi-user Surgical Planning in Augmented Reality

OpenDIBR: Open Real-Time Depth-Image-Based renderer of light field videos for VR

Keywords

1 Introduction

2 Method

3 Results

4 Discussion and Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation