1 Introduction

Subsurface scattering of light is a physical phenomenon that occurs in translucent materials. Milk, honey, skin, marble, and candle wax are just a few examples of translucent materials. It is possible to produce the qualitative appearance of translucency using interactive volume rendering techniques [32], but such techniques are not quantitatively accurate. With the advent of analytical models for subsurface scattering [26], it became feasible to build more accurate techniques for interactive rendering of translucent objects. The first technique of this kind [33], and more recent ones that also work for deformable objects (see Sect. 2), consider diffuse subsurface scattering only. In practice, this means that subsurface scattering is computed by evaluating an integral over the object surface of an analytic dipole model [26] that only depends on the distance between the points of incidence and emergence. Single scattering and other dependencies of the subsurface scattering on the direction of the incident light are neglected. Recent work in offline rendering however shows that the directional effects are not negligible [10, 14, 20, 50].

We present an interactive technique that supports directional subsurface scattering without relying on precomputation or a grid for volumetric light propagation. To the best of our knowledge, our method is the first of its kind. Since the method does not rely on texture parameterization, it works for deformable and even procedurally generated geometry.

Due to reciprocity of light transport, we would ideally treat the directions of incident and emergent light equally. This is however too costly for an interactive technique. To achieve interactivity, we need caching of subsurface scattering computations. Existing techniques typically cache transmitted irradiance [25, 33] (total incoming light in a surface point) and use a precomputed filter to evaluate the subsurface scattering [5, 29, 33]. These techniques require that the subsurface scattering depends on distance only, whereas we need to use the direction of the incoming light. To cache another quantity, we note that subsurface scattering partly diffuses the light even if the incident light and the scattering are highly directional. Every ray of incoming light gives rise to a (non-diffuse) lobe of emergent light at all surface points. Adding up these lobes, the emergent light is in practice nearly diffuse. We therefore store scattered radiosity (outgoing light) instead of transmitted irradiance. Some of the directional subsurface scattering models also neglect dependency on the direction of emergence but still achieve improved accuracy [14, 20, 50]. Out of these, we can directly use the ones that do not rely on precomputation [14, 20].

In some existing techniques [4, 33, 36, 42], scattered radiosity is stored per vertex. To accommodate more detailed directional effects, we use more detailed maps of the scattered radiosity. We obtain these maps without requiring texture parameterization of the translucent object by rendering the object from multiple views using orthographic cameras. For each of these views, we compute a map of scattered radiosity. We can then efficiently render the translucent object from any view by look-ups into the scattered radiosity maps.

The scattered radiosity maps have two other important advantages. As long as the light source and the object are stationary, we can blend scattered radiosity maps and thereby progressively improve the rendering. Moreover, we can compute the transport of emergent light to the surrounding scene [40, 42]. To include these light paths while keeping the translucent object deformable, we generate a distribution of virtual point lights on the surface of the translucent object and set their intensity according to the scattered radiosity. These virtual point lights enable us to render the transported light using a many-light method [8]. Since we include transport of emergent light, our method is very useful for interactive rendering of scenes with the light source hidden behind a translucent object. Indirect illumination of a scene by light that has scattered through candle wax is one use case (Fig. 1). Another interesting example is light scattering through translucent lamp shades or light bulbs. To the best of our knowledge, we present the first interactive technique for transport of light emerging from deformable translucent objects.

Fig. 1
figure 1

Deforming translucent candle rendered interactively as with existing techniques (left block), with our transport of emergent light (middle block), and including directional subsurface scattering (right block). Our method is the first to support interactive rendering of the results in the right block (6 frames per second). For this scene, we use 28 scattered radiosity maps, 45 samples per direction, and 80 virtual point lights

2 Related work

One way to obtain interactive subsurface scattering is by means of precomputation. Several early techniques rely on precomputed scattering factors that enable subsurface light transport between surface patches or from patch to vertex [4, 23, 24, 33]. These factors resemble form factors in radiosity algorithms and specify transport of transmitted irradiance to scattered radiosity. An extension of these radiosity-like techniques is to include transport of emergent light [42]. Other work is based on precomputed radiance transfer [43, 46, 47, 49], and some of this includes directional effects such as single scattering in the rendered result [43, 47, 49]. Another approach is to precompute a grid that can be used with a fast diffusion computation to render subsurface scattering in real-time [45, 48]. As opposed to our work, all these precomputation-based methods cannot interactively render deformable translucent objects.

Some finite element methods are fast enough to enable interactive rendering of deformable translucent objects [34, 36]. However, as these methods rely on diffuse incoming light (transmitted irradiance) and a multi-resolution mesh (triangular or tetrahedral), they are not easily adapted for directional subsurface scattering and would typically require some mesh preprocessing.

Volume rendering techniques can quite convincingly produce the qualitative appearance of translucency at high frame rates [2, 3, 13, 32]. While such methods are inspired by the volume rendering equation [30], they only provide a rather rough approximation of its solution. In addition, the accuracy of the subsurface scattering is limited by the resolution of the volume or the grid. Some of the more advanced methods [3, 13] also propagate light using low-order spherical harmonics that effectively diffuse the subsurface scattering contribution. Other techniques, which are based on separable filtering and a depth map, also achieve real-time subsurface scattering by aiming at the qualitative appearance and sacrificing quantitative accuracy [18, 19].

Fast filtering techniques can be constructed so that they approximate diffuse subsurface scattering more accurately [5, 12, 21, 29]. The filtering is done in texture space and thus requires texture parametrization of the object surface. To avoid texture space problems, similar filtering techniques are available for light space [9] and screen space [2729, 35]. The performance of all these filtering techniques, however, depends heavily on the assumption that the subsurface scattering is diffuse so that the convolution kernel is only a function of the distance between the points of incidence and emergence. Our work uses light space sampling [9], but removes the assumption that subsurface scattering is diffuse. If we were to remove this assumption from texture or screen space filtering techniques and adapt them for directional subsurface scattering, they would become texture space or screen space variations of the technique that we propose. The former variation would require texture parametrization of the object surface, the latter would be view dependent.

Another interesting approach to interactive rendering of deformable translucent objects is based on splatting [6, 41]. In this approach, surface points seen from the light source are splatted as screen-aligned quads. These splats contribute according to the subsurface scattering model where they overlap surface points in the geometry buffer of the camera. On first inspection, this seems an ideal approach for interactive rendering of directional subsurface scattering. However, the directional model requires larger splats as it varies not only with distance, and it is more expensive to evaluate as tabulation is impractical. We, therefore, found the splatting approach too expensive.

3 Method

We render translucent objects using a bidirectional scattering-surface reflectance distribution function (BSSRDF). In most BSSRDFs, a translucent material is defined by the following spectral optical properties: refractive index \(\eta \), absorption coefficient \(\sigma _a\), scattering coefficient \(\sigma _s\), and asymmetry parameter g. As is common in graphics, we use trichromatic optical properties (rgb). In addition, the BSSRDF depends on the position \({\varvec{x}}_i\) and the direction \(\vec {\omega }_i\) of the incident light as well as the position \({\varvec{x}}_o\) and the direction \(\vec {\omega }_o\) of the emergent light. The configuration is illustrated in Fig. 2. When rendering a translucent object, we obtain the outgoing radiance \(L_o\) by evaluating the following integral over all \({\varvec{x}}_i\) in the surface area A and over all \(\vec {\omega }_i\) in the hemisphere around the surface normal \(\vec {n}_i\) at \({\varvec{x}}_i\) [26]:

$$\begin{aligned} L_o\left( {\varvec{x}}_o,\vec {\omega }_o\right)= & {} L_e\left( {\varvec{x}}_o,\vec {\omega }_o\right) + \int _{A}\int _{2\pi } S\left( {\varvec{x}}_i, \vec {\omega }_i; {\varvec{x}}_o, \vec {\omega }_o\right) \nonumber \\&\times L_i\left( {\varvec{x}}_i,\vec {\omega }_i\right) \cos \theta _i \, {\mathrm {d}}\omega _i \, {\mathrm {d}}{A}_i , \end{aligned}$$
(1)

where \(\cos \theta _i = \vec {\omega }_i\cdot \vec {n}_i\), \(L_i\) is incident radiance, \(L_e\) is emitted radiance, and S is a BSSRDF. Disregarding surface reflection, as this can be incorporated using well-known techniques, the analytical BSSRDF can be written in the form:

$$\begin{aligned}&S\left( {\varvec{x}}_i, \vec {\omega }_i; {\varvec{x}}_o, \vec {\omega }_o\right) \nonumber \\&\quad = F_t\left( \vec {\omega }_o\right) \left( S_d\left( {\varvec{x}}_i, \vec {\omega }_i; {\varvec{x}}_o\right) + S^{*}\right) F_t\left( \vec {\omega }_i\right) , \end{aligned}$$
(2)

where \(F_t\) is Fresnel transmittance, \(S_d\) is the diffusive part, which is typically modeled by a dipole, and \(S^{*}\) (dependencies omitted) is the remaining light transport, that is, the part not included with \(S_d\).

Fig. 2
figure 2

BSSRDF configuration on an object surface A. The diagram illustrates the notation we use: bold font as in \({\varvec{x}}_o\) denotes a point, while arrow overline as in \(\vec {\omega }_i\) denotes a normalized direction vector

As in other interactive subsurface scattering techniques that are not based on precomputation, we now assume that \(S^{*}\) is insignificant. For most BSSRDF models [11, 20, 26], this means that single scattering is excluded entirely. However, if we use the directional dipole model [14], most single scattering is included with \(S_d\). We therefore get a more accurate result with this model as the neglected \(S^{*}\) contains a significantly smaller part of the scattered light.

In existing interactive techniques, it is common practice to move the BSSRDF outside the integration over directions of incidence \(\vec {\omega }_i\) (in Eq. (1)) and define transmitted irradiance by [5, 6, 9, 29, 3336, 41]

$$\begin{aligned} E({\varvec{x}}_i) = \int _{2\pi } L_i\left( {\varvec{x}}_i,\vec {\omega }_i\right) F_t(\vec {\omega }_i) \cos \theta _i \, {\mathrm {d}}\omega _i . \end{aligned}$$
(3)

We would however like to support BSSRDFs that include directional effects [14, 20]. Since such BSSRDFs depend on \(\vec {\omega }_i\), we cannot perform this separation, but we can define scattered radiosity by

$$\begin{aligned} B({\varvec{x}}_o)= & {} \pi \int _{A}\int _{2\pi } S_d({\varvec{x}}_i, \vec {\omega }_i; {\varvec{x}}_o) L_i({\varvec{x}}_i,\vec {\omega }_i) \nonumber \\&\times F_t(\vec {\omega }_i) \cos \theta _i \, {\mathrm {d}}\omega _i \, {\mathrm {d}}{A}_i . \end{aligned}$$
(4)

This is an important quantity as the rendering equation (1) becomes

$$\begin{aligned} L_o({\varvec{x}}_o,\vec {\omega }_o) = L_e({\varvec{x}}_o,\vec {\omega }_o) + \frac{1}{\pi } F_t(\vec {\omega }_o) B({\varvec{x}}_o) , \end{aligned}$$
(5)

which enables view-independent rendering of translucent objects if we store scattered radiosity B. We note that \(L_o\) is not fully view independent because of the Fresnel term \(F_t\), but this is an inexpensive term that we can evaluate per pixel per frame at very little cost.

For simplicity, our initial assumption is of a scene consisting of a single object illuminated by a single directional light. In Sect. 3.3, we extend to point lights, and in Sect. 4, we show an example of using multiple lights. For surface points lit by a directional light with radiance \(L_{\ell }\) and direction \(\vec {\omega }_{\ell }\), we have

$$\begin{aligned} L_i({\varvec{x}}_i,\vec {\omega }_i) = L_{\ell } \, V({\varvec{x}}_i, -\vec {\omega }_{\ell }) \, \delta (\vec {\omega }_i + \vec {\omega }_{\ell }) , \end{aligned}$$
(6)

where V is visibility and \(\delta \) is a Dirac delta function that makes the inner integral disappear, yielding

$$\begin{aligned} B({\varvec{x}}_o) = \pi L_{\ell } \int _{A_{{\mathrm {lit}}}} S_d({\varvec{x}}_i, -\vec {\omega }_{\ell }; {\varvec{x}}_o) F_t(-\vec {\omega }_{\ell }) \cos \theta _{\ell } \, {\mathrm {d}}{A_i} , \end{aligned}$$
(7)

where \(\cos \theta _{\ell } = -\vec {\omega }_{\ell }\cdot \vec {n}_i\) and \(A_{{\mathrm {lit}}}\) is the directly lit area of the surface (for unlit areas \(L_i = V = 0\)). Since we only need to integrate over the directly lit part of the surface area, we perform the integration in a geometry buffer (G-buffer) rendered from the point of view of the light source (a translucent shadow map [9]). Since we have a directional light, our G-buffer is an orthographic projection of the scene into the light’s view plane, which has \(\vec {\omega }_{\ell }\) as its normal.

In order to distribute samples in the G-buffer according to a distance r and an angle \(\alpha \), we assume a planar surface normal to the light direction and rewrite the integral in polar coordinates with origin \({\varvec{x}}_o\):

$$\begin{aligned} B({\varvec{x}}_o)= & {} \pi L_{\ell } \int _{0}^{2\pi }\int _{0}^{\infty } S_d({\varvec{x}}_i, -\vec {\omega }_{\ell }; {\varvec{x}}_o) \nonumber \\&\times F_t(-\vec {\omega }_{\ell }) \cos \theta _{\ell }\, r \, {\mathrm {d}}{r} \, {\mathrm {d}}\alpha , \end{aligned}$$
(8)

where \(r = \Vert {\varvec{x}}_o - {\varvec{x}}_i\Vert \) and \(\alpha \) is the angle between \({\varvec{x}}_o - {\varvec{x}}_i\) and the first basis vector of the light’s view plane. This assumption is clearly often violated, but it is commonly used in derivation of BSSRDF models [11, 26].

We evaluate the integral in Eq. (8) by Monte Carlo integration. Our estimator for scattered radiosity is

$$\begin{aligned} B_N({\varvec{x}}_o) = \frac{\pi L_{\ell }}{N}\sum _{j = 1}^N \frac{S_d\left( {\varvec{x}}_i, -\vec {\omega }_{\ell }; {\varvec{x}}_o\right) F_t(-\vec {\omega }_{\ell }) \cos \theta _{\ell }\, r_j}{p(r_j, \alpha _j)} , \end{aligned}$$
(9)

where \(p(r, \alpha )\) is the joint probability density function from which we draw the sample pairs \((r_j,\alpha _j)\). Starting from \({\varvec{x}}_o\) transformed to the texture space of the light’s camera, each sample pair corresponds to a texture space offset for looking up \({\varvec{x}}_i\) and \(\vec {n}_i\) in the light’s G-buffer.

3.1 Sampling distribution

BSSRDFs decay exponentially with the distance \(r = \Vert {\varvec{x}}_o - {\varvec{x}}_i\Vert \). In particular, the asymptotic exponential falloff of the standard and directional dipoles [14, 26] is \(\exp (-\sigma _{{\mathrm {tr}}} d)\), where \(d \rightarrow r\) for \(r \rightarrow \infty \) and \(\sigma _{{\mathrm {tr}}}\) is the effective transport coefficient defined by

$$\begin{aligned} \sigma _{{\mathrm {tr}}} = \sqrt{3 \sigma _a \left( \sigma _a + (1-g) \sigma _s\right) } . \end{aligned}$$
(10)

It is therefore highly beneficial to importance sample according to this exponential decay. We do importance sampling by choosing

$$\begin{aligned} p_{\exp }(r,\alpha ) = p(r) p(\alpha ) = \sigma _{{\mathrm {tr}}} e^{-\sigma _{{\mathrm {tr}}} r} \frac{1}{2 \pi } , \end{aligned}$$
(11)

which is easily sampled by

$$\begin{aligned} (r_j, \alpha _j) = \left( \frac{-\log \xi _1}{\sigma _{{\mathrm {tr}}}}, 2\pi \xi _2\right) . \end{aligned}$$
(12)

The symbols \(\xi _1,\xi _2 \in [0,1]\) denote canonical uniform random variables, which we obtain on the fly using a linear congruential pseudorandom number generator.

It is important to note that the effective transport coefficient \(\sigma _{{\mathrm {tr}}}\) is different for different color bands. As a consequence, we use a separate set of position samples for each color band. In this way, we avoid color shifts, especially for materials with very different scattering coefficients in the different color bands (ketchup, for example).

3.2 Rendering technique

The diffusive part of the standard dipole BSSRDF depends only on \(r = \Vert {\varvec{x}}_o-{\varvec{x}}_i\Vert \) and is, therefore, easily tabulated and used at runtime at nearly no expense. In directional subsurface scattering, on the other hand, the diffusive part of the BSSRDF depends on both \({\varvec{x}}_o\), \(\vec {n}_o\), \({\varvec{x}}_i\), \(\vec {n}_i\), and \(\vec {\omega }_i\). This means that it is impractical to tabulate it and thus expensive to evaluate it. To limit the number of times that we need to evaluate the BSSRDF at runtime, we chose to exploit the opportunity to have view-independence by storing scattered radiosity in maps. In fact, as we noted in Eq. (5), the scattered radiosity does not depend on the view direction \(\vec {\omega }_o\). With view-independence, it is convenient to also make the update of the scattered radiosity maps progressive. By doing so, the rendered result improves over time if we are only moving the camera. Our technique is easily made progressive by adding more samples for each frame. This means that we have two render modes: (a) converged translucency with real-time fly-through and (b) fully flexible translucency rendered at interactive frame rates.

Our rendering technique is based on the rasterization pipeline of the graphics processing unit (GPU). In fully flexible mode, we use the three-step multipass algorithm illustrated in Fig. 3. In the first step, we create a G-buffer for each light source. In the second step, we compute scattered radiosity maps using these light G-buffers. In the third step, we sample the scattered radiosity maps and combine the look-ups. If nothing changed except the camera position, we also accumulate radiosity map results with the ones from the previous frames. When convergence is reached, we switch to converged mode and perform the third step only. In the following, we provide the details of the three steps.

Fig. 3
figure 3

Our three-step multipass technique for interactive rendering of directional subsurface scattering in deformable translucent objects. The scattered radiosity maps enable view-independence and transport of emergent light

In the first step, as in translucent shadow mapping [9], we render a G-buffer from the point of view of the light. For each pixel, we store positions and normals, as well as a material index (for global illumination purposes, Sect. 3.4). Each directional light has an orthographic camera and an associated G-buffer stored in a layered 2D texture. We compute all the light G-buffers in a single rendering pass, where each triangle is fed to each layer of a 2D layered texture in a geometry shader.

In the second step, we render the translucent object from K directions using orthographic cameras. The number of directions is chosen so that the surface of the model is covered well. We place the cameras randomly on the bounding sphere of the object using a quasi-random Halton sequence [22]. We then configure the cameras to look at the center of the bounding sphere with a frustum that encapsulates the sphere. Also in this step, we use layered rendering in order to efficiently render scattered radiosity into the different maps in a single pass. For each fragment of the translucent object observed by an orthographic camera, we compute the scattered radiosity by generating N samples per color band on-the-go (Eq. (12)), looking up into the light G-buffers with those samples to get \({\varvec{x}}_i\) and \(\vec {n}_i\), and using those to evaluate Eq. (9). To avoid pattern repetition artifacts, we choose a seed for the random points using the pixel index in the scattered radiosity map as well as the current map and frame numbers.

To progressively update the scattered radiosity maps, we first perform a depth-only pass and then we render the model with writing into the depth buffer disabled. During the second step of the algorithm (except when the light condition is changing or the object is deforming), blending is enabled to allow accumulation in the scattered radiosity maps. We also generate mipmaps for the scattered radiosity maps so that we have the opportunity to apply a cheap high-pass filter that smoothes high-frequency noise.

figure e

In the third and final pass, we sample the scattered radiosity maps for each fragment of the translucent object observed by the actual camera. This process is described in the pseudo-code in Algorithm 1. We average the contributions from the various directions with the visibility of the point as a binary weight. In the third step of Fig. 3, the green and the red dots represent the visible and not visible contributions from the point \({\varvec{x}}_o\), respectively. Storing depth with the scattered radiosity maps, we use shadow mapping to obtain a visibility function. To avoid artifacts, we choose a constant shadow bias \(\epsilon _{bias}\) for the visibility function. Moreover, to avoid errors when sampling close to the borders of a scattered radiosity map, we multi-sample the shadow map and introduce an additional bias \(\epsilon _{comb}\) that translates the sample position towards the negative normal direction \(-\vec {n}_o\). After composition of the scattered radiosity B, we obtain outgoing radiance from Eq. (5) and perform tone mapping to finalize the result.

Considering the procedure described in this section, we can get a better understanding of the parameter N. The total number of Monte Carlo samples used for computing the outgoing radiance (\(L_o\)) in a surface point observed by the camera is 3N times K times the number of frames used for progressive updates. From the point of view of a surface point, N can thus be thought of as the number of samples per frame per map direction per color band.

3.3 Point lighting

A point light at some distance from the translucent object works much in the same way as a directional light. The light’s camera simply uses perspective instead of orthographic projection and intensity falls off with the distance squared. One particularly important application of our work is however simulation of the light coming through candles, candleholders, and lamp shades (Fig. 1, for example). In these cases, the point light is surrounded by the translucent object and we then use omnidirectional shadow mapping [15] with a cube map G-buffer for the light.

With a cube map captured for a point light at \({\varvec{x}}_{\ell }\), one would first get a sampled point \({\varvec{x}}_j\) by using \((r_j, \alpha _j)\) to offset \({\varvec{x}}_o\) in its tangent plane. A look-up into the cube map with \({\varvec{x}}_j - {\varvec{x}}_{\ell }\) would then provide the sampled \({\varvec{x}}_i\) and \(\vec {n}_i\). However, when observing a translucent object surrounding the light source, this planar sampling of the light’s G-buffer is no longer a good approximation. To have a better approximation that enables sampling of the entire cube map for each \({\varvec{x}}_o\) (instead of only a hemisphere), we use an inverse stereoscopic projection. With this stereoscopic correction, the direction used for look-up into the cube map becomes

$$\begin{aligned} {\varvec{x}}_{{\mathrm {stereo}}} - {\varvec{x}}_{\ell } = ({\varvec{x}}_{\ell } - {\varvec{x}}_o) - 2[({\varvec{x}}_{\ell } - {\varvec{x}}_o)\cdot \vec {\ell }\,]\vec {\ell } , \end{aligned}$$
(13)

where

$$\begin{aligned} \vec {\ell } = \frac{({\varvec{x}}_j - {\varvec{x}}_{\ell }) - ({\varvec{x}}_{\ell } - {\varvec{x}}_o)}{\Vert ({\varvec{x}}_j - {\varvec{x}}_{\ell }) - ({\varvec{x}}_{\ell } - {\varvec{x}}_o)\Vert } , \end{aligned}$$
(14)

as illustrated in Fig. 4. The top right image (a) in Fig. 4 is an example of the sampling noise we get if we use \({\varvec{x}}_j - {\varvec{x}}_{\ell }\). The middle right image (b) shows how the stereoscopic correction betters this problem.

Fig. 4
figure 4

Effect of stereographic correction when a translucent object surrounds a point light. With planar sampling (a), we look-up into the light’s cube map G-buffer using \({\varvec{x}}_j - {\varvec{x}}_{\ell }\). With stereographic correction (b), we use \({\varvec{x}}_{{\mathrm {stereo}}} - {\varvec{x}}_{\ell }\) instead. The insets (a, b) show how the correction improves the final result (torus, potato material)

3.4 Transport of emergent light

We further extend our method to account for transport of emergent light using virtual point lights (VPLs) [31]. We distribute a set of \(N_{{\mathrm {vpl}}}\) points on the surface of the translucent object. Then, for each observed point \({\varvec{x}}_o\), we add the contribution from all VPLs using

$$\begin{aligned} L_o({\varvec{x}}_o, \vec {\omega }_o)= & {} \sum _{v = 1}^{N_{{\mathrm {vpl}}}} f_r({\varvec{x}}_o, -\vec {\omega }_v, \vec {\omega }_o) \nonumber \\&\times G_b({\varvec{x}}_o, {\varvec{x}}_v) V({\varvec{x}}_o,{\varvec{x}}_v) I_v \end{aligned}$$
(15)

with VPL intensity

$$\begin{aligned} I_v= & {} \frac{1}{\pi }F_t(\omega _v)B({\varvec{x}}_v) A/N_{{\mathrm {vpl}}} , \end{aligned}$$
(16)

where A is the surface area across which the VPLs were distributed, \(G_b\) is the standard bounded geometry term [8], and B is obtained from the scattered radiosity maps using Algorithm 1.

As in the previous section, we now take special steps to accommodate our key use case of a point light surrounded by a translucent material. Our approach is illustrated in Fig. 5. In this particular case, the scene illuminated by emergent light will most commonly be shadowed from surface points of the translucent object that are directly lit (as the source is surrounded). We, therefore, approximate the visibility term V by distributing VPLs on backlit surfaces only. With this distribution of VPLs, we use the area of the bounding volume of the translucent object as an approximation of A. This is computed for each frame on the CPU.

Fig. 5
figure 5

Transport of emergent light from a translucent object (blue) to a diffuse object (red). We distribute VPLs (gray dots) on the outer surface of the translucent object, and use them to indirectly illuminate the remaining scene

Unfortunately, for a deformable object and a relatively small set of VPLs, the method is prone to flickering unless we ensure that the VPL positions are stable over time. Our solution is to render the outermost surface of the translucent object to a cube map whose center \({\varvec{c}}\) coincides with the object’s bounding box center. Each pixel in the cube map now contains the coordinates of a point on the surface of the translucent object. By sampling the cube map at a constant set of random directions, we obtain a stable set of surface positions that we use as VPL locations (Fig. 5).

4 Results

The implementation of our method interactively renders directional subsurface scattering in deformable objects and requires no preprocessing nor texture parameterization of the object surface. We use the diffusive part of the directional dipole [14] as \(S_d\) or the photon beam diffusion model [20] when evaluating Eq. (9). The directional dipole is significantly faster, so we use this one unless noted otherwise. We define the translucent objects in our scenes using measured optical properties from different sources [17, 26, 37].

To validate our results, we compare with Monte Carlo ray tracing implemented on the GPU using OptiX [38]. In this reference method, we render directional subsurface scattering using the progressive direct Monte Carlo integration technique described by Frisvad et al. [14]. As prescribed, we use a Russian roulette based on the asymptotic exponential falloff of the model to accept or reject samples. However, we do not equidistribute the samples using a dart throwing technique as a more brute force uniform sampling of the object surface is more well-suited for a GPU ray tracer. This implementation gave us a ground truth for comparison both in terms of quality and performance. However, when comparing performance, one should keep in mind that unlike our method the reference method is view dependent.

In all the following examples, performance is at interactive rates. If nothing changes except the camera, our method will converge over a number of frames and then run in real-time. The implementation switches back to interactive rates when something other than the camera changes. By ‘interactive’ we mean a rendering time below 166 ms per frame (6 frames per second, fps), as specified by Akenine-Möller et al. [1]. All the tests were performed on an NVIDIA GeForce GTX 780 Ti graphics card. Unless otherwise indicated, our results use a \(512\times {512}\) frame resolution for both radiosity and light maps.

Figure 6 allows a visual comparison with ground truth (results obtained with the reference method). We chose one highly scattering material with isotropic phase function (\(g = 0\)), namely marble, and two forward scattering materials (\(g > 0\)), namely white grapefruit juice and strawberry shampoo. At convergence (second row), our method compares favorably to the directional dipole reference (third row). Our method improves the details of the subsurface scattering when compared with diffuse subsurface scattering, that is, the standard dipole [26] (fourth row), especially for white grapefruit juice and strawberry shampoo. We also show the results of our method after one frame rendered at interactive frame rates (first row). These results are similar to our converged solution except that there is a slight bit of sampling noise, which we reduce using mipmap filtering.

Fig. 6
figure 6

Comparison of our method (rows 1, 2) with the reference method (row 3) and diffuse subsurface scattering (row 4) for different materials. Row 1 is our results for a single frame at 6 fps, while row 2 is our view-independent result after convergence. All results use 31 maps

Figure 7 compares the transport of emergent light obtained with our method to that obtained with the reference method. While the 200 VPLs used here do not provide a highly accurate result, they do provide something better than a constant ambient term. At 6 fps, our solution is similar to the reference and converges very quickly to a better result, while the OptiX solution has both high-frequency and low-frequency noise, is view dependent, and converges very slowly.

Fig. 7
figure 7

Equal time comparison (left column) of our method with the reference method and qualitative comparison with diffuse subsurface scattering (upper right) and the converged reference solution (lower right). The scene is lit by a point light in a white grapefruit candle holder

Figure 8 provides zoom-ins and difference images from Figs. 6 and 7. Our results in general seem to be missing a part of the light transport. As revealed by the difference images, the missing contribution is due to undersampling of the surface at grazing incidence and missing interreflections. This undersampling is the reason why Mertens et al. [35] chose to sample in screen space instead of light space. However, sampling in screen space has other problems, as not all samples are lit. When considering transport of emergent light, the zoom-ins and difference images show missing shadows and inaccuracies due to the small number of VPLs. However, as graphics hardware improves, we will be able to use more VPLs and one of the several fast VPL visibility techniques [8] to get better accuracy while retaining interactive frame rates.

Fig. 8
figure 8

Zoom-ins and differences from Figs. 6 and 7. Root-mean-squared error of the color bands \(\sqrt{{\varDelta }r^2 + {\varDelta }g^2 + {\varDelta }b^2}\) is used as error metric in the difference images

In Fig. 9, we compare the quality reached by our solution with the quality reached by the ray traced solution in equal time. We perform this comparison for a marble bunny at three different scales. Generally, our method has a uniform behavior for different scales. For materials that are not optically thin (not at low scale), our method converges faster. The highly scattering materials (mid and high scale) are the more important cases to render well, as these are inside the range of materials for which the analytic subsurface scattering models are valid. At high scales, scattering effects become more localized; so our method is better at capturing the effect than the ray traced solution. At low scales, fewer G-buffer samples hit the object, which leads to a more noisy result with our solution.

Fig. 9
figure 9

Stanford bunny with marble material at different scales (from top to bottom the scale is 0.01, 0.1, and 1 m). The left and middle columns show equal time results for our method and the ray tracer (1 frame at 6 fps). The right column shows the ray traced results after convergence. Here we use 16 maps and a \(1024\times 1024\) light map

In order to test the method using dynamically generated geometry, we created an implicit 3D surface [44] as the sum, \(\varPhi _t = \varSigma _i \phi _{i,t}(\vec {p})\), of four blobs,

$$\begin{aligned} \phi _{i,t}(\vec {p}) = \exp (-\sigma \Vert \vec {p} - \vec {p}_i(t)\Vert ^2) , \end{aligned}$$

where the position of each blob, \(\vec {p}_i(t)\), is a periodic function. Since the periods are different, the period of the aggregate implicit \(\varPhi _t\) is potentially very large, and precomputation of the light transport inside the object would not be practical. Our method, however, applies, as it does not rely on precomputation, but we do need to rasterize the object. To do this, we compute a triangle mesh for an isosurface of \(\varPhi _t\) using dual contouring [16] implemented in a geometry shader. This is done in a pre-pass to each frame where the geometry shader evaluates \(\varPhi _t\) and its gradient directly based on the current time. The output triangle strips are streamed back to a vertex buffer object using transform feedback. Figure 10 presents a rendered blob using different materials.

Fig. 10
figure 10

Rendering with our method and a dynamically generated 3D surface (‘blob’) and transport of emergent light for three materials. The blob renders at 6 fps with 50 VPLs and 1500 samples per map in six maps. Materials from left to right: white grapefruit juice, soy milk, and glycerine soap

To justify our claimed need for scattered radiosity maps, we compare our method with an implementation without caching of subsurface scattering computations (as in translucent shadow mapping [9]). Note that this approach as opposed to ours is view dependent and pixel bound, and that unobserved VPLs would be more expensive to evaluate. Figure 11 compares performance without considering view dependency and VPLs. Caching of scattered radiosity in maps is more efficient as soon as the translucent object occupies more than 5.8 % of a \(1024\times 1024\) image.

Fig. 11
figure 11

Chocolate milk blob occupying different percentages of the image (noted at the top). We compare our method (ours) with a view-dependent, caching-free implementation (no scattered radiosity maps). We use 1000 samples per map in ten maps when caching, per pixel when not caching. Equal frame rates (12 fps) occur when occupancy is 5.8 % of the image

The candle scene in Fig. 1 demonstrates the usefulness of our method. We scaled the optical properties of glycerine soap to approximate the scattering properties of candle wax. Our method creates a soft ‘caustic’ on the ground with varying intensity depending on the shape of the candle model. We, thus, enable a more realistic lighting of the scene than is obtainable with existing interactive techniques.

To provide a performance breakdown of our technique, Fig. 12 lists render times dedicated to the different steps of our algorithm in our various results. BSSRDF evaluation (step 2 of Fig. 3) dominates all the timings, with the exception of Figs. 1 and 7, where the transport of emergent light dominates. Figure 13 provides timings and coverage improvement of a bunny rendering with increasing K. The first seven directions cover most of the surface, while the remaining directions are necessary to cover small holes in the shading.

Fig. 12
figure 12

Timing breakdowns for some of our renderings. Initialization times were negligible and were thus included with step 1 of Fig. 3. The evaluation of the BSSRDF and the VPLs (when present) dominate the rendering times

Fig. 13
figure 13

Converged renderings of a potato bunny (\(N = 30\)) and timings for increasing number of scattered radiosity maps K. We list K followed by rendering time in milliseconds (ms) for each result. The first seven maps cover most of the surface, while the following four cover small details (the small area just to the left of the bunny’s hind leg, for example)

To underline the versatility of our approach, Fig. 14 has a set of results rendered using the photon beam diffusion model [20]. The weak singularities in this model lead to fireflies (overly bright pixels) with our sampling approach. We avoid this problem by clamping the distance \(d_r\) to a minimum of \(0.25/(\sigma _a + \sigma _t)\) when it is used in a denominator. Factors that photon beam diffusion is slower than the directional dipole are included in the figure. These factors double if we use a graphics card with 512 cores (GTX 580) instead of 2880 cores.

Fig. 14
figure 14

Converged results with scenes and parameters as in other figures, but this time rendered using the photon beam diffusion model [20]. For each rendering, we provide the factor that this model is slower than if we use the directional dipole

Finally, Fig. 15 presents results with multiple directional lights. To approximate an environment light, we sample a number of representative directional light sources from the environment map using the method described by Pharr and Humphreys [39]. Contributions from all the directional lights are cached in the same scattered radiosity maps. In this example, we add specularly reflected light by looking up into the environment map using the direction of the reflected ray and multiplying by Fresnel reflectance.

Fig. 15
figure 15

Stanford Bunny illuminated by an environment map. The map was importance sampled and converted to eight different directional lights. Potato material, 16 maps

5 Discussion

The resolution of a light’s G-buffer (a light map) should be chosen carefully. If the range of the scattering effects (roughly \(1/\sigma _{{\mathrm {tr}}}\)) is smaller than the size of one pixel in the light map, the contributions from the directional dipole tend to cluster and form ‘pearling’ artifacts. A possible solution would be a variation of cascaded shadow maps [51] to provide a higher resolution light map when needed. Generally, a light map of \(512\times 512\) pixels is an acceptable size that can be brought to \(1024\times 1024\) in problematic cases.

User parameters of our method include the resolutions of the light map and the scattered radiosity maps, the two biases \(\epsilon _{comb}\) and \(\epsilon _{bias}\), the number of samples N, the number of scattered radiosity maps K, and the number of VPLs \(N_{{\mathrm {vpl}}}\). We now provide some guidelines for setting parameters. The size of the light map was already discussed in the previous paragraph. For the scattered radiosity maps, a size of \(512\times 512\) is fine for most application, and \(K = 16\) directions generally provide enough coverage for simple models (the dragon, with its complicated geometry, required \(K = 31\) directions). Performance scales linearly with K (Fig. 13), as we spend most of the time evaluating the BSSRDF (Fig. 12). The two biases \(\epsilon _{comb}\) and \(\epsilon _{bias}\) need to be tweaked manually. The numbers N and \(N_{{\mathrm {vpl}}}\) are usually set manually to get the desired performance once the other parameters have been settled.

For most of our results, we choose the directions \(\vec {d}_k\) of the scattered radiosity maps automatically. This works well for objects that are roughly convex, but for more oddly shaped concave objects some part may be left uncovered. Tearing artifacts caused by insufficient coverage appear in the mouth of the dragon in Fig. 6 and in the supplementary video. Figure 13 also illustrates the problem, and shows that increasing the number of directions or manually choosing them can often ease this problem.

The memory consumption of our technique is comparable to that of the texture space filtering techniques [5, 12, 21, 29]. As such, the maps and buffers that we use easily fit in the memory of modern GPUs. We surprisingly use more memory than the volumetric techniques [2, 3, 13, 32]. The reason is that they make do with very low resolution volumes (\(32^3\) or \(64^3\)). It is, however, important to note that the added directionality and quality of details that we achieve cannot be achieved with such low resolution volumes. High-resolution volumes would be needed with these techniques, which would lead to performance and memory issues.

Since we cache scattered radiosity, we cannot directly use a BSSRDF that fully depends on the direction of emergence \(\vec {\omega }_o\) (the dual-beam model [10], for example). For such a BSSRDF, we would have to rely on the assumption that the emergent radiance integrates to a nearly diffuse distribution. We would then carry out a cosine-weighted integral over \(\vec {\omega }_o\) when computing the scattered radiosity maps and otherwise use the same method. On the other hand, our concept of caching scattered radiosity instead of transmitted irradiance might be of interest in offline rendering techniques such as multiresolution radiosity caching [7]. This would enable use of directional subsurface scattering and inexpensive transport of emergent light in a movie production rendering solution.

6 Conclusion

We have presented a novel technique for interactive rendering of directional subsurface scattering. The method is view independent and applicable to deformable 3D models without requiring a texture parameterization of the object surface. While our method takes the direction of incident light into account, it also relies on the assumption that emergent light is not directional. This enables us to cache emergent light in so-called scattered radiosity maps. These maps enable us to control the output quality, to render progressively, and to illuminate the scene with light that has scattered through a translucent object.