1 Introduction

The generation of reflections is usually part of a rendering engine, which is responsible for creating realistic 3D scenes in computer graphics applications [11]. Every realistic scene contains a robust description of the geometry, viewpoint, texture, reflection, lighting, and shading of its objects. The data describing the scene are sent to a rendering program, in charge of processing and displaying these data as an image.

Usually, a very common technique is used on a step before generating the output in the rendering pipeline, that is the rasterization or scan conversion [21]. Currently, it runs almost entirely on hardware level. It also takes into account ordering of objects and conversion of their coordinates into the camera coordinates. On the other hand, a powerful method for rendering 3D graphics with very complex light interactions is ray tracing [19].

Recently, with the advance of hardware and parallel processing, some dedicated libraries have been created to assist programmers in developing graphical applications with ray tracing, such as Nvidia’s OptiX [18].

Visual effects, such as reflections, are extremely important in graphical applications to represent objects at runtime. Game engines, such as the Unreal Engine [8], include those effects for static and dynamic objects. However, these engines typically use a combination of the SSR technique for the calculation of reflections for dynamic objects and cube maps (created from sphere reflection capture) for static objects. This approach is not very accurate, since the SSR works only in screen space, thus, failing in its attempt to calculate the reflection of objects located outside the camera’s field of view or of those that are being occluded by other objects.

The reflections generated using cube maps are not very realistic, since the synthesized reflection on every scene object comes from the same point of origin. Another problem is the occurrence of lighting seams between adjacent objects when using different cube maps. Besides, for dynamic objects, it is necessary to track which cube map affect them. As dynamic objects can swap their cube maps, popping artifacts are generated. These effects can be reduced, for example, by blending the K nearest cube map of the dynamic object. However, this is rather an expensive solution for representing reflections on moving objects at runtime.

The work presented in this paper extends our previous work [1] by dealing with newly implemented features in our hybrid algorithm, which currently also includes not only rendering of rigid objects in realistic static scenes, but also in dynamic ones. In addition, we have also developed a new solution for multiple recursive rays, specially useful to represent self reflections in curved surfaces at interactive rates. We have also conducted new performance and qualitative tests using a more precise metric. Our solution combines rasterization with pure GPU-ray tracing. To take advantage of the main benefits that both techniques offer and bypass their limitations, our algorithm initially generates both the reflections using the SSR and a mask map of the areas with the scene points in which the SSR failed to calculate the reflections (a well-known problem of the SSR, since it only processes what is visible in the image). The ray tracing then uses the SSR mask map to calculate the secondary rays for the reflections and to merge the SSR reflection with the additional ray-traced reflections. Finally, all the results generated on each step are combined with shadows and light calculations for the final composition of realistic 3D scenes with multiple moving objects and real-time dynamic reflections and shadows.

2 Related work

Several visual effects, such as reflections, are quite important in generating both static and dynamic realistic 3D scenes. Many studies have been published on reflection for static scenes using ray tracing in real time. In this work, we focus on real-time dynamic reflections for realistic rendering of 3D scenes.

2.1 Reflection techniques

One of the most common and traditional technique to simulate reflections is to position and cover the specular object in the center of a cube map [10]. Normally, a cube map is not generated for each object rendered in a scene of a real-time graphical application due to its high cost. Thus, an alternative solution is to add points of reflection capture, which are responsible for creating cube maps of the scene from that point. The objects that are reflecting light close to the cube map will use this information to its calculation, but only static objects are contained in the cube map. This approach creates interesting visual results, however, physically incorrect, since it is only an approximation. The physically correct solution would be to generate a reflection, exactly from the point of the surface being rendered at that time, as done by ray-tracing algorithms [26].

There are other newer techniques that use the screen space to synthesize reflections, such as Real-Time Local Reflections or SSR [16], which simulate the behavior of light as calculated in the ray tracing, but in screen space. To this end, it uses the ray marching technique to find out the intersections using the depth buffer by taking steps along the ray. The algorithm works as a final image post-processing effect. This means that the number of polygons (or scene complexity) does not affect the information processing. An interesting SSR approach to reduce missing geometries, one that uses two depth layers, was presented by Mara et al. [15].

2.2 GPU-ray tracing

Currently, the emergence of GPUs has also strongly motivated various researchers to explore new ideas and implement ray tracing to produce effects, e.g., reflections, in the context of digital games [3, 20]. However, the major challenge is still to perform the entire processing with dynamic scenes in real time [24]. Despite all these technology advancements in graphics cards and while dynamic reflections and shadows in real-time promise to make computer graphics applications more realistic, the development of algorithms that guarantee interactive frame rates per second, even running on GPUs, is still a complex and challenging task.

Carr et al. proposed the first ray tracing algorithm on GPU [6]. It runs only on GPU the calculation of the intersection between rays and triangles. Subsequently, Purcell et al. presented a solution for the generation of rays, traversal, intersection between rays and triangles, shading and creation of secondary rays, and all running on separate GPU kernels [22].

Some techniques also use pre-computed maps to assist and accelerate the processing of rendering scenes with ray tracing, or cache data between frames [4, 17]. However, most of the techniques proposed so far do not tackle the problem of generating realistic visual effects on dynamic scenes effectively.

2.3 Hybrid solutions

Hybrid solutions have also been presented by combining rasterization techniques and ray tracing to generate a final scene. In [13], a solution that supports only static objects at runtime is described. The reflections are simulated using an approximation from a combination of the screen-space reflections with the use of a predefined bounding box which contains the buffers associated with its faces to represent the scene. It is used as a backup feature to fix parts of the scene, where the screen-space method for adding reflections failed.

Widmer et al. [27] propose an efficient hybrid acceleration structure capable of generating real-time screen-space ray-traced reflections. Cabeleira [5] has also described a solution in which all the global illumination effects are done by ray tracing. It was implemented in CPU and GPU, and the result is merged with the rasterization process; however, it only supports static scenes. In addition, a hybrid solution which uses a simple heuristic to ignore irrelevant static objects for the ray tracing phase is presented in [2], in which only the information about the primary rays is stored in the G-buffer. Besides, the reflections and shadows are generated using pure ray-tracing, being too demanding when dealing with complex environments. Our work focuses on dynamic scenes, taking into consideration all the scene objects to generate a more realistic result.

Other recent and competitive solutions have also been proposed. For example, Ganestam and Doggett [9] use ray tracing for calculating the reflections near to the camera and rasterization for objects far from it using a cube map of G-buffers. However, the reflections located far from the camera miss some of the occluded objects in the scene, since the rasterization process uses only the G-buffer cube map data. Our approach does not have this problem, since it considers all the objects in a scene, including occluded ones. Whenever the SSR detects that is not possible to reflect a specific pixel, the ray tracing will compute the reflection.

3 Dynamic reflections in real time

In this section, we present a novel, simple, and effective approach that we have developed for generating dynamic reflections at runtime. Our algorithm combines rasterization with pure GPU-ray tracing through deferred rendering pipeline.

First, the 3D scene information to be rendered is loaded. Then, we create the G-buffer information (diffuse color, worldspace, normal, Z-buffer, and reflection), i.e., the maps containing the geometric information of the scene in screen coordinates. The reflection calculation consists of two steps: (1) the classic SSR algorithm [16] and (2) a pure ray tracer.

In the first step, our algorithm gets a reflection map from the G-buffer indicating which pixels represent objects that have reflections and uses this information to generate reflections with the SSR. To take advantage of the main benefits that both techniques offer and bypass their limitations, we initially generate both the reflections using the SSR and a mask map of the areas with the scene points in which the SSR failed to calculate the reflections (a well-known problem of the SSR, since it only processes what is visible in the image).

More specifically, our SSR algorithm works as follows. Initially, we traverse the input image over the G-buffer, which has stored the information about the pixels that generate reflections. The ray marching process begins when we find a pixel that reflects (i.e., with value 1) and consists in traversing the scene through a ray marching in the Z-buffer. In this regard, the algorithm captures the pixel position in the 3D scene (an information also contained in the G-buffer image), and from there, we reflect the camera-eye ray until that point to find out the marching direction. Our ray marching starts with this direction vector. For each marching step, we calculate its Z-buffer value and compare it with the scene’s Z-buffer value. For a given pixel, if the scene’s Z-buffer value is smaller than the one from ray marching’s Z-buffer and the distance between them is greater than a threshold, we mark that pixel to be ray traced, since its capture probably will not be the right one due to its distance.

If the distance value between the Z-buffers is smaller than the threshold value, we perform new marching backwards into even smaller steps, until we find an intersection point between the Z-buffers again. Once found this point, before capturing its color and applying it to the reflection, we verify if the dot-product value of the current point’s normal with the ray direction is greater than zero. If so, the point is facing away from the ray, and thus, we also mark it to be ray traced using ray tracing. For the pixels that need a new ray step, for example, in cases of reflective surfaces, we mark the pixels to be ray traced, too. In addition, whenever the ray marching goes beyond the image boundaries, we choose ray tracing. In particular, the threshold is directly related to the precision of the reflection. It is often sacrificed to improve the performance, by restricting the number of samples along the ray (to reduce texture lookups) which, in turn, can miss geometry.

To test how different threshold values (which are relative to the scene size), T, influence the resulting images and evaluate the error, we used three different T values applied to the Sponza scene [7], with some other mesh objects added to it. We run the SSR algorithm solely for the T values of 0.005, 0.025, and 0.065, as shown in Fig. 1, which were set manually to detect the failure of a screen-space ray and to determine a value capable of producing more realistic visual results.

Fig. 1
figure 1

ac Variations in error threshold, T, for the values 0.005, 0.025, and 0.065, respectively

Depending on the size of the scene, this threshold value may change, which means that there is a need for future work towards better estimating this value to make thresholding automated. We can see in Fig. 1 that from the threshold value of 0.025, the SSR algorithm begins to generate visual artifacts that prolong the reflection of objects, creating a repeating pattern in the images (Fig. 1c). On the other side, threshold values smaller than 0.025 produce a striped effect in the reflection (Fig. 1a). Therefore, in our implementation, we chose the 0.025 value to detect the difference of ray position to Z-buffer, since it was the threshold limit value that produced the best visual results among the values we have tested with the Sponza scene.

Following that, we start the second step, in which the 3D scene geometry, the SSR mask map, and the G-buffer information are passed to calculate the secondary rays for the reflections (data on the primary ones are already available in the G-buffer) and to merge the SSR reflection with the additional ray-traced reflections. The ray tracer verifies each pixel of the mask generated during the first step, searching for pixels that were not correctly calculated. If any pixel satisfying this situation is found, a reflected ray from the viewer’s eye is calculated and ray traced, resulting in a new complementary reflection image. As a result, the reflection of objects that lie outside the screen-space or that are occluded by any other objects can be processed.

We remark that for aiming at processing acceleration and inspired by [24], our ray tracer distinguishes between the static objects and dynamic ones, which are contained in a bounding volume hierarchy (BVH) data structure, and divides them into two separated trees, where each tree is traversed separately and the nearest intersection point is used, discarding the need of rebuilding the whole data structure of the scene on every frame. The same geometric transformations applied to update the dynamic objects during the rasterization process are also applied in the ray tracing to maintain the visual coherence between these two algorithms.

Our engine generates a combined image from the reflection images calculated by raytracing and SSR. We apply a Gaussian Blur filter [23] to the combined image to soften some of the artifacts (but not all of them) generated by the SSR, especially at the transition regions between screen-space and object-space reflections, as shown on the right side of Fig. 2.

Fig. 2
figure 2

Left Hybrid image and visual artifacts (highlighted in a circle) that may appear due to the merging process of the SSR and ray tracing. Right Artifacts were reduced through applying Gaussian blur

We also apply, in both steps, an attenuation to the reflection based on the length of the reflection ray from the reflected surface (making it fade out along the reflection) for generating a polished visual result and to obtain faster performance making the rays end earlier in a few cases. The ray marching process from SSR is attractive, as no additional data structure needs to be built, but for long distances, it can become expensive. Regarding the generation of shadows, before generating the reflections our algorithm uses a cube map of shadow maps. Finally, all the lightings are calculated and merged with the reflection and shadow maps, resulting in the final composition of the scene.

4 Test results on the performance of rendering algorithms

In this section, we report on the quality and performance of our new reflection solution using ray tracing to fix the SSR failures in scenes with reflections, shadows, lighting, and dynamic objects. For our benchmark, we have used the Sponza scene from Crytek, with a resolution of \(1280\times 720\) and 260,000 polygons. The floor reflection properties are enabled, and a point light source is in constant motion.

We created three different testing scenarios which will be described in more detail as follows. In particular, Test Scenario 2 is a 3D scene, having a total of around 1,810,000 polygons. All the tests were executed on an Intel Core i7-4770 with 3.40GHz and 16GB of RAM, using an Nvidia GeForce GTX 780 TI graphics card, Windows 10 64-bit, OptiX 3.9 and CUDA 7.5.

The images generated by the SSR and hybrid algorithms were tested under the Structural Similarity Image Metric (SSIM) [25] to verify their quality, by comparing them to “ideal” reference images rendered with pure ray tracing. The SSIM is a perception-based model that considers image degradation as a perceived change in the structural information, while also incorporating important perceptual phenomena, such as both luminance and contrast masking terms.

4.1 Test Scenario 1

In this first test scenario, the camera was positioned facing up, toward two curtains, so that part of the curtains and parts of the decorative floor vases lie outside the camera’s field of view. This is a typical case, where the SSR fails to compute the reflection.

We measured the execution times (in milliseconds) for each stage of the execution process of our rendering engine using a ray-length-fade-off distance of 4 U. The breakdown of the rendering process (running times) using pure SSR, pure ray tracing, and our hybrid solution is listed in Table 1.

Table 1 Breakdown of the rendering for Test Scenario 1

More specifically, in the G-buffer stage, we generate the images containing the following pixel information: world position, normal, diffuse color, Z-buffer, and reflection mask. In the shadow stage, we create a cube map with the shadow map in real time to calculate the shadows. Next, in the SSR stage, we run the SSR algorithm to identify and compose the mask of pixels, whose reflection information was missed. We generate the reflections that were missing with ray tracing in the ray tracing stage. The computation time includes the time for updating the BVH, traversal, and tracing (since we have used OptiX, we could not calculate those items individually). Finally, lighting calculations are done, as well as and the final scene composition.

The final images and their respective processing times using the SSR, ray tracing, and hybrid algorithms are shown in Fig. 3a–c and in Table 1, respectively. The results on the performance of the rendering algorithms show that our hybrid solution can achieve high accuracy at interactive frame rates. More specifically, the hybrid algorithm shows better processing time (16.9 ms) than the purely ray tracing (21.2 ms). Although the combination of SSR with ray-tracing generates a higher memory load, it does less work through the balance of the tracing step, with a significant part of the image being calculated faster by the SSR.

Fig. 3
figure 3

Final images and their respective processing times for Test Scenario 1

Despite the fact that the purely SSR solution is much faster (6.1 ms) than the other two, it does not faithful portray the realism in the scene and generates various areas on the image without reflection (this problem can be easily identified in Fig. 3a, in both laterals of the floor and on the floor in front of the central decorative floor vase).

4.2 Test Scenario 2

In this second test scenario, we defined and used a camera walkthrough animation (with 950 frames), whose movement is controlled by a Catmull-Rom spline. The camera flies over the whole Sponza scene. At the central hall, there are 31 Happy Buddha models (from the Stanford 3D scanning repository) moving together and continuously from one side to the other, as shown in Fig. 4.

Fig. 4
figure 4

Walkthrough animation samples for Test Scenario 2

Initially, we performed tests to generate some statistics on the rendering process, with our focus on the impact that different resolutions (\(640\times 480\), \(1280 \times 720\), and \(1920 \times 1080\) pixels) have on rendering the scene using the SSR, hybrid, and ray-tracing algorithms. The results are displayed graphically in Fig. 5a–c. The processing times of the three algorithms increase with the size of the resolution of the scene, which was expected.

Fig. 5
figure 5

Impact of resolution on the rendering process of the SSR, ray-tracing, and hybrid algorithms, tested with Test Scenario 2

Figure 6 shows the results of the processing times (in milliseconds) for the SSR, hybrid, and ray-tracing (RT) algorithms, using a ray-length-fade-off distance of 4 U and a resolution of \(1280\times 720\). Comparing the results, it is evident that our hybrid solution is (most of the time) in between the SSR and the pure ray tracing, during the walkthrough. This behavior is certainly due to the hybrid nature of the algorithm. That is, as the SSR does not show much performance loss associated with an increase in scene resolution (Fig. 5a), it helps to reduce the processing time needed to generate some of the reflections. This can be visualized more clearly, for instance, between the frames 50 and 200.

Moreover, the results show that the hybrid solution scales quite well for scenes with approximately 1,810,000 polygons, among which around 1,550,000 of them display the Happy Buddha objects in movement. Even in a dynamic scene, our solution again presents very close results in terms of visual quality to the pure solution implemented in ray tracing.

Fig. 6
figure 6

Performance comparisons (SSR, hybrid, and RT) for Test Scenario 2, with a resolution of \(1280\times 720\)

In Fig. 7, we show the frames comparison between the SSR and our hybrid solution using the SSIM metric for measuring the image quality. The values vary from \(-1\) (which means that the image is totally different from the reference image) to 1 (which means that the image is equal to the reference image). As shown in the images, it is possible to see that our solution achieves much better results than the one obtained using the pure SSR.

Fig. 7
figure 7

SSIM metric applied to the walkthrough animation of Test Scenario 2 (frames #5, #200, #375, #585, and #875 are highlighted). The graphical results were generated using the SSR and hybrid algorithms, with respect to the ray tracing

The SSR algorithm fails in the synthesis of reflections, with a large and contrasting difference in quality when compared to that one generated by ray tracing, especially if the camera is directed at the reflective floor and much of the scene lies outside the camera’s field of view. For example, at peak points of processing time in the walkthrough, such as in the frame #375 of the animation, the difference in the visual rendering quality between the SSR and our hybrid solution with respect to ray tracing is quite large, mainly because in this frame, there are many objects that lie outside of the camera’s field of view, as shown in Fig. 4c. It is also noteworthy that our hybrid solution presents very close results in terms of visual quality, to the pure solution implemented in ray tracing (although at frame #375 it suffers a considerable impact on the FPS performance, due to intense use of the ray tracing).

Furthermore, for a large section of the walkthrough, we clearly identify gains in both the FPS and the quality of the reflections generated, for instance, in between the frames #200 and #585. Thus, considering these gains accumulated throughout the animation, we show that our hybrid solution can synthesize reflections with quality much higher than that generated by the SSR and very close to that produced by the ray tracing.

We produced a video as supplementary material to better illustrate our hybrid solution [12]. The results show a clear visual benefit at acceptable cost.

4.3 Test Scenario 3

In this third scenario, the camera is positioned facing the middle of the Sponza central corridor, where there are also 6 additional mesh objects (4 armadillos and 2 T-Rex dinosaur models, respectively, with 70 and 40k polygons each), totalizing approximately 620k polygons in the entire scene. We have performed several performance and image quality evaluation tests, demonstrated on various scenes (with planar mirrors, curved reflectors, and normal maps on planar and curved surfaces), using different ray-length (RL) values of 4 and 16 U, as shown in the \(1\mathrm{st}\) and \(2\mathrm{nd}\) columns of Fig. 8. We have removed the shadows from all the images displayed in Fig. 8 to facilitate clearer visualization of the reflections.

Using an RL value of 16 U obtained from the AABB (Axis-Aligned Bounding Boxes) of the Sponza 3D model (which is enough to reflect most of the scene), we have also calculated the ray distribution (colored in pink and blue, respectively, indicating where the object and screen space methods were used), and the reflection differences between the ray tracing and hybrid images, as shown in the \(3\mathrm{rd}\) and \(4\mathrm{th}\) columns of Fig. 8, respectively.

Moreover, in Table 2, we compare the performance of each method (SSR, hybrid, and ray tracing) in terms of the frame time, measured in seconds. The increase in the value of RL had almost no impact on the SSR performance, probably because the value of 16 U may be very long and, in many cases, can quickly leave the camera’s field of view, switching the rendering to the ray tracing.

Fig. 8
figure 8

Image quality comparisons for Test Scenario 3, tested with different types of reflectors with RL values of 4 and 16 U, including ray distributions (colored in pink and blue, respectively, indicating where the object and screen space methods were used) and reflection differences between the ray tracing and hybrid images (to improve visibility, the images were enlarged to 1.5 times its normal size and their contrast was increased)

Table 2 Frame time comparisons between the SSR, hybrid, and ray-tracing algorithms, tested with different types of reflectors and ray lengths (RL), for Test Scenario 3

For the tests performed with objects that had a considerable increase in their reflection area due to longer RL values (Ground, Ground with normal map, and Stanford Bunny), the hybrid algorithm showed an increase in its processing time, since with an RL of 4 U, the SSR reaches the final step of the ray marching very quickly. If the SSR does not hit any object, the ray tracing is not chosen. However, for an RL of 16 U, some of the SSR ray marching steps most likely will be unable to reach the final stage and will switch to the ray tracing, generating a greater number of pixels with reflections, as shown in the \(2\mathrm{nd}\) column of Fig. 8. Therefore, for an RL of 16 U, the increase in the frame time of the hybrid algorithm can be explained by the increased use of ray tracing. In the tests performed with the reflective sphere (with and without normal map), its reflection area is not too big, so the difference between the frame time of the reflections generated with RL values of 4 and 16 U is very small, for any of the three methods, as shown in Table 2.

To better illustrate our results, in the \(3\mathrm{rd}\) column of Fig. 8, we also show images highlighting which parts of the reflective objects were generated with SSR or ray tracing (RT). In percentage terms, these fractions are the following: Ground (60% SSR, 30% RT), Ground with normal map (61% SSR, 29% RT), Sphere (45% SSR, 55% RT), Sphere with normal map (49% SSR, 51% RT), and Stanford Bunny (44% SSR, 56% RT).

Finally, we compare the results of the hybrid and ray-tracing algorithms, using the SSIM quality metric. As shown in Table 3, the results produced by the hybrid algorithm are very close to those generated by the ray tracing. In the \(4\mathrm{th}\) column of Fig. 8, we also added difference-image illustrations showing that the reflection differences between the pixels generated by both methods, even with curved reflectors, are negligible.

Table 3 Quality metric comparison (SSIM) between the hybrid and ray-tracing algorithms, tested with different types of reflectors and ray lengths (RL), for Test Scenario 3

5 Conclusions and Future Work

In this paper, we have presented a simple but novel solution for rendering realistic dynamic reflections of rigid objects in real time. Our solution is based on a combination of the SSR technique with pure GPU-ray tracing (OptiX) for the generation of reflections. For those pixels having a reflective component and where SSR fails, GPU-ray tracing is launched to complete the reflections.

The results demonstrate an improvement in performance using our hybrid algorithm, with a small perceptual loss in quality when compared to the full ray-tracing solution. The metric SSIM was helpful to analyze the quality of the dynamic images along the walkthrough animation in a precise and automated way.

In terms of FPS, our solution remains positioned (most of the time) in between the SSR and the pure ray-tracing’s methods, during the walkthrough. Besides, it scales quite well for scenes with 3D dynamic objects. However, the solution presented in this work does not support yet skeletal mesh animations, since the dynamic objects are build once and translated in position.

Given that our solution is hybrid, in the worst case, e.g, facing a mirror, only the ray tracing will be chosen for the calculation of reflections, therefore, negatively affecting the FPS. However, to simulate reflections in cases when the SSR achieves good results, such as to synthesize dynamic scenes that contain an asphalt covered ground, a wet floor, a polished floor, etc, our solution would be highly recommended.

As future work, we intend to develop a customized ray tracer, fully implemented on GPU using CUDA for the generation of reflections in dynamic scenes. We also plan to integrate optimized data structures in our hybrid solution to differentiate objects near and far away from the camera [9]. We might also consider using two depth layers [15] or applying an undersampling technique [14] to reduce the number of rays in the full ray-tracing pass, what probably would increase our algorithm’s performance.

The reason for choosing OptiX was that it is an ease-to-use and GPU-optimized library which provides specialized data structures for ray tracing. However, it is not open source, which means that it does not allow to customize our own code for running the test cases. Therefore, we also intend to implement and explore the use of new optimized data structures (different from those currently available in OptiX), as well as customize them to specific contexts, since for interactive ray tracing, it is fundamental to update or recreate the spatial data structures at runtime to produce interactive refresh rates.

It is our position that automated threshold methods for estimating the most appropriate values for any scenes, particularly for the generation of reflections in dynamic ones, may improve even more the quality of our solution. The same applies to exploring a changing step size for ray marching [13] or acceleration techniques [27].

Finally, for mirror objects, the blurring will sometimes not be enough to hide the “blocky” appearance of screen-space reflections. Thus, we also plan to investigate new image processing methods to further soften or even mitigate those unpleasant visual effects.