1 Introduction

Despite recent successes in building efficient ray tracers, optimizing the rendering computation remains desirable and even essential when the computing load for a required rendering task is beyond the processing power of available processors. For instance, the QHD resolution (2560 \(\times \) 1440) has nowadays become common for mobile phones whose processors are often not powerful enough for full ray tracing in real time. An effective way of accelerating the ray-tracing computation is adaptive undersampling, which aims to fire fewer than one ray per pixel, thereby minimizing the total number of ray shootings that incur costly ray–object intersections, while introducing only a small reduction in ray-tracing quality. In fact, the idea of adaptive pixel sampling has long been explored in the ray-tracing community, usually in the context of adaptive supersampling, which aims at reducing aliasing artifacts caused by insufficient point sampling. Whether undersampling or supersampling, the major concern is identical in that the goal is to efficiently detect image-space pixels and/or object-space surface regions that may create aliasing, then adaptively dispatching rays only where necessary, and applying cheaper interpolation whenever possible.

In this paper, we present an adaptive undersampling technique that is well suited for effective implementation of a mobile GPU ray tracer. Our method collects various pixel attributes on the fly during rendering, which are then used to decide, through similarity checks, whether the expensive ray-tracing operations may be replaced by much cheaper linear interpolation for computing geometric attributes at the first hit points (see Fig. 1). Compared to previous adaptive sampling techniques that exploit both image- and object-space information [1, 4, 5, 13], our method is more “geometric” in that it also examines the higher-order local geometry of object surfaces, such as convexity. This reduces the likelihood of subtle visual artifacts that are hard to eliminate using previous methods. In addition, we propose a low-cost postcorrection method that effectively reduces the occurrence of aliases such as the “missing objects” caused by incomplete ray sampling in undersampled images.

Fig. 1
figure 1

Problematic adaptive pixels found by our adaptive undersampling method. To render the images (a), our mobile GPU ray tracer performed costly ray-tracing operations only for those problematic pixels that were detected through the seven similarity checks defined in Table 1, as respectively depicted in (bh). Only 34.5 % of the image pixels, including both base and adaptive pixels, were ray traced to create the 1024 \(\times \) 1024 image, which were very difficult to distinguish visually from the fully ray-traced image. In general, the ratios of adaptive pixels that fail the respective similarity checks vary in a complicated manner, depending on the scene complexity and rendering parameters

The proposed method is simple in structure and easily mapped to the mobile GPU architecture, offering an efficient parallel undersampling computation. In particular, while most of the existing adaptive methods recursively subdivide pixels for further sampling based on the attributes of four reference corner pixels, our adaptive undersampling algorithm shades a pixel, through ray shooting or interpolation, with reference to only two neighboring pixels. This simple, two-level pixel sampling technique is computationally simpler and requires less memory bandwidth. Therefore, compared with recent adaptive sampling methods such as [8] that are optimized for high performance GPUs, our method will allow more efficient implementation on mobile GPUs, which are more vulnerable to control-path complexity and heavy memory accesses than PC-based GPUs.

2 Previous work on adaptive ray sampling

Adaptive sampling in spatial and temporal spaces has been an important research topic in the ray-tracing community. In his seminal paper, Whitted proposed using hierarchical adaptive supersampling to reduce aliases resulting from the undersampling of high-frequency signals, where pixels were recursively subdivided for further sampling only if colors sampled at their four corners vary significantly [20]. For optimal supersampling in multidimensional space, Lee et al. derived a relationship between the number of ray samples and the quality of the rendering image [10]. Also, for optimal stochastic sampling, Dippé and Wold adaptively determined the sampling rate and filter width based on their error estimates [3]. As a variance reduction technique for solving the rendering equation, Kajiya proposed applying an adaptive hierarchical sampling method so that samples were concentrated in interesting parts of the rendering domain [9].

In his distributed ray-tracing paper, Cook gave an example of using two levels of sampling densities in which a higher-density pattern was applied for troublesome areas [2]. Mitchell also presented a two-level sampling method by subdividing pixels into small squares and finding those that need high-density sampling [12]. Painter and Sloan applied hierarchical adaptive stochastic sampling that worked in a progressive manner [15]. Levoy proposed an adaptive sampling method for volume rendering that also determined the sample rate progressively [11]. Rigau et al. exploited a family of discrimination measures, called the f-divergences, to determine the adaptive sampling rate [16]. Hachisuka et al. proposed a kd-tree-based adaptive refinement and anisotropic integration algorithm for multidimensional sampling in ray tracing [6].

In addition to the color measure, object space information has also been exploited by Thomas et al. [18] and Ohta and Maekawa [14]. Whitted’s adaptive sampling scheme [20] was also extended by Genetti et al. so that decisions regarding extra sampling were made based on object-space information obtained during the ray–object intersection computation [5]. Akimoto et al. proposed a four-level undersampling technique, called pixel-selected ray tracing, to speed up the rendering computation [1], in which both image-space and object-space measures were utilized for adaptive ray tracing. Their idea was then extended by Murakami and Hirota [13] and Formella et al. [4]. Jin et al. also presented a selective and adaptive supersampling method, optimized for today’s many-core processors [8].

In the context of the rasterization-based rendering pipeline, He et al. [7] and Vaidyanathan et al. [19] independently proposed rendering architectures supporting varying shading rates, where different levels of pixel sampling were adopted to reduce the fragment shading cost. These multi-rate shading methods are similar to ours, in that GPU-oriented, simple structured mechanisms are employed to perform expensive shading operations, ray tracing in our case, only where needed, eventually leading to effective undersampling. However, the rasterization-based approaches are not extendable for developing a GPU ray tracer.

3 Adaptive undersampling algorithm

3.1 Partition of image pixels

Figure 2 shows an example of pixel partitioning, where a set of regularly distributed pixels, marked as B, forms a group of base pixels. The other pixels, called adaptive pixels, are classified as either A1 or A2 depending on whether they are in the same row as the base pixels or not. Previous related methods [1] often traverse pixels in a recursive, multilevel fashion for adaptive sampling. To produce a simpler control structure and permit efficient implementation on a mobile GPU platform, however, our adaptive undersampling algorithm adopts a simple traversal mechanism, whereby the pixels are processed in a fixed order: base pixels, type-A1 adaptive pixels, and type-A2 adaptive pixels. In this paper, we describe our algorithm in terms of the \(2\times 2\) pixel partitioning shown in Fig. 2. It requires only a simple modification to handle a base block of larger size.

3.2 Stage I: regular sampling of base pixels

In the first stage of our algorithm, a ray is traced recursively through each of the B pixels. Whereas the eventual goal of firing a primary ray for each pixel is to compute the final shaded color (COL), our method collects various ray attributes at the first hit of the ray, which are exploited later to enable efficient rendering computations. These include a set of geometry attributes of the surface at the first hit, comprising an object identification number (OID), a position vector (POS), a normal vector (NORM), shadow bits (SHDBIT), and texture coordinates (TCOORD), where the SHDBIT attribute stores a set of shadow bits such that a bit is set if and only if a shadow is cast at the surface point with respect to the corresponding light source. In addition, a global shaded color (GCOL) attribute is collected. In our current implementation, this stores the radiance from specular reflection and refraction, although any other radiance caused by a different kind of global illumination may be associated with GCOL. In this work, the vector (OID, POS, NORM, SHDBIT, TCOORD, GCOL) is called the ray-attribute vector (or simply attribute vector) for a pixel.

Fig. 2
figure 2

Image pixel partitioning through \(2\times 2\) base blocks. The pixels marked as B, A1, and A2 represent base pixels, and vertical and horizontal adaptive pixels, respectively

3.3 Stages II and III: adaptive sampling of adaptive pixels

The actual adaptive ray-sampling computation proceeds in two separate steps using the ray-attribute vectors of the B pixels as the initial data. Each elementary sampling operation in stages II and III takes the attribute vectors of two reference pixels as inputs and computes an attribute vector for an adaptive pixel, called the current pixel that exists horizontally (in stage II) or vertically (in stage III) between the reference pixels. In stage II, an attribute vector of each A1 pixel (the current adaptive pixel) is calculated via interpolation or ray tracing based on the attribute vectors of the two neighboring B pixels (the reference pixels) in the same row. In stage III, the same computation is then repeated vertically, taking each A2 pixel as the current pixel and then calculating its attribute vector using those of the corresponding B or A1 pixels in the same column as the reference pixels. Note that when the attribute vector of a pixel is ready, the final color can easily be produced from it.

3.4 Similarity checks

The key aim of our method is to seek to compute the attribute vectors of adaptive pixels through cheap interpolation, in which the linear interpolation for each ray attribute is clearly defined, as much as possible, instead of through expensive ray tracing. To check if simple linear interpolation may safely be applicable, a series of seven elementary tests called similarity checks are performed (refer to Table 1 for a summary of these tests). In our method, the interpolation is applied only if all the tests succeed.

Table 1 Similarity checks

Four local geometry tests The aim of these four tests is to examine if the four geometry attributes, OID, POS, NORM, and TCOORD, can be interpolated from those of the reference pixels. First, different objects between two pixels are often the most serious source of annoying aliases. Therefore, the first test compares the OIDs of two reference pixels and is considered to fail if the objects are different from each other (G1 in Table 1). Second, the next test checks if the distance between the first hits of the reference pixels is less than a given distance threshold (G2). Third, the normal direction at the first hit is particularly useful for detecting an edge formed by polygons of an object that meet at an acute angle. Therefore, this third test investigates if the dot product of the normal directions of the reference pixels is less than a preset threshold (G3).

Although these three similarity tests have often been used in previous methods, they can introduce aliasing when the real intersection point exists on a complex surface. Figure 3 shows a common adverse situation in which the local surface fluctuates between the first hits \(\text{ POS }_0\) and \(\text{ POS }_1\) of the reference pixels. In this case, a linear interpolation of the two normal vectors may give an inaccurate normal at the current ray’s position \(\text{ POS }\), even though the previous three tests may have succeeded. An incorrect normal can result in a severe error when the surface point is locally shaded or the reflection/refraction direction is generated.

To minimize these problems with normals, we perform a fourth elementary test, called the convexity check, in which the signs of the first hit point in 3D space with respect to the tangent plane defined by the position and normal at the second first hit, and vice versa, are examined (G4 in Table 1). If the two signs are different, the local surface between \(\text{ POS }_0\) and \(\text{ POS }_1\) is not smooth, possibly causing a troublesome fluctuation. Although the success of the convexity check does not guarantee surface convexity, because the surface can have multiple inflections, we have found the convexity check to be quite effective for removing normal-related aliasing.

Fig. 3
figure 3

Convexity check. The local concave geometry around the current ray’s first hit may result in annoying alias artifact when the geometry data interpolated from those of the two adjacent reference rays are used for ray tracing

A shadow test  Next, our method performs a shadow test that succeeds only if all the corresponding shadow bits in SHDBITs of the two reference pixels are identical (SH in Table 1). If the test succeeds, the current pixel simply inherits the light visibility from the reference pixels without shooting shadow rays. Otherwise, if at least one bit field disagrees, the shadow rays are fired toward each light. This all-or-nothing strategy may appear excessive, because the light visibility could be checked only for lights with different visibility. However, this strategy produces a simple control structure that results eventually in more efficient SIMD processing on the mobile GPU platform, particularly for scenes with few lights.

A texture test   Often, the same texture image is repeatedly applied to surfaces during texture mapping. If the image is not continuous along its boundaries, a careless linear interpolation of texture coordinates from the two reference pixels could cause annoying aliases. To avoid such problematic situations, we perform a texture test that checks whether, for each component of the texture-coordinate vector, at least one of the corresponding coordinates exists in the interval \([T_{\mathrm{tex}}, 1.0-T_{\mathrm{tex}}]\) for some small \(T_{\mathrm{tex}}>0\) (TX in Table 1). See Fig. 1f to notice how a single texture image was repeatedly mapped onto the floor surface, where simple interpolation of texture coordinates around the boundaries would easily cause wrong texture fetches.

A global color test   The last, but not least, element of classic ray tracing is the effect of indirect illumination caused by specular reflection and refraction, for which costly secondary rays must be traced recursively. In the same way as for primary rays, we may investigate the geometry of these secondary rays via similarity checks, at both the origins and the destinations. However, our preliminary implementation revealed that such a detailed adaptive technique often worsened the runtime performance markedly, at least on the current mobile GPU platform. Therefore, we conduct a simple global color test in which the reflection/refraction colors of the reference pixels are compared with each other (GC in Table 1).

4 Postcorrection of undersampled images

Due to insufficient sampling, our method may introduce the problem that objects, or parts of objects, can fall between ray-traced samples and be missed. Figure 4a, b illustrates a typical situation where the vanishing part in Fig.  4b falls between pixels that are classified, through either ray tracing or interpolation, as being outside the thin object. If those pixels have similar geometry attributes, an incorrect OID is interpolated into the intervening adaptive pixels, making the middle part disappear.

Fig. 4
figure 4

Correction of missing parts. Here, the black dots and the squares indicate pixels whose geometry attributes were obtained through regular ray tracing and interpolation, respectively. The base pixels are marked with B. a Thin object. b Without correction. c Corrected

An important observation is that the missing object problem always occurs in the interpolated adaptive pixels that can be traced from ray-traced adaptive pixels. Such troublesome adaptive pixels are marked with an asterisk in Fig. 4b. An effective way of removing such aliasing is to revisit the ray-traced adaptive pixels, marked in thicker lines in Fig. 4b, propagating their correct ray–object intersection information into their interpolated neighbors. Given a ray-traced adaptive pixel x, consider a neighboring pixel y of x, whose geometry attributes have been interpolated. If the OIDs of x and y are different, y becomes a candidate for the problematic pixel, marked with white dots in Fig. 4c. To investigate whether it actually is a candidate, a primary ray is additionally shot through y, thereby performing the regular ray-tracing operation. If the new OID differs from the old one, then a missing part of the object has been found and can be reconstructed. The adaptive pixel y then becomes classified as ray traced, and its interpolated neighbors are repeatedly investigated, as illustrated in Fig. 4c. In this propagation process, an eight-neighbor examination would give a more robust result. However, we find that a more efficient four-neighbor examination produces sufficiently good rendering results for the \(2\times 2\) base block.

Notice that our correction algorithm involves postprocessing after the entire pass of adaptive rendering is complete. It differs from a previous approach to pixel-selected ray tracing [1], which aims to detect pixels of vanishing objects by referring to the color attributes of pixels during adaptive ray tracing. By separating the adaptive-sampling and error-correction stages, we can achieve a simpler, GPU-friendly algorithm. Note also that our antialiasing mechanism is selective in that other aliases, such as “missing shadow,” can selectively be reduced by checking the corresponding attribute (e.g., SHDBIT for shadow antialiasing).

5 Efficient implementation on mobile GPUs

Because runtime performance is usually more vulnerable to careless GPU implementation on a current mobile platform than on a PC platform, the GPU program must be carefully tuned for maximum efficiency. First, consider the similarity checks in our seven elementary tests. If all the local geometry tests succeed but the shadow test fails, for instance, we may interpolate the local geometry but shoot shadow rays for light visibility. However, to avoid the branch divergences that have a significant negative impact on the GPU performance, our implementation adopts the strategy of full ray tracing for the current pixel if there is any failure in the similarity checks.

Table 2 Seven-kernel implementation of adaptive undersampling

Second, as a result of the similarity checks in stage I and stage II, a set of usually sparse problematic pixels are detected, for which expensive ray tracing is to be carried out in the next stage. Again, to minimize the branch divergence between concurrent threads, our implementation runs a separate kernel for packing those pixels into a contiguous region before initiating the ray-tracing computation. Despite this extra kernel requiring a series of parallel scan operations [17] on the GPU, our test results exhibit a significant enhancement in the rendering performance, because the pack operation also reduces global memory bandwidth significantly. Table 2 summarizes our seven-kernel implementation of the proposed algorithm, which shows the highest rendering performance. Note that splitting the GPU program into kernels of smaller granularity might improve the GPU efficiency further. However, we observe that the increasing numbers of global memory accesses cancels out the benefit from the reduced divergence, ultimately reducing the GPU efficiency.

Third, while the optional postcorrection technique in Sect. 4 can easily be implemented using a stack on a CPU, a different implementation scheme is needed for effective many-core processing. In our method, a concurrent thread, associated with each row of the image, first scans its row from left to right, detecting and correcting problematic pixels progressively. The same operation is then carried out repeatedly from right to left, from top to bottom, and finally from bottom to top. This requires four applications of the scanning process, but our experiments have also shown that scanning in just two orthogonal directions, e.g., from left to right and from top to bottom, usually produces sufficiently good correction outcomes.

6 Experimental results

To test our method, we first implemented a kd-tree-based full ray tracer using the OpenCL 1.2 API on an LG G3 Cat.6 mobile phone that uses the Qualcomm Snapdragon 805 chipset equipped with an Adreno 420 GPU. The proposed adaptive undersampling technique was then applied to optimize the rendering computation on the mobile platform. All the timings were measured using OpenCL workgroups of \(8\times 8\) work items and default thresholds \(T_{\text {pos}} = 0.03\), \(T_{\text {norm}} = 0.9\), \(T_{\text {tex}} = 0.3\), and \(T_{\mathrm{gcol}} = 0.15\), which generally produced good results.

Fig. 5
figure 5

Example scenes and the camera views tested. To achieve fair evaluation of our method on a mobile phone, we selected six scenes with low to high geometric and rendering complexity, whose triangle numbers ranged from 29,359 to 588,402. Because of the limited memory space of the tested mobile phone, some part of the original dataset for San Miguel was omitted. Note that, because the distance threshold \(T_{\mathrm{pos}}\) for the local geometry test G2 is dependent on the dimension of the scene, each scene was normalized such that the longest side of the axis aligned bounding box has length 1. a Café. b Ben. c Kitchen. d Conference. e Bathroom. f San Miguel

6.1 Computation time

Table 5 at the end of this article compares our method to full ray tracing, which shoots one ray through every pixel (see the sampling—1 \(\times \) 1 rows). The timing results in the Time and Speedup columns indicate that the proposed adaptive sampling method (Ours) compares quite favorably to the nonadaptive method (Full RT), being 1.48–2.21 times faster when the 2 \(\times \) 2 base block was used to render 1024 \(\times \) 1024 images for the six example scenes shown in Fig. 5. This efficiency gain was achieved primarily by the decrease in the costly ray-tracing computation despite the extra overhead for the adaptive undersampling, where the figures in the RT ratio column show that only 27.0–34.5 % of image pixels, including both base and adaptive pixels, were actually ray traced in our method. Figure 1 shows those adaptive pixels that were found to be problematic in the respective similarity checks.

Despite our efforts toward lowering the ray-tracing cost, it still accounts for a major portion of the rendering computation, which paradoxically shows the importance of adaptive undersampling on the mobile GPU. As implied by the timing results in Table 3a, which reports the breakdown of runtimes for the test scenes measured for \(1024 \times 1024\) images, the kernel step-I, step-II-c, and step-III-c spent 64.4 % (Conference) to 78.3 % (Kitchen) of the rendering time to ray trace around 30 % of 1024\(^2\) pixels, whereas the other kernels, including the optional postcorrection (step-IV) and the overhead of initiating the GPU program and transferring data (step-ETC), used the remaining time to shade the other pixels.

Table 3 Analysis of the timing performance of our adaptive undersampling method

Note that the base pixels comprising one-quarter of the entire image pixels are always ray traced when the \(2\times 2\) base block is employed, implying that the lowest possible ray-tracing ratio is 25 %. With increasing scene and rendering complexity, the ratio will increase to maintain the rendering quality, in turn lowering the speedup number. Otherwise, there would be an incorrect reliance on heavy interpolation. When the complexity is beyond the capability of a given ray-sampling density, spatial aliasing artifacts occur even for full ray tracing (the case when the ratio is 100 %), to which supersampling has been an inevitable solution. The statistics in the sampling—\(2\times 2\)/\(4\times 4\) rows in Table 5 show that the ray-tracing pixel ratio decreases, thereby improving the efficiency in the supersampling settings.

6.2 Image quality

Figure 6a–c depicts an example of how effectively the postcorrection algorithm reconstructs the vanishing parts of thin objects and the shadow cast by them in the bathroom scene, where our basic adaptive undersampling method suffered from the “missing object” problem. At the extra cost of postcorrection through the OID and SHDBIT attributes, our method was able to reconstruct these missing parts, resulting in an image that appeared very similar to that produced via full ray tracing. Note that, for the tested scenes, the correction stage required 7.7 % (Ben) to 17.1 % (Conference) of the entire rendering time, resulting in a slight decrease in the frame rate. Although the timings in Table 5 include that for the postcorrection computation, this feature can often be turned off for such scenes as café, Ben, and Conference that do not contain very thin objects, which would produce an additional performance enhancement without significant harm to the rendering quality.

Fig. 6
figure 6

Comparison of rendering results. Parts of the images, which were rendered at \(1024\times 1024\) pixels, are shown to aid analysis of the rendering quality. See the text for detail. a Without postcorrection. b With postcorrection. c Full ray tracing. d Ours (\(1\times 1\)). e Full RT (\(1 \times 1\)). f Ours (\(4\times 4\)). g Full RT (\(4 \times 4\)). h Café. i Bathroom

However, the postcorrection algorithm could not effectively remove the other kind of aliasing artifact, such as that appearing on the sink surface where the floor surface, represented as a single object, is reflected (see Fig. 6d, e). These artifacts are caused mainly by the small details on the reflected floor surface being simply beyond the capability of the applied sampling density of one sample per pixel (sampling—\(1\times 1\)). They also appear in the full ray-traced image. Although the adaptive undersampling technique worsens the situation somewhat for temporal efficiency reasons, the appropriate solution is again supersampling, where the 16 ray samplings per pixel (sampling—\(4\times 4\)) reduce the visual difference between the results for our method and for full ray tracing (see Fig. 6f, g).

Overall, our experiments show that good image quality is maintained despite the reduced numbers of ray shots, as given in the PSNR column in Table 5. Figure 6h, i also compares the results from the full ray tracing (top) and our adaptive undersampling (middle) for two example scenes. The most obvious visual errors, as displayed in the difference image (bottom), usually occur around corners or for highly curved objects, which are often hard to detect using similarity checks. Furthermore, when textures are applied, the shaded colors of interpolated adaptive pixels differ slightly from those of the ray-traced pixels. However, these visual differences are often difficult to detect, particularly when rendered interactively.

6.3 Further analysis

Our method is usually more effective when more shadow and/or reflection/refraction rays are to be traced. If full ray tracing is employed, the extra rendering cost increases linearly with the additional number of these rays. As clearly indicated in the experiment where the ratio of pixels for which reflective objects are visible (i.e., reflection rays are being fired), is varied (see Table 3b), the rendering cost for handling the extra secondary rays increased slowly in our method because the ray-tracing computation was suppressed effectively.

Table 4 Preliminary performance test on a PC platform
Table 5 Performance comparison with full ray tracing (\(2\times 2\) base block). Six scenes of low to high geometric complexity were tested, with triangle numbers given in parentheses. The figures in the RT ratio column indicate the ratios of the number of ray samples used by our renderer (“ours”) to that for the full ray tracer (“full RT”). The PSNR values were measured by comparing the respective rendering images produced by the two methods

In summary, the timing performance of our method was primarily affected by the ratio of pixels, including base pixels, for which ray tracing should be performed. This is clearly confirmed in the experiment where the ratio was varied in the interval that usually contains those observed in the tested example scenes (refer to Table 3c). Note that our method is selectively controllable in that the performance drop can be suppressed by lowering this ratio through relaxed tolerances in the relevant similarity checks. Developing an effective way to find an optimal set of tolerance values for input scenes remains an open problem.

7 Concluding remarks

As noted earlier, many-core processing driven with a parallel programming tool such as the OpenCL API is more vulnerable to the complexity of parallel algorithms on mobile GPUs than on PC-based GPUs. Therefore, it was critical to design a mobile GPU algorithm with a simple control structure, which explains why we had to compromise between simplicity and flexibility in developing our algorithm. As a result, our adaptive sampling scheme is orthogonal to the acceleration structures and traversal algorithms that are routinely used in ray tracing. We expect that it will be combined effectively with a mobile ray-tracing hardware architecture for even higher ray-tracing throughput in the future.

In the future, it will be worthwhile to investigate the possibility of our method in a PC GPU ray tracer. Our preliminary experiments show that the simple porting of the OpenCL-based ray tracer also allows effective adaptive ray sampling on a PC platform when high-quality, high-resolution images are to be ray traced for complicated scenes. Table 4 summarizes the statistics collected when a reflective Hairball model made of 2,880,000 triangles was rendered at the 4K UHD resolution of \(3840\times 2160\) pixels on a desktop PC with an AMD Radeon R9 Fury X GPU. Here, because of the complexity of the model, high rates of sampling was needed to ensure the rendering quality. As in the mobile ray tracing, we observe that our method achieves marked speedups while retaining good image quality compared to the full ray tracing. Tailoring our adaptive undersampling algorithm to best fit the PC GPU remains a future research topic.