1 INTRODUCTION

Visualization of large 3D scenes remains a challenging computer graphics problem. Even though the capabilities of graphics hardware grow and new methods and algorithms are developed, the realistic or authentic visualization of complex scenes remains an unattainable goal for many computer graphics applications. Many industrial software packages, e.g., CAD/CAE/CAM, BIM, and GIS, are critical with respect to the complexity of scenes and detalization of individual objects, because these packages involve a certain interactive model of human–machine interaction and enable efficient scene rendering on user’s hardware. Nowadays, scenes often include thousands or millions of polygon models created in 3D modeling applications or obtained by scanning real-world objects. Moreover, scene objects can exhibit dynamic behavior given by deterministic or random events of their appearance, disappearance, or movement in a scene.

There are many approaches to solving this problem: frustum culling, occlusion culling, geometry simplification, and rendering optimization. Currently, geometry simplification is one of the most efficient approaches. It has one goal: reducing the number of polygons while preserving, to the extent possible, the main features of the original model. Geometry simplification methods differ in their basic decimation operators (vertex removal, edge collapse, vertex clustering, etc.), error metrics, and topology requirements to a polygonal model. More information can be found in [1]. The method described in [2] should be especially noted. Owing to edge collapse and quadric error metric, this method demonstrated good performance and high quality of simplification. In addition, this method can be employed for models that are not simply connected 2D manifolds.

One of the promising methods for visualizing complex scenes is the levels of detail (LOD) for scene objects. This line of research in computer graphics has a long history, which began with the introduction of the LOD concept by James Clark in 1976. This concept implies that, for each scene object, a number of its representations with different degrees of detail are created. When visualizing a scene, suitable levels of detail are selected in such a way that more accurate representations are used for the objects close to the viewer, while more rough ones are used for distant objects.

Presently, to visualize large scenes, hierarchical levels of detail (HLOD) are widely employed [3, 4]. In contrast to classic LOD, HLOD provide simplified representations not only for individual objects but also for groups of objects organized in multi-level hierarchies. Instead of selecting a suitable level of detail for each object, it becomes possible to process groups of objects. This provides a higher degree of simplification and enables a reduction in the time it takes to traverse the scene tree, as well as reduction in the number of draw calls.

However, HLOD is difficult to use for arbitrary dynamic scenes. Each time objects appear, disappear, or move in the scene, their representations must be recomputed. The time required for the recomputation generally exceeds the time required for rendering the scene, which makes HLOD useless for scenes with a large number of dynamic objects. Known attempts to optimize the recomputation process by performing incremental updates executed in parallel were not successful [3].

In this paper, we discuss dynamic scenes with deterministic events, which determine the appearance, disappearance, and movement of objects in the scene. In this case, determinism is understood as the presence of a priori knowledge about when and with what objects the events occur. For effective visualization of these scenes, we propose a new method, which is called hierarchical dynamic levels of detail (HDLOD). This method creates hierarchical levels of detail that do not require recomputation when animating a dynamic scene. Our method differs fundamentally from the LOD methods used to render scenes with typical geometric models and predefined behavior patterns, e.g., scenes that simulate pedestrian flows [5].

The HDLOD method does not preclude the use of various rendering methods, including frustum culling and occlusion culling [6]. Having loaded a simplified representation of the scene in video memory, popular methods for visibility checks on the GPU can be employed [79]. This paper describes algorithms for the generation of HDLOD clusters and their visualization by using various rendering methods and presents results of our computational experiments, which confirm the efficiency and practical potential of the method proposed.

2 HIERARCHICAL DYNAMIC LEVELS OF DETAIL

Suppose that a scene S(t) is defined in a 3D Euclidean space E3 on a simulation time interval t ∈ [0, T] and is represented as a linear list of objects s(gs, \({{{v}}_{s}},{{p}_{s}}) \in S\). Each object has an invariable geometric representation \({{g}_{s}} \subseteq {{E}^{3}}\). The presence of the objects in the scene is determined by their visibility functions \({{{v}}_{s}}(t)\) : \([0,T] \to \{ 0,1\} \) in such a way that the function \({{{v}}_{s}}(t)\) is one if the object s appears in the scene at the instant t and it is zero if it does not. The position of the object is determined by its position function ps(t) : \([0,T] \to M\), where M is a set of 4 × 4 matrices. It should be noted that the movement of objects can be defined in many different ways, e.g., by using a set of key points at which the position and orientation of an object are determined and the time when the object appears at a certain key point is specified. The position of the object at each instant can be determined by interpolating its positions at the nearest key points. However, for rendering, the position of the object is typically specified using a model transformation matrix. Moreover, any representation of the object’s position can be expressed in terms of a matrix. Hence, without loss of generality, it can be assumed that, at each instant \(t \in [0,T]\), the position of the object can be represented by a certain matrix.

Figure 1 shows an example of a dynamic scene that simulates the construction of a skyscraper in accordance with a predefined design project. As simulation time goes, its structural elements and construction machinery are installed at the construction site or removed from it. These appearances and disappearances can be rendered using the visibility functions shown in Fig. 2. The landscape elements do not change throughout the entire simulation period. In addition, during the construction, various machinery moves all over the construction site (see Fig. 3).

Fig. 1.
figure 1

Example of a dynamic scene that simulates the construction of a skyscraper.

Fig. 2.
figure 2

Some typical visibility functions.

Fig. 3.
figure 3

Example of dynamic objects in the scene (construction machinery).

Hereinafter, we assume that the geometric representation of the scene objects is given by a set of triangles. We do not make any assumptions about the topology of the objects’ boundary representations and do not require the boundary representations to be connected or manifolds, because, for rendering, only the so-called polygon soup is often provided without guarantees of any topological properties.

In terms of dynamic behavior, all objects in the scene can be divided into three classes.

The class of static objects includes the objects whose visibility and position do not change throughout the entire simulation period, i.e., \(\forall t \in [0,T]~\,{{{v}}_{s}}(t)\) = 1, ps(t) = I, where I is the identity matrix.

The class of pseudo-dynamic objects consists of the objects that appear and disappear without changing their positions: \(\forall t \in \left[ {0,T} \right]~{{p}_{s}}\left( t \right)\) = I, but \(\exists {{t}_{1}},{{t}_{2}} \in \left[ {0,T} \right]\) such that \({{{v}}_{s}}({{t}_{1}}) \ne {{{v}}_{s}}({{t}_{2}})\).

Finally, the class of dynamic objects implies that both the visibility function and position function of the object vary with time: \(\exists {{t}_{1}},{{t}_{2}},{{t}_{3}},{{t}_{4}} \in [0,T]\) such that \({{{v}}_{s}}({{t}_{1}}) \ne {{{v}}_{s}}({{t}_{2}})\), \({{p}_{s}}({{t}_{3}}) \ne {{p}_{s}}({{t}_{4}})\).

We refer to the hierarchical dynamic levels of detail (HDLOD) as a tree of clusters C(G, V, P) = \(\{ c({{g}_{c}},{{{v}}_{c}},{{p}_{c}}), \prec \} \) represented by a set of clusters \(c({{g}_{c}},{{{v}}_{c}},{{p}_{c}})\) with assigned geometric representations gc, visibility functions \({{{v}}_{c}}\left( t \right):\left[ {0,T} \right] \to \left[ {0,1} \right]\), and position functions pc(t) : \([0,T] \to M\). In contrast to the visibility functions of objects \({{{v}}_{s}}(t)\), which take values 0 or 1, the visibility functions of clusters \({{{v}}_{c}}(t)\) take values on the interval from 0 to 1, thus using the concept of partial truth. Indeed, since some objects in a cluster can appear in the scene at some instant and other objects can disappear at the same instant, it is impossible to make an unambiguous decision about the appearance of the whole cluster. For the clusters, an agglomeration relation \( \prec \) is also defined in such a way that \(c' \prec \, c\) if and only if the cluster \(c{\kern 1pt} ' \in C\) is a direct descendant (in the tree) of the cluster cC.

The first step of HDLOD generation is the classification of the scene objects into static, pseudo-dynamic, and dynamic. Each class uses its own cluster generation method.

Static and pseudo-dynamic cluster trees have similar structures: the leaves of the trees are strictly defined individual objects. Their internal nodes are clusters that aggregate the geometry (in the pseudo-dynamic case, also visibility) of the corresponding subtrees. Moreover, as the level in the tree rises, the geometry and visibility function of the clusters become simpler with the root of the tree being the simplest representation of a large group of objects.

Dynamic objects can have different motion patterns and different instants of appearance and disappearance. This significantly complicates the analysis and hinders their effective clustering. That is why, for each dynamic object, an individual cluster is created. In typical scenes, the same model can be used to represent different objects. For instance, several excavators that have the same geometric representation, being instances of the same general model of an excavator, can move all over the construction site. This is taken into account when generating clusters for dynamic objects: they can refer to the same geometric representation. It should be noted that, for clusters of dynamic objects, parent clusters can be additionally generated, which contain simplified geometry, simplified visibility function, and simplified position function.

Finally, all root nodes in the trees of static, pseudo-dynamic, and dynamic clusters are attached to a virtual node to form a single HDLOD tree. Figure 4 shows an example of this tree.

Fig. 4.
figure 4

Example of the HDLOD tree.

In addition to geometric and behavioral representations, each cluster cC stores the following derived attributes: bounding box bc, size or power \({{w}_{c}}\), as well as spatial error \({{\varepsilon }_{c}}\) and time error \({{\gamma }_{c}}\) defined below. In fact, these attributes are evaluated when generating hierarchical dynamic levels of detail and are used to display them. Thus, each HDLOD cluster is represented as a tuple \(c\left( {{{g}_{c}},{{{v}}_{c}},{{p}_{c}},{{b}_{c}},{{w}_{c}},{{\varepsilon }_{c}},{{\gamma }_{c}}} \right)\).

The spatial error \({{\varepsilon }_{c}}\) can be defined as an absolute error that specifies the maximum permissible local deviation of the geometric representation of the cluster c from the aggregated representation of objects \(o \prec .. \prec \, c' \prec \, c\):

$${{\varepsilon }_{c}} = \mathop {\max }\limits_{c' \prec \, c} \left( {{{\varepsilon }_{{c'}}}} \right) + {{D}_{H}}\left( {{{g}_{c}},\bigcup\limits_{c' \prec \, c}^{} {{{g}_{{c'}}}} } \right).$$

Here, the error \({{\varepsilon }_{c}}\) is determined recursively using the Hausdorff metric DH(A, B), which is the longest distance from a point of one set to the nearest point of another set:

$${{D}_{H}}(A,B) = {\text{max}}\{ \mathop {\max }\limits_{x \in A} \mathop {\min }\limits_{y \in B} D\left( {x,y} \right),\mathop {\max }\limits_{y \in B} \mathop {\min }\limits_{x \in A} D\left( {x,y} \right)\} ,$$

where A and B are closed sets of points and D(x, y) is the metric’s function in the Euclidean space.

The time error \({{\gamma }_{c}}\) is defined for pseudo-dynamic clusters as the maximum deviation of the cluster’s visibility function from the visibility functions of the original objects:

$${{\gamma }_{c}} = \mathop {\max }\limits_{c' \prec ~c} \left( {{{\gamma }_{{c'}}} + {{D}_{V}}\left( {{{{v}}_{c}},{{{v}}_{{c'}}}} \right)} \right),$$

where the distance \({{D}_{V}}({{{v}}_{A}},{{{v}}_{B}})\) is computed using a functional metric:

$${{D}_{V}}\left( {{{{v}}_{A}}(t),{{{v}}_{B}}(t)} \right) = \frac{1}{T}\mathop \smallint \limits_0^T \left| {{{{v}}_{A}}(t) - {{{v}}_{B}}(t)} \right|dt.$$

These parameters are used to estimate the deviation of geometry and proximity of temporal behaviors. The spatial errors are computed when simplifying the geometry of the cluster. The temporal errors can be evaluated together with the cluster visibility function. To evaluate the cluster visibility function, the following formula is used:

$${{{v}}_{c}}(t) = \frac{{\sum\limits_{c' \prec \, c}^{} {{{w}_{{c'}}}{{{v}}_{{c'}}}(t)} }}{{\sum\limits_{c' \prec \, c}^{} {{{w}_{{c'}}}} }}.$$

The use of the weighted sums allows us to better take into account the behavior of significant objects and adequately render the behavior of the cluster. The function \({{{v}}_{c}}(t)\) expresses the weighted number of visible objects in the cluster at the instant t. If it takes a value of one, then all objects of the cluster are present in the scene at the instant t; if it takes a zero value, then they are absent in the scene at the instant t.

For each cluster, it must be decided if to display it, ignore it, or use its more accurate child representations. For this purpose, a time-dependent spatial error \({{\delta }_{c}}(t)\) is computed (spatial error caused by both geometric and temporal simplifications):

$${{\delta }_{c}}\left( t \right) = \left\{ \begin{gathered} 0,\quad {\text{for}}\quad {{{v}}_{c}}\left( t \right) = 0 \hfill \\ {{\varepsilon }_{c}},\quad {\text{for}}\quad {{{v}}_{c}}\left( t \right) = 1~ \hfill \\ {{\varepsilon }_{c}} + \left( {1 - {{{v}}_{c}}\left( t \right)} \right)\mathop \sum \limits_{c' \prec \, c} {{w}_{{c'}}},~\quad {\text{for}}\quad ~\frac{1}{2} \leqslant {{{v}}_{c}}\left( t \right) < 1~ \hfill \\ {{\varepsilon }_{c}} + {{{v}}_{c}}\left( t \right)\mathop \sum \limits_{c' \prec \, c} {{w}_{{c'}}},\quad {\text{for}}\quad ~0 < {{{v}}_{c}}\left( t \right) < \frac{1}{2}. \hfill \\ \end{gathered} \right.$$

When rendering the scene, the cluster tree is traversed. At each node, cluster attributes are analyzed to determine if the further traversal of the subtree is required. It is checked whether the cluster is present in the scene at a given simulation instant (\({{{v}}_{c}}(t) > 0.5\)), whether the bounding box bc of the cluster falls within the visibility cone, and whether the cluster error \({{\delta }_{c}}(t)\) is adequate to provide a desired quality. If all conditions are met, the representation of the cluster is immediately selected for displaying. In this case, the entire subtree can be excluded from the traversal. If the third condition is not satisfied, then the traversal of the cluster subtree continues and its child nodes also undergo the same checks until the leaf nodes with strictly defined scene objects are reached. The pseudocode of the HDLOD visualization algorithm is presented in Appendix A.

3 HDLOD GENERATION

The hierarchical dynamic levels of detail can be generated automatically using the proposed method. The first step of the HDLOD generation is the classification of the scene objects into static, pseudo-dynamic, and dynamic ones. Each class uses its own cluster generation method.

First, we consider the method for processing pseudo-dynamic objects as the most complex ones. This method uses hierarchical (bottom-up) clustering and multi-level accuracy control. It begins with individual objects and sequentially groups them into larger clusters until a desired number of levels is reached. The root clusters can be further simplified if the scene needs to be displayed as part of a more complex composition. The clustering should be carried out with rigid accuracy control: the resulting tree of clusters must satisfy many requirements, including the expected number of levels of detail, degrees of nodes, spatial density and overlap of child clusters, their temporal proximity, and reduction in the complexity of clusters as the level of the tree rises.

Unfortunately, classical clustering methods cannot be directly applied to the problems of the HDLOD generation. The corresponding requirements are quite complex and may contradict each other. This complicates the mathematical formalization of the metric functions and connectivity criteria required for clustering methods [10]. The unacceptably high computational complexity of these methods is another factor preventing the adaptation of the classical results. For instance, a naive implementation of agglomerative clustering has the time complexity of \(O({{n}^{3}})\) and memory consumption of O(n3), which makes it inapplicable even to fairly simple scenes. Faster hierarchical clustering with the time complexity of O(n2) and memory consumption of O(n2) is also not suitable for solving the problems under consideration [10].

Thus, we use another approach to generate the tree. The clustering process is divided into steps, at each of which the clusters generated must satisfy certain accuracy requirements. As new steps are made, these requirements are weakened in such a way as to guarantee the completion of the process after a certain number of steps equal to the number of levels of detail \(L\).

At each step of the method l (\(1 \leqslant l \leqslant L\)), an attempt is made to form new clusters with sizes \({{w}_{c}} \leqslant w\left( l \right)\) and errors \({{\varepsilon }_{c}} \leqslant \varepsilon \left( l \right)\) and \({{\gamma }_{c}} \leqslant \gamma \left( l \right)\). For this purpose, the threshold values w(l), \(\varepsilon \left( l \right)\), and \(\gamma \left( l \right)\) are selected in such a way that they monotonically increase with rising level. For instance, we can suggest the following values:

$$\begin{gathered} w(l) = \frac{l}{L}W,\quad \varepsilon (l) = 0.01\frac{l}{L}W = 0.01w(l), \\ \gamma (l) = 0.25\frac{l}{L}T, \\ \end{gathered} $$

where W is the size of the entire scene.

At each step, the method compiles a list of active clusters to be used at the next step. Initially, the list of active clusters includes all original objects. At each step of the method, some candidates among the active clusters are selected using Hilbert space-filling curves [11]. This allows us, on the one hand, to select candidates in the entire volume of the scene and, on the other hand, to localize them in densely filled regions. Next, for each candidate, its neighbors are found. The neighbors must satisfy conditions for spatial and temporal proximity and also guarantee the generation of a parent cluster with a desired accuracy. A new cluster is formed from the candidate and its neighbors, which become its children. The geometric representation of the new cluster is obtained by combining the representations of the children. It should be simplified to achieve the desired error \({{\varepsilon }_{c}} \leqslant \varepsilon \left( l \right)\), e.g., by using the algorithm mentioned above [2]. The visibility function of the new cluster is determined by evaluating the weighted visibility function \({{{v}}_{c}}(t)\) described above. The new cluster is added to the list of active clusters, while its children are removed from it. If, for the current candidate, no suitable neighbors are found, then it is excluded from the analysis at the current step; however, it participates at the following steps. The pseudocode of the algorithm described above is presented in Appendix B.

Using spatial indexing, the clustering process can be carried out in \(O\left( {n\log n} \right)\), where n is the number of scene objects. As compared to the classical clustering methods mentioned above, this result is much more adequate for large scenes. To quickly find the neighbors, various index structures can be employed [11]; in particular, regular dynamic octrees [12] perform well for pseudo-dynamic scenes. In our implementation, we use ordering with respect to each coordinate.

For static objects, the same clustering method is employed; in this case, however, only the spatial aspect is taken into account. Thus, static objects can be regarded as a special case of pseudo-dynamic ones.

For each dynamic object, an individual cluster is generated. In this case, the geometric representations of the objects are analyzed. If there are objects with the same representation, then the representation of their clusters is set by referring to this representation. This makes it possible to avoid data duplication.

4 RENDERING

The efficiency of the HDLOD cluster visualization depends on the rendering methods employed. In this paper, we propose two subsystems operating in parallel. The first one determines visible clusters (visible surface determination), while the second one uploads polygonal representations of the clusters into video memory and sends instructions to the GPU by using the OpenGL programming interface.

When visualizing the HDLOD tree, it is traversed with cluster visibility checks at the current simulation instant from a specified camera position, taking into account the accuracy of the cluster’s polygonal representation. For the clusters that passed the checks, messages about their visibility are sent to the second subsystem. Having received the message, the second subsystem sends a triangulated representation of the cluster and its rendering instruction to the GPU. Once all necessary data are buffered in the GPU memory, a draw call is made to run the execution of the rendering instructions. The execution of each instruction consists of the following steps: transforming the vertices (vertex shader), rasterizing the triangles, computing the color of the fragments (fragment shader), composing, and displaying (see Fig. 5).

Fig. 5.
figure 5

Main steps of scene object rendering.

The HDLOD method reduces the time required to complete the whole process described above. First, the number of triangles in the geometric representations of the clusters is reduced, which accelerates data transfer to GPU memory and reduces the time of vertex transformation. Second, the total number of rendering instructions is reduced, thus reducing the cost of their processing.

When processing a large number of instructions, CPU resources are used to validate and store the current state of the graphics pipeline. To reduce the overheads, the OpenGL extension NV_Command_List can be employed, which makes it possible to pre-generate an array of rendering instructions together with the current state of the pipeline [13]. This extension is quite efficient but has limited support on modern graphics hardware.

The software implementation of the HDLOD tree visualization method uses the OpenGL procedure MultiDrawElementsIndirect, which allows the array of buffered instructions to be executed in a single call [14]. However, this requires arrays of instructions, transformation matrices, and materials. To reduce the cost of buffering and removing invisible surfaces, the spatial decomposition method based on single reference octrees is employed [15]. Each octant is associated with the corresponding arrays that are supported in the state consistent with the HDLOD tree and are updated as clusters appear, disappear, or move. If the cluster moves within an octant, then its matrix is updated in the corresponding array. If the cluster moves to another octant, then the arrays of both the octants are updated.

5 COMPUTATIONAL EXPERIMENTS

To test the introduced concept of HDLOD and our method for automatic HDLOD generation and visualization, we carried out a series of computational experiments. The times of frame rendering when navigating the scene at fixed simulation instants and the average times of frame rendering for animation throughout the entire simulation period were measured. For testing, we selected the dynamic scene of skyscraper construction (Fig. 1). This scene is an example of a typical project in the construction industry: it has a static environment, a pseudo-dynamic model of a skyscraper under construction (the structural elements of which are installed in accordance with a certain design project), and dynamic machinery used for construction. All scene objects are represented by polygonal meshes. Table 1 shows parameters of the scene.

Table 1. Количество объектов и треугольников в сцене

To estimate the effectiveness of HDLOD, the scene was visualized for various camera positions. In the first position, the camera was placed at the closest range possible to make the scene fully visible. In the other positions, the camera was placed at such distances that the scene occupied one fourth and one sixteenth of the screen. The time stamps were selected at the beginning, at the end, at one third, and at two thirds of the simulation period. The camera positions at which the scene was fully visible on the screen were selected so as to exclude the effect of frustum culling. The computational experiments were conducted on a computer with Intel Core i7-4790 (3.6 GHz), 16 GB RAM, and GeForce GTX 750 Ti (2 GB).

Table 2 shows the performance measurements taken during the experiments. The last column contains results obtained when rendering the scene without using simplified representations, while the first three columns contain results obtained using the HDLOD tree. It can be seen that the use of HDLOD significantly improves performance. As the camera zooms out, the desired effect grows because increasingly large (and simple) clusters are selected to visualize the scene. This dependence is observed both when navigating the static scenes fixed at the selected instants and when animating the scene. In this experiment, the levels of detail for dynamic objects were not used. However, when simplifications for dynamic objects are applied, the growth in performance as the distance to the scene increases becomes more significant. With increasing complexity of the scene, by the end of the simulation period (the skyscraper under construction), we can observe an increase in frametime when visualizing the scene without HDLOD. However, it can be seen that HDLOD shows the worst performance for the time stamp of 1/3 of the simulation period. This is due to the fact that, at this point, a large number of changes occur in the scene and, for many clusters, their visibility functions fc(t) are close to 0.5, while their time-dependent spatial errors δc(t) are very large, which forces the use of child representations. However, the performance remains higher than that without HDLOD.

Table 2. Frame rendering time (in milliseconds) when visualizing the skyscraper construction scene

A similar series of experiments were carried out for some other problems associated with visual simulation of construction projects, urban infrastructure programs, and engineering processes. The experimental results also confirmed the high efficiency and practical potential of the proposed method.

6 CONCLUSIONS

In this paper, we have proposed a method for visualizing scenes with deterministic dynamics that is based on hierarchical dynamic levels of detail (HDLOD). In contrast to popular LOD methods and their hierarchical extensions (HLOD), the proposed method is applicable to a wide class of large dynamic scenes. We have also described algorithms for the HDLOD tree generation and visualization at a specified accuracy. The results of the computational experiments have confirmed the high efficiency and practical potential of the proposed approach. Our further research will be devoted to analyzing algorithmic variants of the proposed method, as well as its industrial application.