1 Introduction

Thanks to recent development of remote sensing technologies such as Light Detection and Ranging (LiDAR), the amount of available spatial data, represented as raw massive point clouds, has been increasing exponentially. LiDAR data is used in a variety of different fields, including urban modeling [69], climate study [3], earthquake analysis [47], disaster management [43, 75], flood risk mapping [71], forest analysis [12, 37, 74], and coastal morphology analysis [65, 73]. Surface models based on LiDAR data enable the extraction of features relevant for several applications. As an example, the slope of a terrain is used to map seafloor habitats [42], while pairing slope, roughness, and curvature of a terrain is used to model coral distribution [17]. Also, the segmentation of a terrain according to its critical points (i.e., peaks and pits) provides information about terrain morphology, which are fundamental for assessing the risk of landslides or floods. Research in these fields has been greatly enhanced by the increasing availability of open data repositories.

A raster representation is often used for modeling a terrain or a surface from airborne LiDAR data, mainly because of the availability of a large number of software tools for processing raster data in Geographic Information Systems (GISs) and remote sensing. While it is always possible to transform a point cloud into a raster representation, this is a computationally intensive operation, which can account for 70-80% of the time required by the total analysis pipeline [1, 6]. Also, artifacts might be created due to the resolution of the raster grid, especially if the original point cloud contains noise due to acquisition errors, or if it contains missing data, due to the presence of occlusions.

Adaptive alternatives to raster representations are Triangulated Irregular Networks (TINs), which are used for encoding irregular distributed data at the cost of a higher memory consumption with respect to raster-based elevation models (usually called Digital Elevation Models (DEMs)). But compact and widely-used data structures for encoding TINs suffer from scalability issues. For instance, using the most compact state-of-art data structure for triangle meshes, we can process meshes up to 150 million of vertices (300 million of triangles) on our workstation with an Intel Xeon E5-2630 v4 CPU at 2.20Ghz and 64GB of RAM, which is much lower than the size of currently available data sets (see, for instance, the OpenTopography repositoryFootnote 1).

We propose here a new data structure for in-memory processing of TINs, the Terrain tree, which encodes the topology of a triangulated terrain combined with a spatial index built on the triangle mesh. We present, discuss and compare three spatial indexes which together form the Terrain tree family. By encoding the vertices incident in each triangle as well as the field values associated with each vertex, we provide the minimum amount of information required for extracting the full mesh connectivity and for processing such fields locally. On the other hand, the spatial index provides the ability to navigate the triangle mesh at a global scale. These two components enable the efficient extraction of connectivity information and guarantee compactness and scalability. Our work builds on a short paper [23] in which we have introduced a single spatial index for triangle meshes, using point-based decomposition of the domain, and we discuss the extraction of some basic morphological terrain features, as slope, curvature and critical points identification. In this work, we have developed a new distributed approach for extracting such morphological information on all Terrain trees. Moreover, we have developed distributed algorithms for computing and analyzing the topology of a terrain by using discrete Morse theory [30]. Such algorithms require an efficient navigation of a triangle mesh, which is a challenging task in hierarchical and modular mesh data structures like the Terrain trees.

Morse theory [52] has been used for computing segmentations of the graph of 2D and 3D scalar fields based on the critical points of the field, like the critical net, which consists of the critical points and of the separatrix lines connecting them, or Morse decompositions, defined by the regions of influence of the critical points. Due to the discrete nature of the problem, recent research has focused on discrete Morse theory [29] which is a combinatorial counterpart of smooth Morse theory [52]. Discrete Morse theory is based on the definition of a discrete vector field, also called a Forman gradient, which emulates the gradient of the original function. By means of the Forman gradient, the connectivity of the critical points and the Morse decomposition can be extracted, thus providing an efficient and compact representation of the terrain topology [15, 38].

It is common in applications to study a terrain in combination with additional fields defined on it, and we can find such data in several domains, like forest management, weather predictions and geological sciences [40]. On geological data, such fields usually represent gravity and intensity of the magnetic field. In the case of LiDAR point clouds, these fields can be either: (i) generated by the instrument used for creating the point cloud (e.g., the elevation of each point), (ii) directly derived from the raw data (e.g., the slope or the airborne laser scanning cover), or (iii) computed by a domain expert at run-time (e.g., estimating the curvature and roughness values from the elevation of each point). These types of data sets are referred to as multifield datasets, and they are formally represented by triangle meshes and collections of scalar values at the vertices of the mesh. To analyze such multifield datasets, we have extended the strategy defined by Nagaraj et al. [53] for computing a new scalar function capturing the relationships among the multifield data.

The remainder of the paper is organized as follows. In Section 2, we review some background notions on triangle meshes and on discrete Morse theory. In Section 3, we discuss related work on connectivity-based data structures for triangle meshes, on spatial indexes for triangle meshes and maps, and on techniques from topological data analysis relevant to our work. In Section 4, we present the Terrain trees, describing the different subdivision rules and how they are generated, and, in Section 5, their encoding structure and how to execute basic spatial and topological queries on them. In Section 6, we describe how to extract classical morphological terrain features, namely triangle and edge slope, curvature and roughness in the Terrain trees framework. In Section 7, we define how to perform topology-based terrain analysis on Terrain trees by using a discrete Morse gradient. First, we introduce a distributed algorithm for extracting a discrete gradient field, and, then, a distributed procedure for extracting the critical net on the critical points of the gradient. In Section 8, we depict how multivariate visualization is performed in Terrain trees. In Section 9, we provide an experimental evaluation of the Terrain trees by comparing the performances of the different spatial indexes also against a state-of-the-art compact data structure for meshes. Finally, in Section 10, we draw some concluding remarks and discuss directions for future work.

2 Background notions

In this Section, we review some background notions on triangle meshes, which are at the basis of triangulated terrains, and on Morse and discrete Morse theories, which are at the basis of the terrain analysis developed in this work.

2.1 Triangulated Irregular Networks (TINs)

A Triangulated Irregular Network (TIN) is a digital terrain model defined by a finite set of irregularly distributed points in the plane, each of which has an elevation value associated. A TIN consists of a triangle mesh connecting the points in the plane and of a piecewise linear interpolating function defined on the triangles of such mesh.

To define a triangle mesh, we need to introduce the concept of a simplex. A k-simplex \(\sigma\) is the convex hull of \(k+ 1\) independent points in the Euclidean space \(\mathbb {E}^n\) (with \(k\ge 0\)). \(k\) is the dimension of \(\sigma\). A 0-simplex is a vertex, a 1-simplex is an edge and a 2-simplex is a triangle. An h-facet \(\sigma '\) of a \(k\)-simplex \(\sigma\) is an h-simplex (\(0 \le h < k\)) generated by \(h + 1\) vertices of \(\sigma\). For instance, a triangle has three 0-facets, its vertices, and three 1-facets, its edges. The set of all the facets of a simplex defines its boundary. Conversely, the star of a simplex \(\sigma\) is the set of simplices that have \(\sigma\) as a facet. For instance, the star of a vertex is the set of triangles and edges incident in it. The link of a simplex \(\sigma\) is the set of all the facets of simplices in the star of \(\sigma\) that are not incident in \(\sigma\). A triangle mesh \(\varSigma\) is a collection of vertices (0-simplices), edges (1-simplices), and triangles (2-simplices) such that, given any two triangles in \(\varSigma\), either they have an empty intersection or they intersect at a common simplex (edge or vertex).

The incidences and adjacencies among the simplices of a triangle mesh are captured by connectivity relations [16]. We distinguish among boundary relations, which relate a simplex to its facets, co-boundary relations, which relate a simplex to the simplices for which it is a facet, and adjacency relations, which relate simplices sharing a facet. For instance, the Triangle-Vertex (TV) relation is a boundary relation that associates with a triangle t its three vertices. The Vertex-Triangle (VT) relation is a co-boundary relation that associates with a given vertex the triangles in its star. The Triangle-Triangle (TT) relation is an adjacency relation that associates with a triangle t the three triangles sharing an edge with t.

2.2 Morse theory

Let us consider a domain \(D \subseteq \mathbb {R}^2\) and a smooth (i.e., \(C^{\infty }\)) scalar function f defined over D [70]. A point \(p \in D\) is called a critical point of f if and only if the gradient of f vanishes at p. The determinant of the Hessian matrix \(Hess_p(f)\) of the second order partial derivatives of f, evaluated in p, provides additional information about the critical points of f. The number of negative eigenvalues of \(Hess_p(f)\), called the index of p, defines the type of the critical point. A critical point of index 0 is a minimum, a critical point of index 1 is a saddle, and a critical point of index 2 is a maximum. For each critical point p, the eigenvectors of \(Hess_p(f)\) define the directions in which function f decreases. The lines everywhere tangent to the gradient of function f are called integral lines. An integral line connecting two critical points of consecutive index is called a separatrix line.

Morse theory [52] has been developed for smooth functions f such that all the critical points of f are non-degenerate, i.e., \(Hess_p(f) \ne 0\). In such cases f is said to be a Morse function. The set of critical points and integral lines define a decomposition of D based on the regions of influence of the critical points. If we consider a critical point p of index k, the integral lines converging at p form a k-cell, called the descending manifold of p. The descending manifold of a maximum is a region, the descending manifold of a saddle is a line and that of a minimum, a point. The collection of all descending manifolds form the descending Morse complex. Dually, integral lines originating at p form a \((2-k)\)-cell, called the ascending manifold of p. Here, the ascending manifold of a maximum is a point, the ascending manifold of a saddle is a line and that of a minimum a region. The collection of the ascending manifolds forms the ascending Morse complex.

Figure 1(a) shows the critical points of f, namely minima ( ), saddles ( ), or maxima ( ). Lines indicate integral lines, while bold lines indicate separatrix lines connecting minima to saddles and saddles to maxima. In Fig. 1(b), we depict in red the descending manifold defined by the integral lines having destination in the maximum. In Fig. 1(c), we depict in yellow the ascending manifold defined by the integral lines having their origin at the minimum. The collection of all descending and ascending manifolds defines the descending and ascending Morse complexes, respectively, depicted in Fig. 1(d) and (e). In this work, we are specifically interested in the critical net that compactly describes the terrain morphology. The critical net is a network having its nodes at the critical points, and as arcs the separatrix lines connecting critical points of consecutive indexes. In Fig. 1(a) the critical net is depicted as a set of lines in bold connecting critical points.

Fig. 1
figure 1

(a) Minima ( ), saddles ( ), and maxima ( ) and the integral lines connecting them. (b) Descending manifold corresponding to a maximum. (c) Ascending manifold corresponding to a minimum. (d) Descending Morse complex and (e) ascending Morse complex

2.3 Discrete Morse theory

In the applications we deal with scalar fields sampled at discrete locations within a domain. To this aim, the results of Morse theory, defined in the smooth case, have been extended by its discrete counterpart, called Discrete Morse Theory [29]. By assuming a scalar function F, defined at the vertices of a triangle mesh \(\varSigma\), discrete Morse theory allows for the computation of a combinatorial gradient approximating the gradient of F, also called Forman gradient. The Forman gradient is defined by a collection of simplex pairs such that a k-simplex of \(\varSigma\) is paired with a \((k-1)\)-simplex or a \((k+1)\)-simplex, and each simplex of \(\varSigma\) is in at most one pair. A k-simplex involved in no pairs is called a critical simplex of index k.

A gradient pair can be viewed as an arrow formed by a head (k-simplex) and a tail (\((k-1)\)-simplex). In a triangle mesh, we have arrows formed by a triangle and an edge (triangle-edge pair) and by an edge and a vertex (edge-vertex pair). In a triangle mesh, unpaired simplices can be: critical triangles indicating maxima, critical edges indicating saddles, and critical vertices indicating minima. Figure 2(b) shows the Forman gradient computed on the triangle mesh shown in Fig. 2(a). Black arrows indicate gradient pairs. Red points indicate critical triangles, green points indicate critical edges, and blue points indicate critical vertices.

Fig. 2
figure 2

(a) A TIN with elevation depicted according to a diverging blue-red colormap. (b) Forman gradient, (d) separatrix V-paths, and (c) the corresponding critical net

In the same way, critical simplices are the discrete counterpart of critical points, and sequences of gradient pairs are the discrete counterpart of the integral lines. We call a V-path a sequence of simplices \([\sigma _0, \tau _0,... , \sigma _i, \tau _i, ..., \sigma _q, \tau _q]\) such that \(\sigma _i\) and \(\sigma _{i+1}\) are on the boundary of \(\tau _i\) and \((\sigma _i, \tau _i)\) are paired simplices, where \(i=0,..., q\). A separatrix V-path is a triple \((\tau ,\rho ,\sigma )\), where \(\tau\) and \(\sigma\) are two critical simplices having consecutive indexes and \(\rho\) is a V-path connecting \(\tau\) to \(\sigma\). In a triangle mesh \(\varSigma\), we have separatrix \(V_1\)-paths connecting a critical edge to a critical vertex and separatrix \(V_2\)-paths connecting a critical triangle to a critical edge. In Fig. 2(c) separatrix V-paths are depicted in red.

V-paths and separatrix V-paths are used to extract features from the Forman gradient, including the critical net. Specifically, within the framework of Forman theory, the vertices of the critical net are the critical simplices of V. Arcs of the critical net are the separatrix V-paths connecting them. A geometrical interpretation of the critical net is given by connecting tails and heads of all the arrows in the separatrix V-paths. Figure 2(d) shows in red the critical net computed by following the separatrix V-paths.

3 Related work

In this Section, we review some related work on connectivity-based data structures for triangle meshes, on hierarchical spatial indexes for maps and meshes, and on techniques for topological data analysis.

3.1 Connectivity-based data structures

Connectivity-based data structures extend graph-based data structures for supporting the efficient extraction of connectivity relations (see [16] for a survey). A variety of connectivity-based data structures have been developed in the literature for triangle meshes [16]. The most widely used data structures are triangle-based ones, which encode the vertices and the triangles of the mesh but not the edges. The most compact triangle-based representation is the indexed data structure, which encodes the vertices and the triangles of the mesh, and, for each triangle, the references to its vertices. The Indexed data structure with Adjacencies (IA data structure) [55, 59] extends the indexed data structure by explicitly encoding the adjacencies between triangles through the Triangle-Triangle relation and, for each vertex, the index of a triangle incident into it.

Other triangle-based data structures are the Corner Table (CoT) [61] and the Sorted Opposite Table (SOT) [32] data structures, in which the connectivity of the triangles is encoded through the concept of corner. Given a triangle t, a corner is a reference to an edge-adjacent triangle of t associated with one the vertices of t. Each triangle is thus identified by three corners. The storage requirement of the CoT data structure is the same that of the IA data structure, as shown in [16]. The SOT data structure introduces a compact version of the CoT and IA data structures, by encoding only the triangles of the mesh and the Triangle-Triangle relation, thus requiring about 50% of the storage of the other two representations. However, the SOT data structure is only suited for static meshes and static applications, since modifications to the mesh require the global reconstruction of the Triangle-Vertex relation. Edge-based data structures are also used for triangle meshes, but they have been shown to be more verbose than triangle-based ones [16], while providing the same computational performances.

3.2 Hierarchical spatial indexes

Hierarchical spatial indexes use a recursive subdivision of the space on which the objects of interest are embedded according to different refinement rules. The main classification is between regular refinement and bisection refinement [5]. Regular refinement on rectangular blocks generates quadtrees in 2D space, and octrees in 3D space, while the bisection refinement of axis-aligned hyper-rectangles bisected by axis-aligned hyperplanes generates kD-trees. These decompositions have been originally defined for indexing point sets. They subdivide the space either into blocks of equal size, generating Point-Region (PR) quadtrees and kD-trees [57] or by using the positions of the points, generating point quadtrees and point kD-trees [28]. In the following, we review spatial indexes dealing with connected entities and maps (see [63] for an in-depth treatment of the subject).

The class of Polygonal Map (PM)-quadtrees [64] extends the PR-quadtree to represent polygonal maps in 2D space, considered as collections of segments intersecting only at most at their extreme vertices. There are three variants of a PM-quadtree, namely the \({PM_{1}-quadtree}\), the \(PM_{2}-quadtree\) and the \(PM_{3}-quadtree\). These differ in their subdivision rule, but they all maintain a list of edges in their leaf blocks.

The Randomized Polygonal Map (PMR)-quadtree [35, 54] is an index for collection of line segments in the plane (not necessarily forming a polygonal map). In a PMR-quadtree, if the insertion of an edge causes the number of edges in a leaf block to exceed a given threshold, the block is split, but only once, thus generating an order-dependent quadtree subdivision. In [45] it has been proven that the number of blocks in a PR-quadtree is proportional to the number of line segments and is independent of the depth of the tree.

A first attempt to extend the PM\(_2\)-quadtree to index triangle meshes is the \(PM_{2}-Triangle quadtree (PM2T-quadtree)\) [13], in which triangles, in place of edges, guide the partition into blocks. However, the PM2T-quadtree has two fundamental limitations. The first one is that each block indexes just one vertex, leading to a very deep hierarchy. The second limitation is that the spatial index is stored on top of the IA data structure, leading to a verbose data structure, which greatly limits its scalability.

Spatial indexes have been widely used for terrain rendering [10, 31], providing efficient ways to generate adaptive meshes in in-core and out-of-core environments. Such representations are optimized for rendering, but not for geometric processing, as required in terrain analysis. We refer an interested reader to [58] for a description of such approaches.

3.3 Methods from topological data analysis

Morse theory [52] has been the basis for extracting topological structures, like Morse and Morse-Smale complexes [33]. Morse theory is defined for smooth functions, but recently two discrete counterparts have been developed, piecewise linear Morse theory [4] and Discrete Morse Theory [30]. In our work we are focusing on this latter, and we refer the reader to [15] for a complete analysis of these methods.

Among the many algorithms defined for computing a Forman gradient [15], the most efficient ones are those computing the gradient from a function sampled at the vertices of a triangle or tetrahedral mesh, or of a regular grid. The algorithm described in [41] is based on a divide-and-conquer approach and has the main drawback of introducing many spurious critical simplices. In [67], a similar approach, based on a weighted discrete function, has been defined for computing a Forman gradient on 2D regular grids. The algorithm is well suited for parallelization and significantly reduces the number of spurious critical cells. In [60], an algorithm is proposed for 3D regular grids that processes the lower star of each vertex independently. The lower star of a vertex v is the set of grid cells in the star of v, on which the function values at the vertices different from v is lower than the function value at v. This algorithm does not generate spurious critical cells. The algorithm has been extended to triangle [24] and tetrahedral meshes [72] by using a new implicit encoding of the discrete gradient. It has been shown [44] that for triangle meshes, the discrete Forman gradient finds a critical vertex for each piecewise linear minimum (i.e., a minimum found by applying piecewise linear Morse theory), while piecewise linear saddles and maxima are on the boundary of critical edges and critical triangles.

Recently, a great interest arose in the visualization community in the analysis of datasets having several scalar values per sampled point, called multifield data. Critical features are extracted from such data to highlight information about the scalar fields therein defined. Examples of such structures are the Reeb space [20], the Joint Contour Nets (JCNs) [7], fiber surfaces [8, 68], the Jacobi set [18] and the Pareto sets [36]. Based on discrete Morse theory, a few approaches appeared extending to the multivariate case the extraction of a discrete gradient [2]. The first algorithm capable of dealing with real data of reasonable size is discussed in [39]. In the case of triangulated terrains, capturing the relationships among the different scalar fields in an “aggregate” value results particularly effective as it reduces the problem to a single scalar field visualization problem. A recent technique proposed by Nagaraj et al. [53] aims at computing an aggregate value for the multiple fields, indicating the presence of Jacobi sets [18]. This technique will be discussed in Section 8 together with our extension to triangle meshes for multifield analysis and visualization on terrains.

4 Terrain trees

Terrain trees are a family of spatial data structures for triangulated terrains based on a nested subdivision of the terrain domain. Given a set S of data points, with each point being characterized by x and y coordinates plus an elevation value), we consider the projections of the points of S on the plane and we call D the square domain in the plane containing such projections. A Terrain tree built on S consists of:

  1. 1.

    a triangle mesh \(\varSigma\) connecting the projections on the plane of the points in S;

  2. 2.

    a quadtree describing the nested subdivision of the domain D into square blocks in such a way that the vertices and triangles of \(\varSigma\) are associated with the leaf blocks of the quadtree subdivision.

The association of the vertices and triangles with a leaf block is defined as follows. A vertex is associated with the only block containing it. A triangle is associated with all leaf blocks having a non-empty intersection with it. Note that a block is considered closed at the two edges incident in its lower-left corner, and open at the remaining two edges. More precisely, a block consists of all points (xy) such that \(x_1\le x < x_2\) and \(y_1\le y < y_2\), where \((x_1,y_1)\) is the lower-left corner and \((x_2,y_2)\) is the upper-right corner of the block. Blocks having the upper or the rightmost edge on the boundary of D are closed on the corresponding edge.

Fig. 3
figure 3

Given the triangle mesh of Fig. 4, a vertex threshold \(k_V=2\) and a triangle threshold \(k_T=2\), the Figure shows the spatial subdivision obtained with a PR-T Tree (a), a PM-T Tree (b) and a PMR-T Tree (c). In black are highlighted the spatial decomposition caused by the vertices threshold, while in red those caused by the triangles one. For the PMR-T tree we also highlight the triangle insertion order that drives the spatial decomposition (note that \(k_T\) is not a bucketing threshold for the PMR-T tree)

Similarly to spatial indexes for 2D maps, we have defined different criteria for domain subdivision, based only on the TIN vertices, on the TIN triangles and on both vertices and triangles. Thus, the Terrain trees family consists of three spatial data structures, namely the PR-Terrain tree (PR-T tree), the PM-Terrain tree (PM-T tree) and the PMR-Terrain tree (PMR-T tree), which are bucketed versions of the PM\(_3\)-quadtree, of the PM\(_2\)-quadtree, and of the PMR-quadtree for maps, respectively [54, 55, 64]. As we demonstrate in Section 9, bucketing is a crucial aspect for our spatial indexes, since it allows the indexing of much larger datasets, compared to existing representations in the literature.

A PR-T tree subdivides the domain D based on the vertices of \(\varSigma\). Its subdivision rule uses a threshold \(k_V\) on the number of vertices contained in a leaf block b. If b contains more than \(k_V\) vertices, then b is recursively split into four blocks until this condition is met. An example of a PR-T tree is shown in Fig. 3(a). The generation of a PR-T tree is entirely guided by the vertices of \(\varSigma\), and, thus, the first step in the generation process is exactly the same as for the PR-quadtree [62]. Then, each triangle t of \(\varSigma\) is added to all the leaf blocks intersecting t, without affecting the spatial decomposition.

A PM-T tree uses the same subdivision rule defined for the vertices of \(\varSigma\) as the PR-T tree. A splitting rule on the triangles is also defined, based on a threshold \(k_T\) on the number of triangles per leaf block, as follows:

  1. (1)

    a block \(b\) containing up to \(k_T\) triangles is a leaf block;

  2. (2)

    a block \(b\) that contains more than \(k_T\) triangles is a leaf block if and only if all triangles intersecting \(b\) are incident in the same vertex v, which can be either inside or outside \(b\);

  3. (3)

    otherwise, the block is recursively split until either condition (a) or (b) is met.

An example of the subdivision obtained with the PM-T tree is shown in Fig. 3(b). Note that a PM-T tree extends the PM\(_2\) quadtree defined for maps, by adding bucketing thresholds \(k_V\) and \(k_T\) for both vertices and triangles. The generation of the initial hierarchical decomposition in a PM-T tree is entirely guided by the vertices, like in PR-T trees. Then, each triangle t in \(\varSigma\) is added to a leaf block \(b\) intersecting t, if and only if \(b\) contains less than \(k_T\) triangles. Otherwise, b is split until either condition (1) or (2) is met.

A PM-T tree is a sort of bucketed version of the PM2T-quadtree with some fundamental differences. The PM-T tree has a bucketing threshold for both vertices and triangles. The lack of a bucketing threshold in the PM2T-quadtree produces much deeper decompositions with a number of leaf blocks that is at least four times the number of vertices in \(\varSigma\) (see [14] for details). The PM-T tree uses a simple indexed representation encoding only Triangle-Vertex relations while the PM2T-quadtree encodes the triangle mesh through the IA data structure (see Section 5).

The subdivision strategy for the PMR-T tree is driven only by the triangles of \(\varSigma\). It extends to triangle meshes the subdivision approach defined for sets of segments in the plane in [54]. A leaf block b is split if b intersects more than \(k_T\) triangles, where \(k_T\) is a user-defined threshold, but b is split only once, not recursively. The decomposition and, thus, the shape of the tree depends on the insertion order of the triangles. In [54] it has been proven that in a PMR quadtree built on the edges of a map, the number of edges intersected by a leaf block cannot exceed the sum of the splitting threshold and of the depth of the leaf block. For a PMR-T tree this result still holds and, thus, the number of triangles in a leaf block of a PMR-T tree can be at most equal to \(d_b+k_T\), where \(d_b\) is the depth of the leaf block and \(k_T\) is the splitting threshold. An example of PMR-T tree is shown in Fig. 3(c). For instance, in leaf block \(b\) there are three triangles indexed by it (i.e., triangles 2, 6 and 8), but no split operation is triggered, as the space is decomposed once when inserting triangle 8. Also condition \(d_b+k_T\) is verified as \(b\) is at depth 3. Notice that the split condition may trigger unnecessary splits. For example, in Fig. 3(c) the insertion of triangles 7 and 8 causes unnecessary split operations on the vertices sharing these two triangles.

A PMR-T tree is generated as follows. For each triangle t, and for each leaf block b intersecting t, t is added to b if and only if b contains less than \(k_T\) triangles. Otherwise, b is split and its triangles are distributed to the newly generated leaf blocks (i.e., the children of b). Triangle t is also added to the children of b intersecting it.

5 Implementation of Terrain trees

In this section, we describe the implementation of the Terrain trees in the Terrain trees library (TTL). The kernel of the tool contains the implementation of the three Terrain trees, the PR-Terrain tree (PR-T tree), the PM-Terrain tree (PM-T tree) and the PMR-Terrain tree (PMR-T tree), plus their generation algorithms and algorithms for answering basic spatial and connectivity-based queries. Other functions currently implemented and discussed in the following sections are the extraction of morphological features (see Section 6), the extraction of topology-based features (see Section 7), and the analysis of multivariate terrain data (see Section 8).

The encoding of the triangle mesh \(\varSigma\) in a Terrain tree consists of two arrays \({\varSigma }_{V}\) and \({\varSigma }_{T}\), storing the vertices and the triangles of \(\varSigma\), respectively. Each vertex v and triangle t is represented by a unique index \({i}_{\textit{v}}\), and \(i_{t}\) within arrays \({\varSigma }_{V}\) and \({\varSigma }_{T}\), respectively. \({\varSigma }_{V}\) encodes the geometry of the terrain \(\varSigma\) by storing the longitude, latitude, and elevation and the other field value(s) associated with each vertex v in \(\varSigma\). \({\varSigma }_{T}\) encodes the connectivity of each triangle by storing its three vertex indexes (see Fig. 4(b)).

Fig. 4
figure 4

(a) Triangle mesh and PR-Tree as encoded in a Terrain tree. (b) The mesh topology is organized by encoding, for each triangle, the boundary relation with its vertices. (c) The PR-T represents a hierarchy where each leaf block encodes the list of vertices contained in the block and the list of triangles intersecting the block

We use a pointer-based representation for the hierarchy describing the nested subdivision of a Terrain tree (see Fig. 4(c)). Each internal block of a Terrain tree contains a reference to its parent block and a reference to its children. Each leaf block contains a reference to its parent block plus the information about the vertices and triangles, or only the triangles in the case of a PMR-T trees.

To encode the information associated with the leaf blocks, we use the compact encoding proposed in [27] for an arbitrary-dimensional connectivity-based data structure for simplicial complexes based on a vertex clustering. Such encoding uses the sequential range encoding (SRE), a variant of the run-length encoding [34], that represents a run of consecutive indexes using two integers. The first (negative) index encodes the starting index of the run, while the second encodes the number of remaining elements of the run. The effectiveness of this compression increases with longer runs. This is exploited by representing all vertices inside a leaf block with a single run. Once we obtain the spatial decomposition, a single tree traversal is sufficient to reindex the vertices. The vertices indexed into the same leaf block get a contiguous range of indexes in the reindexed vertices array \({\varSigma }_{V}\). Within each leaf block, this range is represented as a pair of integers. Exploiting the spatial coherence for the triangles is more involved. The reindexing and compression of triangles is performed in such a way that, at the end, triangles indexed by the same set of leaves have contiguous indices in \({\varSigma }_{T}\). To obtain this representation, we traverse first the tree to extract, for each triangle t, the tuple of leaf blocks indexing t. Then, we extract the dual relation, i.e., we associate the list of triangles with each tuple of leaf blocks. Given this inverted relation, we extract a coherent ordering for the triangles of \({\varSigma }_{T}\), where triangles indexed by the same leaf tuple have contiguous indexes. Once we have this spatial ordering on the triangles, we apply it to the triangle list of each leaf block, and we compress this list by using the SRE compression. Finally, we update \({\varSigma }_{T}\) to be consistent with this spatial ordering.

5.1 Spatial and connectivity-based queries in a Terrain tree

We have developed algorithms on Terrain trees for answering two fundamental spatial queries, namely a point location and a window query, and for extracting connectivity-based relations (described in Section 2), which enable the local traversal and processing of the underlying mesh. The point location query consists of finding the triangle (or triangles) containing a given query vertex, while the window query consists of finding all the triangles which intersect an axis-aligned rectangular window.

We have implemented algorithms for connectivity-based queries which extract the connectivity relations discussed in Section 2. Since the indexed TIN representation underlying a Terrain tree encodes the TIN vertices and triangles, extracting the vertices and edges of a triangle (these latter being expressed as vertex pairs) does not require the use of the tree structure unless we combine the extraction of such information with a window query when focusing on portions of the TIN. Extracting co-boundary relations at the vertices of the mesh efficiently and massively, as our experiments show, are fundamental for computing morphological features, the discrete Forman gradient estimators, and the critical net (see Sections 6 and 7). We describe here how we extract some fundamental co-boundary relation, namely the Vertex-Triangle (VT), the Vertex-Vertex (VV), and the Vertex-Edge (VE) relations, and the Edge-Triangle (ET) relation, as well as the Edge-Vertex (EV) relation inside a leaf block of the Terrain tree. These are used in the terrain analysis algorithms presented in the following sections.

Extracting the VT relations in a block b requires knowing the set of vertices contained in b. The range of indexes of the vertices contained in any given block b is explicitly encoded in the PR-T and PM-T trees. A PMR-T tree encodes only the triangles indexed by a block b. In this case, the set of vertices in b are extracted by performing a point-in-block test for each of the bounding vertices of the triangles in b. Then, the VT relation for the vertices in block b is extracted by cycling over the set of triangles in b. For each triangle t, the algorithm iterates through the vertices of t. For each vertex v of t, if v is indexed by b, t is added to the list of triangles incident in v. The strategy for extracting the Vertex-Vertex (VV) and Vertex-Edge (VE) relations in a block b combines the VT relation with either the Triangle-Vertex (TV) relation, which provides the list of the vertices of a given triangle, or the Triangle-Edge (TE) relation, which provides the list of the edges of a given triangle. For the VV relation, for each vertex v of a triangle t, if v is indexed by b, we add the other two vertices in the boundary of t, namely v\(_{i}\) and v\(_{j}\), to the set of vertices adjacent to v. Similarly, for the VE relation of v, we pair v\(_{i}\) and v\(_{j}\) with v to get the two edges of t that are incident in v.

Extracting the Edge-Triangle (ET) relations in a block b is slightly more involved, since the edges are not explicitly encoded in a Terrain tree. The algorithm iterates over the triangles in a block b and extracts the edges on their boundary provided by the TE relation. An edge e belonging to the boundary of a triangle t is considered internal to b if it has at least one vertex indexed by b. Each internal edge e is encoded in a local associative array having as key e and as value a pair containing the index of the two triangles in the co-boundary of e. These two triangles are identified during a single iteration on the triangles in b. The strategy for extracting the Edge-Vertex (EV) relation in a block b is similar to the one for extracting the ET relation. The algorithm iterates over the triangles of b and extracts their edges. Since each edge is encoded as a pair of vertex indices, we add those edges having at least one vertex indexed by b to the output.

6 Morphological terrain features

We have developed and implemented in the Terrain trees library algorithms which extract classical morphological terrain features, namely triangle and edge slope, curvature and roughness. An experimental comparison among the various Terrain trees and the IA data structure in computing such features is presented in Section 9.

The slope of an edge or a triangle in a TIN represents the steepness and the direction of its extent, where the steepness is the absolute value of the slope. A zero value of the slope indicates horizontality. The direction is increasing if the slope is positive and decreasing if the slope is negative. Specifically, the edge slope of an edge e is the angle between e and its projection on the horizontal plane defined by the z-coordinate of its lowest endpoint. In other words, given \(\textit{v}_i =\{\textit{v}_{i_x}, \textit{v}_{i_y}, \textit{v}_{i_z}\}\) and \(\textit{v}_j =\{\textit{v}_{j_x}, \textit{v}_{j_y}, \textit{v}_{j_z}\}\) the endpoints of e. Let \(v_i\) the endpoint of e with the minimum z-value, we consider the projection \(\textit{v}_j'\) of vertex \(\textit{v}_j\) on the plane \(z=\textit{v}_{i_z}\). The slope of edge e is the angle \(\widehat{\textit{v}_j\textit{v}_i\textit{v}_j'}\). For computing edge slopes, we need to extract the edges as pair of vertices, by using the TV relation, since the edges are not explicitly encoded. Extracting the edges requires an auxiliary data structure. The decomposition of the mesh defined by the blocks becomes computationally relevant, since the auxiliary data structure is created in each block independently and discarded after processing the block. This makes it possible to compute slopes in a region of interest without iterating through the entire domain.

In a similar way, the triangle slope is defined as the angle between the normal to the plane to which the triangle belongs and a vector aligned with the z-axis. The computation of triangle slopes requires only the Triangle-Vertex (TV) relation for each triangle, which is stored in the triangle array of a Terrain tree.

We approximate the curvature at the vertices of a TIN by using a discrete approach. In our previous work [49, 50], we have developed three discrete curvature approximations of Gaussian and mean curvature, and compared and evaluated them in [48] for curvature estimation on a TIN. Our results showed that all curvature estimators provide similar results, also when used as the basis for TIN segmentation, and that concentrated curvature is the least sensitive to noise.

We consider a TIN \(\varSigma\) and a vertex v of \(\varSigma\). Let \(t_1,....,t_n\) the triangles incident in v. Let \(\textit{v}_i\) and \(\textit{v}_i'\) the two vertices in the triangle \(t_i\), different from v. The concentrated curvature is defined as \(K_c(\textit{v}) = 2 \pi - \varTheta _{\textit{v}}\), for internal vertices, and \(K_c(\textit{v}) = \pi - \varTheta _{\textit{v}}\), for boundary vertices, where \(\varTheta _{\textit{v}} = \sum _{i=1}^n \widehat{\textit{v}_i\textit{v}{} \textit{v}_i'}\).

The computation of curvature requires extracting, for each vertex v of the TIN, the set of triangles incident at v, i.e., extracting the Vertex-Triangle relation, which is performed as discussed in Section 5.1. As the internal vertices and boundary vertices have different equations for estimating concentrated curvature, a preprocessing step to identify boundary vertices is performed. For each vertex v, we check the number of edges \(\vert e \vert\) and the number of triangles \(\vert t \vert\) in the boundary of the star of v. A vertex is on the boundary when \(2\vert e \vert \ne 3 \vert t \vert\).

There are several ways to define surface roughness, and the most commonly used is the standard deviation of local elevation at each vertex, evaluated based on the neighbors of the vertex itself [66]. We have extended the definition that is given for raster grids to TINs by considering the vertices adjacent to a vertex v, which are the ones sharing an edge with v. Based on that definition, the roughness at a vertex v in a TIN is computed as:

$$\begin{aligned} R(\textit{v})=\sqrt{\frac{\sum _{i=1}^m(z_i-\overline{z})^2}{m}} \end{aligned}$$
(1)

where m is the number of vertices adjacent to v plus v itself, \(z_1\), \(z_2\), ..., \(z_m\) are the elevations at such vertices, and \(\overline{z}\) is the average of those elevations. From (1), the roughness computation requires the extraction of the VV relation of v (see Section 5.1 for details).

7 Topology-based terrain segmentation

The basis for terrain analysis is a segmentation of the terrain based on its critical points, their regions of influence and how they are connected together in the critical net. The approach we consider here is rooted in discrete Morse Theory, which supports an efficient computation of a discrete gradient on large meshes and the efficient computation of topological descriptors like the Morse decompositions and the critical net.

In Section 7.1, we present the general strategy for computing a discrete Morse (Forman) gradient and a distributed approach based on Terrain trees. In Section 7.2, we discuss how to compute the critical net and present an algorithm for Terrain trees.

7.1 Forman gradient computation

We consider a triangle mesh \(\varSigma\) and an elevation function \(f:\varSigma _V\longrightarrow \mathbb {R}\) defined on the vertices of \(\varSigma\). The algorithm for computing the Forman gradient is based on the extension to TINs of the algorithm proposed for regular grids [60]. It consists of three major steps, which are described below and illustrated by referring to Fig. 5.

Step 1 (Indexing)

The first step requires computing a total order I on the vertices of \(\varSigma\). The total order will serve as guiding schema for the subdivision of the triangles, edges and vertices of \(\varSigma\) in independent sets. This is done by Simulation of Simplicity [21], i.e., by sorting the vertices of \(\varSigma\) in ascending order and by assigning a unique index to each of them. In Fig. 5(a) we indicate the index in I of each vertex of a triangle mesh. On Terrain trees, Step 1 is executed by sorting the vertices stored in the global vertex array.

Step 2 (Partition)

\(\varSigma\) is then subdivided by associating each vertex v with the set of edges and triangles having the same value of I as v. I is extended to the edges and triangles of \(\Sigma\) via \(I(\sigma ):=\max _{\textit{v}\in \sigma }I(v)\), where \(\sigma\) is either an edge or a triangle of \(\varSigma\) and v is a vertex on the boundary of \(\sigma\). For each edge or triangle \(\sigma\), we denote as v the vertex of \(\sigma\) with maximum value of I. For this reason, this set of triangles and edges associated with v is called the lower star of v according to I and denoted \(L_I(\textit{v})\). It can be proved that each triangle or edge in \(\varSigma\) belongs to exactly one lower star. Then, the lower stars associated with the vertices form a partition of \(\varSigma\) and thus they can be processed in parallel. In Fig. 5(a) the lower star of vertex 6 is depicted with bold lines.

On Terrain trees, Step 2 is implemented through a tree traversal where each leaf block b is processed once. In a leaf block b, the algorithm extracts the lower star \(L_I(\textit{v})\) for each vertex v in b by retrieving the set of triangles incident in v (Vertex-Triangle (VT) relation) and the set of edges (Vertex-Edge (VE) relation), and by computing their values of I at runtime.

Fig. 5
figure 5

Homotopy expansion [60] computed on the simplices belonging to the lower star of vertex 6

Step 3 (Pairing)

Pairings of edges and vertices, and edges and triangles are computed through a process called homotopy expansion on each lower star. Recall that the discrete gradient is a collection of vertex-edge and edge-triangle pairs.

We initialize a set LS with \(L_I(\textit{v})\). If \(LS=\{\textit{v}\}\), v is declared as a critical vertex. Otherwise the pair \((\textit{v},e)\) is created by pairing v with the edge e in LS having \(\min _{\textit{v} \in e}I(\textit{v})\). In Fig. 5(b) edge (6, 1) is selected to be paired with vertex 6.

Then, for each triangle t in LS, we compute the number of unpaired edges on the boundary of t, which are also in LS. We are interested in two cases:

  • if t has no unpaired edges on its boundary, then it is classified as critical,

  • if t has exactly one unpaired edge e on its boundary, then it is paired with e

If by cycling over all triangles in LS, no triangle is paired, at the end of the cycle a new edge is classified as critical, and we start again.

In Fig. 5(c), triangle (6, 3, 1) is paired with its unique unpaired boundary edge (6, 3). In Fig. 5(d), no triangle has either zero or exactly one unpaired edge, then edge (6, 4) is declared as critical. In Fig. 5(e), triangle (6, 5, 4) gets paired with its unique unpaired boundary edge (6, 5). Figure 5(f) shows the discrete gradient computed within the lower star of vertex 6.

On Terrain trees, Step 3 is performed by considering each leaf block independently, and performing computation locally to the block on the lower stars of the vertices which belong to the block.

7.2 Extracting the critical net

Computing the critical net means visiting all the separatrix \(V_1\)-paths connecting critical vertices and edges, and all the separatrix \(V_2\)-paths connecting critical edges and triangles (see Section 2.3). The geometrical representation of the critical net is computed by connecting the barycenters of the triangles and edges visited in the separatrix V-paths. For the vertices of the triangle mesh, we consider the points themselves.

Extracting separatrix vertex-edge paths (V 1-paths)

Given a critical edge, the two vertices in its boundary are first extracted. For each vertex, we extract its paired edge and we insert such edges into a stack Q. The stack is used to implement a depth-first traversal of the path. At each iteration, we extract an edge e from Q and we compute its boundary vertices. For each vertex v, we compute its paired edge \(e'\) and we add \(e'\) to Q if \(e' \ne e\). This retrieves the connections of critical edges with the critical vertices. The lines of the critical net reconstructed at this stage are obtained by connecting the barycenters of each edge with the boundary vertices.

Figure 6 shows the extraction of the critical net limited to a critical edge and two critical vertices. Starting from the critical edge (Fig. 6(a)) the two vertices in its boundary are extracted and connected with the critical edge. Edges paired with such vertices are inserted in the stack Q. In Fig. 6(b), edge \(e_1\) is extracted from the stack and connected with its boundary vertex \(\textit{v}_1\). In Fig. 6(c), the other vertex on the boundary of \(e_1\) (i.e., \(\textit{v}_2\)) is connected to \(e_1\) and its paired edge \(e_2\) is added to the stack Q. The depth-first traversal continues in the same manner until two critical vertices are encountered (Fig. 6(d)).

Extracting separatrix edge-triangle paths (V 2-paths)

Given a critical edge, all the triangles in its co-boundary are extracted. For each such triangle, we extract its paired edge and we insert the edge into a stack Q. Each time we extract an edge e from Q, we compute the triangles in its coboundary. For each of such triangles t, we compute the edge \(e'\) paired with t and we enqueue \(e'\) if \(e' \ne e\). The lines of the critical net reconstructed at this stage are obtained by connecting the barycenters of each triangle with the barycenters of the boundary edges encountered during the visit.

Figure 7 shows the extraction of the critical net limited to a critical edge and a critical triangle. Starting from a critical edge (Fig. 7(a)) the triangle \(t_1\) in its coboundary is retrieved. The edge \(e_1\) paired with \(t_1\) is inserted in the stack Q. In Fig. 7(b), the edge \(e_1\) is extracted from the stack and connected with the triangle paired with it (i.e., triangle \(t_1\)). In Fig. 7(c), the next triangle on the coboundary of \(e_1\) (i.e., \(t_2\)) is connected to \(e_1\) and its paired edge \(e_2\) is added to the stack Q. The depth first traversal continues in the same manner until a critical triangle is encountered (Fig. 6(d)).

Fig. 6
figure 6

Reconstruction of a portion of the critical net connecting a critical saddle with two critical minima. (a) Starting from a critical edge, the boundary vertices are connected with the edge. (b) The edge \(e_1\) paired with \(\textit{v}_1\) is added to the stack. (c) The edge \(e_1\) is extracted from the stack, connected with its paired vertex \(\textit{v}_1\). The other vertex on the boundary of \(e_1\) (i.e., \(\textit{v}_2\)) is retrieved and its paired edge \(e_2\) is added to the stack. (d) The portion of the critical is reconstructed repeating the same steps until both paths reach critical vertices (minima)

As the extraction of the separatrix V-paths involves an intense mesh traversal, the leaf blocks of Terrain trees need to be visited multiple times. Thus, for efficiency, we use an auxiliary cache for encoding a subset of the connectivity relations required. Extracting separatrix \(V_1\)-paths, which are composed of edges and vertices, requires the Edge-Vertex (EV) relation. Extracting separatrix \(V_2\)-paths, which are composed of edges and triangles, requires the Edge-Triangle (ET) relation. The cache uses a Least-Recent-Used replacement policy (LRU-cache) which let us improve processing times with a negligible storage overhead.

Fig. 7
figure 7

Reconstruction of a portion of the critical net connecting a critical saddle with a critical maxima. (a) Starting from a critical edge, the coboundary triangle (i.e., \(t_1\)) is visited. (b) The traversal starts by adding the edge \(e_1\), paired with \(t_1\), to the stack. (c) The edge \(e_1\) is extracted from the stack and connected with its paired triangle \(t_1\). The other triangle in the coboundary of \(e_1\) (i.e., \(t_2\)) is extracted and its paired edge \(e_2\) is added to the stack. (d) The portion of the critical is reconstructed repeating the same steps until a critical triangle (maximum) is reached

Within each leaf block b of a Terrain tree, we execute the following steps:

  1. 1.

    expand the leaf block representation by computing and storing in the block the connectivity relations required, as discussed above;

  2. 2.

    extract the separatrix V-paths in b;

  3. 3.

    save in cache the connectivity relations of b.

During the computation of the separatrix V-paths, it can happen that a V-path will go outside of the leaf block b currently processed. To deal with this situation, we introduced the notion of dangling path, where a dangling path is a V-path whose continuation is outside the block currently processed. Our strategy uses the dangling paths for postponing the construction of certain V-paths, thus limiting the number of times we have to enter and exit a leaf block.

The extraction of the gradient V-paths within a leaf block b is performed as follows:

  • the new V-paths starting from the critical edges indexed by b are visited. Notice that an edge could be shared by two blocks. To process each critical edge once, we use the following convention: a critical edge is indexed by b if and only if b indexes the vertex with the higher label value in the vertex array \(\varSigma\);

  • the dangling paths that are entering b are then expended. Each time a dangling path is expanded, the corresponding entry in the auxiliary data structure storing dangling paths is removed. In this way, the storage requirement of this structure is kept negligible during the extraction process.

Notice that the visit of a V-path can be interrupted several times, as it can cross multiple leaf blocks. Once the visit of the V-paths in b is terminated, the connectivity relations computed are saved into the cache.

8 Multifield visualization

Multifield data are scientific data characterized by multiple field values. An example of this type of data is airborne LiDAR data where, for each point, multiple measures are recorded such as the intensity of the laser pulse, the point classification (i.e., ground, canopy, water, etc.), RGB bands, scan angle, and direction.

Extracting and visualizing descriptive information for multifield data is a major challenge. The technique implemented in this work relates to the notion of Jacobi set [19]. The Jacobi set of a collection of real-valued Morse functions defined on a common manifold is the set of all points where the function gradients are linearly dependent, which is directly related to the rank of the Jacobian matrix. This definition inspired numerical techniques aimed at rendering a single function built out of the multiple scalar fields in a way compatible with the relationships among scalar fields. In [53], a comparative measure is defined as measure for the evaluation of the local coherence among different scalar fields based on the gradient of the fields.

Given a point v, we can write the matrix of partial derivatives as follows:

$$\begin{aligned} dF(\textit{v}) = \begin{bmatrix} \frac{\delta f_1}{\delta x_1}(\textit{v}) &{} \cdots &{} \frac{\delta f_1}{\delta x_n}(\textit{v}) \\ \vdots &{} \ddots &{} \vdots \\ \frac{\delta f_m}{\delta x_1}(\textit{v}) &{} \cdots &{} \frac{\delta f_m}{\delta x_n}(\textit{v}) \\ \end{bmatrix} \end{aligned}$$

The multifield comparison measure in [53] is defined as the norm of such matrix \(\eta ^{F}(\textit{v}) =||dF(\textit{v})||\). To speed up the computation, the estimation of \(\eta ^{F}(\textit{v})\) is reduced to the root of the maximum eigenvalue of the matrix \((dF(\textit{v}))^T(dF(\textit{v}))\). To compute \(\eta ^{F}(\textit{v})\), when v is a vertex of a TIN, we need to estimate partial derivatives at v. In [46] several methods have been analyzed, and the best method has been shown to be the Average Gradient on Star (AGS) method [51], which is both accurate and efficient. In AGS, the gradient at a vertex v is approximated by taking the average of the gradients estimated at the triangles incident in v.

In particular, given a scalar function f defined on the vertices of a TIN \(\varSigma\), the gradient at a triangle t of \(\varSigma\), denoted as \(\triangledown {f_t }\), is calculated as follows:

$$\begin{aligned} \triangledown _{f_t}=(f(\textit{v}_j)-f(\textit{v}_i))\times \frac{(p_i-p_k)^\bot }{2A_t}+(f(\textit{v}_k)-f(\textit{v}_i))\times \frac{(p_j-p_i)^\bot }{2A_t} \end{aligned}$$

where \(\bot\) denotes the 90 degrees rotation of a vector, \(A_t\) is the area of the triangle t, \(\textit{v}_i, \textit{v}_j, \textit{v}_k\) are the three vertices of t, and \(p_i,p_j,p_k\) are vectors representing the x- and y- coordinates of \(\textit{v}_i,\textit{v}_j\) and \(\textit{v}_k\), respectively.

To compute the gradient at vertex v, we need to compute the so-called mixed area [51]. Let t be a triangle and \(p_i,p_j,p_k\) the vectors of coordinates of its three vertices. If t is non-obtuse, the contribution of t to the mixed area is \(\frac{1}{8}(|p_ip_k|^2 cot\angle p_j+|p_ip_j|^2cot\angle p_ip_k)\). If t is obtuse, there are two cases: if the angle at v is obtuse, the mixed area will be half of the triangle area, while if the angle at v is not obtuse, then the area will be a quarter of the triangle area. Then, the gradient at v is the weighted average of gradients computed at each triangle incident in v weighted by the corresponding mixed area.

Since the computation of the gradient at vertex v relies on the gradients at all the triangles intersecting at p, the computation of such gradient in the Terrain trees requires extracting the Vertex-Triangle (VT) relation, as discussed in Section 5.1.

9 Experimental results

In this section, we study the performances of the Terrain trees family. TINs used are computed from LiDAR point clouds by means of a Delaunay triangulation algorithm from the CGAL library [9].

The characteristics of each TIN are reported in Table 1. We have used a total of seven TINs with a number of vertices ranging from 34 million to 193 million. Great smokey mountain, Canyon lake gorge and Big creek are three datasets provided by the OpenTopography repository [56]. For each of them we have computed a single TIN. The original Sonoma county dataset includes more than 60 billion points. We created four different datasets by subsampling the original point cloud at four different resolution levels. For each point cloud obtained we have computed a TIN that is used in our experiments.

Table 1 Overview of experimental datasets. For each terrain, we list the number of vertices\(|{\varSigma }_{V}|\)and triangles\(|{\varSigma }_{T}|\)

Our experimental evaluation addresses five aspects: (i) calibration of the thresholds guiding the construction of Terrain trees, (ii) evaluation of the requirements for initializing Terrain trees, (iii) extraction of connectivity-based relations, (iv) computation of morphological and topology-based features, and (v) analysis and visualization of multifield data. The hardware configuration used for these experiments is a dual Intel Xeon E5-2630 v4 CPU at 2.20Ghz, and 64GB of RAM. The source code of the Terrain trees library implementing the Terrain trees is available at [26]. The source code of the LibTri library implementing the IA data structure is available at [25].

9.1 Selection of thresholds for Terrain trees generation

Terrain trees are generated based on, at most, two input values. One is the maximum number of vertices per leaf block, denoted as \(k_{\textit{v}}\). The second one is the maximum number of triangles per leaf block, denoted as \(k_t\). Since the number of triangles in a TIN is about twice the number of its vertices, in the following, we set \(k_t = 2k_{\textit{v}}\). To efficiently calibrate such thresholds, we performed a preliminary evaluation to identify non-optimal values, that create either too deep or too coarse hierarchies. This evaluation also established an initial test range such that each leaf block contains between 1 millionth and 10 millionth of vertices of the TIN. For each dataset, we create a total of 20 spatial indexes within this test range. All triangle meshes are generated over irregularly distributed LiDAR point clouds. Since the observed performance trend is similar on all datasets, in this section, we show just the plots from great smokey mountain dataset to evaluate the experimental results on threshold selection. We provide the plots describing the performance trends of other datasets in Appendix A. In order to evaluate the effects of varying \(k_{\textit{v}}\) and \(k_t\), we analyze the following parameters: (i) storage costs, (ii) time requirements for generating a Terrain tree, and (iii) time requirements for answering the most common connectivity-based query, i.e., the extraction of Vertex-Triangle relation.

Fig. 8
figure 8

The Storage costs for storing the hierarchical index of the Terrain trees on Great smokey mountain dataset using different values of \(k_{\textit{v}}\) and \(k_t\). The x-axis shows the threshold value on the vertices

Since all Terrain trees encode the TIN with the same indexed representation, we focus only on the storage costs of the hierarchical index without considering the storage costs for encoding information on the vertices and on the triangle connectivity. Figure 8 shows these results. The storage costs show a sharp decrease with smaller thresholds, while they remain nearly constant with larger ones. The differences in storage costs among the three indices are more noticeable when the thresholds are small, while the costs become nearly identical for larger ones. The storage cost of the hierarchical index is closely related to the number of nodes in a Terrain tree. When using larger thresholds (i.e., \(k_V\) greater than 550), Terrain trees have the same number of nodes, and this means that the spatial index is losing its effectiveness at decomposing the embedding space. This latter result highlights that the threshold on the triangles is the one guiding the spatial decomposition when both thresholds become larger.

Figures 9(a) shows the time required for generating the spatial decomposition of a Terrain tree on great smokey mountain dataset. Generation times decrease for all types of Terrain trees when using larger thresholds since each leaf block can contain more vertices and triangles, and the spatial decomposition is obtained by executing fewer split operations. Also, since the subdivision rule for the PMR-T tree is not recursive, its construction is always faster than the other two rules, that, on average, have similar generation times.

Fig. 9
figure 9

Timings for generating the spatial decomposition of Terrain trees and extracting the VT relation in Terrain trees on Great smokey mountain dataset using different values of \(k_{\textit{v}}\) and \(k_t\). The x-axis shows the vertex threshold value

We evaluate now how different thresholds affect the extraction of the Vertex-Triangle (VT) relation. This relation is key to most of the applications defined in our framework. Figure 9(b) summarizes the results we have obtained. We notice that the larger the threshold used, the faster the extraction of the VT relation is. The larger time drop is noticeable on smaller thresholds, while for larger thresholds the time difference becomes smaller. Since PR-T and PM-T trees explicitly encode the range of vertices, they show similar performances, and are always faster than PMR-T trees, using from 30% to 70% less time, as the latter has to compute such vertex ranges at run-time.

These results highlight how the performances of Terrain trees are affected by the two user-defined thresholds. By considering only the storage requirements and generation timings, we see that by using larger thresholds, we reduce such quantities, since the decomposition is coarser and the compressed encoding becomes more effective. However, for larger values of \(k_V\) and \(k_T\), the spatial index becomes less efficient, being more similar to a global representation. For such thresholds, leaf blocks encode more entities, leading to a reduced speedup gain and to higher storage requirements for encoding application-specific auxiliary data structures. Based on the results and evaluations above, we choose thresholds that generate trees having a number of nodes between 300K and 500K, as we have noticed that such spatial indices are neither too coarse nor too deep, with overall good performances at generating and encoding Terrain trees and answering connectivity queries.

9.2 Terrain trees evaluation

In this section, we compare the storage costs and timings for generating the spatial indices of Terrain trees and IA data structure, as well as their performance at extracting the Vertex-Triangle (VT) relations.

Table 2 Overview of Terrain trees. For each Terrain tree, we list the thresholds \(k_V\) and \(k_T\) and the number of blocks in the index (|N|)

We generate a PR-T tree, a PM-T tree, and a PMR-T tree for each TIN. A single value for \(k_v\) and \(k_t\) is selected for each dataset according to the results discussed in Section 9.1. Table 2 shows the thresholds selected and the total number of internal and leaf nodes of each Terrain tree. Notice that PR-T trees and PMR-T trees use only one threshold value (i.e., \(k_v\) and \(k_t\), respectively), while PM-T trees use both. As shown in Table 2, the number of nodes in a PM-T tree and a PMR-T tree of the same dataset is always similar, which leads to comparable storage costs. The number of nodes in a PR-T tree is always smaller than in the other two trees. These results match the ones of Section 9.1, which show that a PR-T tree always has a lower storage cost compared to the other two Terrain trees.

Table 3 Comparison of storage, expressed in megabytes (MB) and gigabytes (GB), for the underlying TIN, Terrain trees and IA data structure. O.O.M. stands for Out Of Memory

Table 3 shows the storage costs of the Terrain trees and of the IA data structure. Since both Terrain trees and IA data structure encode an indexed representation of the TIN, we represent this storage requirement separately, and thus, the storage costs shown in Table 3 consider only the requirements for encoding the spatial index in a Terrain tree, and for encoding adjacency and coboundary relations (i.e., Triangle-Triangle and partial Vertex-Triangle relation) in the IA data structure. The total storage requirements can be easily computed by adding the storage cost of the TIN to the corresponding Terrain trees or IA data structure overhead cost. Note that the overhead cost of Terrain Trees is between 1% and 3% of the overhead cost of the IA data structure. When considering also the cost of the underlying indexed TIN representation, a Terrain tree can encode the same dataset using, on average, 36% less storage than IA data structure. The differences between the three Terrain trees are minimal. The PR-T tree is the most compact since it generates fewer leaf blocks compared to the other two.

Fig. 10
figure 10

Comparison of total timings, expressed in minutes (m), for generating a Terrain tree or the IA data structure

Figure 10 shows the time requirements for generating a Terrain tree or the IA data structure. The generation times do not consider the time needed to load the TIN from file, but only the time for generating the corresponding data structure. The generation times for the IA data structure are about 10% of those of Terrain Trees. This is expected since the IA computes only the adjacencies between triangles and a partial VT relation. Terrain Trees, instead, are created by first computing the spatial decomposition and then compressing its representation, as described in Section 5. Also in this case, the differences between the three Terrain trees are minimal, and the generation of a PMR-T tree is always 20% faster than that of the other two, which is consistent with the findings in Section 9.1.

Fig. 11
figure 11

Comparison of total timings, expressed in minutes (m), for extracting the VT relations

As shown in Fig. 11, Terrain trees can always extract VT relations faster than the IA data structure. PR-T and PM-T trees use from 57% to 72% less time than the IA data structure. PMR-T trees use, on average, from 30% to 70% more time compared to the PR-T and PM-T trees, still saving at least 30% time compared to IA data structure. The differences among Terrain Trees are due to the encoding of the vertex ranges. In PR-T and PM-T trees, such ranges are explicitly encoded, while in PMR-T trees, they are computed at run-time on a block-by-block basis.

9.3 Computing morphological features

In this section, we evaluate the performances for computing morphological features, as described in Section 6. Results are shown in Figs. 12, 13, and 14.

Fig. 12
figure 12

Comparison of total timings, expressed in minutes (m), for extracting edge slopes

Computing the triangle slope requires the Triangle-Vertex relation, while computing the edge slope requires the (dual) Vertex-Triangle relation. Since both Terrain trees and the IA data structure store the Triangle-Vertex relation explicitly we only compare their performances in computing the edge slope.

As edges are not explicitly encoded neither in Terrain trees nor in the IA data structure, we have to use an auxiliary lookup table in both implementations for encoding the slope values without duplicates. Terrain trees enable the usage of a local data structure within each leaf block, and this reduces both the cost of encoding and accessing the lookup table. As shown in Fig. 12, the slope estimation benefits by the use of a spatial index and a modular structure. We notice that computing edge slopes on Terrain trees requires from 37% to 45% less time and less memory than by using the IA data structure (considering also that the IA data structure goes out of memory five times). The difference among the three Terrain Trees is limited, since the extraction time for the VT relation accounts for a small portion of the overall slope computation time.

Fig. 13
figure 13

Comparison of total timings, expressed in minutes (m), for extracting concentrated curvatures

As discussed in Section 6, estimating the curvature requires visiting the star of each vertex (i.e., extract the Vertex-Triangle relation). As in Fig. 13, the implementation based on Terrain trees is always faster than the one based on the IA data structure, as it requires from 25% to 30% less time. Both Terrain trees and the IA data structure require the same amount of space for encoding curvature values, while the size of the auxiliary data structures is negligible. The performances of the three Terrain trees are similar. This shows that extracting the vertices in a block at run-time does not affect significantly the performances of the spatial index.

Fig. 14
figure 14

Comparison of total timings, expressed in minutes (m), for extracting roughness values

Estimating roughness requires computing the Vertex-Vertex relation. In Terrain trees, the roughness computation is pretty efficient (see Fig. 14), being from 36% to 55% faster than the corresponding procedure on the IA data structure. On larger datasets, PMR-T trees are always slower than PR-T and PM-T trees, since they are less efficient at extracting the VV relation.

9.4 Computing topology-based segmentations

In this section, we evaluate the performances in computing the Forman gradient, and in extracting the critical net.

9.4.1 Forman gradient computation

As discussed in Section 7, computing the discrete gradient on a Terrain tree requires extracting the star of each vertex of the TIN, i.e., the Vertex-Triangle relation. As shown in Fig. 15, Terrain trees are always faster than the IA data structure requiring about 20% less time. PMR-T trees are usually slightly faster than PR-T and PM-T trees, but overall the time difference between the three Terrain trees is small.

Since Terrain trees have lower storage requirements than the IA data structure, they can complete the Forman gradient computation on all datasets, i.e., there is enough memory for encoding the discrete gradient and the auxiliary data structures used in the process. This does not apply to the IA data structure, which goes out of memory on the two larger datasets.

Fig. 15
figure 15

Comparison of total timings, expressed in minutes (m), for computing the Forman gradient vector

9.4.2 Critical net extraction

In a Terrain tree, the extraction of the critical net requires an intense navigation of the spatial index, and the extraction at run-time of connectivity relations that, in the IA data structure, are either explicitly encoded (Triangle-Triangle relation) or efficiently extracted (Vertex-Vertex relation). This application represents an interesting worst-case scenario for Terrain trees.

Fig. 16
figure 16

Comparison of total timings, expressed in minutes (m), for computing the critical net

Both Terrain trees and the IA data structure use auxiliary data structures for extracting the critical net. The IA data structure uses a global stack to perform the TIN traversal. Conversely, a Terrain tree uses a cache of leaf blocks with expanded connectivity information as well as a list of dangling paths, plus a local stack within each leaf block (see Section 7.2). As shown in Fig. 16, thanks to the lower storage requirements of Terrain trees, such auxiliary structures can be effectively encoded in memory, while the IA data structure goes out of memory on the three larger datasets.

Comparing timings, PR-T and PM-T trees have comparable performances with respect to the IA data structure, and use up to 10% more time, while PMR-T trees perform significantly worse being up to 8 times slower than the IA data structure. Comparing Terrain trees, PR-T and PM-T trees have similar performances and are at least 5 times faster than PMR-T trees. As we have already observed with the other applications, this speedup is due since a PMR-T tree has to compute the range of vertices at run-time.

9.5 Multifield data visualization

We evaluate the performances of Terrain Trees for computing the multifield measure described in Section 8. We test our implementation for a visual analysis on two small datasets (shown in Figs. 17 and 18), and for a performance analysis against the IA data structure on three different areas of the sonoma county dataset. Each dataset has a total of five scalar fields: elevation, a color field (encoded as a RGB triple), and roughness.

Fig. 17
figure 17

The satellite map and the visualization of the multifield comparison measure on an area with few trees and some human artifacts. Figures show the satellite image (a), the multifield measure based on red, green and blue values (b), based on elevation paired with green values (c), and based on elevation paired with RGB values (d)

First, we visually analyze the algorithm performance on an area with few trees and some human artifacts (a fence and a small building). Figure 17 presents the raster image of this area (Fig. 17(a)), and three output images obtained with our algorithm in which we used as input scalar fields: (i) the RGB values (Fig. 17(b)), (ii) the green band paired with the elevation (Fig. 17(c)), and (iii) the RGB values paired with the elevation (Fig. 17(d)). Just using the RGB values, the algorithm can identify the boundary of the buildings and their shadows clearly, but it does not highlight precisely the trees. Pairing the green band with the elevation improves the identification of trees, while the human artifacts result smoothed and less clear. Lastly, if we pair the three RGB values and the elevation, the algorithm can correctly highlight both trees and human artifacts.

In order to understand the performance of our strategy in distinguishing forest areas from other land cover types (like rivers or streets), a region with higher tree density has been used (see Fig. 18). We compare the visualization results when using just the RGB values (Fig. 18(b)), and when pairing them with roughness (Fig. 18(c)). In this case, we pair the RGB values with roughness instead of the bare elevation, since roughness has been proven to be a better estimator for identifying surface deformations in a terrain [66]. The visual comparison of the outputs shows that the multifield strategy is more precise in highlighting the different cover types, when also a geometric field is added to the identification procedure. Pairing roughness with the color values enables the identification of both the road crossing the forest and areas of low vegetation (Fig. 18(c)), that cannot be clearly identified by just using the color (Fig. 18(b)). Also, a narrow band representing the road in the forest can be identified clearly only if we include roughness values in our input fields.

The visual analysis of the results shows that the multifield strategy can be effectively used for highlighting key areas in satellite datasets, and that the identification improves when pairing a geometric attribute with other scalar fields that are not spatial or geometric.

Fig. 18
figure 18

The satellite map and the visualization of the multifield comparison measure on an area with high tree density. Figures show the satellite image (a), the multifield measure based on red, green and blue values (b), and based on roughness paired with RGB values (c)

Fig. 19
figure 19

Comparison of total timings, expressed in minutes (m), for computing the multifield measure

Finally, we compare the performance of Terrain trees and IA data structure at extracting the multifield measure on three datasets based on sonoma county. Results are reported in Fig. 19. Overall, the timing performances of the Terrain trees and IA data structure are very similar, even if we notice that Terrain trees are about 5% faster than the IA data structure. The performance difference is minimal if we compare the Terrain trees. Thanks to the initial lower storage requirements, Terrain trees can compute the multifield measure on all three datasets, while the IA data structure goes out of memory on the largest one.

10 Concluding remarks

We have presented a family of spatial data structures, the Terrain trees, for the efficient representation, analysis and visualization of triangulated terrains. A Terrain tree combines a minimal connectivity-based encoding of the underlying triangle mesh with a hierarchical spatial index, thus implicitly encoding other connectivity relations. Terrain trees consist of three spatial indexes that use different bucketed subdivision rules. By borrowing an idea presented in [27] for a distributed data structure for simplicial complexes in arbitrary dimensions, we use spatial coherence to reorder the indexed data, thus achieving the compression of both vertex and triangle information inside the spatial index. This enables high storage reduction and optimized algorithms.

We have proven the effectiveness of our proposal by designing and implementing state-of-the-art morphological estimators for terrain analysis, like slope, curvature, and roughness, as well as a distributed technique based on discrete Morse theory for topology-based segmentation of such terrains. Lastly, as it is common to study a terrain in combination with additional fields attached to it, we have defined a distributed strategy for visualizing multifield data.

We have experimentally demonstrated how the bucketing thresholds on the Terrain trees affect generation times, storage requirements, and performances in extracting a basic connectivity relation. This has enabled the optimal identification of the most appropriate threshold ranges. We have then experimentally demonstrated the efficiency of our data structure by comparing it against the most common and compact connectivity-based data structure, the IA data structure. Terrain trees require always less storage, they generally perform better than the IA data structure, and their effectiveness increases with the dataset size. Conversely, the difference in performances among the three Terrain trees is minimal. We have noticed, however, that spatial indexes explicitly encoding the vertices in each leaf block have shown better performances at computing those estimators which require the efficient extraction of the triangles incident in a vertex. The source code of Terrain trees library and of the library implementing the IA data structure, called LibTri, are available in the public domain [25, 26].

The experimental results showed that encoding the triangle meshes has a relevant impact on the storage cost. As future work, we will consider distributing the global arrays across the leaf block of the Terrain trees. Also, we plan to design an algorithm for computing the TIN at runtime, locally within each leaf block, and discard it when no longer needed. This will further reduce the memory footprint and enable the handling of even larger point clouds.

For topology-based analysis of the terrain, we are currently extending our tool with algorithms for geometric and topology-based simplification. Simplification algorithms have been developed for reducing the size of a TIN based on the edge contraction operator, but the major problem with TIN simplification algorithms is that they can create or remove critical points in an uncontrolled way. Topology-aware operators [38] have been defined to solve this issue by coarsening a TIN without affecting its topology. While effective, existing algorithms are sequential in nature and are not scalable enough to perform well with large terrains. We are currently developing a simplification algorithm in the Terrain Trees using the topology-aware edge contraction operator first introduced in [38]. Thanks to the compact and distributed representation of Terrain trees, this algorithm will improve both the memory and time requirements of the simplification procedure. Furthermore, we plan to investigate a new parallel topology-aware simplification algorithm that takes advantage of the spatial domain decomposition at the basis of Terrain trees.

Algorithms for geometric and topology-based simplification intensively update both the TIN and the Terrain tree. Currently, we are defining an update procedure that keeps the Terrain tree up-to-date after a simplification, in such a way that performances are not largely affected. In the future, we plan to extend this procedure to support also generic updates to the TIN, like adding new points or removing TIN vertices. The generic mechanism to update the Terrain tree is similar to the process performed after a simplification, but it might be more challenging if larger portions of the TIN would need to be re-triangulated.

The morphology of a terrain, represented by the critical net, can also be simplified by modifying the underlying discrete gradient [29]. Simplification strategies for the discrete gradient have been defined for reducing noise and for obtaining clearer and accurate representations of the critical net [22]. We plan to implement a simplification algorithm for the discrete gradient. This will result in more robust descriptions of the terrain morphology.

Lastly, we are currently studying an extension of the Terrain trees for distributed frameworks like Apache Spark [76] or MPI [11]. The distributed environment will increase the scalability of our approach dramatically. The hierarchical representation of the Terrain trees is well suited to be organized in the distributed framework. The challenge here is defining a distributed algorithm for constructing the TIN in such context.