1 Introduction

We are concerned with the tracking of regions defined by merge trees. In [12], we devised a method that tracks the superlevel or sublevel sets of a scalar field as defined by the subtrees of the merge tree. However, once these regions have been extracted in each time step, we neglect their origin 1 and record tracking information such as overlap and histogram similarity in a directed acyclic graph (DAG). Its nodes are the regions (Figs. 1 and 2). Overlapping and similar regions in consecutive time steps are connected by an edge, weighted by the amount of overlap and similarity. We solve a shortest path problem to track a region over time. This global approach to tracking prevents the issue with only local decisions as shown in Fig. 1.

Fig. 1
figure 1

Tracking regions solely based on local decisions leads to broken tracks. In this simple example, a small fluctuation between time steps t 3 and t 5 causes the creation of a region C that has significant overlap and similarity with region A. Assigning the locally best match neglects that there can be more than one suitable track between two time steps (e.g., between t 5 and t 6), and causes tracks to break. A graph structure as illustrated in Fig. 3a and b helps circumvent this problem

Fig. 2
figure 2

Illustration of an objective function that does not satisfy the condition expressed in Eq. (4). If f signifies the standard deviation of the weights along a path, f(CABD) < f(CFGD) but f(CABD ∪ DE) > f(CFGD ∪ DE)

Fig. 3
figure 3

Several nodes in the graph can have the same shortest path. Hence, running Djikstra’s algorithm for every node independently will be expensive and redundant. (a) Dijkstra’s algorithm to find the shortest path through the DAG represents the track of this region. Several regions may have the same source and sink, and result in the same shortest path. In this example, the shortest path starting from source A 1 and sink A 7, is common to all the nodes in bold outline. The path is shown as a blue band. (b) The shortest path through the green bold outlined nodes also has the same source-sink pair (A 1, A 7) as in (a). It is given by the green band. The shortest path through the red bold outlined nodes is given by the magenta band

In [12], we present, among other things, a method for tracking a single region using the DAG. This is done by computing shortest paths to all reachable sources and sinks from a given node and combining those two paths. This however, is not how one might define a shortest path via a node. In this paper, we define a shortest path as the path with the least objective function value out of all paths starting at a source, going through the given node and ending at a sink. An objective function can be any function which assigns a score to a path based on how well it represents the evolution of a particular feature along that path. Using this definition of a shortest path, the previous method of combining backward and forward shortest paths may not work.

In this work, we extend the previous work and present a non-trivial solution to tracking all regions from all time steps, i.e., a method for extracting all feature tracks. The trivial solution is to iterate over all nodes of the DAG and execute the single region tracking algorithm from [12]. However, we will show in this paper how this leads to very long running times. Our approach is up to two orders of magnitude faster. Our method employs a shortest path algorithm but is quite different from the standard Djikstra’s algorithm or Floyd-Warshall’s algorithm to compute all pairs shortest paths. Our DAG being structured in a way that only temporal edges—edges between nodes of two successive time steps—exist, facilitates better runtime bounds than standard algorithms.

2 Related Work

The sheer size of time-dependent data sets often necessitates a data reduction step before an efficient analysis can take place. It is therefore a common approach to extract and track features.

Many methods track topological structures. Tricoche et al. [23] track critical points and other topological structures in 2D flows by exploiting the linearity of the underlying triangle grid. Garth et al. [4] extend this to 3D flows. Theisel and Weinkauf [18, 26] developed feature flow fields as a general concept to track many different features independent of the underlying grid. Reinighaus et al. [11] extended this idea to the discrete setting.

In the area of time-dependent scalar fields several methods exist to track and visualize topological changes over time. Samtaney et al. [15] provides one of the first algorithms to track regions in 3D data over time using overlap. Kettner et al. [7] presents a geometric basis for visualization of time-varying volume data of one or several variables. Szymczak [17] provides a method to query different attributes of contours as they merge and split over a certain time interval. Sohn and Bajaj [16] presents a tracking graph of contour components of the contour tree and use it to detect significant topological and geometric evolutions. Bremer et al. [2] provide an interactive framework to visualize temporal evolution of topological features.

Other methods for tracking the evolution of merge trees such as the method due to Oesterling [8] track changes to the hierarchy of the tree. This comes at the price of a very high computation time. Its runtime complexity is polynomial in the data size, more precisely, it is O(n 3) with n being the number of voxels. However, the method tracks the unaugmented (full) merge tree instead of just critical points or super-arcs.

Vortex structures are another important class of features that can be tracked in time-dependent flows. Reinders et al. [10] track them by solving a correspondence problem between time steps based on the attributes of the vortices. Bauer and Peikert [1] and Theisel et al. [19] provide different methods for tracking vortices defined by swirling stream lines. This notion was extended later to include swirling path lines [25], swirling streak and time lines [27], swirling trajectories of inertial particles [5] and rotation invariant vortices [6].

Pattern matching has been originally developed in the computer vision community. A number of visualization methods have been inspired by that. Examples are pattern matching methods for vector fields based on moment invariants as proposed by Bujack et al.[3], or pattern matching for multi-fields based on the SIFT descriptor as proposed by Wang et al. [24].

A similar line of research, but technologically rather different, is the analysis of structural similarity in scalar fields, which gained popularity recently. Thomas and Natarajan detect symmetric structures in a scalar field using either the contour tree [20], the extremum graph [21], or by clustering contours [22]. Saikia et al. compared merge trees by means of their branch decompositions [13] or by means of histogram over parts of the merge tree [14]. Our method outputs a set of best tracks of topologically segmented structures in a spatio-temporal setting, and enables an all-to-all temporal pattern matching scheme using techniques like dynamic time warping.

3 Method

In the following, we will first briefly recapitulate the tracking method for single regions of [12], and then present our new and fast approach for tracking all regions.

3.1 Tracking Merge Tree Regions using a Directed Acyclic Graph

Given is a time-dependent scalar field. It could have any number of spatial dimensions, our implementation supports 2D and 3D. A merge tree is computed from each time step independently. After an optional simplification, all subtrees (as defined in [13]) are converted into a set of nodes to be used within the Directed Acyclic Graph (DAG). They represent the components of the superlevel or sublevel sets of the scalar field and are continuous regions in the domain.

All overlapping nodes from consecutive time steps are connected via edges in the DAG. Their weights represent local tracking information in the sense that a lower edge weight indicates a higher likelihood for the two connected regions to be part of the same track. We use a linear combination of volume overlap and a histogram difference to compute these weights. The volume overlap distance d o between two non-empty regions and is determined from the number of voxels they have in common and the total number of voxels covered by both regions:

(1)

The Chi-Squared histogram distance (see e.g. [9]) between two regions is defined as

(2)

where h a,i and h b,i denote the bins of the histograms h a and h b, respectively. Here the histograms represent the number of vertices encapsulated by a region as described in [14].

Our combined distance measure for an edge is given by d = λd s + (1 − λ)d o where λ ∈ [0, 1] is a tunable parameter. It is now possible to use this DAG for the next step as is, or it can be further thresholded to weed out extremely large weighted edges (For instance in Fig. 3a the edges between the green and pink nodes are removed).

We track a region by solving a shortest path problem with Dijkstra’s algorithm on the DAG. The method in [12] does this for one region at a time. From the selected region, a shortest path is found to a source in an earlier time step, and another shortest path is found to a sink in a later time step. Combining these paths yields the track of the region. We describe a path to be a set of successive directed edges in the graph. Since there exists only a single directed edge between any two nodes in the graph, a path can also be described by all successive nodes that connect these edges. Source and sink refer in this context to nodes that have no incoming or outgoing edges, respectively. We discuss this in more detail in the next section.

3.2 Objective Function and Its Validity with Dijkstra’s Algorithm

The classic Dijkstra algorithm finds the shortest path by summing up the edge weights along the path. Applying this directly to our setting would yield unsuitable tracks: instead of following a long path with many likely edges, the tracking would rather choose an unlikely edge to an immediate sink.

Hence, we use a measure assessing the average edge weight along a path. The goal is to find the path through a given node that has the smallest normalized squared sum of edge weights d i:

(3)

The purpose of this section is to demonstrate that the Dijkstra algorithm can be used to solve for this objective. To do so, let us define an objective function f which assigns a non-negative score to a path satisfying the following condition:

Condition 1

Consider two paths and with . We require the objective function to maintain this relationship after adding an edge e:

(4)

Dijkstra’s algorithm can only be used to solve for an objective if this condition is fulfilled, since it allows to incrementally build a solution, which is the essential cornerstone of Dijkstra’s algorithm.

The objective function used in the classic Dijkstra’s shortest path algorithm is the sum of weights of all edges in a path . This function trivially satisfies the above condition. A non-satisfying objective function is the standard deviation of the weights as shown in Fig. 2. Thus we can see that not all objective functions that determine the quality of a path can be used with Dijkstra’s algorithm.

Regarding the objective function (3) we note that it can be solved with Dijkstra’s algorithm if holds, i.e., the two paths are of equal length. This keeps the denominator of (3) equal, and the numerators are just a sum of values consistent with Condition 1. The condition always holds true in our setting, since edges connect two consecutive time steps only and we start Dijkstra’s algorithm at a particular source, which keeps all considered paths at equal length. Hence, Dijkstra’s algorithm can be used to solve (3).

3.3 Algorithm for Finding All Paths

Tracking of a single node in the DAG is done by finding the shortest path through that node from any source to any sink of the DAG. It may be that the shortest path through other nodes coincides with . This is illustrated in Fig. 3. Hence, to find the shortest paths through all nodes, running a naïve Dijkstra for every node independently will be expensive and redundant.

Instead, we run Dijkstra’s algorithm for every source and sink (in a joint fashion in two passes, see below), record the gathered information at every node, and stitch this information together to obtain the shortest path for every node.

To facilitate this, we define a function to incrementally compute the objective function in (3). We denote this new function by the symbol ⊕ and call it the incremental path operator. The incremental path operator takes as input the objective value for path and a connecting node n and computes the global measure for path . If the weight of the connecting edge between and n is given by d, ⊕ is defined as follows

(5)

Furthermore, all nodes are topologically sorted. That is, for a node n p,i at timestep t = p and another node n q,j at timestep t = q, node n p,i occurs before node n q,j in the sorted order if p < q.

Our algorithm works as follows. We make two passes through this list of sorted nodes. One in the sorted order (past time steps to future time steps) and one in the reverse sorted order. During the first pass, at every node, the best path from every reachable source to that given node is recorded. This is done by checking all incoming edges to that node and incrementally calculating the best path from all incoming edges from a single source. This becomes possible because all nodes connected to the incoming edges have already been processed earlier (they live at the previous time step). Consider a node n i with some incoming edges as illustrated in Fig. 4. The best score from any given source to n i is calculated using:

(6)
Fig. 4
figure 4

Illustration of Algorithm 1. For every node n i in the DAG, the lowest cost (and the corresponding best neighbor) to every reachable source is computed iteratively and stored in the associative map . In this figure, for example, the best path from n i to source s 1 is via its neighbor n i−1,1. Similarly lowest costs to all reachable sinks are stored in the map . After these values are computed, the best source-sink pair (s, k) is computed with the lowest cost using Eq. (8) and the best path from s to k passing through n i is traced out. All nodes lying on which have the same best source-sink pair (s, k) need not be processed as the best path through any such node is itself. The final output is the set of all paths passing through every node in the DAG

Algorithm 1 shows the pseudo-code for the first pass described above. The second pass is equivalent to the first, but operates on the DAG with edges reversed. We record the best path to every reachable sink now. This is done by checking for outgoing edges and the sinks that they lead to. The best score to any given sink from n i is calculated using:

(7)

Let the set of all reachable sources to node n i be S i and the set of all reachable sinks be called K i. After the two passes are complete, the combined best path for every node is calculated by choosing the paths from the source-sink pair which minimizes the objective function on the combined path as follows:

(8)

Algorithm 2 shows the pseudo-code to obtain all best paths. It can be observed that, if for any given node n i, the best source-sink pair is given by (s i, k i) and the extracted best path is , all nodes lying on this path, having the same best source-sink pair (s i, k i), will trace out the exact same path. Hence, while determining unique paths in our solution, we can avoid tracing paths from all such nodes. See Fig. 4 for an illustration.

After all nodes have been examined, we are left with the set of best paths passing through every single node in the DAG. An illustration of the output is shown in Fig. 5.

Fig. 5
figure 5

The shortest paths through all nodes in the DAG combined represent our track graph structure. Our algorithm avoids computing the three shortest paths (given by the blue, green and magenta bands) for every single node naively, but instead traces a single shortest path only once. Nodes which lie on a shortest path and has the same source-sink pair, trivially trace the same path

Algorithm 1: Algorithm to find the associations for the best routes to any node from all reachable sources. The best routes to all reachable sinks are determined by running the same algorithm with the nodes sorted in reverse order

Algorithm 2: Algorithm to find shortest paths via every node

3.4 Complexity Analysis

Let us assume, without loss of generality, that the average number of features in every timestep is n. For t timesteps, we would then have a total of tn nodes in the entire DAG. The number of edges is bounded by n 2 between every pair of successive timesteps, so the total number of edges would be bounded by tn 2. The naïve version of the algorithm is a combination of two simple Dijkstra runs from any given node to all reachable sources and sinks. As we know the runtime of this algorithm is O(V + E), for V  vertices and E edges in a graph, the runtime in the naïve case will be O(tn + tn 2) or O(tn 2) for every node. Hence, if we were to run the naïve algorithm for all nodes, the runtime would be given by O(t 2 n 3) in the worst case.

Now for our improved algorithm, assuming the number of sources/sinks is given by p, we can safely say that p << tn. The runtime of Algorithm 1 is then given by O(tnp + tn 2 p) or O(tn 2 p). For Algorithm 2, it is O(tnp 2 + t 2 n). So the total runtime of our algorithm would be given by O(tn 2 p + tnp 2 + t 2 n) which in practice (as seen in Table 1) is far less than O(t 2 n 3).

Table 1 Computation runtimes and memory requirements of our algorithm versus the naïve one for several data sets

The memory footprint for the naïve version is bounded by the normal Dijkstra runtime of O(N) for N nodes. Thus, in our scenario, it is given by O(tn) as the shortest path via every node is computed independently. For the improved algorithm however, we need to store the mappings of shortest paths from all incoming/outgoing edges to all reachable sources/sinks and hence the memory footprint is given by O(tnp).

3.5 Filtering Similar Paths for Visualization

For visualization purposes we need to choose the best candidate paths which best represent a feature track at some spatio-temporal region.

In most cases, due to slight perturbations in the DAG, two unique paths may differ only at very few node positions with most of their nodes being identical. An example of this can be observed in Fig. 5, where the blue and green paths show in essence the same structure with only a slight perturbation.

We aim to show the path with the best objective score, while other similar paths falling within a specified threshold are filtered out. The similarity g between two paths is estimated using

(9)

where represent the number of matching edges. The function g estimates the fraction of edges that are identical in both paths. The filtration using function g is applied as follows. All paths obtained by solving Eq. (8) for every node are sorted according to the best objective function score given by Eq. (3). Paths are then processed in this sorted order, from lowest score to highest. If a path falls above the similarity threshold with any other path encountered before, it is filtered out. All other paths are retained.

If the filter node is set to be 100%, we are left with the complete set of unique paths. In our experiments, a filter rate of 70% shows best results.

4 Results

The timing and memory consumption for our method are given in Table 1. Regarding the computation times, note how our algorithm improves over the naïve version by up to two orders of magnitude. Regarding the memory consumption, the naïve method has lower memory usage as it only processes one node at once, while our algorithm processes all nodes together. Hence, considering the number of nodes in each data set, our algorithm is quite efficient with regards to memory usage as well.

Figure 6 shows a rotating and translating benzene data set. Since the data is not truly time-dependent, but just transformed rigidly, this serves as a test case to show that we capture all expected tracks and that our method is invariant against rotations and translations.

Fig. 6
figure 6

Our method applied to the Benzene dataset. The paths indicate tracking of centers of mass of the regions signified by nodes in our DAG. (a) Paths of all lengths at filter rate 100%. (b) Paths of length 100 and above at filter rate 100%. (c) Paths of length 100 and above at filter rate 70%

Figure 7 shows the 2D time-dependent Streak Line Curvature dataset.

Fig. 7
figure 7

(a) The 2D Streak Line Curvature dataset at filter rate 100% and showing paths of all lengths. (b) At 100% filter rate and full length paths only. (c) At 70% filter rate and full length paths only

Figure 8 shows the tracks for the smallest super/sub level set regions in a 2D Checkerboard dataset. The checkerboard pattern starts off smoothly and becomes increasingly noisy with time.

Fig. 8
figure 8

Rotating 2D Checkerboard dataset. Tracks for the centers of mass of the smallest super/sub level sets are shown. (a) Filter rate 100% and paths of all lengths. (b) Filter rate 100% and long (100 length or more) paths only. (c) Filter rate 70% and long paths only. (d) Filter rate 70% with long paths obtained from the naive version of the algorithm

Figure 9 shows all the tracks in a flow around a 3D Square Cylinder. The location of the center of mass of a region is used to visualize the paths in all result images.

Fig. 9
figure 9

Flow around a Square Cylinder dataset. (a) Paths of all lengths extracted at filter rate of 100%. (b) Paths of all lengths at filter rate 90%. (c) Paths of all lengths at filter rate 70%

5 Conclusion

We presented an extension of the method in [12] which was used to extract the best track through a chosen region at any given timestep in a time-dependent scalar field. These regions are based on topological segmentations in the spatial domain using merge trees and form the nodes in a Directed Acyclic Graph (DAG) structure in the spatio-temporal domain. Using the method in [12] to extract the best tracks through all nodes naïvely results in tracing the same paths multiple times. The algorithm presented in this paper makes use of the structure in the DAG to iteratively compute the best paths to every node from all reachable sources and sinks. This in turn allows us to compute the best paths through all nodes at orders of magnitude faster than the naïve approach. We also presented a filtering algorithm to filter out very similar paths for visualizing all paths together. Further work may include clustering these paths according to their similarity by using temporal similarity estimation techniques like dynamic time warping.