Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Many different systems show recurring behavior and its study has attracted attention in almost all scientific fields. The climate system can express recurring behavior due to Milankovich cycles [1], seasonal changes, El Niño/Southern Oscillation, etc. The study of these recurrences allows for a better understanding of the climate system; its past as well as its future [24]. The cardiorespiratory system is investigated by its recurrence properties to get insights in its mechanisms or to measure dysfunctions for diagnosing life threatening conditions [59]. Recurrence analysis is promising for investigating brain activity [10] and early detection of epileptic states [11]. Furthermore, it is applied to monitor engines and technological processes, like power generation gas turbines health, cutting processes or crack detection in materials [1214].

Recurrence plots (RPs) and recurrence quantification analysis (RQA) are powerful methods for analyzing recurrences in measured time series [15]. Their application in many fields have proven their potential for various kinds of analyses [16]. A recurrence plot is a two-dimensional representation of a time series when a \(m\)-dimensional phase space trajectory recurs to former (or later) states. Recurrence of a state at time \(i\) at a different time \(j\) is captured within a two-dimensional squared matrix \(\mathbf {r}\) [15]:

$$\begin{aligned} r_{i,j} = \varTheta \left( \varepsilon -\left\| \varvec{\text{ x }}_{i} - \varvec{\text{ x }}_{j}\right\| \right) , \quad \varvec{\text{ x }}_{i} \in \mathbb {R}^m,\; i,j=1 \ldots N. \end{aligned}$$
(1)

Both of its axes represent the set of states in temporal order. \(N\) is the number of considered states \(x_i\) (length of phase space trajectory). \(\varepsilon \) is a threshold distance, \(\Vert \cdot \Vert \) a norm, and \(\varTheta (\cdot )\) the Heaviside function. A pair of states that fulfills the threshold condition is assigned with the value \(1\) (recurrence point), whereas a pair that is considered to be dissimilar is assigned with the value \(0\). Further details about the reconstruction of phase space vectors from a scalar time series, the recurrence parameters, as well as the typical visual characteristics of RPs can be found in [15].

Small scale structures in the RP, like diagonal lines, are used to define measures of complexity establishing the recurrence quantification analysis (RQA) [15, 17, 18]. As an example, we present the RQA measure percent determinism (DET):

$$\begin{aligned} DET=\frac{\sum _{l=d_{\min }}^N l\, H_{\text {D}}(l)}{\sum _{i,j=1}^N r_{i,j}}. \end{aligned}$$
(2)

It is the fraction of recurrence points that form diagonal lines; \(H_{\text {D}}(l)\) is the number of diagonal lines of exactly length \(l\) and \(d_{\min }\) is a minimal length necessary to be a diagonal line. This measure characterizes the deterministic nature of a dynamical system from a heuristic point of view (further discussions can be found in [15, 19]). Further measures quantify average line lengths or the complexity of the line length frequency distributions \(H_{\text {D}}(l)\) (diagonal lines) and \(H_{\text {V}}(l)\) (vertical lines).

The time complexity of basic RQA measures is \(O(N^2)\), where \(N\) denotes the number of data points. This property hampers an efficient computation for very long data. Furthermore, current implementations are limited by the memory these tools can manage. The CRP Toolbox for MATLAB [20] is limited to \(N<\) 10,000 data points when calculating the entire RP; for standard PC configurations even less, i.e., \(N<\) 5,000 data points. The RQA software by Webber [21] is capable of processing only up to \(N=\) 5,000 data points.

We present an algorithm based on the concept of Divide and Recombine (D&R) that allows RQA of very long time series in Sect. 2 and evaluate our algorithm in Sect. 3. In Sect. 4 we highlight the utilization of our approach for a concrete application example taken from the climate research domain.

2 Our Approach

2.1 Divide and Recombine

D&R is a very general approach to address large computational problems. The basic idea is to divide a data set into small sub sets allowing the fast computation of analytical results of the subsets. The intermediate results of the sub sets are recombined into a global solution.

The main application of D&R as presented by Guha et al. in [22] is to enable the analysis of big data sets. They use the MapReduce [23] framework to distribute the data and analytical computation between several computing nodes. In contrast, the central challenge in the context of RQA is not the amount of data itself but rather that the determination of RQA measures is compute intensive. To meet this challenge, we utilize the paradigm D&R as follows.

We divide the underlying matrix of a RP, the so called recurrence matrix [see Eq. 1] into small sub matrices (Divide). For each sub matrix, we compute RQA measures in a massively parallel manner on GPU devices. This includes especially the detection of diagonal and vertical lines. The key issue of the divide step is distributing RQA computations of sub matrices between multiple GPU devices. In a final step, the individual results of the sub matrices are recombined into a global RQA solution (Recombine).

Having distributed the computational load to several GPU devices using D&R, we further reduce the runtime by exploiting the parallel processing capabilities of a GPU device itself. We subdivide the processing within a sub matrix into a set of subtasks that can be processed concurrently. The underlying workflow of our approach is summarized in Fig. 1.

Fig. 1
figure 1

Our D&R approach. a Given a recurrence matrix. b We divide the recurrence matrix into a set of sub matrices. c We distribute sub matrices to several GPU devices. For each sub matrix, we compute the frequency distributions of diagonal and vertical lines. The computation of the frequency distributions of a single sub matrix is done in a massively parallel manner on a GPU device by identifying independent sub tasks; depicted as dotted arrows. d The individual results are recombined into a global RQA solution

An important challenge with D&R for RQA is that diagonal and vertical lines may spread over multiple sub matrices (see Fig. 2). To compute a valid global frequency distribution of diagonal and vertical line lengths, we introduce the carryover buffer. A carryover buffer is a data structure that allows to share information about the length of diagonal or vertical lines that exceed a sub matrix. In the following, we describe how our approach addresses the challenges of computing valid global frequency distributions of diagonal and vertical line lengths (see Sect. 1).

Fig. 2
figure 2

Challenges of divide and recombine for RQA. The single diagonal and vertical line in (a) are distributed between several sub matrices in (b). a Full recurrence plot. b Divided recurrence plot

2.2 Detection of Vertical and Diagonal Lines

To determine vertical lines within the \(i\)th column of the recurrence matrix \(\mathbf {r}\), the computation starts at element \(r_{i,0}\), representing its first element. The index \(j\) is increased until the first recurrence point (see Sect. 1) has been found. Assuming the element \(r_{i,j}\) is a recurrence point, the counter representing the length of the current vertical line is increased by \(1\). This counter is initially set to \(0\). If \(r_{i,j+1}\) is also a recurrence point, the counter is increased again. If \(r_{i,j+1}\) is not a recurrence point, the vertical line stops at \(r_{i,j}\). We then update the frequency distribution of vertical line lengths \(H_{\text {V}}(l)\), reset the line length counter to \(0\) and continue the detection of vertical lines at \(r_{i,j+2}\). Note, each column of the recurrence matrix has a separate line length counter attached.

Fig. 3
figure 3

Detection of vertical lines. Detecting the vertical line in (a) is straight-forward. Performing the processing on multiple sub matrices in (b) requires to preserve the execution order \(I. \rightarrow II. \rightarrow III.\) The intermediate states of the carryover buffer element corresponding to the column after processing each sub matrix (\(2\), \(5\) and \(0\)) are depicted above the recurrence plot. a Full recurrence matrix. b Multiple sub matrices

Subdividing \(\mathbf {r}\), its columns are split into a number of parts belonging to different sub matrices. We introduce the vertical carryover buffer to address this challenge. For each column of \(\mathbf {r}\) it stores the length of a vertical line that exceeds the horizontal border of a sub matrix.

Figure 3 compares the detection of vertical lines using (a) the recurrence matrix as a whole with applying (b) the vertical carryover buffer to the set of sub matrices. The states of the carryover buffer element corresponding to the column containing the vertical line after processing a particular sub matrix are shown above the recurrence plot. If a vertical line reaches the last element of a column of a sub matrix, the carryover buffer element stores its current length. Otherwise its value is \(0\).

The value of the carryover buffer element is used as input for processing parts of the \(i\)th column of \(\mathbf {r}\) that belong to the adjacent sub matrix. To compute valid results, applying the carryover buffer requires a particular order of processing the sub matrices. We present the vertical execution order rule that reflects this dependency.

A sub matrix of \(\mathbf {r}\) is referred to as \(S_{g,h}\). The sub matrix representing the bottom left corner of the recurrence matrix has the indices \(g=0\) and \(h=0\).

Vertical Execution Order Rule Suppose a sub matrix \(S_{g,h}\) with \(g\) being the horizontal index and \(h\) being the vertical index. All sub matrices \(S_{g,n}\) with \(0 \le n < h\) have to be processed before \(S_{g,h}\) is processed.

Fig. 4
figure 4

Detection of diagonal lines. Given a full recurrence matrix in (a), the detection of the diagonal line is straight-forward. By dividing the recurrence matrix in (b), the diagonal line stretches over 5 sub matrices. To preserve the validity of the detection result, they must be processed in the order \(I. \rightarrow II. \rightarrow III. \rightarrow IV. \rightarrow V.\) The intermediate states of the corresponding carryover buffer element is depicted on the right next to the recurrence plot. a Full recurrence matrix. b Multiple sub matrices

Sub matrices which do not share any element of the carryover buffer can be processed concurrently. This allows us to compute the local frequency distribution of vertical line lengths of multiple sub matrices at the same time.

This concept can easily be adapted to the detection of diagonal lines, including the use of a carryover buffer and a particular order of execution concerning the set of sub matrices. The major difference is that a diagonal line may transcend not only the horizontal but also the vertical borders of sub matrices. Furthermore, the size of the carryover buffer is equivalent to the number of diagonals of \(\mathbf {r}\).

Figure 4 illustrates the detection of diagonal lines. For the purpose of demonstration, the RP contains only a single diagonal line of length \(6\). Since diagonal lines may cross horizontal as well as vertical sub matrix borders, we define a diagonal execution order rule that reflects this property.

Diagonal Execution Order Rule Suppose a sub matrix \(S_{g,h}\) with \(g\) being the horizontal index and \(h\) being the vertical index. All sub matrices \(S_{m,n}\) that fulfill either the condition \((0 \le m \le g) \wedge (0 \le n < h)\) or \((0 \le m < g) \wedge (0 \le n \le h)\) have to be processed before \(S_{g,h}\) is processed.

3 Performance Evaluation

To evaluate our approach, we compare the performance characteristics of three RQA implementations. We contribute an implementation of our D&R approach using version 1.1 of the OpenCL framework for parallel programming of heterogeneous systems [24]. To fulfill the performance requirements, we use the low-level programming language C\(++\) for implementing the host program.

The hardware setup of the experiment consists of an off-the-shelf desktop workstation, containing an Intel i5-3570 quad-core CPU at up to 3.80 GHz and 16 GB of main memory. It also includes a NVIDIA GeForce GTX 690 that provides two GPU processors running at up to 1.019 GHz; each of them is supplied with 2 GB of memory. In the context of heterogeneous computing, each GPU processor is treated as a separate computing device. The workstation runs on a 64-bit version of OpenSuse 12.1 with version 4.2.1 of CUDA.

To reduce the impact of outliers, we repeat each experiment five times. Measuring the runtime, we rely on the chrono package of the Boost library [25] with an accuracy up to one millisecond.

Table 1 Runtime for RQA calculation for time series of varying length

As stated in [26], it is important to compare optimized code that is running on the GPU to optimized CPU code. For this reason, we compare the massively parallel GPU implementation approach to two non-distributed, non-D&R CPU implementations. As a baseline we refer to a single-threaded C\(++\) implementation. Additionally, we employ a parallel C\(++\) implementation that is extended with OpenMP statements. It executes the detection of diagonal and vertical lines using multiple CPU threads.

Table 1 and Fig. 5 compare the runtime of the three implementations based on a time series capturing the Sinus wave. We vary the length of the time series between 20,000 and 200,000 data points with a step size of 20,000. For each experiment we use the same parameters for reconstructing the states from the time series (see Sect. 1). Furthermore, we set the size of the sub matrices processed by a GPU processor to 20,000\(\,\times \,\)20,000 elements.

In all cases, our OpenCL implementation outperforms the multi-threaded OpenMP implementation (up to a factor \(>\)5) and the single-threaded C\(++\) implementation (up to a factor \(>\)28) using only one GPU processor. Using both GPU processors available, the runtime can be reduced additionally up to 40 %.

Fig. 5
figure 5

Runtime comparison. Our GPU implementation (OpenCL) of D&R outperforms both the single-threaded (C\(++\)) and multi-threaded (OpenMP) non-D&R CPU implementations. Balancing the work between two GPU processors, the runtime can be reduced additionally up to 40 %

4 Application to Climate Data

In the following we will investigate the hourly temperature dynamics by RQA, which will be applied on a measurement record of hourly air temperature in Potsdam. This record is one of the longest, non-interrupted, hourly climate records in the world. In our analysis it is covering the period from 1893 until 2011 (although the measurements are still ongoing), resulting in 1,043,112 data points (see Fig. 6a). For the period between 1893 and 1974, the warming trend of the annual mean temperature was 0.46 K per century, but after 1974 the trend rose to 3.4 K per century (see Fig. 6b). In the following we will consider the full time period as well as the two periods 1893–1974 (718,776 data points) and 1975–2011 (324,336 data points) separately.

Fig. 6
figure 6

a Hourly temperature in Potsdam. b Annual mean temperature in Potsdam and warming trend for the periods 1893–1974 and 1975–2011

To study the short-term dynamics, we remove the annual trend (seasonal cycle) from the data by phase averaging, resulting in an anomaly temperature record. We use a time delay embedding of dimension \(m=5\) and delay \(\tau = 3\), which have been found by false nearest neighbors approach for finding \(m\) [27] as well as a combined autocorrelation and visual recurrence plot inspection approach for finding an optimal \(\tau \) [28]. We calculate the four RQA measures (1) recurrence rate \(RR\), (2) determinism \(DET\), (3) average diagonal line length \(L\), and (4) laminartity \(LAM\) [15] for a recurrence threshold of \(\varepsilon =1\) (Euclidean norm). These measures reflect different aspects of the short-term dynamics, e.g., predictability. We find that all four measures do not remarkably change for the full period and the sub periods 1893–1974 and 1975–2011 (see Table 2). This result suggests that, in contrast to the longer time-scales, the short-term dynamics, and, thus, the short-term weather predictability, has not (yet) changed due to the climate change.

Table 2 RQA results for the full time series of hourly temperature anomaly data of Potsdam as well for the two periods 1893–1974 and 1975–2011
Table 3 Runtime for RQA calculation for the full time series of hourly temperature anomaly data of Potsdam as well for the two periods 1893–1974 and 1975–2011

The calculation of these RQA measures benefits highly from our D&R approach, which makes the calculations possible for these long time series. We apply the same implementations as well as the same experimental environment as described in Sect. 3. Using the single-threaded common RQA software, it takes over six hours to calculate the RQA for the full time period, whereas the OpenMP implementation still needs over one and a half hour. The OpenCL implementation allows to reduce the runtime to about five minutes, using two GPU processors (see Table 3, Fig. 7). This significant runtime improvement of RQA will allow comprehensive investigations of big data collections of weather data, consisting of thousands of time series similar to the present Potsdam temperature record.

Despite its performance improvements, it is of great importance that the GPU implementation computes correct RQA results. Table 4 compares a selection of RQA measures computed by the different implementations for the period between 1893 and 2011. It shows that the GPU implementation calculates the same results as the single-threaded C\(++\) and the multi-threaded OpenMP implementation. In addition, distributing the computations between two GPU processors does not influence the results.

Fig. 7
figure 7

Runtime for RQA calculation for the full time series of hourly temperature anomaly data of Potsdam as well for the two periods 1893–1974 and 1975–2011

Table 4 Results for the RQA measures recurrence rate \(RR\), determinism \(DET\), average diagonal line length \(L\), and laminarity \(LAM\) computed by the different implementations for the period 1893–2011

5 Conclusion

We present an approach based on the paradigm of D&R that allows to perform RQA on very long time series (\(>\)1,000,000 data points) efficiently. By splitting the underlying matrix of a RP into a set of sub matrices, we are able to distribute the computational load between several GPU devices; each sub matrix is processed individually. We address the problem of diagonal and vertical lines transcending the borders of multiple sub matrices by introducing the diagonal and vertical carryover buffer as global data structures. To preserve the correctness of the RQA measures, we provide specific execution order rules for processing the individual sub matrices.

In comparison to a parallel CPU implementation using OpenMP, we can improve the runtime performance significantly up to factor \(>\)9, using two GPU processors. Our experiments have shown that GPU devices are well suited to compute basic RQA measures.

The application of the proposed RQA implementation to a specific problem from climate research has demonstrated its potential for an efficient recurrence study and will allow future RQA investigations of very long time series.