1 Introduction

The statistical tools used to extract knowledge from time series analysis have undergone considerable development during the past decade (see Livina and Lenton 2007; Livina et al. 2011; Lenton et al. 2012; Scheffer et al. 2009; Dakos et al. 2008; Held and Kleinen 2004; Cimatoribus et al. 2013). Driven by the ultimate aim of understanding past climate variability, the above studies focused on statistical analysis of time series that demonstrate threshold behaviour as used in Alley et al. (2003). Candidate explanations for transitions of a system over thresholds link to dynamical systems analysis, which is used for gaining insight into internal variability modes and response to external forcing on both simple and complex systems (Saltzman 2001). Adopting the notation from Ashwin et al. (2012) the abrupt shift from a stable state to another stable state could be e.g. due to B-tipping or N-tipping. In B-tipping the system is driven past bifurcation points, where equilibrium solutions loose their stability past a critical value with respect to a control parameter (Saltzman 2001). In N-tipping, noise fluctuations of a fast component affect a slower variable pushing the system away from the neighbourhood of an attractor to that of another one (Hasselmann 1976). Those are only two of the candidate explanations of tipping points; combined with the fact that in open systems subject to internal and external forcings such as the climate system both dynamical behaviours can take place (Ashwin et al. 2012), attributing causal relations to abrupt transitions with certainty is a challenging task. Therefore, statistical analysis is required to complement the mapping of the variability before tipping points. To reveal additional properties of the system, sophisticated tools have been developed for statistical monitoring of its evolution.

One of the most studied paleoclimate proxy data series that demonstrates tipping behaviour is the oxygen isotope \(\delta ^{18}\hbox {O}\) of the Greenland ice cores ranging to more than 100,000 years before today (Svensson et al. 2008). Abrupt transitions can be seen in this time series, that correspond to Dansgaard–Oeschger (DO) events. DOs consist of sudden warming in Greenland followed by gradual cooling. The last of those warmings is called the Younger Dryas event, and marked the transition to today’s warm climate (Alley et al. 2003).

By focusing on a single transition event, the work of Dakos et al. (2008) drew attention on bifurcation points and their precursors as they could identify slowing down as an early warning signal (EWS) before the Younger Dryas period. In particular, the increase in autocorrelation within a fixed sliding window as wide as half the record length was found to precede the transition. Ditlevsen and Johnsen (2010) brought forward that the fluctuation-dissipation theorem from Kubo (Kubo 1966) imposes both increasing autocorrelation and variance before crossing a bifurcation. Cimatoribus et al. (2013) suggested preprocessing or filtering the raw proxy data in order to infer novel diagnostics, such as the Detrended Fluctuation Analysis (DFA) coefficient. Based on that indicator Livina and Lenton (2007) have measured the proximity of a system to a tipping point not in units of critical parameter difference, but by monitoring the evolution of the DFA propagator. Held and Kleinen (2004) proposed the method of degenerate fingerprinting and hypothesized the system as an auto-regressive process with lag-1 autocorrelation, considering only exponentially decaying modes and white noise in the dynamics of the system. Rahmstorf (2003) tried to automatize the characterization of the DO events by introducing an event detection algorithm based on the slope of the data curve within time intervals of 200 years. The study failed to detect certain DO events because of the different time scales within which each DO event developed. From the above studies the statistical quantities of variance and lag-1 autocorrelation were pointed out as early warning indicators of critical transitions and the slope within a fixed time interval was found to characterise most of the DO events.

In this paper, a time series segmentation algorithm combining a genetic algorithm and a clustering technique is proposed to address the existence of EWSs before tipping points in climate time series. Starting from a random division pattern, the different segments within the time series are classified according to similarities in their statistical parameters (including variance, lag-1 autocorrelation and slope; as considered in previous works). The algorithm is then evolved by renewing its segmentation pattern at each iteration in a process called genetic evolution to optimize the class label assigned to each segment, where algorithm operators are applied and the best-fitted individuals are selected. By keeping intact the segments that group well in statistical similarity and rearranging those that were far from any of the groups, the iteration converges to a final pattern of division of the time series, providing as output the data regions that have common characteristics. No prior knowledge of the tipping point locations are supplied to the algorithm. The goal of the algorithm is to provide a more compact representation of time series while keeping high the number of statistical parameters employed. The time series segmentation problem has been widely studied within various disciplines. For example, such algorithms have been successfully applied in phoneme recognition (Xiong et al. 1994; Prandom et al. 1997), paleoecological problems (Bennett 1996), telecommunication applications (Himberg et al. 2001) or financial problems (Tseng et al. 2009). For an excellent review of the field see Keogh et al. (2001). Furthermore the interest in GAs applied to climate tipping points is rising, e.g. Lenton et al. (2009) used a GA to tune 12 physical parameters of an Earth System Model to study the tipping of the Atlantic thermohaline circulation following a multi-objective optimization method.

The layout of the paper is as follows. Section 2 introduces the segmentation algorithm with a detailed description of the embedded genetic algorithm, the statistical metrics and the clustering process. Section 3 presents the synthetic and paleoclimate datasets used in this study and the algorithm parameters. Section 4 presents the main results of the segmentation algorithm on the paleoclimate and synthetic datasets, the latter used to analyse the algorithm’s capabilities in a controlled environment. Section 5 discusses the results from the point of view of dynamical system theory in addition to possible limitations of the algorithm. Finally Sect. 6 reviews the main findings of this paper.

2 Segmentation algorithm

2.1 General overview of the segmentation algorithm

This paper proposes a novel Genetic Algorithm (GA) from the field of time series segmentation (see Sclove 1983; Himberg et al. 2001; Keogh et al. 2001; Chung et al. 2004). The general objective of the GA is to identify segments with common characteristics by applying a label to these segments. In practice this means finding the cutpoints of the different segments to be discovered together with their class labelling. As in traditional GAs, the proposed approach considers a population of candidate solutions representing different possible segmentations which are evolved towards better segmentation solutions. Each individual is represented by an array of integer values (chromosome representation) that can be mutated and recombined. The evolution starts from a population of randomly generated segmentations. After that, every segment in every chromosome is categorized using six statistical metrics, most of which were previously considered in climate tipping point research (including variance, autocorrelation, and skewness). The clustering technique is then applied over this six-dimensional space for every chromosome using the \(k\)-means clustering algorithm (MacQueen et al. 1967) and a fitness value is assigned to every chromosome according to the degree of homogeneity of the segments with respect to their centroids (i.e. the mean value of each cluster). A class label is assigned during the clustering process. After that, different mutation and crossover operators are applied to explore the search space. This procedure is repeated during \(n\) generations. The final mathematical goal of the proposed GA is to minimize the distance of each segment to its centroid in the six-dimensional space where the six dimensions are statistical properties of each segment.

The time series segmentation algorithm presented in this paper features the following characteristics:

  • A class label is assigned to the different segments via the combination of the GA with the clustering technique; traditional approaches would only provide the segmentation points (Sclove 1983; Himberg et al. 2001; Keogh et al. 2001). This is specially useful for finding common patterns arising in the climate datasets.

  • Apart from the determination of the cutpoints \((t_{i},i= 1,\ldots , m-1)\), the main idea is that of transitions between classes. The analysis of these transitions is crucial to the detection of EWSs, as it indicates changes in the statistical parameters.

  • Each segment is represented by a six-dimensional vector where the dimensions are the six statistical metrics, some of which have been previously considered in the detection of EWSs of critical transitions (Cimatoribus et al. 2013; Dakos et al. 2008, 2012).

  • Instead of representing the time series evolution by plotting one of its metrics as done in previous works, the approach proposed in this paper allows to visualise several metrics simultaneously and to compare several sections of the time series to find common patterns.

  • The proposed algorithm finds automatically the length of the data segment of interest, in contrast to a fixed sliding window or fixed time interval when calculating the statistics without providing prior information nor following the trial and error approach.

  • This algorithm is a significant extension of the one presented in Tseng et al. (2009) as it enables the clustering of segments of various lengths without performing advanced signal processing such as wavelet analysis, which would add bias to the signal by assuming arbitrary mother wavelet shapes.

The different GA characteristics are defined in the following subsections.

2.2 Chromosome representation and initial population

A direct encoding of the final segmentation solution is adopted where each individual chromosome consists of an array of integer values (Michalewicz 1996). Each position stores a cutting point of the time series. A chromosome of \(m\) segments is represented by \(\{t_1,\ldots ,t_{m-1}\}\), where the value \(t_i\) is the index of the \(i\)th cutting point. In this way, the first segment is delimited by the cutting points 1 and \(t_1\), the second by the cutting points \(t_1\) and \(t_2\) and so on. An example of this chromosome representation is given in Fig. 1.

Fig. 1
figure 1

Chromosome representation. a Example chromosome. Each position represents an index of a time series value. b Segments of the time series resulting from the chromosome. c Corresponding segmentation and time series. The characteristics of each segment will be obtained for the corresponding part of the time series

A GA requires a population of feasible solutions to be initialized and updated during the evolutionary process. As mentioned above, each individual within a population is a possible segmentation result for the time series under consideration. An initial set of chromosomes is thus generated with some constraints to form feasible segments. This initial population of \(t\) individuals is randomly generated. The number of individuals will be kept constant during the evolution. Further information on the creation of each initial individual can be found in Pérez-Ortiz et al. (2014).

2.3 Segment characteristics

As a result of the genetic operators, the segments in a chromosome may have different lengths. Thus, an approach had to be designed to transform all the segments to the same dimensional space. In this paper, six statistical metrics are measured for all the segments included in a chromosome, allowing the GA to calculate similarities between segments using the same dimensional space. For the sake of simplicity, the following characteristics are referred to the segment \(S_s\) which is mathematically defined as \(S_s= \{y_{t_{s-1}},\ldots ,y_{t_{s}}\}\):

  1. 1.

    Variance \((S_s^2)\): It is a measure of variability that indicates the degree of homogeneity of a group of observations. The mathematical expression of this metric is:

    $$\begin{aligned} S_s^2=\frac{1}{t_s-t_{s-1}} \sum _{i=t_{s-1}}^{t_s} \left( {y_{i}} - \bar{y_{s}}\right) ^2, \end{aligned}$$
    (1)

    where \((t_s-t_{s-1})\) is the number of points within a segment, \(t_{s-1}\) is the index of the first point in the \(s\)th segment, \(t_s\) is the index of the last point in the segment, \({y_{i}}\) are the time series values of the segment, and \(\bar{y_{s}}\) is the average value of the segment.

  2. 2.

    Skewness \((\gamma _{1s})\): The skewness represents the asymmetry of a distribution of values within a segment. Segments can be skewed either up or down with respect to the arithmetic mean. The skewness is defined as:

    $$\begin{aligned} \gamma _{1s}=\frac{\frac{1}{t_s-t_{s-1}}\sum _{i={t_{s-1}}}^{t_s} (y_i - \bar{y_s})^3}{S_s^3}, \end{aligned}$$
    (2)

    where \(S_s\) is the standard deviation of the \(s\)th segment.

  3. 3.

    Kurtosis \((\gamma _{2s})\): It measures the degree of concentration that the values present around the mean of the distribution. Positive kurtosis (i.e. long tails) indicate large excursions away from the arithmetic mean. Kurtosis is defined as:

    $$\begin{aligned} \gamma _{2s}=\frac{\frac{1}{t_s-t_{s-1}}\sum _{i=t_{s-1}}^{t_s} (y_i - \bar{y_s})^4}{S_s ^4} - 3. \end{aligned}$$
    (3)
  4. 4.

    Slope of a linear regression over the points of the segment \((a_s)\): A linear model is constructed for every segment, trying to achieve the best linear approximation of the points in the evaluated segment. The slope of the linear model is a measure of the general tendency of the segment. The slope parameter is obtained as:

    $$\begin{aligned} a_s = \frac{S_{s}^{yt}}{\left( {S_s^t}\right) ^2}, \end{aligned}$$
    (4)

    where \(S_{s}^{yt}\) is the covariance of the time indices \(t\) and the time series values \(y\) for the \(s\)th segment; and where \(S_s^t\) is the standard deviation of the time values. The mathematical expression for the covariance is:

    $$\begin{aligned} S_{s}^{yt} = \frac{1}{t_s-t_{s-1}}\sum _{i=t_{s-1}}^{t_s} (i - \bar{t_s})\cdot (y_i - \bar{y_s}). \end{aligned}$$
    (5)
  5. 5.

    Mean Squared Error \((MSE_s)\): This metric measures the degree of nonlinearity of the segment. As for the slope, a linear model is fitted and used to obtain the \(MSE_s\):

    $$\begin{aligned} MSE_s = S_{s}^2 \cdot ( 1 - r_s^2), \end{aligned}$$
    (6)

    where:

    $$\begin{aligned} r_s^2 = \frac{S_{s}^{yt}}{S_s^2 \cdot \left( {S_s^t}\right) ^2}. \end{aligned}$$
    (7)
  6. 6.

    Autocorrelation coefficient \((AC_s)\): It measures the dependence of a time series with itself shifted by some time delay, i.e. the degree of correlation between the current values of the time series and the values in the previous time stamp. The \(AC_s\) is defined as:

    $$\begin{aligned} AC_s = \frac{\sum _{i=t_{s-1}}^{t_s-1} (y_i - \bar{y_s}) \cdot (y_{i+1} - \bar{y_s})}{{S_{s}^2}}. \end{aligned}$$
    (8)

Once the six statistical metrics have been calculated for each segment in a chromosome, a clustering technique is applied over the six-dimensional space.

2.4 Clustering: k-means algorithm

A clustering process has to be applied in order to obtain the value of the fitness function for each segment. The algorithm chosen, k-means, is applied to the time-series segments. For further information on \(K\)-Means and the initialization procedure see Pérez-Ortiz et al. (2014).

Before applying the clustering algorithm one should normalize the values of the segment metrics, as the distance of each segment to its centroid strongly depends on the range of values of each metric (e.g. variance can have a much broader range of variation than skewness). Thus, distances from each metric with larger ranges would disrupt others with smaller ranges. Scaling is used to avoid this problem: for a given segmentation, the segment metrics are normalized to the range \([0, 1]\) using the min-max normalization:

$$\begin{aligned} v^* =\frac{{v - v_\mathrm {min}}}{{v_\mathrm {max}-v_\mathrm {min}}}, \end{aligned}$$
(9)

where \(v\) is the value of the metric for a given segment, \(v^*\) is the normalized value, \(v_{\mathrm{min}}\) is the minimum value for this metric when considering all the segments and \(v_{\mathrm{max}}\) is the maximum value. A constant value of \(v^*=0.5\) is assigned whenever the metric is constant for all segments.

2.5 Fitness

All GAs need an evaluation mechanism to assign a quality index to each population individual. In clustering processes, one way to evaluate the obtained groups is to consider the Sum of Squared Errors \((SSE)\), which consists of the sum of the squared distances between each segment and its cluster centroid:

$$\begin{aligned} SSE = \sum _{i=1}^m d_{i}^2, \end{aligned}$$
(10)

where \(i\) is the segment being evaluated, \(m\) is the total number of segments, and \(d_{i}\) is the distance between the segment \(i\) and its closest centroid.

Our goal is to minimize this \(SSE\) in order to obtain more compact clusters (where each point is as close as possible to its centroid, but the centroids are as far as possible from each other). However, when the GA tries to minimize the \(SSE\), it tends to minimize the number of segments as much as possible and could even produce a partition where each cluster is a single segment. For instance, assuming that the number of clusters considered is five and that a chromosome includes only five segments, the \(SSE\) would be minimum in this case, \(SSE = 0\), because each segment would constitute a cluster. Since this is not an acceptable solution, the fitness function is redefined considering also the number of segments:

$$\begin{aligned} fitness = \frac{m}{SSE}. \end{aligned}$$
(11)

In this way, the algorithm tries to find partitions of the time series where the number of segments is sufficiently high to ensure the acquisition of valuable information from the clustering process.

2.6 Selection and replacement processes

In each generation, all individuals within the population are selected for reproduction and generation of offspring to promote a greater diversity since the parents are not selected based on their fitness.

The replacement process has been performed by roulette wheel selection, i.e. a selection probability for each individual chromosome is calculated from its fitness value. The number of individuals selected is the population size minus one, and the vacant place is occupied by the best segmentation (with the highest fitness) of the previous generation, thus being an elitist algorithm.

As can be seen, the selection process promotes diversity, while the replacement process promotes elitism.

2.7 Mutation operator

The algorithm has been endowed with four mutation operators whose principal function is to perform a better random exploration of the search space, aiming at reducing the dependency to the initial population and escape from local optima. The probability \(p_\mathrm{m}\) of performing any mutation is decided by the user. Once a mutation is decided, the kind of perturbation applied to the chromosome is randomly selected from the following list: (1) add a cutpoint, (2) remove a cutpoint, (3) move half of the cutpoints to the left, and (4) move half of the cutpoints to the right.

The number of cutpoints to be added or removed is determined randomly. The number of points to move is approximately half of the available points and they are randomly selected and pushed to the previous or the following point, with the constraint that it never reaches the previous or the next point. An example of the four mutation operations is included in Fig. 2, where two cutpoints are removed, one is added and half of the them are moved to the left and to the right.

Fig. 2
figure 2

Mutation operator. a Mutation operator: remove two cut points (18 and 52). b Mutation operator: add a cut point: 39. c Mutation operator: randomly move cutpoints to the left. d Mutation operator: randomly move cut-points to the right

2.8 Crossover operator

The algorithm includes a crossover operator, whose main function is to perform a better exploitation of the existing solutions. For each parent individual, the crossover operator is applied with a given probability \(p_\mathrm{c}\). The operator randomly selects the other parent, a random index of the time series, and it interchanges the left and right parts of the chromosomes with respect to this point. An illustration of the crossover operator can be seen in Fig. 3.

Fig. 3
figure 3

Crossover operator. a Before applying crossover operator. b After applying crossover operator. The crossover point was randomly decided to be 60

3 Experiments

3.1 Climate datasets

The paleoclimate datasets chosen for this study are the GISP2 Greenland Ice Sheet Project Two and the NGRIP North Greenland Ice Core Project \(\delta ^{18}\hbox {O}\) time series (Grootes and Stuiver 1997; Stuiver and Grootes 2000; Andersen et al. 2004; Svensson et al. 2008). The \(\delta ^{18}\hbox {O}\) isotope record is a proxy for past surface temperature variations (Bradley 2015). In this study we focus on the 20-year resolution \(\delta ^{18}\hbox {O}\) isotope records from both drilling sites. Pre-processing the datasets in the form of a 5-point average was found to help reduce short-term fluctuations within the datasets and improve the analysis of time series segmentations. If \(\{y_n\}_{n=1}^N\) is the original time series, then the considered time series is \(\{y^*_n\}_{n=1}^{N/5}\) with \(y^*_i=\frac{1}{5}\sum _{j=5i}^{5i+4} y_i\).

In addition to the paleoclimate records, synthetic datasets obtained from well-known dynamical systems are also studied here to better understand the algorithm behaviour and as a preliminary attempt to reject or reinforce hypotheses related to underlying dynamical mechanisms for the DO events. Synthetic time series were produced using two simple mathematical models demonstrating noise-induced transitions as described in Benzi et al. (1981). We name Benzi-A the time series x(t) of a Langevin equation evolution with a gaussian noise component, a Wiener process \(dW\):

$$\begin{aligned} dx=[x(a-x^2)]dt+\epsilon dW , \end{aligned}$$
(12)

where \(\epsilon =0.5\) is the noise level and a is a constant. For \(a > 0\) the system has two equilibrium solutions and is able to alternate between them because of the noise fluctuations on x. We name Benzi-B the time series with an additional periodic forcing with frequency \(\varOmega\) as described in Benzi et al. (1981):

$$\begin{aligned} dx=[x(a-x^2) + A cos(\varOmega t)]dt+\epsilon dW. \end{aligned}$$
(13)

The Benzi-B time series demonstrates stochastic resonance for the noise level chosen. The following values were used for the parameters: \(a=1.0, dt= 1.0, A=0.12, \epsilon =[0.08, 0.10]\), and \(\varOmega =10^{-3.0}\) In addition to the noise-induced transitions, time series of a simplified 2D model of the Atlantic thermohaline Meridional Overturning circulation demonstrating critical as well as other types of bifurcations were also examined in this paper. A detailed description of the Thermohaline Ocean Model (TOM) and its parameterizations can be found in Artale et al. (2002). Two experiments (TOM-A and TOM-B) were conducted by time varying the hydrological cycle strength \(F_{NS}\) to push the system through bifurcation points. The \(F_{NS}\) was varied with a rate of \(10 \times 10^{-11}\) psu/sec units within a total period of 700,000 years, which is slower than the time scale of the slowest process (diffusion) included in the system, in order to ensure resilience of the system to the external forcing (Saltzman 2001). Experiments TOM-A and TOM-B were performed by increasing and decreasing the control variable \(F_{NS}\) following the sequences of states (A, F, B, C, D, C, E, F, A) and (D, C, E, F, A, G, H) as shown in the state diagram of Fig. 4, respectively.

Fig. 4
figure 4

Bifurcation diagram of the TOM system: average vorticity \(( \langle Q \rangle )\) of the circulation cell as a function of \(F_{NS}\): the strength of the evaporation minus precipitation at the surface of the ocean, or hydrological cycle, in practical salinity units psu/s. Notice the various bifurcation points \(BF1', BF1''\), BF2 and the limit cycles LC1, LC2 that mark certain values of \(F_{NS}\) where transitions in the stability of the equilibrium solutions take place. Each cross point is a stable solution. The dashed branches represent unstable solutions where the system cannot be found and are included for completeness. The two experiments TOM-A and TOM-B were conducted by increasing and decreasing \(F_{NS}\) to follow the state sequences: TOM-A: (A, F, B, C, D, C, E, F, A) and TOM-B: (D, C, E, F, A, G, H). Notice the hysteresis branches involved along the track of the experiments

3.2 Algorithm parameters

GAs usually involve adjusting a notable set of parameters. However, their search dynamics can adapt to the problem under study, resulting in a performance that is negligibly affected by minor changes in the initial parameters. In this paper, the following parameters were initially set by trial and error and then used for every dataset under study. The number of individuals (segmentation possibilities) of the population is \(t=80\). The crossover probability is \(p_\mathrm {c}=0.9\) and the mutation probability \(p_\mathrm {m}=0.075\). The number of clusters to be discovered from each candidate segmentation is \(k=[4,5]\); such values are high enough to discover new information among the derived clusters, but not so high as to threaten the interpretation and reproducibility of the results. The maximum number of generations is set to 2,000, and the \(k\)-means clustering process is allowed a maximum of 20 iterations.

Finally a GA is a stochastic optimization algorithm with an embedded random number generator. Given that the results can be different depending on the seed value, the algorithm is run several times with different seeds. For each dataset, the GA was run 10 times, with seeds in the set \(\{10,20,\ldots ,100\}\) to evaluate and remove the dependence of the results on the seed value. It is also a means to evaluate the accuracy of the algorithm.

4 Results

This section presents the main results of the segmentation algorithm for the synthetic and paleoclimate datasets under study. The segmentation returned by the GA in the final generation was analyzed using the following approach: First it was verified whether the abrupt transitions were belonging to different classes or if they were grouped to the same class according to some common characteristics. Second the behaviour of each metric in the six-dimensional parameter space was observed on the onset of the transitions to find common patterns in the form of certain class sequences that would be indicative of EWSs, e.g. increasing variance and autocorrelation coefficient. This was done for the two independent paleoclimate datasets, the synthetic datasets, and for the ten seed values.

4.1 Synthetic datasets

Based on the approach described above the following results have been obtained for the synthetic datasets to evaluate the performance of the proposed algorithm on well-described dynamical systems. Figures 5 and 6 present the segmentation results for the Benzi-A and -B models where transitions are caused by gaussian noise and stochastic resonance, respectively. The main results are listed below.

  1. 1.

    The algorithm constantly attributes classes with high variance, autocorrelation, and MSE for the abrupt transitions (shown in red, magenta, and green). Variance, MSE, and autocorrelation increase by one order of magnitude close to the transition.

  2. 2.

    There are no false alarms for the N-tippings in the Benzi-A -B models, i.e. the above classes are never found outside abrupt transitions.

  3. 3.

    The algorithm performs much less robustly for transitions that are closely spaced to each other. Detection of transitions occuring less than 700 time steps after the previous transition goes reduces to 40 % compared to transitions occurring after longer time intervals (e.g. \(\varDelta t \ge 1{,}200\)).

  4. 4.

    The algorithm cannot identify whether the system under study is governed by stochastic resonance or white noise-induced transitions since the statistical parameters observed by the algorithm follow a similar behaviour in both cases.

  5. 5.

    A spurious class with segments of kurtosis value equal to \(-1\) (i.e. broad and flat distribution) can easily be identified from the segmentation results, otherwise the kurtosis value is usually positive \((\gamma \in (0,2))\), corresponding to narrower distributions.

Fig. 5
figure 5

Results of the segmentation algorithm for the Benzi-A and B models governed by Gaussian noise (left) and stochastic resonance (right), \(\epsilon =0.08\). The transition belongs to a single class (shown in red) with high autocorrelation, variance, MSE, skewness and kurtosis compared to the segments preceding the transition. a Benzi-A model: Gaussian noise. b Benzi-B model: stochastic resonance

Fig. 6
figure 6

Results of the segmentation algorithm for the Benzi-A and B models where transitions are governed by Gaussian noise (left) and stochastic resonance (right), \(\epsilon =0.10\). In both cases, the transitions are attributed classes (shown in red, magenta, and green) with high autocorrelation, variance, and MSE compared to the class of the preceding segments. a Benzi-A model: Gaussian noise. b Benzi-B model: stochastic resonance

Figures 7 and 8 present the segmentation results for the Thermohaline Ocean Model (TOM), where the system undergoes critical bifurcations in experiment TOM-A and both critical and limit cycle transitions in experiment TOM-B. The main results are listed below, employing the maximum flow strength \(\varPsi\) as a state variable second to the average vorticity \(Q\).

  1. 1.

    The TOM-A experiment pushes consecutively the system across two bifurcation points \(BF1'\) and \(BF1''\) that mark the limits of a hysteresis on the bifurcation diagram (see Fig. 4). The B-tipping associated with the \(BF1'\) point is always detected by the algorithm in the form of increased autocorrelation, MSE and variance in the sequence of classes until the transition. This is encountered in both state variables \(Q\) and \(\varPsi\) (see Fig. 7a: green, blue, red, green, and Fig. 7b: blue, red sequence).

  2. 2.

    The detection of the second bifurcation tipping \(BF1''\) was accompanied by false alarms for 40 % of the cases for the variable \(\varPsi\), meaning that the sequence of classes at the B-tipping was repeated at parts of the time series where no bifurcation point was crossed (see Fig. 7b: blue, green sequence), as consulted by the bifurcation graph (Fig. 4).

  3. 3.

    The experiment TOM-B pushes the system consecutively across the \(BF1''\) and the limit cycles LC1 and LC2 where the system engages in internally excited oscillations, irrelevant to external periodic forcings therefore the use of the “cycle” term (see Fig. 4). The variability of \(Q\) and \(\varPsi\) differ in amplitude, with \(\varPsi\) having a larger amplitude.

  4. 4.

    The increase in variance, MSE and autocorrelation by two orders of magnitude at 100 % of the runs while crossing the critical bifurcation point \(BF1''\) is a robust indicator of B-tipping in the Q time series.

  5. 5.

    The algorithm divides the \(Q\) data series section that corresponds to the interval (F, A, G) (see Fig. 4) to consecutive segments of the same class. It successfully (90 %) attributes the same class to the qualitatively same states of the system (see Fig. 8a: multiple blue segments), without prior knowledge of their similarity on the bifurcation graph.

  6. 6.

    Failure to detect the B-tipping in the \(\varPsi\) series at the \(BF1''\) point as it falsely attributes to the B-tipping one of the two similar classes that are used for the interval (F, A, G) where no qualitative change is observed in the system’s state (see Fig. 8b: magenta, green, blue). In contrast, the algorithm captures the \(BF1''\) successfully in the \(Q\) series.

  7. 7.

    Increase in variance, MSE but decrease in autocorrelation by one order of magnitude is diagnosed in \(\varPsi\) time series following the transition to the limit cycle at point LC1, showing a different statistical profile for this transition. The signal is very robust, persisting during three consecutive classes for 80 % of the algorithm seeds, instead of two classes for the previous statements, setting a stronger diagnostic of the tipping using these three metrics as robust indicators (see Fig. 8b: green, red).

Fig. 7
figure 7

Results of the segmentation algorithm for the Thermohaline Ocean Model (TOM) with the system undergoing critical bifurcations (Experiment TOM-A) for the average vorticity \(Q\) (left) and maximum flow strength \(\varPsi _{max}\) (right). The \(BF1'\) bifurcation is successfully detected through a unique class attribution with an increase in autocorrelation, MSE and variance. None of the datasets captures the intermediate stable state reached before the \(BF1''\) transition, resulting in false alarms (decrease in autocorrelation, MSE and variance for the second \(BF1''\) point, in contrast to the EWSs expected by the literature. The seed value \(s\) of the random initiation is noted below each image. a Experiment TOM-A \((\langle Q(t) \rangle, s=20)\). b Experiment TOM-A \((\psi _{max}, s=60)\)

Fig. 8
figure 8

Results of the segmentation algorithm for the Thermohaline Ocean Model (TOM) with the system undergoing critical and limit-cycle bifurcations (experiment TOM-B) for the average vorticity \(Q\) (left) and maximum flow strength \(\varPsi _{max}\) (right). Transition through the \(BF1''\) is captured in the \(Q\) series but is documented as a false alarm in the \(\varPsi\) time series. The LC1 is detected in both runs with the difference that in (b) the autocorrelation decreases close to the transition instead of increasing as in (a). The seed value \(s\) of the random initiation is noted below each image. a Experiment TOM-B (\(\langle Q(t) \rangle, s = 40\)). b Experiment TOM-B \((\psi _{max},s =70)\)

4.2 Paleoclimate datasets

This subsection presents the main results of the segmentation algorithm for the paleoclimate datasets, following the same approach as for the synthetic datasets. They are listed below:

  1. 1.

    The DO events are grouped into two main classes, sometimes three because the values of autocorrelation, variance, and MSE may differ significantly from one DO event to another. The high number of classes considered here (5 classes in total) allows for flexibility within the algorithm.

  2. 2.

    EWSs of DO events are found by the segmentation algorithm in the form of an order of magnitude increase in autocorrelation, variance, and mean square error (MSE) across a sequence of two classes. These EWSs are robustly found (70 %+) in the GISP2 \(\delta ^{18}\)O dataset for DO 0, 1, 2, 4, 7, 8, 11, 12 and for DO 0, 1, 4, 8, 10, 11, 12 for the NGRIP \(\delta ^{18}\hbox {O}\) dataset (see Fig. 9).

  3. 3.

    The increase in mean square error (MSE) is suggested here as an additional indicator of abrupt climate change. The increase in MSE, which suggests nonlinear behaviour, has been found to correspond with an increase in variance prior to DO events for \(\sim \)90 % of the seed runs for the GISP2 \(\delta ^{18}\hbox {O}\) dataset (see Fig. 9) and for \(\sim \)100 % of the seed runs for the NGRIP \(\delta ^{18}\hbox {O}\) dataset.

  4. 4.

    The increase in the autocorrelation coefficient cannot be solely used as indicator of climate change. The algorithm sometimes found an increase in MSE and variance but a decrease in autocorrelation coefficient on the onset of DO events. This signature was minor in the GISP2 \(\delta ^{18}\hbox {O}\) dataset (e.g. DO 2, 10) but much more present in the NGRIP \(\delta ^{18}\hbox {O}\) dataset (e.g. DO 0, 1, 5, 7, 8, 10). Hints of this behaviour could already be found for DO 1 by Lenton et al. (2012). We stress that the increase in variance and MSE is a much more robust EWS for the NGRIP dataset especially.

  5. 5.

    Analysis of paleoclimate records GISP2 and NGRIP did not find any consistent change in skewness nor kurtosis on the onset of DO events.

Fig. 9
figure 9

Time series metrics after the clustering process (i.e. the segments found by the algorithm are replaced with their clusters centroids). The increase in MSE is associated with an order of magnitude increase in variance and autocorrelation on the onset of DO events. All DO events are represented for reference (GISP2 \(\delta ^{18}\hbox {O}\) ice core, seed = 10)

Considering algorithm runs with different seed values revealed minor differences such as DO events being attributed to other classes and the cutpoints between classes being not exactly at the very same location, but the main characteristics described here and in the five main points remained robust throughout the analysis. Finally, the average computational time of the 10 runs was \(53.24\pm 10.36\) and \(65.45\pm 10.38\) s for GISP2 \(\delta ^{18}\hbox {O}\) and NGRIP \(\delta ^{18}\hbox {O}\) datasets, respectively, using an Intel Core i7 (R) CPU 3610QM at 2.3 GHz with 8 GB of RAM. Taking the length of the datasets into account, this computational cost is justified and affordable.

Fig. 10
figure 10

Results of the segmentation algorithm on \(\delta ^{18}\hbox {O}\) ice core data (seed = 10). The Dansgaard–Oeschger events are found grouped into two or three main classes with high autocorrelation, MSE, and variance. a GISP2: \(\mathcal {C}_1, \mathcal {C}_2\), and \(\mathcal {C}_5\). b NGRIP: \(\mathcal {C}_1\) and \(\mathcal {C}_5\). All Dansgaard–Oeschger events are numbered for reference. a Segmentation results on GISP2 dataset. b Segmentation results on NGRIP dataset

Figure 10 presents the detailed segmentation results for GISP2 and NGRIP \(\delta ^{18}\hbox {O}\) ice core data for a fixed seed value. The warning signals can be seen as the transition between consecutive segments and the label assigned to such segments, e.g. the upward trend in nonlinearity was seen via one or two consecutive increases in value, although this depends on the number of segment classes compared to the whole dataset length. The Dansgaard–Oeschger events are found grouped into two or three main classes with high autocorrelation, MSE, and variance corresponding to classes \(\mathcal {C}_1\) and \(\mathcal {C}_2\) for GISP2 and classes \(\mathcal {C}_1\) and \(\mathcal {C}_5\) for NGRIP for that run. An order of magnitude increase in these statistical parameters are found at the onset of the events; they decrease back to normal values as the dynamical system slowly recovers to its stable state. This behaviour can also be seen in Fig. 9, which illustrates the evolution of the statistical metrics accross the whole GISP2 data series.

A detailed analysis of Fig. 10 reveals that class \(\mathcal {C}_3\) for GISP2 dataset was the third main class, grouping segments with the lowest MSE, variance, and autocorrelation for that seed run and was found at the onset of several DO events (e.g. 1, 4, 8, 12) collocated with the Heinrich events H1, H3, H4, H5 as well as during the Holocene period (for an introduction to Heinrich events see Hemming 2004). Classes \(\mathcal {C}_4\) and \(\mathcal {C}_5\) have been found outside the plotted area (in the \(-\)50, \(-\)60 ka range) and therefore do not appear in the graph. As for the NGRIP dataset classes \(\mathcal {C}_2\) and \(\mathcal {C}_4\) (with the lowest MSE, variance, and autocorrelation) have been found at the onset of several DO events as well (e.g. 4, 7, 8, 10 and 12) with an unusual behaviour in the autocorrelation coefficient for DO 1. A detailed analysis of their six-dimensional vector revealed that classes \(\mathcal {C}_2\) and \(\mathcal {C}_4\) differ only from the point of view of kurtosis. This is further discussed in the discussion section about the limitations of the algorithm. Class \(\mathcal {C}_5\) (cyan curve in Fig. 10b) is considered the main DO class in NGRIP data for that particular run with a highly linear relationship (ratio of 1:1) between variance and MSE within that class and a constant high autocorrelation coefficient. This is illustrated in Fig. 11, where the 3D representation of the clustering results is shown for variance, autocorrelation, and MSE (normalised values), where each point is a segment within its own cluster, colour-coded according to the classes shown in the previous figure (Fig. 10).

Fig. 11
figure 11

3D representation of the clustering results for variance, autocorrelation and MSE (normalized values), where each point is a segment within its own cluster. The centroids are represented by black circles. a GISP2. b NGRIP

5 Discussion

Before diagnosing the high variability of the ice core data series for EWSs, the algorithm was tested on synthetic data, which includes transitions associated to N- and B-tipping (Ashwin et al. 2012). Patterns preceding such transitions can be studied in the controlled environment of simulated data for possible existence of EWSs, with the advantage of prior knowledge of their underlying dynamic mechanisms. It was found that testing the proposed algorithm on the simulated time series was successfull in detecting the N-tippings (100 %) and the B-tippings (100 %) in the time series. EWSs of those tippings were revealed in the form of sequence of classes with pronounced differences in their autocorrelation, variance and MSE. In particular, an order of magnitude increase in autocorrelation, variance, and MSE was observed before a N-transition point in the data series via their change of class through two or three consecutive segments. This sequence is not seen throughout the dataset in the absence of N-tipping event. Frequently the last segment includes the transition point itself; this occurs due to the randomised initial segmentation. However, the sequence of attributed classes still compactly describes the statistical evolution of the time series before and after the onset of the tipping and serves to classify the tipping type according to differences in the attributed class sequence.

In the case of B-tipping, the algorithm always captures the critical bifurcations using the same EWS as in the N-tipping in absence of other B-tippings in the dataset. In agreement with previous studies (see  “Introduction”) an increase in variance and autocorrelation is found and an increase in MSE value is added to the tipping precursors. The existence of a second critical bifurcation, as seen in the TOM-A experiment can undermine the detection of multiple B-tippings if they occur within a short time interval, compared to the time series length. It could also be that the availability of warning signals or resilience indicators are absent as proposed in Batt et al. (2013) for stable to cycle transitions of phytoplankton blooms. For the systems whose state is described by multiple variables, the dynamical transitions occuring could be differently depicted on each variable evolution and therefore be elusive to the algorithm from variable to variable as seen in experiment TOM-B. An important aspect of the precursor signal at the LC1 transition is that both the variance and MSE increases, while the algorithm fails to see an increase in autocorrelation prior to the transition: it only sees a sharp decrease in autocorrelation following the transition. Such types of signals mark the transition of the system from one stable state to a state that alternates between stable points and have been encountered in the study of similar ecosystem dynamics already (Batt et al. 2013). This encouraging result was captured by the algorithm demonstrating its ability to distinguish between qualitatively different B-tippings. The algorithm was able to cluster segments with qualitatively same states into one or two classes with minor differences in their statistical metrics without prior knowlegde of the bifurcation diagram. Regarding technical limitations of the algorithm, the kurtosis and slope coefficients are inconclusive as classification criteria and the algorithm is unable to detect multiple tipping transitions that occur within a short time interval, which is an expected weakness for any statistically-based approach that relies on the size of the sample.

The comparative study of the two independent ice core datasets revealed that the algorithm could capture the same main characteristics in warning signals of DO events. For instance warning signals were robustly detected (\(\ge\)70 % of the seed runs) for DO events 0, 1, 4, 8, 11, and 12 for the two independent datasets in the form of an order of magnitude increase in autocorrelation, variance, and mean square error (MSE). Significant changes in these metrics were detected in the two ice cores for DO 2, 3, 5, 6, 7, and 10 for 20–70 % of the algorithm seeds, suggesting that these events possess weak warning signals. To analyse EWSs we considered a time span starting 100 years after the previous DO event until the one under study. Warning signals were detected at least 600 years in advance by the algorithm (e.g. DO 1, 2, 4, 8, GISP2 data) and up to 1.8 kyrs before the event (e.g. DO 12, NGRIP data). A minimum time period for investigating EWSs before the onset of DO events would therefore correspond to 600 years, which is comparable with values found in the literature (e.g. 700 years for the increase in variance as found by Cimatoribus et al. (2013). The starting point for considering this time span should be the start of the cold stadial state to discriminate from the dynamical behaviour of the previous event, e.g. a decrease in autocorrelation. The non-detection by the algorithm of any warning signals for DO event 9 supports this hypothesis: DO 9 starts 1 kyr after the onset of the previous event but only 500 years after the beginning of the cold, stable period. The case of DO 9 not being detected falls in the shortcomings of the algorithm, which underperforms for short temporal distance from the previous transition i.e. another DO event. Detection of warning signals notably improves for DO events with previously long cold stadials. This is also consistent with the results of the synthetic datasets, e.g. the noise-driven system and the stochastic resonance system, \(\epsilon =0.1\): the performance of the algorithm in detecting warning signals for transitions occurring less than 700 years after the previous transition reduces to 40 % compared to transitions following longer stable states (e.g. \(\varDelta t \ge 1{,}200\)).

Hypotheses of underlying dynamical mechanisms for DO events can be reinforced or rejected by additional comparison of the algorithm behaviour on simulated data produced with well-known dynamical models. In this reverse engineering approach more than a single model can reproduce the variability encountered in the ice core or any other paleoclimatic record (Crucifix 2012), so any interpretation of dynamical systems should be used with caution and in combination with scientific insight. Distinct behaviours for the autocorrelation coefficient have been found according to the type of transition studied as revealed from the synthetic data analysis. The algorithm sometimes found an increase in MSE and variance but a decrease in the autocorrelation coefficient on the onset of DO events 1 and 10 in both the GISP and NGRIP datasets. It is therefore proposed to revisit the hypothesis that the DO events were caused by an uniform dynamical system, e.g. the stochastic resonance hypothesis (Ganopolski and Rahmstorf 2001, 2002), as statistical indicators seem to point to certain DOs being attributed other dynamical causes. This aspect is left for investigation to the specialized paleoclimate scientist.

When analyzing the results of segmentation algorithms the segment length was also considered as a possible bias factor to the diagnostics. It can happen that the algorithm is not able to assign a proper class to a segment and prefers to divide the segments into smaller sections to reduce e.g. MSE and kurtosis values. The new smaller sections are likely to be grouped together in this parameter space, allowing the algorithm to perform the clustering process. Moreover, analyzing Eq.11, fitness is directly proportional to the number of divisions so segmentations with a high number of cut points will be prefered. One signature of this effect is seen in the fact that all small segments are found in a single class with very low kurtosis \((\gamma =[-1.6,-1.9])\), constant skewness (equal to 0), and a large range of slope coefficients. They are represented by a straight line in Fig. 12. Special care was taken to discard those small segments (e.g. containing 2 or 3 points) in the analysis of EWSs, otherwise they would have lead to false alarms, i.e. events of nonlinear behaviour that are not leading up to a DO event, e.g. at \(-\)20.66 kyrs (Figs. 9, 10a, GISP2 data). Closer inspection revealed that the increase in nonlinearity was a spurious effect due to the small segments, which were discarded in order to avoid false alarms.

Fig. 12
figure 12

3D representation of the clustering results for slope, skewness and kurtosis (normalized values) where each point is a segment within its own cluster. The centroids are represented by black circles. a GISP2, b NGRIP

6 Conclusion

In this paper, a novel genetic algorithm (GA) from the field of time series segmentation was applied to paleoclimate data in order to identify common patterns that would act as early warning signals for abrupt climate shifts. The algorithm automatically found the cutpoints, segment length, and the length of the window for calculating six statistical metrics without providing climate knowledge or any prior information. In addition to the variance and autocorrelation, the MSE metric is found to respond robustly before a tipping point. A few datasets were used to evaluate the algorithm behaviour, revealing similarities and differences amongst the EWSs occurring before different qualitative tipping types. Experimental results show that warning signals of Dansgaard–Oeschger events could be robustly found for several of these events in the form of an order of magnitude increase in autocorrelation, variance, and mean square error in both GISP2 and NGRIP \(\delta ^{18}\hbox {O}\) ice core data, but those occuring within a short time interval were elusive. The GA applied to NGRIP \(\delta ^{18}\hbox {O}\) ice core record showed that an increase in autocorrelation coefficient cannot be solely used as an indicator of climate shifts, as the expected tendency of an increase in value is not seen for certain DO events. Comparison with synthetic datasets of well-known dynamical behaviour suggests that different DO events might be triggered by different underlying dynamics. Finally the proposed approach provides a novel visualisation tool in the field of climate time series analysis and detection of warning signals of abrupt transitions.

As future steps, improvements of the algorithm are required to overcome its limitations regarding consecutive fast transitions. We also suggest creating a dataset comprising several tipping points and their statistical metrics to investigate the development of data-driven mathematical functions that would be representative of abrupt transitions.