1 Introduction

In this paper, we consider the problem of estimating quantiles when data arrive sequentially (data stream). The problem has been considered for many applications like portfolio risk measurement in the stock market [1, 12], fraud detection [28], signal processing and filtering [24], climate change monitoring [29], SLA violation monitoring [21, 22] and backbone network monitoring [7].

Real-life data streams typically have the following properties:

  1. 1.

    The distribution of data from the data stream changes with time. All sorts of changes may happen like a shift of the distribution, change in the expectation value or the variance or other changes of the shape of the distribution.

  2. 2.

    Data are received with a high intensity over a long period of time.

  3. 3.

    Following a data stream over time, one may expect outliers and some times extreme outliers.

In this paper, we consider the problem of maintaining running estimates of multiple quantiles for data streams with the properties described above (quantile tracking). A natural requirement of the quantile estimates is that the monotone property of quantiles is satisfied, i.e. that the estimate of the, say, 70% quantile always is above the estimate of the, say, 50% quantile.

Efficiently tracking quantiles for data streams with the properties described above (and as shown in Fig. 1) is a challenging task. The most natural is to maintain a sorted list of the data and estimate the quantiles from the sorted list. Such a quantile estimator is not viable for massive data streams as computation time and memory requirement increase with time. In addition, the quantile estimates will not adapt to the dynamic changes of the data stream. Another alternative could be to fit a time series model to the received data and compute quantiles of the forecast distribution, but such approaches are vulnerable to changes in the properties of the data stream, e.g. if the stream changes from a period with slow variations to a period with rapid variations. The method that will be presented in this paper is completely nonparametric and only relies on a single tuning parameter making it robust to changes in the properties of the data stream distribution.

Several algorithms have been proposed to deal with those challenges. Most of the proposed methods fall under the category of what can be called histogram or batch-based methods. The methods are based on efficiently maintaining a histogram estimate of the data stream distribution such that only a small storage footprint is required. See [3, 8, 10, 11, 17] for representative examples and [18] for a recent survey.

Another ally of methods is so-called incremental update methods. The methods are based on performing small updates of quantile estimates every time a new sample is received from the data stream. The methods only need to store one value for each quantile estimate and therefore are very memory efficient compared to histogram/batch methods. The literature on such quantile estimation methods is sparse. One of the first and prominent examples is the algorithm in [25] by Tierney which is based on the stochastic approximation theory. The method is developed for a static data stream and will not work for dynamically changing data streams. A few modifications of the Tierney method have been suggested that are able to track quantiles of dynamically varying data streams, see e.g. [4, 6]. For more recent methods, we can mention the Frugal methods [19] which run a discrete Markov chain and estimate quantiles of discrete probability distribution. Recently, Hammer and Yazidi proposed the deterministic based multiplicative incremental quantile estimator (DUMIQE) [27] which is a version of the Frugal method that works on continuous sample spaces in addition to an improved version, based on deterministic updates. Other recent methods are the DQTRE and DQTRSE algorithms by Tiwari and Pandey [26]. A nice property of the DUMIQE, DQTRE, DQTRSE and the estimator suggested in this paper is that the update size is automatically adjusted dependent on the scale/range of the data. This makes the estimators robust to substantial changes in the data stream. The DQTRE and DQTRSE aim to achieve this by estimating the range of the data using peak and valley detectors. However, a disadvantage with these algorithms is that several tuning parameters are required to estimate the range making the algorithms challenging to tune compared to the DUMIQE and the estimator suggested in this paper. Recently in [14, 16], Hammer, Yazidi and Rue presented an incremental estimator that used the values of the received samples directly separating it from all incremental estimators previously presented in the literature. The estimator is in fact a generalized exponentially weighed average of previous observations received from the data stream and documents state-of-the-art performance [14, 16].

Given a dynamically changing data stream, two main problems are considered in the literature, namely to (1) dynamically update estimates of quantiles of all data received from the stream so far or (2) estimate quantiles of the current distribution of the data stream (tracking). Despite the importance of efficient tracking of statistical properties, the tracking problem (2) has been far less studied in the literature than problem (1). Incremental methods are well suited to address the tracking problem (2), while histogram and batch methods mainly have been used to address problem (1). Histogram and batch-based methods are not well suited for the tracking problem (2), and incremental methods typically are the only viable lightweight alternatives [5].

A disadvantage with the incremental methods referred to above is that they are constructed to track only a single quantile of the data stream. Of course, one could run such methods for several quantile probabilities, but for such methods, the quantile estimates usually will be unrealistic since the monotone property of quantiles will be violated. The problem with monotone property violation will be reduced if the incremental update size is reduced, but this is not a viable alternative for dynamically changing data streams. Using too small update steps, the incremental methods will not be able to track the dynamic changes of the data stream. In other words, a good quantile tracking algorithm must on one hand be able to efficiently track the quantiles of the data stream and at the same time satisfy the monotone property of quantiles in every iteration.

The only methods we have found in the literature that attempt to satisfy this is that of Cao et al. in [5] and by Hammer and Yazidi in [13]. The method by Cao et al. is based on first running an incremental update of each quantile estimate and secondly computing a monotonically increasing approximation of the cumulative distribution of the data stream distribution. Finally, the quantile estimates are computed from the approximate cumulative distribution. A disadvantage of the method is that the data from the data stream will be used directly making it sensitive to outliers (recall property three of real-life data streams). The method by Hammer and Yazidi in [13] is an immature version of the algorithm presented in this paper. A disadvantage of both of these methods is that we do not have any guarantee that the resulting quantile estimates converge to the true quantiles.

In this paper, we suggest a novel incremental method to track multiple quantiles that handles all the challenges with real-life data streams as described above. The method adapts efficiently to dynamically changing data streams (property 1 of real-life data streams) and only needs one operation and to store one value per quantile estimate per iteration making it extremely computational and memory efficient (property 2). The method does not use the values of the data streams directly, but only if the values are above or below the current estimates. This makes the method robust to outliers (property 3). The method is constructed such that quantile estimates satisfy the monotone property of quantiles in every iteration. A theoretical proof will be given showing that the quantile estimates converge to the true quantiles. It is hard, or maybe even impossible, to prove such convergence for the alternative methods in [5, 13].

As a first demonstration of the suggested algorithm, we look at the problem of detecting and characterizing real-world events based on tweets. A popular approach is to monitor the number of posted tweets [2], although also more advanced approaches have been explored [9, 15, 23]. Figure 1 shows an example to illustrate the use of quantile tracking on Twitter data.

Fig. 1
figure 1

The upper and lower panel shows quantile tracking for the algorithm suggested in this paper (MDUMIQE) and the DUMIQE algorithm, respectively. The grey circles show the number of tweets posted by Norwegian Twitter users every minute from 21 July 2011 to 24 July 2011. The black, blue and red curves show running estimates of the 20, 50 and 80% quantiles of the distribution of the number of tweets posted (colour figure online)

The upper and lower panel shows quantile tracking for the algorithm suggested in this paper and the DUMIQE, respectively. The grey circles show the number of tweets posted by Norwegian Twitter users every minute in the time period before and after the Oslo bombing and Utøya massacre in Norway 22 July 2011. The terror attack was initiated by a bomb going off in Oslo at July 22 3:25 p.m, and as expected, we see a rapid increase in the number of posted tweets after that time. The black, blue and red curves show the tracking of the 20, 50 and 80% quantiles of the distribution of the number of tweets posted using the method that will be presented in this paper. Comparing the two methods, we observe that the suggested algorithm better represents the simultaneous estimates of the three quantiles in each iteration. The estimates are more smooth. The DUMIQE quantile estimates even violate the monotone property of quantiles in several iterations. This is further demonstrated in the paper, for example, in Fig. 2.

2 Estimation of multiple quantiles

Let \(X_n\) denote a stochastic variable for the possible outcomes from a data stream at time n and let \(x_n\) denote a random sample (realization) of \(X_n\). We assume that \(X_n\) is distributed according to some distribution \(f_n(x)\) that varies dynamically with time n. Further, let \(Q_{n}(q)\) denote the quantile associated with probability q, i.e \(P(X_n \le Q_n(q)) = F_{X_n}(Q_n(q)) = q\).

In this paper, we focus on simultaneously estimating the quantiles for K different probabilities \(q_1, q_2, \ldots , q_K\) at each time step (tracking) for a data stream where the distribution of the samples varies dynamically with time. We assume an increasing order of the target quantiles to be tracked, i.e. \(q_1< q_2< \cdots < q_K\). The straightforward approach to estimate the quantiles would be to run K online quantiles estimators in parallel and in isolation, one for each probability. Using the deterministic based multiplicative incremental quantile estimator (DUMIQE) approach from [27], the update equations become

$$\begin{aligned} \begin{array}{ll} {\widehat{Q}}_{n+1}(q_k) \leftarrow (1 + \lambda q_k) {\widehat{Q}}_{n}(q_k) &{}\quad \text { if } {\widehat{Q}}_{n}(q_k) < x_n \\ {\widehat{Q}}_{n+1}(q_k) \leftarrow (1 - \lambda (1-q_k)) {\widehat{Q}}_{n}(q_k) &{}\quad \text { if } {\widehat{Q}}_{n}(q_k) \ge x_n \end{array} \end{aligned}$$
(1)

for \(k = 1,2, \ldots , K\).

The scheme is presented in detail in Algorithm 1. We see that the algorithm requires to only store a single value for each quantile estimate, namely the estimate itself. Further, we see that the algorithm is computationally efficient requiring only a single update per quantile estimate per iteration. Finally, we see that the method is robust to outliers since the observations \(x_n\) are not used directly, but only if it is above or below the current quantile estimate.

figure a

We assume for now that \({\widehat{Q}}_{n}(q_k) > 0\,\, \forall \,\, k,n\). Generalization such that \({\widehat{Q}}_{n}(q_k)\) can take any positive or negative value will be explained in remark 1 below. Unfortunately, using (1) results in unrealistic estimates as the monotone property of quantiles, as given by the constraint \({\widehat{Q}}_{n}(q_1) \le {\widehat{Q}}_{n}(q_2) \le \cdots \le {\widehat{Q}}_{n}(q_K)\), is most likely violated in some iterations. For the DUMIQE, this can be explained as follows (see [5] for an example of another method). Assume at time n that the monotone property is satisfied and that the sample \(x_n\) admits a value between \({\widehat{Q}}_{n}(q_k)\) and \({\widehat{Q}}_{n}(q_{k+1})\), i.e

$${\widehat{Q}}_{n}(q_1) \le \cdots \le {\widehat{Q}}_{n}(q_k)< x_n < {\widehat{Q}}_{n}(q_{k+1}) \le \cdots \le {\widehat{Q}}_{n}(q_K)$$
(2)

Then according to (1), the estimates are updated as follows

$$\begin{aligned} \begin{array}{ll} {\widehat{Q}}_{n+1}(q_j) \leftarrow (1 + \lambda q_j) {\widehat{Q}}_{n}(q_j) &{}\quad \text { for } j = 1,2, \ldots , k\\ {\widehat{Q}}_{n+1}(q_{j}) \leftarrow (1 - \lambda (1-q_{j})) {\widehat{Q}}_{n}(q_{j}) &{}\quad \text { for } j = k+1, \ldots , K \end{array} \end{aligned}$$
(3)

which means that the estimates are increased for the quantiles with an estimate below \(x_n\) and decreased for the estimates above \(x_n\). Consequently, the monotone property may get violated. In the next section, we will present a novel update scheme that satisfies the monotone property of quantiles while converging both theoretically and experimentally to the true quantiles.

2.1 Multiple quantile DUMIQE

When updating \({\widehat{Q}}_{n}(q_k)\), we ensure that the value of \(\lambda\) is such that \({\widehat{Q}}_{n}(q_k)\) never cross the “neighbours” \({\widehat{Q}}_{n}(q_{k-1})\) and \({\widehat{Q}}_{n}(q_{k+1})\). Assume at time n that the monotone property is satisfied and that the sample \(x_n\) gets a value between \({\widehat{Q}}_{n}(q_k)\) and \({\widehat{Q}}_{n}(q_{k+1})\) as given by (2). We now use a \(\lambda\) (denoted \({\widetilde{\lambda }}_k\) below) such that the distance between \({\widehat{Q}}_{n+1}(q_k)\) and \({\widehat{Q}}_{n+1}(q_{k+1})\) is equal to some portion, \(\alpha\), of the distance from the previous iteration, i.e.

$$\begin{aligned} &{\widehat{Q}}_{n+1}(q_{k+1}) - {\widehat{Q}}_{n+1}(q_k) = \alpha \Big({\widehat{Q}}_{n}(q_{k+1}) - {\widehat{Q}}_{n}(q_k) \Big) \\ & \Big(1 - {\widetilde{\lambda }}_k (1-q_{k+1})\Big) {\widehat{Q}}_{n}(q_{k+1}) - \Big(1 + {\widetilde{\lambda }}_k q_k \Big) {\widehat{Q}}_{n}(q_k) = \alpha \Big({\widehat{Q}}_{n}(q_{k+1}) - {\widehat{Q}}_{n}(q_k) \Big) \end{aligned}$$
(4)

By solving (4) with respect to \({\widetilde{\lambda }}_k\), we obtain

$${\widetilde{\lambda }}_k = (1-\alpha ) \frac{{\widehat{Q}}_{n}(q_{k+1}) - {\widehat{Q}}_{n}(q_{k})}{(1-q_{k+1}){\widehat{Q}}_{n}(q_{k+1}) + q_k{\widehat{Q}}_{n}(q_{k})}$$
(5)

To avoid crossing, we must ensure that \({\widehat{Q}}_{n}(q_{k})\) stays above the estimate below, \({\widehat{Q}}_{n}(q_{k-1})\) as well. Thus, a sufficient criterion to guarantee that \({\widehat{Q}}_{n}(q_k)\) stays between \({\widehat{Q}}_{n}(q_{k-1})\) and \({\widehat{Q}}_{n}(q_{k+1})\) is to use the minimum of \({\widetilde{\lambda }}_k\) computed from \({\widehat{Q}}_{n}(q_{k})\) and \({\widehat{Q}}_{n}(q_{k+1})\) and computed from \({\widehat{Q}}_{n}(q_{k})\) and \({\widehat{Q}}_{n}(q_{k-1})\). This gives the following

$$\begin{aligned} {\widetilde{\lambda }}_k&= (1-\alpha ) \min \left\{ \frac{{\widehat{Q}}_{n}(q_{k}) - {\widehat{Q}}_{n}(q_{k-1})}{(1-q_{k}){\widehat{Q}}_{n}(q_{k}) + q_{k-1} {\widehat{Q}}_{n}(q_{k-1})}, \frac{{\widehat{Q}}_{n}(q_{k+1}) - {\widehat{Q}}_{n}(q_{k})}{(1-q_{k+1}){\widehat{Q}}_{n}(q_{k+1}) + q_k{\widehat{Q}}_{n}(q_{k})} \right\} \\&= (1-\alpha ) H\left( {\widehat{Q}}_{n}(q_{k}); {\widehat{Q}}_{n}(q_{k-1}), {\widehat{Q}}_{n}(q_{k+1}) \right) \end{aligned}$$
(6)

By using \(\lambda = {\widetilde{\lambda }}_k\) from (6) in (1) when updating the estimates \({\widehat{Q}}_{n}(q_k), k = 1,2,\ldots ,K\), the monotone property will be satisfied for all the quantile estimates. Of course, the lowest quantile estimate \({\widehat{Q}}_{n}(q_1)\) only needs to satisfy the monotone property against \({\widehat{Q}}_{n}(q_2)\), and therefore, H becomes

$$H\left( {\widehat{Q}}_{n}(q_{1}), {\widehat{Q}}_{n}(q_{2}) \right) = \frac{{\widehat{Q}}_{n}(q_{2}) - {\widehat{Q}}_{n}(q_{1})}{(1-q_{2}){\widehat{Q}}_{n}(q_{2}) + q_{1} {\widehat{Q}}_{n}(q_{1})}$$
(7)

and similarly for the highest quantile estimate

$$H\left( {\widehat{Q}}_{n}(q_{K-1}), {\widehat{Q}}_{n}(q_{K}) \right) = \frac{{\widehat{Q}}_{n}(q_{K}) - {\widehat{Q}}_{n}(q_{K-1})}{(1-q_{K}){\widehat{Q}}_{n}(q_{K}) + q_{K-1} {\widehat{Q}}_{n}(q_{K-1})}$$
(8)

Substituting \({\widetilde{\lambda }}_k\) in (6) for \(\lambda\) in (1) and defining \(\beta = 1 - \alpha\), we obtain the following update rules

$$\begin{aligned} \begin{array}{ll} {\widehat{Q}}_{n+1}(q_k) \leftarrow \left( 1 + \beta H\left( {\widehat{Q}}_{n}(q_{k}); {\widehat{Q}}_{n}(q_{k-1}), {\widehat{Q}}_{n}(q_{k+1}) \right) q_k\right) {\widehat{Q}}_{n}(q_k) &{}\text { if } {\widehat{Q}}_{n}(q_k) < x_n \\ {\widehat{Q}}_{n+1}(q_k) \leftarrow \left( 1 - \beta H\left( {\widehat{Q}}_{n}(q_{k}); {\widehat{Q}}_{n}(q_{k-1}), {\widehat{Q}}_{n}(q_{k+1}) \right) (1-q_k)\right) {\widehat{Q}}_{n}(q_k)&{}\text { if } {\widehat{Q}}_{n}(q_k) \ge x_n \end{array} \end{aligned}$$
(9)

for \(k = 2, \ldots , K-1\). For \(k=1\) and \(k=K\), it results in

$$\begin{aligned} \begin{array}{ll} {\widehat{Q}}_{n+1}(q_1) \leftarrow \left( 1 + \beta H\left( {\widehat{Q}}_{n}(q_{1}), {\widehat{Q}}_{n}(q_{2}) \right) q_1\right) {\widehat{Q}}_{n}(q_1) &{}\text { if } {\widehat{Q}}_{n}(q_1) < x_n \\ {\widehat{Q}}_{n+1}(q_1) \leftarrow \left( 1 - \beta H\left( {\widehat{Q}}_{n}(q_{1}), {\widehat{Q}}_{n}(q_{2}) \right) (1-q_1)\right) {\widehat{Q}}_{n}(q_1)&{}\text { if } {\widehat{Q}}_{n}(q_1) \ge x_n \end{array} \end{aligned}$$
(10)

and

$$\begin{aligned} \begin{array}{ll} {\widehat{Q}}_{n+1}(q_K) \leftarrow \left( 1 + \beta H\left( {\widehat{Q}}_{n}(q_{K-1}), {\widehat{Q}}_{n}(q_{K}) \right) q_K\right) {\widehat{Q}}_{n}(q_K) &{}\text { if } {\widehat{Q}}_{n}(q_K) < x_n \\ {\widehat{Q}}_{n+1}(q_K) \leftarrow \left( 1 - \beta H\left( {\widehat{Q}}_{n}(q_{K-1}), {\widehat{Q}}_{n}(q_{K}) \right) (1-q_K)\right) {\widehat{Q}}_{n}(q_K)&{}\text { if } {\widehat{Q}}_{n}(q_K) \ge x_n \end{array} \end{aligned}$$
(11)

The parameter \(\beta \in [0,1)\) controls the size of the update when a new sample arrives, and the H-functions ensure that the monotone property will be satisfied in every iteration. Please note that since the \(H-\) functions depend of k, the increment lengths vary with k.

In the rest of the paper, we will refer to the method in Algorithm 2 as MDUMIQE which is an abbreviation for multiple DUMIQE. Please see Algorithm 2 for a detailed representation of the algorithm. The structure of the algorithm is quite simple and intuitive. First in lines 3–9, the update size \(\lambda\) is computed to ensure no monotone violations. In lines 10 to 14, the quantiles estimates are updated using the ordinary DUMIQE with the \(\lambda\) computed in lines 3–9. Similar to the original DUMIQE, the MDUMIQE requires to only store a single value for each quantile estimate, namely the estimate itself. Further, only two operations are necessary per quantile estimate in every iteration, namely to adjust the step size and update the quantile estimate. In other words, the MDUMIQE algorithm is extremely memory and computationally efficient. Finally, we observe that, similar to the DUMIQE, MDUMIQE does not use the observations \(x_n\) directly and thus is very robust to outliers. Please see [15] for a further demonstration of the robustness of the DUMIQE and MDUMIQE.

figure b

Now we will present a theorem that catalogues the properties of the estimators \({\widehat{Q}}_{n}(q_k),\,\, k = 1,2,\ldots , K\) given in (9)–(11) for a stationary data stream, i.e. \(X_n = X \sim f(x), \,\, n=1,2,\ldots\). We assume that all the estimators \({\widehat{Q}}_{n}(q_k) > 0,\,\, k = 1,2,\ldots , K\) and the true quantiles \(Q(q_k) > 0,\,\, k = 1,2,\ldots , K\). A sufficient condition to obtain \(Q(q_k) > 0,\, k = 1,2,\ldots ,K\) is that the random variable X only takes positive values.

Theorem 1

Let\(Q(q_k) = {F_X}^{-1}(q_k),\, k = 1,2,\ldots ,K\)be the true quantiles to be estimated and suppose that\(Q(q_k)> 0,\, k=1,2,\ldots ,K\). In addition, we suppose that\({\widehat{Q}}_1(q_k) >0,\, k = 1,2,\ldots ,K\). Applying the updating rules (9)– (11), we obtain

$$\lim _{n \beta \rightarrow \infty , \beta \rightarrow 0} {\widehat{Q}}_{n}(q_k) = Q(q_k),\quad k=1,2,\ldots ,K$$

The proof of the theorem can be found in “Appendix 1”. Although the quantile estimators \({\widehat{Q}}_{n}(q_k),\,\, k = 1,2,\ldots , K\) given in (9) to (11) are mainly designed to estimate quantiles for dynamic environments, it is an important requirement of the estimators that they converge to the true quantiles for static data streams as given by Theorem 1.

We end this section with a few remarks.

Remark 1

A potential challenge with multiplicative update schemes, as given by (1) and by (9)–(11), is that if we start with a quantile estimate above zero, \({\widehat{Q}}_{0}(q_k) > 0\), the estimates will stay above zero. Similarly, if we start with a quantile estimate below zero, it will stay below zero. A simple solution is to estimate the quantiles of a transformation of the data \(h(X_n)\) where \(h(\cdot )\) is a monotonically increasing function and \(h(x)>0\, \forall x\). A natural alternative is \(h(x) = \exp (x)\).

Remark 2

We see that the updating rules (9)–(11) only update based on \({\widehat{Q}}_{n}(q_k) < x_n\) or \({\widehat{Q}}_{n}(q_k) > x_n\) and not the value of \(x_n\). This means that the algorithm is very robust against outliers, which is important for real-life data streams.

Remark 3

The strategy of adjusting the value \(\lambda\) in order to avoid the monotone violation of quantiles, as described in Sect. 2.1, can also be used for the additive alternative to DUMIQE in (1) given by

$$\begin{aligned} \begin{array}{ll} {\widehat{Q}}_{n+1}(q_k) \leftarrow {\widehat{Q}}_{n}(q_k) + \lambda q_k &{}\quad \text { if } {\widehat{Q}}_{n}(q_k) < x_n \\ {\widehat{Q}}_{n+1}(q_k) \leftarrow {\widehat{Q}}_{n}(q_k) - \lambda (1-q_k) &{}\quad \text { if } {\widehat{Q}}_{n}(q_k) \ge x_n \end{array} \end{aligned}$$
(12)

In our experiments, MDUMIQE outperformed this additive alternative and the experimental results thus limit to MDUMIQE.

3 Experiments

In this section, we evaluate the performance of the estimators presented in this paper. We compare against the only alternative multiple quantile tracking algorithm we are aware of, namely the method of Cao et al. [5]. It would be interesting to evaluate the performance of different methods for real-life data, but this is challenging to do in a systematic way for dynamical data streams as the ground truth generally is missing. Before proceeding to systematic experiments based on synthetic data, we just recall Fig. 1 showing that MDUMIQE can be used to efficiently track quantiles of challenging real-life data streams.

We look at two different cases where we assume that the data are outcomes from a normal distribution or from a \(\chi ^2\) distribution. For the normal distribution case, we assume that the expectation of the distribution varies with time

$$\mu _n = a \sin \left( \frac{2\pi }{T} n \right) , \quad n = 1,2,3, \ldots$$

which is the sinus function with period T. Further, we assume that the standard deviation of the distribution does not vary with time and is equal to one. For the \(\chi ^2\) distribution case, we assume that the number of degrees of freedom varies with time as follows:

$$\nu _n = a \sin \left( \frac{2\pi }{T} n \right) + b, \quad n = 1,2,3, \ldots$$

where \(b > a\) such that \(\nu _n > 0\) for all n. In the experiments below, we used \(a = 2\) and \(b=6\).

Figure 2 shows a small section of the estimation processes using DUMIQE and MDUMIQUE. The grey dots show the samples from the data stream and are the same in both panels. The data are generated from the normal distribution above with \(T=75\). The grey and the black curves show estimates of the 0.4 and the 0.6 quantiles of the data, respectively. We see that using DUMIQE, the monotone property is violated in several iterations, while MDUMIQE satisfies the monotone property and at the same time is able to track the quantiles efficiently.

Fig. 2
figure 2

Estimation processes using DUMIQE and MDUMIQE. The grey dots show samples from the data stream distribution, while the black and the grey curves show estimates of the 0.4 and the 0.6 quantiles of the data, respectively

We now turn to performing a thorough analysis of how well the proposed method in Sect. 2 estimates the quantiles of data streams. We considered two different periods, namely \(T=800\) (rapid variation) and \(T=8000\) (slow variation), i.e. in total four different data streams. In addition, for each of the four data streams, we estimated quantiles around the median and in the tail of the distribution. We estimated three or nine quantiles representing cases where the distance between the quantiles is either large or small, respectively. Obviously, if the quantiles are close to each other, the monotone property will be violated more frequently, making the estimation problem more difficult. In more detail, for the different cases we estimated the following quantiles:

  • For the normal distribution and the quantiles around the median, we estimated the quantiles related to the following probabilities \(q_k = \Phi (-\,0.8 + 0.2(k-1)), \,\,\, k=1,2,\ldots,9\). For the case with three quantiles, we only used \(k=1,5\) and 9. \(\Phi (\cdot )\) refers to the cumulative distribution function of the standard normal distribution. Recall that in dynamically changing data streams, as in these experiments, the value of a quantile related to a specific probability varies with time.

  • For the normal distribution and the quantiles in the tail of the distribution, we use \(q_k = \Phi (0.8 + 0.2(k-1)), \,\,\, k=1,2,\ldots ,9\). For the case with three quantiles, we only used \(k=1,5\) and 9.

  • For the \(\chi ^2\) distribution and the quantiles around the median, we estimated the quantiles related to the following probabilities \(q_k = F(4.2 + 0.3(k-1); \nu = 6), \,\,\, k=1,2,\ldots ,9\) where \(F(\cdot ;\nu )\) refers to the cumulative distribution function of the \(\chi ^2\) distribution with \(\nu\) degrees of freedom. For the case with three quantiles, we only used \(k=1,5\) and 9.

  • Finally, for the \(\chi ^2\) distribution and the quantiles in the tail of the distribution, we estimated the quantiles related to the following probabilities \(q_k = F(12 + 0.4(k-1); \nu = 6), \,\,\, k=1,2,\ldots ,9\). For the case with three quantiles, we only used \(k=1,5\) and 9.

The probabilities related to quantiles in the median and in tail of the distribution are centred around the probabilities 0.5 and 0.95, respectively. When estimating nine quantiles, the choices above resulted in a monotone property violation at about every third iteration using a typical value of \(\lambda = 0.05\) in (1). Similarly when estimating three quantiles, we got a monotone property violation at about every eleventh iteration.

To measure estimation error, we use the average of the root-mean-square error (RMSE) for each quantile. In order to get a good overview of the performance of the algorithms, we measure the estimation error for a large set of different values of the tuning parameters \(\lambda\) and \(\beta\).

The results for the normal and \(\chi ^2\) distribution cases when estimating three quantiles are shown in Figs. 3 and 4, respectively. We proceed to discussing the normal distribution cases. For all the four cases, we observe that MDUMIQE outperforms DUMIQE for the optimal values of \(\beta\) and \(\lambda\) which are chosen (results in the smallest RMSE). We also see that estimation performance of the MDUMIQE is less sensitive to the choice of \(\beta\) than DUMIQE on the choice of \(\lambda\). This is a crucial remark since for real-life applications we do not know the optimal values of \(\beta\) and \(\lambda\) that yield the best results. Hence, not only are we able to satisfy the monotone property of quantiles, we also improve estimation precision compared to DUMIQE. For the \(\chi ^2\) distribution cases, we see that MDUMIQE and DUMIQE perform about equally well except that DUMIQE performs better when \(T = 800\) and when estimating quantiles in the tail of the distribution. The reason will be explained below.

Fig. 3
figure 3

Estimation error for data from the normal distribution when estimating three quantiles

Fig. 4
figure 4

Estimation error for data from the \(\chi ^2\) distribution when estimating three quantiles

The results for the normal and \(\chi ^2\) distribution cases when estimating nine quantiles are shown in Figs. 5 and 6, respectively. We start by discussing the normal distribution cases. We see that for \(T = 800\), the DUMIQE performs better than the MDUMIQE, especially when estimating in the tail of the distribution. On the other hand, for \(T=8000\) and estimating the median, the MDUMIQE outperforms DUMIQE. The explanation for why DUMIQE performs better when \(T=800\) is that for such a rapidly changing data stream, large updates of the quantile estimates must be used to track the true quantiles. MDUMIQE is required to satisfy the monotone property which sets a limitation on how far MDUMIQE can update the estimates in each iteration. More specifically, for MDUMIQE, the estimate \({\widehat{Q}}_{n}(q_{k})\) must always be between \({\widehat{Q}}_{n}(q_{k-1})\) and \({\widehat{Q}}_{n}(q_{k+1})\), recall (6). Whenever the difference between \({\widehat{Q}}_{n}(q_{k-1})\) and \({\widehat{Q}}_{n}(q_{k+1})\) is small, we can only do small updates of \({\widehat{Q}}_{n}(q_{k})\). For the \(\chi ^2\) distribution, the two methods perform about equally well for \(T=8000\) and estimating the median, and for the other cases, DUMIQE performs better than MDUMIQE. An appealing property of the MDUMIQE approach (in addition to satisfying the monotone property) is that the estimation performance is less sensitive to the choice of \(\beta\) than DUMIQE is on the choice of \(\lambda\). Using \(\beta = 0.5\), we get satisfactory results in all the cases. Such a “universal” \(\lambda\) does not exist for DUMIQE.

Fig. 5
figure 5

Estimation error for data from the normal distribution when estimating nine quantiles

Fig. 6
figure 6

Estimation error for data from the \(\chi ^2\) distribution when estimating nine quantiles

For comparison purposes, we tested the multiple quantile estimation method in [5] as well for the estimation tasks described above. This is the only viable method we have found in the literature for estimating multiple quantiles in dynamically changing data streams. The method has two tuning parameters, a weight parameter similar to \(\lambda\) and \(\beta\) for the methods in this paper, and a parameter that controls the width of intervals to estimate the distribution of the data stream around a quantile. To achieve the best possible results, we ran the method for a large set of values for the two parameters. The best estimation results (smallest RMSE) are shown in Tables 1 and 2 for the cases with three and nine quantiles, respectively. Comparing these results with the results in Figs. 3, 4, 5 and 6, we see that MDUMIQE clearly outperforms Cao et al. [5].

Table 1 RMSE estimation error using the method in Cao et al. [5] when estimating three quantiles
Table 2 RMSE estimation error using the method in Cao et al. [5] when estimating nine quantiles

4 Closing remarks

In this paper, we present a novel algorithm for keeping online estimates of multiple quantiles in a dynamically changing data stream (tracking). The algorithm is an extension of the efficient DUMIQE algorithm from [27], developed to avoid monotone violations of the quantile estimates. A theoretical proof is given that it ensures the convergence of each quantile estimate to its true quantile.

The experimental results in Sect. 3 show that the suggested algorithm performs very well. For most of the experiments, the method performs equally well or better than DUMIQE that does not satisfy the monotone property of quantiles. The suggested algorithm is well suited to track multiple quantiles for real-life data streams as it adapts efficiently to dynamic changes in the data streams, is very computationally and memory efficient and is robust to outliers.

Another advantage of the suggested algorithm is that the estimation performance is less sensitive to the choice of the tuning parameter compared to DUMIQE. Choosing a \(\beta = 0.5\) performed well in all the experiments. This is a crucial property for real-life data streams since the distribution of data streams may vary slowly in some time periods and more rapidly in others (see Fig. 1 for an example). Using the DUMIQE, one must choose a tuning parameter that performs well either where the data stream varies slowly or rapidly. Since the performance of the algorithm suggested in this paper is less sensitive to the choice of the tuning parameter, it will perform well both when the data stream varies slowly and rapidly. In Fig. 1, we see that the algorithm performs well both when the data stream varies slowly and rapidly. In addition, we saw that the algorithm outperformed the state-of-the-art method of Cao et al. [5] with a clear margin.

The suggested algorithm experiences some reduction in performance for rapidly changing data streams. For such data streams, large updates of the quantile estimates are necessary to track the true quantiles efficiently and the requirement of satisfying the monotone property of quantiles sets a limit on how large increments that are possible, and thus reduces the performance of the algorithm. An interesting challenge for future research is to develop an incremental quantile estimation algorithm that performs better in rapidly changing dynamically changing data streams when many quantiles need to be estimated.