1 Introduction

The volumes of automatically generated data are constantly increasing [27] and such data usually requires to be analyzed in real time [18]. Unfortunately, conventional statistical and data mining techniques are not applicable for such real time analysis [20]. Analysis of automatically generated data has been transitioning from being predominantly offline (or batch) to primarily online (or streaming) [18]. A wide range of streaming algorithms are continuously being developed pointed at real time analysis, like clustering, filtering, cardinality estimation, estimation of moments or quantiles, predictions and anomaly detection.

In this paper we consider the problem of estimating quantiles of streaming data. Streaming quantile estimation has been considered for a wide range of applications like portfolio risk measurement in the stock market [1, 15], fraud detection [40], signal processing and filtering [32], climate change monitoring [41], SLA violation monitoring [30, 31], network monitoring [7, 22], Monte Carlo simulation [36], structural health monitoring [16] and non-parametric statistical testing [21],

The first and the second moments of data, i.e. the mean and variance, are most commonly used as features in machine learning. However, as it is known in many real-life applications, these features might sometimes be misleading. Quantiles are better suited to extract the different aspects of data [11, 12]. A feature selection technique can further be applied to select the most appropriate quantile features such as done in [9, 29, 35].

Suppose that we are interested in estimating the quantile related to some probability q. The natural approach is to use the q quantile of the sample distribution. Unfortunately, such conventional approach has clear disadvantages for data streams as computation time and memory requirement are linear to the number of samples received so far from the data stream. Such methods thus are infeasible for large data streams.

Several algorithms have been proposed to deal with those challenges. Most of the proposed methods fall under the category of what can be called histogram or batch based methods. The methods are based on efficiently maintaining a histogram estimate of the data stream distribution such that only a small storage footprint is required. Another ally of methods are the so-called incremental update methods. The methods are based on performing small updates of the quantile estimate every time a new sample is received from the data stream. Generally, the current estimate is a convex combination of the estimate at the previous time step and a quantity depending on the current observation. A thorough review of state-of-the-art streaming quantile estimation methods is given in the related work section (Section 2).

In data stream applications, a common situation is that the distribution of the samples from the data stream varies with time. Such system or environment is referred to as a dynamical system in the literature. Given a dynamical system, two main problems are considered namely to i) dynamically update estimates of quantiles of all data received from the stream so far or ii) estimate quantiles of the current distribution of the data stream (tracking). Incremental methods are well suited to address the tracking problem ii while histogram and batch methods mainly have been used to address problem i. Histogram and batch based methods are not well suited for the tracking problem ii and incremental methods typically are the only viable lightweight alternatives [4].

To address tracking problem ii, several incremental quantile estimators have been suggested [3,4,5, 22, 24, 34, 38]. The intuition behind the estimators is simple. If the received sample has a value below some threshold, e.g. the current quantile estimate, the estimate is decreased. Alternatively, whenever the received sample has a value above the same threshold, the estimate is increased. Even though the estimators document state-of-the-art tracking performance [38], neither of them use the values of the received samples directly to update the estimate, but only whether the value of the samples is above or below some varying threshold. Intuitively, this seems like a waste of information received from the data stream. In this paper, we thus present an estimator that uses the values of the received samples directly separating it from all incremental estimators suggested in the literature. The estimator is such that the update step size is proportional to the distance between the current estimate and the value of the sample. Thus if the current estimate is off-track compared to the data stream, the estimator will perform large jumps to rapidly get back on-track. A theoretical proof is provided to document the convergence properties of the estimator in addition to extensive simulation experiments. The experiments show that the estimator outperforms several other legacy state-of-the-art quantile tracking algorithms.

The EWA of observations is known to be state-of-the-art estimator to track expectations of dynamically varying data streams [14]. Interestingly, we will show that the suggested quantile estimator in this paper is in fact an instance of a generalized EWA such that quantiles and not expectations are tracked. To the best of our knowledge, this is the first EWA based quantile estimator found in the literature.

The paper is organized as follows. In Section 3, we present the novel quantile estimator using an EWA of observations. In Section 4, we present a quantile estimation algorithms based on the estimator in Section 3. In Section 5, we perform extensive experiments that document the superiority of the suggested algorithm. Finally, in Section 6 we apply the quantile estimator on real-life data related to the problem of efficient online control of indoor climate. More specifically, the estimator is used to detect when a machine learning model should be retrained/updated which is commonly referred to as concept drift detection [13].

2 Related work

In this Section, we shall review some of the related work on estimating quantiles from data streams. However, as we will explain later, these related works require some memory restrictions which renders our work radically distinct from them. In fact, our approach requires storing only one sample value in order to update the estimate. The most representative work for this type of “streaming” quantile estimator is due to the seminal work of Munro and Paterson [25]. In [25], Munro and Paterson described a p-pass algorithm for selection using O(n1/(2p)) space for any p ≥ 2. Cormode and Muthukrishnan [8] proposed a more space-efficient data structure, called the Count-Min sketch, which is inspired by Bloom filters, where one estimates the quantiles of a stream as the quantiles of a random sample of the input. The key idea is to maintain a random sample of an appropriate size to estimate the quantile, where the premise is to select a subset of elements whose quantile approximates the true quantile. From this perspective, the latter body of research requires a certain amount of memory that increases as the required accuracy of the estimator increases [37]. Furthermore, in the case where the underlying distribution changes over time, those methods suffer from large bias in the summary information since the stored data might be stale [4].

As Arandjelovic remarks [2], most quantile estimation algorithms are not single-pass algorithms and thus are not applicable for streaming data. On the other hand, the single pass algorithms are concerned with the exact computation of the quantile and thus require a storage space of the order of the size of the data which is clearly an unfeasible condition in the context of big data stream. Thus, we submit that all work on quantile estimation using more than one pass, or storage of the same order of the size of the observations seen so far is not relevant in the context of this paper.

When it comes to memory efficient methods that require a small storage footprint, histogram based methods form an important class. A representative work in this perspective is due to Schmeiser and Deutsch [28]. In fact, they proposed to use equidistant bins where the boundaries are adjusted online. Arandjelovic et al. [2] used a different idea than equidistant bins by attempting to maintain bins in a manner that maximizes the entropy of the corresponding estimate of the historical data distribution. Thus, the bin boundaries are adjusted in an online manner. Nevertheless, histogram based methods have problems addressing the problem of tracking quantiles of the current data stream distribution [4] and are mainly used to recursively update quantiles for all data received so far. Finally, Lou et al. [23] perform extensive experiments to compare several histogram based algorithms.

Another group of methods are incremental quantile algorithms, which are particularly suitable to track quantiles of dynamically varying data stream distributions. In [3,4,5,6], the authors proposed modifications of the stochastic approximation algorithm [33]. While Tierney [33] uses a sample mean update from previous quantile estimates, [3,4,5,6] propose an exponential decay in the usage of old estimates making them able to track quantiles of non-stationary data stream distributions. Indeed, a “weighted” update scheme is applied to incrementally build local approximations of the distribution function in the neighborhood of the quantiles. More recent approaches in this direction is the Frugal algorithm by Ma et al. [24]. The RUMIQE and DUMIQE algorithms by Yazidi and Hammer [38] represents multiplicative updates compared to the additive updates of other incremental methods. A nice property of the RUMIQE and DUMIQE algorithms and the estimator suggested in this paper that the update size is automatically adjusted dependent on the scale/range of the data. This makes the estimators robust to substantial changes in the data stream. The DQTRE and DQTRSE algorithms by Tiwari and Pandey [34] aim to achieve the same by estimating the range of the data using peak and valley detectors. However, a disadvantage with these algorithms is that several tuning parameters are required to estimate the range of the data which renders the algorithms difficult to tune.

3 Quantile estimator using a generalized exponentially weighted average of observations

Let Xn denote a stochastic variable representing the possible outcomes from a data stream at time n and let xn denote a random sample (realization) of Xn. We assume that Xn is distributed according to some distribution fn(x) that varies dynamically over time n. We denote the cumulative distribution of Xn with Fn(x), i.e. P(Xnx) = Fn(x). Further, let Qn(q) denote the quantile associated with probability q, i.e P(XnQn(q)) = Fn(Qn(q)) = q. A summary of the most central notation is given in Table 1.

Table 1 Key notations with their meaning

In 2017, Hammer and Yazidi suggested the DUMIQE algorithm [38] given by

$$ \begin{array}{ll} \widetilde{Q}_{n + 1}(q) &\leftarrow \widetilde{Q}_{n}(q) + \lambda q \widetilde{Q}_{n}(q) \hspace{14.5mm}\text{ if\ } x_{n} > \widetilde{Q_{n}}(q) \\ \widetilde{Q}_{n + 1}(q) &\leftarrow \widetilde{Q}_{n}(q) - \lambda (1-q) \widetilde{Q}_{n}(q) \hspace{6mm}\text{ if} \ x_{n} \leq \widetilde{Q_{n}}(q) \end{array} $$
(1)

which documents state-of-the-art tracking performance. However, a weakness of the DUMIQE and other proposed quantile tracking algorithms is that neither of them use the values of the received samples directly to update the estimate, but only whether the value of the samples is above or below some varying threshold. Intuitively, this seems like a waste of information received from the data stream. We now propose an incremental quantile estimator where the update step size is proportional to the distance between the received sample and current estimate. Thus, if the current estimate is off-track compared to the data stream, the estimator will initiate large jumps to rapidly get back on-track and thus more efficient tracking is expected. The suggested estimator is described formally as follows

$$ \begin{array}{ll} \widehat{Q}_{n + 1}(q) &\leftarrow \widehat{Q}_{n}(q) + \lambda c_{n} \frac{q}{\mu_{n}^{+} - \widehat{Q}_{n}(q)} \left|x_{n} - \widehat{Q}_{n}(q) \right| \hspace{7mm} \text{ if\ } x_{n} > \widehat{Q_{n}}(q) \\ \widehat{Q}_{n + 1}(q) &\leftarrow \widehat{Q}_{n}(q) - \lambda c_{n} \frac{1-q}{\widehat{Q}_{n}(q) - \mu_{n}^{-}} \left|x_{n} - \widehat{Q}_{n}(q) \right| \hspace{7mm} \text{ if\ } x_{n} \leq \widehat{Q_{n}}(q) \end{array} $$
(2)

where \(\mu ^{+} = E(X_{n}|X_{n} > \widehat {Q}_{n}(q))\) and \(\mu ^{-} = E(X_{n}|X_{n} < \widehat {Q}_{n}(q))\). Naturally, the conditional expectations satisfy the inequality

$$\mu^{-} < \widehat{Q}_{n}(q) < \mu^{+} $$

such that \(\mu _{n}^{+} - \widehat {Q}_{n}(q) > 0\) and \(\widehat {Q}_{n}(q) - \mu _{n}^{-} > 0\). The factors \(q/(\mu _{n}^{+} - \widehat {Q}_{n}(q))\) and \((1-q)/(\widehat {Q}_{n}(q) - \mu _{n}^{-})\) are included to ensure that the estimator converges to the true quantile value.

The constants cn can be any sequence of positive and bounded values. The estimator performed well when the fractions in (2) were “normalized” as follows

$$ c_{n} = \left( \frac{q}{\mu_{n}^{+} - \widehat{Q}_{n}(q)} + \frac{1-q}{\widehat{Q}_{n}(q) - \mu_{n}^{-}} \right)^{-1} $$
(3)

Substituting (3) into (2) we get

$$\begin{array}{@{}rcl@{}} \begin{array}{ll} \widehat{Q}_{n + 1}(q) &\!\leftarrow\! \widehat{Q}_{n}(q) + \lambda a_{n} \left|x_{n} \!- \widehat{Q}_{n}(q) \right| \hspace{13.5mm}\text{ if\ } x_{n} > \widehat{Q_{n}}(q) \\ \widehat{Q}_{n + 1}(q) &\!\leftarrow\! \widehat{Q}_{n}(q) - \lambda (1-a_{n}) \left|x_{n} - \widehat{Q}_{n}(q) \right| \hspace{6mm}\text{ if} \ x_{n} \leq \widehat{Q_{n}}(q) \end{array} \end{array} $$
(4)

where

$$ a_{n} = \frac{q}{\mu_{n}^{+} - \widehat{Q}_{n}(q)} \left/ \left( \frac{q}{\mu_{n}^{+} - \widehat{Q}_{n}(q)} + \frac{1-q}{\widehat{Q}_{n}(q) - \mu_{n}^{-}} \right) \right. $$
(5)

Please note that since \(\mu _{n}^{+} - \widehat {Q}_{n}(q) > 0\) and \(\widehat {Q}_{n}(q) - \mu _{n}^{-} > 0\) we have that 0 < an < 1. By factoring out \(\widehat {Q}_{n}(q)\) and xn we get

$$\begin{array}{@{}rcl@{}} \widehat{Q}_{n + 1}(q) &\!\leftarrow\! (1 - \lambda a_{n}) \widehat{Q}_{n}(q) + \lambda a_{n} x_{n} \hspace{21mm}\text{ if\ } x_{n} \!>\! \widehat{Q_{n}}(q) \\ \widehat{Q}_{n + 1}(q) &\!\leftarrow\! (1 - \lambda (1-a_{n})) \widehat{Q}_{n}(q) + \lambda (1-a_{n}) x_{n} \hspace{6mm}\text{ if\ } x_{n} \leq \widehat{Q_{n}}(q) \end{array} $$

which can be written as

$$ \widehat{Q}_{n + 1}(q) \leftarrow (1 - b_{n}) \widehat{Q}_{n}(q) + b_{n} x_{n} $$
(6)

where \(b_{n} = \lambda \left (a_{n} + I\left (x_{n} \leq \widehat {Q}_{n}(q)\right )(1-2a_{n})\right )\) and I(A) the indicator function returning one (zero) if A is true (false).

Now we will present a theorem that catalogs the properties of the estimator \(\widehat {Q}_{n}(q)\) for a stationary data stream, i.e. Xn = XF(x),n = 1,2,….

Theorem 1

LetQ(q) = F− 1(q) be the true quantile to be estimated. Applying the updating rule in (6), we obtain:

$$\lim\limits_{n \lambda \to \infty, \lambda \to 0} \widehat{Q}_{n}(q) = Q(q) $$

The proof of the theorem can be found in Appendix A. Although the quantile estimator \(\widehat {Q}_{n}(q)\) given in (6) is designed to track quantiles for dynamic environments, it is an important requirement that the estimator converges to the true quantile for static data streams as verified by Theorem 1.

We end this section with a remark.

Remark 1

If the conditional expectations are symmetrically positioned on each side of the quantile estimate, then \(\widehat {Q}_{n}(q) - \mu _{n}^{-} = \widehat {Q}_{n}(q) - \mu _{n}^{-}\) and an = q which is equal to DUMIQE. In other words, we can interpret that \(\widehat {Q}_{n}(q) - \mu _{n}^{-}\) and \(\widehat {Q}_{n}(q) - \mu _{n}^{-}\) ensure that the update rules take into account the asymmetries of the data stream distribution on each side of the quantile.

3.1 Connection to the EWA

A simple and intuitive approach to track the expectation of a data stream distribution, i.e. μn = E(Xn), is the weighted moving average

$$ \widehat{\mu}_{n} = \frac{1}{W_{n}} \sum\limits_{i = 0}^{n} w_{i} x_{i} $$
(7)

where \(W_{n} = {\sum }_{j = 1}^{n}w_{j}\). Using wnh = ⋯ = wn = 1 and the other weights equal to zero, (7) reduces to the standard moving average. Intuitively, it seems more reasonable to use weights with decreasing values. The decrease should be more rapid than the standard sample mean wi = 1/i to be able to track the changes in the data stream.

Consider the following recursive update scheme

$$\begin{array}{@{}rcl@{}} \widehat{\mu}_{0} &\leftarrow& x_{0} \end{array} $$
(8)
$$\begin{array}{@{}rcl@{}} \widehat{\mu}_{n + 1} &\leftarrow& (1 - \alpha) \widehat{\mu}_{n} + \alpha x_{n} \end{array} $$
(9)

where the current estimate is a convex combination of the estimate at the previous time step and the observation. By substitution, we get

$$ \widehat{\mu}_{n + 1} = \alpha (x_{n} + (1-\alpha)x_{n-1} + (1-\alpha)^{2}x_{n-2} + {\cdots} + (1-\alpha)^{n-1}x_{1}) + (1-\alpha)^{n} x_{0} $$
(10)

Interestingly, from (10) we see that (8)-(9) can be interpreted as an EWA of observations. The estimator is highly popular and known to be the state-of-the-art approach to track expectations of dynamically varying data streams. Inspecting the incremental update form of our quantile estimator in (6), we see that it is identical to the update form of (9), except that the 0 < bn < 1 varies with time. Thus by keeping the weights constant as in (9), the estimator will track the expectation of the data stream distribution, while using the weights 0 < bn < 1 in (6), the estimator will track a quantile of the distribution.

4 Quantile estimation algorithm

The interpretation of the update rule in (6) as an EWA of observations (recall Section 3.1) and Theorem 1 constitute some intriguing theoretical results on the link between EWA and quantile estimation. However, the update rule in (6) cannot be used directly since the conditional expectations, \(\mu _{n}^{+}\) and \(\mu _{n}^{-}\), are unknown and need to be estimated. Probably the most natural approach is to track conditional expectations using an EWA of observations as given in (8)-(9). This results in the following update rules:

$$\begin{array}{@{}rcl@{}} &&\bullet\hspace{4mm} \widehat{Q}_{n + 1}(q) \!\leftarrow\! (1 - \widehat{b}_{n}) \widehat{Q}_{n}(q) + \widehat{b}_{n} x_{n} \\ && \bullet\hspace{4mm} \text{If} x_{n} > \widehat{Q_{n}}(q) \end{array} $$
(11)
$$\begin{array}{@{}rcl@{}} &&\hspace{8mm}\text{-}\hspace{4mm} \widehat{\mu}_{n + 1}^{+} \leftarrow \widehat{Q}_{n + 1}(q) - \widehat{Q}_{n}(q) + (1-\gamma) \widehat{\mu}_{n}^{+} + \gamma x_{n} \end{array} $$
(12)
$$\begin{array}{@{}rcl@{}} &&\hspace{8mm}\text{-}\hspace{4mm} \widehat{\mu}_{n + 1}^{-} \leftarrow \widehat{Q}_{n + 1}(q) - \widehat{Q}_{n}(q) + \widehat{\mu}_{n}^{-} \end{array} $$
(13)
$$\begin{array}{@{}rcl@{}} && \bullet\hspace{4mm}\text{Else} \end{array} $$
$$\begin{array}{@{}rcl@{}} &&\hspace{8mm}\text{-}\hspace{4mm} \widehat{\mu}_{n + 1}^{+} \leftarrow \widehat{Q}_{n + 1}(q) - \widehat{Q}_{n}(q) + \widehat{\mu}_{n}^{+} \end{array} $$
(14)
$$\begin{array}{@{}rcl@{}} &&\hspace{8mm}\text{-}\hspace{4mm} \widehat{\mu}_{n + 1}^{-} \leftarrow \widehat{Q}_{n + 1}(q) - \widehat{Q}_{n}(q) + (1-\gamma) \widehat{\mu}_{n}^{-} + \gamma x_{n} \end{array} $$
(15)
$$\begin{array}{@{}rcl@{}} &&\bullet\hspace{4mm} \widehat{a}_{n + 1} \leftarrow \frac{q}{\widehat{\mu}_{n + 1}^{+} - \widehat{Q}_{n + 1}(q)} \left/ \left( \frac{q}{\widehat{\mu}_{n + 1}^{+} - \widehat{Q}_{n + 1}(q)} \right.\right.\\ && + \left.\frac{1-q}{\widehat{Q}_{n + 1}(q) - \widehat{\mu}_{n + 1}^{-}} \right) \end{array} $$
(16)
$$\begin{array}{@{}rcl@{}} &&\bullet\hspace{4mm} \widehat{b}_{n + 1} \leftarrow \lambda\left( \widehat{a}_{n + 1} + I\left( x_{n} \leq \widehat{Q}_{n + 1}(q)\right)(1-2\widehat{a}_{n + 1})\right) \end{array} $$
(17)

In each of the (12)-(15), the part \(\widehat {Q}_{n + 1}(q) - \widehat {Q}_{n}(q)\) is included to ensure that the conditional expectation estimates are relative to the current quantile estimate \(\widehat {Q}_{n + 1}(q)\).

Thus (11) tracks the overall trends of the dynamical data stream while (12)-(15) are responsible for estimating the conditional expectations relative to the quantile estimate. Thus, for most dynamic data streams it is reasonable to use a value of the EWA tuning parameter, γ, that is on a smaller scale than λ [19]. This is verified in our experiments. In the rest of the paper, we denote this EWA quantile estimator approach for QEWA. We end this section with a remark.

Remark1: We evaluated a second approach based on estimating the streaming distribution, fn(x), and computing the unknown conditional expectations from the estimated distribution. The streaming distribution were estimated by tracking several quantiles Qn(q1),Qn(q2),…,…,Qn(qK) and a linear spline were interpolated between the quantile estimates. However experiments showed that the QEWA approach performed better than this spline approach. The spline approach therefore is not followed any further in the paper.

5 Experiments based on synthetic data

In this section we perform a thorough comparison of the performance of the suggested algorithm QEWA and other quantile estimators in the literature. Figure 1 shows tracking of the quantile with probability q = 0.7 for the suggested algorithm QEWA and DUMIQE. The true quantile is given as the dashed black line. The tuning parameters are adjusted such that the estimation error in the stationary parts after convergence is the same for the two algorithms. We see that the proposed algorithm QEWA tracks the true quantile more efficiently after a switch than the DUMIQE. For the suggested algorithm, the step size is proportional to the difference between the observations and the quantile estimate (recall (4)). After a switch, these differences are large, and our devised algorithm makes large steps to get back on track. The DUMIQE, and the other state-of-the art incremental algorithms, use the same step size independent of these difference, resulting in poorer tracking.

Fig. 1
figure 1

Quantile estimates in every iteration using the DUMIQE and the suggested algorithm QEWA using ratio γ/λ = 1/100

The results below show a more systematic evaluation of the performance of the suggested algorithm against seven state-of-the-art quantile estimators namely the DUMIQE and RUMIQE by Yazidi and Hammer [38], the estimator due to Cao et al. [3], the Frugal approach by Ma et al. [24], the selection algorithm by Guha and McGregor [17] and the DQTRE and DQTRSE algorithms by Tiwari and Pandey [34]. For the DQTRE and DQTRSE algorithms we used values of the tuning parameters recommended in [34], namely α = 0.1,β = (1 − α)λ,pb = 1/10 and l = 1/4 which performed well in our experiments.

The estimator in this paper is designed to perform well for dynamically changing data streams and the experiments will focus on such streams.

We considered four different data cases. For the first case, the data stream distributions were normally distributed and the expectations, μn, varied smoothly as follows

$$\mu_{n} = a \sin \left( \frac{2\pi}{T} n \right), n = 1,2,3, \ldots $$

which is a sinus function with period T. For the second case, the data stream distributions were also normally distributed, but the expectation switched between values a and − a

$$\mu_{n} = \left\{ \begin{array}{ll} a & \text{ if\ } n\text{mod}T \leq T/2 \\ -a & \text{ else} \end{array} \right. $$

We assumed that the standard deviation of the normal distributions did not vary with time and was equal to one. For the two remaining cases, the data stream distributions were χ2 distributed, one with smooth changes and one with rapid switches. For the smooth case the number of degrees of freedom, νn, varied with time as follows

$$\nu_{n} = a \sin \left( \frac{2\pi}{T} n \right) + b, n = 1,2,3, \ldots $$

where b > a such that νn > 0 for all n. For the switch case, the number of degrees of freedom switched between values a + b and − a + b

$$\mu_{n} = \left\{ \begin{array}{ll} a+b & \text{ if\ } n\text{mod}T \leq T/2 \\ -a+b & \text{ else} \end{array} \right. $$

In the experiments we used a = 2 and b = 6.

We estimated quantiles of both the normally and χ2 distributed data streams above using two different periods, namely T = 100 (rapid variation) and T = 500 (slow variation), i.e. in total eight different data streams. For each data stream we estimated the 50, 70 and 90% quantiles ending up with a total of 24 different estimation tasks.

To measure estimation error, we used the root mean squares error (RMSE) for each quantile given as:

$$ \text{RMSE} = \sqrt{ \frac{1}{N}\sum\limits_{n = 1}^{N} \left( Q_{n}(q) - \widehat{Q}_{n}(q)\right)^{2}} $$
(18)

where N is the total number of samples received from the data stream. In the experiments, we used N = 106 which efficiently removed any Monte Carlo errors in the experimental results. In order to obain a good overview of the performance of the algorithms, we measured the estimation error for a large set of different values of the tuning parameters of the algorithms.

Figures 234 and 5 illustrate the results of our experiments. For the normal distribution period case (Fig. 2), we see that the QEWA algorithm outperforms all the algorithms in the literature. In accordance with the analysis in Section 4, the QEWA algorithm performed the best using a small value of the ratio γ/λ. The Cao et al. algorithm struggled with numerical problems for some choices of the tuning parameters and therefore some of the curves are short.

Fig. 2
figure 2

Normal distribution smooth case: The left and right columns show results for T = 100 and T = 500, respectively. The rows from top to bottom show results when estimating quantile Qn(q = 0.5),Qn(q = 0.7) and Qn(q = 0.9), respectively. Ratio refers to the ratio between the tuning parameters, i.e. ratio = γ/λ. The upper x axis refers to the step size in the Frugal algorithms

Fig. 3
figure 3

Normal distribution switch case: The left and right columns show results for T = 100 and T = 500, respectively. The rows from top to bottom show results when estimating quantile Qn(q = 0.5),Qn(q = 0.7) and Qn(q = 0.9), respectively. Ratio refers to the ratio between the tuning parameters, i.e. ratio = γ/λ. The upper x axis refers to the step size in the Frugal algorithms

Fig. 4
figure 4

χ2 distribution smooth case: The left and right columns show results for T = 100 and T = 500, respectively. The rows from top to bottom show results when estimating quantile Qn(q = 0.5),Qn(q = 0.7) and Qn(q = 0.9), respectively. Ratio refers to the ratio between the tuning parameters, i.e. ratio = γ/λ. The upper x axis refers to the step size in the Frugal algorithms

Fig. 5
figure 5

χ2 distribution switch case: The left and right columns show results for T = 100 and T = 500, respectively. The rows from top to bottom show results when estimating quantile Qn(q = 0.5),Qn(q = 0.7) and Qn(q = 0.9), respectively. Ratio refers to the ratio between the tuning parameters, i.e. ratio = γ/λ. The upper x axis refers to the step size in the Frugal algorithms

For the normal distribution switch case (Fig. 3), we see that the QEWA algorithm again outperforms all the algorithms in the literature. Again we see that the QEWA performs best using a small value of the ratio γ/λ.

For the χ2 distribution cases we see that the QEWA algorithm also here outperforms the other algorithms. For q = 0.9, the QEWA algorithm documents competitive results to the best performing alternative algorithms. Also here a small value of the ratio γ/λ is the preferable choice.

Among the alternative algorithms there were no consistency in which algorithms were closest to the performance of the QEWA, but overall the DUMIQE and DQTRE seem to be closest. However, all the alternative algorithms suffer with significantly poorer results than the QEWA for at least some cases. E.g. DQTRE performed poorly when estimating quantiles in the tails (q = 0.9) and DUMIQE for the switch cases.

Tables 234 and 5 show results for the selection algorithm [17]. The algorithm does not have any tuning parameters and the results are presented in the tables. We see that QEWA outperforms the selection algorithm with a clear margin for all the different cases.

Table 2 Normal distribution smooth case: Root mean squared estimation error for the selection algorithm [17]
Table 3 Normal distribution switch case: Root mean squared estimation error for the selection algorithm [17]
Table 4 χ2 distribution smooth case: Root mean squared estimation error for the selection algorithm [17]
Table 5 χ2 distribution switch case: Root mean squared estimation error for the selection algorithm [17]

In summary the QEWA algorithm outperforms all the different state-of-the-art algorithms from the literature. Best performance is achieved using a small value of the ratio γ/λ.

6 Real-life data experiments – concept drift detection

In most challenging data prediction tasks, the relation between input and output data evolves over time. Thus if static relationships are assumed, prediction performance will degrade with time. In the field of machine learning and data mining this phenomenon is referred to as concept drift [13]. Different strategies have been suggested to detect when the performance of the predictive model degrades and thus should be retrained/updated [13]. Current state-of-the-art strategies monitor the average predictive error, but for real-life applications it is often more relevant to control that the prediction error rarely goes above some critical threshold. In this example we demonstrate how to perform concept drift detection and adaptation on such a critical threshold by tracking an upper quantile of the prediction error distribution, e.g. the 80% quantile. As an application domain, we investigate the case of efficient control of indoor climate.

Heating, ventilation and air conditioning (HVAC) systems typically control indoor climate by reacting on the current room conditions such as indoor temperature. However, given the time required for a HVAC system to adjust to changes in the indoor climate, such strategies always will lag behind resulting in poor control of indoor climate and energy usage. This raises the need for building models that forecast future indoor climate temperature and to use this as input to the HVAC system. Zamora-Martínez et al. [39] propose to use artificial neural network (ANN) models to forecast future indoor temperature based on a total of 20 features including outdoor climate variables such as temperature and precipitation amounts and indoor climates variables such as CO2 level. Since more observations are received with time and the relation between input and output may evolve with time, the model is retrained in an online manner. The authors however do not take advantage of concept drift detection in order to efficiently decide when to retrain the model.

We now demonstrate how the suggested quantile estimator in this paper can be used for concept drift detection for the online indoor temperature forecasting problem described above. We consider the same dataset as in [39] where new observation of input and output variables is received every 15 minutes. We forecasted indoor temperature 15 minutes into the future using an autoregressive (AR) model of order one. In addition to the current indoor temperature, the current value of the other 20 features were used as input to the forecasting model. Given the large number of features, regularization of the model parameters was required to get a reliable forecasts and we relied on LASSO regularization [10]Footnote 1.

First, we trained the LASSO AR model based on eight days of observations and used the model to predict 15 minutes into the future each time a new observation was received. The results are shown in Fig. 6. The figure demonstrates that if the model is not retrained after day eight, the forecasting error gradually increases with time (the red line). In other words, the data is subject to concept drift and the forecasting model should be retrained as more observations are received. Instead of retraining the model regularly according to a fixed periodicity which is clearly ineffective, a sophisticated approach consists of retraining the model only if concept drift is detected.

Fig. 6
figure 6

The left and right panels refer to dining room and bed room, respectively. The x-axis refers to the number of days since the observation started. The gray curves show the forecasting error predicting 15 minutes into the future. The red curves show the linear trends in the forecasting error

We now build a concept drift and model retraining procedure based on quantile tracking. We required that the indoor temperature forecasting error rarely should go above two degrees centigrade. We used the QEWA estimator to track the 80% quantile of the forecasting error data stream (the gray curves in Fig. 6). If the quantile estimate went above two degrees centigrade, the model was retrained. We trained the model for the first time after 24 hours of observations. The results are shown in Fig. 7. After the initial training after 24 hours of observations, the 80% quantile estimate of the forecasting error distribution went above two degrees three times and each time the model was retrained. The results demonstrates that by a few selected retrainings of the model, the forecasting error is controlled, indicated by a horizontal linear trend (red curves) in Fig. 7.

Fig. 7
figure 7

The left and right panels refer to dining room and bed room, respectively. The x-axis refers to the number of days since the observation started. The gray curves show the forecasting error predicting 15 minutes into the future. The blue curves show tracking of the 80% quantiles of the forecasting error data streams. The black dots along the x-axis show when the model was retrained. The red show the linear trends in the forecasting error

In conclusion, the example demonstrates how the suggested quantile estimator can be useful for concept drift detection and model adaptation.

7 Closing remarks

The exponentially weighted moving average of observations is known to be the state-of-art estimator to track the expectation of dynamically varying data stream distributions. In this paper, we have presented an incremental quantile estimator that is in fact a generalized exponential weighted moving average estimator. To the best of our knowledge, this is the first quantile estimator in the literature that falls within this well-known class of efficient estimators. The experiments show that the estimator outperforms state-of-the-art quantile estimators in the literature.

We demonstrate how tracking of quantiles has application in the field of machine learning. More particularly, we show how the suggested estimator can be used for tracking quantiles of the prediction error distribution in order to detect when a machine learning model should be retrained.

A potential ally for future research is to extend the QEWA estimator to simultaneously track multiple quantiles. One could of course, just run the QEWA estimator for each quantile of interest, but this could potentially lead to a violation of the monotone property of quantiles. The monotone property of quantiles, refers to the requirement that an estimate of a higher quantile should be always bigger than an estimate of a lower quantile e.g. the 50% quantile should always be above the 30% quantile.