1 Introduction

How can we discover currently emerging patterns in high-speed data streams, like keyword streams from social networks or click streams from e-commerce sites? How to track them in real time with high accuracy and small memory requirements? How can we guarantee results to be self-consistent? These questions are directly related to recently frequent pattern mining in a data stream.

Formally, a data stream is defined by a sequence of transactions, each of which is a set of items, arriving one by one. Usually, the length of the stream and the number of distinct items are very large numbers, possibly infinite. Due to this massiveness, it is impossible to store all the information from the stream, and thus it becomes important to efficiently use memory spaces. As a result, most data stream mining algorithms [29, 30] perform approximation rather than exact computation, and the followings are generally required [6]. First, the stream should be scanned as few times as possible: only one scan is enough for many recent algorithms. Second, the amount of used memory spaces should be limited and independent of the number of distinct items and the stream length, e.g. O(k) space complexity for finding top-k frequent items. Third, processing a transaction at each time should be fast because the rate of transaction arrival can be bursty, e.g. Internet traffic may be exploded by network anomalies like DDoS (Distributed Denial of Service) attacks. Fourth, an up-to-date result should be available on demand. These requirements allow that data stream mining algorithms run in real time with small memory spaces.

A number of studies [9, 10, 17, 21, 32, 38] have shown efficacy of their methods in finding frequent patterns including items and itemsets in a data stream, but still it is not clear whether they can find recent frequent patterns (items or itemsets) correctly. Although a pattern whose frequency decreases over time tends to have a small count by construction of algorithms, it is not done explicitly. Another problem is that among discovered frequent patterns, it is hard to know when they have become frequent. Depending on applications, the problem is crucial. For example, when we monitor keywords mentioned in SNS, it is important to know which one is the current trend and which one is the past trend. Also, in an e-commerce site, a manager would be interested in sets of products co-purchased frequently not just in the all days but in recent days to understand consumers’ current needs correctly. To overcome this weakness, finding recent frequent patterns from a data stream has been also studied [1, 13, 18]. However, they have limitations in accuracy, running time, and memory usage.

Table 1 Comparison of performance of TwMinSwap, TwSample and competitors
Fig. 1
figure 1

Our proposed algorithm TwMinSwap outperforms the others in both memory usage and error of estimated time-weighted counts of top-k items. Despite comparable estimation of TwHCount(\(10\)) to that of TwMinSwap, TwHCount(\(10\)) requires much more memory space. a Static distribution, b dynamic distribution

Table 2 Comparison of performance of TwMinSwap-Is and the competitor

In this paper, we propose two recently frequent pattern mining algorithms: TwMinSwap for items and TwMinSwap-Is for itemsets. The idea is to count items or itemsets with time-weighting, which means that a value of an item or itemset decreases over time.

TwMinSwap is a deterministic version of our motivating algorithm TwSample which is a sampling-based randomized algorithm with theoretical guarantees. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both algorithms only require O(k) space complexity. Especially, TwMinSwap requires no other parameter than k and \(\alpha \) (time-decaying factor), and this simplicity enables to not only save memory spaces but also reduce per-item processing time. Table 1 compares our proposed TwMinSwap with other competitors, and Fig. 1 shows the plots of memory usage versus error in estimated time-weighted counts for them. TwMinSwap outperforms the others in terms of precision and recall, time-weighted count estimation, and memory usage; its speed is comparable to the fastest competitor TwHCount requiring large memory spaces (see also Fig. 5).

Our second algorithm TwMinSwap-Is is devised for finding the top-k recently frequent itemsets in data streams. TwMinSwap-Is always keeps at most k itemsets, and outputs self-consistent results, i.e. the Apriori propertyFootnote 1 holds in the found top-k itemsets, which makes the results more reliable. Also the maximum size of itemsets found remains reasonable, while the competitor’s can be arbitrary large. Table 2 briefly compares TwMinSwap-Is and the competitor.

Conducting extensive experiments, we demonstrate that TwMinSwap finds top-k time-weighted frequent items with small memory spaces, and small error in terms of precision and recall, and estimated time-weighted counts. Also applying TwMinSwap to real-world data streams, we show that tracking recently frequent activities and keywords enables to discover sudden bursts of attentions to currently hot events and trends in real time. We also evaluate TwMinSwap-Is in accuracy and non-triviality of results. TwMinSwap-Is outputs top-k itemsets whose time-weighted counts are highly correlated with the ground truth, and the result includes an itemset of a size at most 5–7 where a large proportion of transactions in data are of a size less than 7: depending on datasets, up to \(95\%\) of transactions have a size smaller than or equal to 7.

Our contributions are summarized as follows.

  1. 1.

    Method Based on time-weighted counting, we propose TwMinSwap and TwMinSwap-Is for finding top-k recently frequent items and itemsets, respectively. TwMinSwap with O(k) memory requirement is a deterministic variation of our motivating randomized algorithm TwSample. Moreover, extending TwMinSwap to itemsets, we develop TwMinSwap-Is which guarantees the Apriori property in the results.

  2. 2.

    Performance We show that TwMinSwap outperforms in precision, accuracy and time-weighted count estimation, and that the speed of TwMinSwap is comparable to that of the fastest competitor TwHCount. Especially, TwMinSwap is \(1.5 \times \) better in time-weighted count estimation and \(2.5 \times \) better in memory usage than the second best competitors. We also show that TwMinSwap-Is finds itemsets whose time-weighted counts show high correlation with the true values, and whose sizes are non-trivial, i.e. 5–7.

  3. 3.

    Discovery We apply TwMinSwap and TwMinSwap-Is to real-world data streams and show interesting discoveries. They include difference of trends between the winner and the loser of U.S. presidential candidates, and different human contact patterns in real life.

The rest of the paper is organized as follows. In Sect. 2, we give related works including algorithms with and without time-weighting. In Sect. 3, we describe and analyze TwSample and TwMinSwap for recently frequent items, and in Sect. 4 we develop TwMinSwap-Is for recently frequent itemsets. In Sects. 5 and 6, performance evaluation results, including comparison with competitors, are presented for TwMinSwap and TwMinSwap-Is, respectively. In Sect. 7, we show the discovery results of applying TwMinSwap to real-world data streams. Finally, we conclude our work in Sect. 8.

Table 3 lists the symbols frequently used in this paper.

Table 3 Table of symbols

2 Related works

2.1 Finding frequent items

There have been numerous studies to find frequent items from a data stream, including not only developing methods but also comparing them [10, 32]. Here, we classify them into three categories as follows.

Sampling-based Approach This approach involves a probabilistic process, and obtained items are random samples over all the items occurring in a data stream. Vitter [37] proposed a uniform sampling method from a data stream by which, in essence, higher frequency items are sampled much more than lower ones. Improving the space requirement for the sampling, Gibbons and Matias [17] invented a method called concise sampling which efficiently represents sampled items. By slightly modifying the method, they also proposed counting sampling to estimate the frequency of each item more accurately. A similar approach was adopted in [33] to obtain high frequency items.

Counter-based Approach A very basic form of the counter based approach is Majority that finds the majority item in a stream if it exists [3, 16]. By generalizing Majority, Misra and Gries [36] developed a method to find items occurring at least N / k times and it was improved in per-item processing time by Demaine et al. [14] and Karp et al. [22]. LossyCounting [33] does the same job, but it additionally guarantees that no item whose count is less than \(N(1/k - \epsilon )\) is reported. SpaceSaving [35] reduces the space requirement not only for an arbitrary data distribution but also for a Zipf distribution.

Sketch-based Approach The sketch-based approach is usually based on using multiple hash functions to map incoming items to a hash table. This can be also understood as maintaining a list of independent counters where each counter is shared by a few items, and the sharing is determined by the hash functions. Charikar et al. [8] proposed CountSketch that computes items appearing at least \(N/(k+1)\) times with probability \(1-\delta \) while requiring \(O(k/\epsilon ^2 \log N/\delta )\) memory spaces. The space requirement was improved by CountMin [11]. GroupTest [12] was developed for a hot item query, which groups items and assumes one frequent item in each group. Jin et al. [20] improved GroupTest in space and their algorithm guarantees the minimum count of items outputted.

2.2 Finding recent frequent items

Despite many algorithms to find frequent items from a data stream, researchers have agreed that recent items are more important than old ones and answering a query of finding recently frequent items is often required. Below, we introduce two approaches for the purpose.

Sliding Window-based Approach This approach divides recent times into window blocks, performs counting for items in the windows, and aggregates them. Golab et al. [18] proposed a method based on basic window blocks for identifying frequent items in packet streams in which item frequency is known to follow a power-law distribution. They also extended the work to the more general case that the item frequency follows a multinomial distribution [19]. There have been works using windows of various sizes. Arasu and Manku [1] provided deterministic and randomized algorithms for \(\epsilon \)-approximate quantiles over sliding windows. Dallachiesa and Palpanas [13] studied the problem in ad hoc time windows. They use overlapping window blocks whose sizes exponentially grow by which a query for frequent items during a certain recent time period can be answered more accurately.

Time-aware-based Approach This approach implicitly considers the recentness. Liu et al. [31] proposed a pruning method, a key operation of counter-based algorithms, considering time information so that an older item is more likely to be pruned than a more recent one when the memory becomes full. A similar approach has been examined in [9, 39, 40]. All of them adopt a time fading factor in developing methods to find frequent items which decreases a weight of an item over time. As a result, recent items have more weights, leading to more accurate results. Our proposed algorithms, whose preliminary version appeared in [28], in this paper also belong to this category. We show that time-weighted counting can be done via sampling which guarantees its accuracy in expectation, and propose our main algorithm TwMinSwap via derandomization.

2.3 Finding frequent itemsets

In this problem, an object from a data stream is a set of items called a transaction, and every subset of the transaction is a candidate to be found as a frequent itemset. This means that there is an exponentially many candidates on the length of a transaction, and thus the problem becomes more challenging than frequent item mining.

LossyCounting has been refined for frequent itemset mining and that with an information decaying parameter [7, 33]. The problem has been studied with various aspects. Examples include focusing maximal frequent itemsets [23, 27], targeting streams with bursty transactions [38], and window-based approaches [5, 26]. Several studies [6, 7] have taken a similar approach of our time-weighted counting. They preserve the Apriori property using a prefix-tree lattice structure, but its space requirement increases as the number of distinct items gets larger. In contrast, our proposed algorithm TwMinSwap-Is maintains only O(k) entries for the top-k recently frequent itemsets as well as keeping the Apriori property.

3 Recently frequent item mining

In this section, we propose our method for finding time-weighted top-k items from a data stream. The main idea is derandomization of a sampling-based algorithm. As a result, we propose a deterministic algorithm TwMinSwap, which requires only O(k) memory spaces where k is the number of time-weighted frequent items that we want to find. To develop TwMinSwap, we first propose a sampling-based randomized algorithm TwSample to guarantee performance in expectation, which helps understand a theoretical background of TwMinSwap. However, the probabilistic nature of TwSample requires several independent sampling sessions to achieve high accuracy, leading to large memory spaces and slow running time. On the other hand, TwMinSwap achieves high accuracy with fast running time while using only a single session.

We start with the definition a time-weighted count of an item [9, 39, 40].

Definition 1

(Time-weighted Count) Let u be an item occurring in a data stream at times \(t_1,\ldots ,t_c\). The time-weighted count of the item u is defined by

$$\begin{aligned} W(u) = \sum _{i=1}^c \alpha ^{t_{cur}-t_i}, \end{aligned}$$

where \(0<\alpha < 1\) is a decaying parameter and \(t_{cur}\) is the current time.

3.1 Randomized algorithm

To develop a sampling-based randomized algorithm, we first define a penalized time-weighted count as follows.

Definition 2

(Penalized Time-weighted Count) Let u be an item occurring in a data stream and let \(T_u\) be a set of the times at which u has occurred. The penalized time-weighted count of the item u is defined by

$$\begin{aligned} L(u) = \sum _{t\in T_u} \alpha ^{t_{cur}-t + \lambda - 1}, \end{aligned}$$

where \(0<\alpha < 1\) is a decaying parameter, \(t_{cur}\) is the current time, and \(\lambda \ge 1\) is a default penalty term for items just arriving.

Let the sequence of items till time \(t_{cur}\) be \(u_1,\ldots , u_{t_{cur}}\). Given \(0<\alpha < 1\) and \(\lambda \ge 1\), our randomized algorithm samples each item \(u_t\) with probability \(\alpha ^{t_{cur}-t+\lambda -1}\). This is incrementally done with increasing \(\lambda \) over time to ensure that the number of distinct items in the samples is at most k. Indeed, our algorithm can be understood as an extension of the uniform sampling in a data stream [17] to a time-weighted sampling. Below, we call that u is monitored if u has been sampled at least once.

Precisely, our randomized algorithm TwSample requires three parameters: the maximum number k of monitored items, the time-weighting factor \(\alpha \), and the increase ratio \(\sigma \) for the penalty term \(\lambda \). Also there are three pieces of information incrementally updated: the penalty term \(\lambda \), the set K of monitored items, and counters \(c_v\) associated with each \(v\in K\). Initially, \(\lambda = 1\) and \(K=\emptyset \). A new item u currently arriving is determined whether sampled or not with probability \(\alpha ^{\lambda -1}\). The sampling is done as follows: if u is currently monitored, i.e. \(u\in K\), \(c_u\) is incremented by 1; otherwise u is added to K with its associated counter \(c_u=1\). After the sampling, if \(|K| > k\), downsampling is applied to all the sampled items so far with increasing \(\lambda \). Precisely, \(\lambda \) is incremented by \(\sigma \), and for each \(v\in K\), \(c_v\) is updated by a random number drawn from \(binomial(c_v,\alpha ^{\sigma })\).Footnote 2 If \(c_v\) becomes 0, v is evicted from K. This downsampling is repeated until \(|K|\le k\). Lastly, whenever one time step passes, all items in K are unconditionally downsampled as follows: for each \(v\in K, c_v = binomial(c_v,\alpha )\). Algorithm 1 fully describes TwSample.

figure c

Effect of \(\sigma \) Essentially, \(\sigma \) determines the sampling rate for evicting existing items so that the number of sampled items does not exceed k. As \(\sigma \) gets smaller, the amount of decrements of each item count in Line 14 is reduced, leading to longer time for the eviction process in Line 12 to 15. Accuracy may increase since we do not evict more than required. On the other hand, as \(\sigma \) gets larger, the eviction process finishes quickly, although the accuracy may decrease since we may evict more than one item.

The following lemma shows that the sampling probability of any individual item is equal to its penalized time-weight.

Lemma 1

At time \(t_{cur}\) with the penalty term \(\lambda \), each item u occurring at time \(t\le t_{cur}\) has been sampled with probability

$$\begin{aligned} \Pr \left[ u~\mathrm{{is}}~\mathrm{{sampled}}\right] = \alpha ^{t_{cur}-t + \lambda - 1}. \end{aligned}$$

Proof

Let u be a new item at \(t=1\) and \(t_{cur}=1\). It is unconditionally sampled because \(\lambda = 1\), and all counters are empty. In other words, u is sampled with the probability \(\alpha ^{t_{cur}-t+\lambda -1} = 1\).

Let \(t_{cur}\ge 1\). Assume that the lemma holds for time \(t_{cur}\). That is, for each sample v at time \(t\le t_{cur}\), it has been sampled with probability \(\alpha ^{t_{cur}-t+\lambda -1}\). Let us consider the process for \(t_{next} = t_{cur}+1\). In Line 4, all samples are downsampled with probability \(\alpha \). This means that after Line 4, remaining samples are with probability \(\alpha ^{t_{cur}+1-t+\lambda -1}\) which matches the sampling probability at time \(t_{next}=t_{cur}+1\).

Next, we verify from Line 5 to 11. Let u be a new item occurring at \(t=t_{next}\). Clearly, u is sampled with probability \(\alpha ^{t_{next}-t+\lambda -1}=\alpha ^{\lambda -1}\) by Line 5, regardless of whether u is currently monitored or not. Let us verify the downsampling. Let d be the number of iterations by the while statement in Line 12. Then, after the downsampling, each sample experiences re-sampling with probability \(\alpha ^{d\sigma }\), and thus each sample is with probability \(\alpha ^{t_{next}-t+\lambda +d\sigma -1}\) which matches the update of \(\lambda = \lambda + d\sigma \).

Note that the sampling probability does not increase over time since \(\lambda \) increases or remains the same at every time step. Hence, an item that is not a sample at a certain time cannot be a sample in the future. \(\square \)

From Lemma 1, we obtain the following corollary which states that the expected count of a distinct item is equal to its penalized time-weighted count.

Corollary 1

At any time \(t_{cur}\) in TwSample, the expectation of a counter \(c_u\) of a monitored item \(u\in K\) is

$$\begin{aligned} \mathbb {E}\left[ c_u\right] = L(u) = \sum _{t\in T_u} \alpha ^{t_{cur} - t + \lambda - 1}, \end{aligned}$$

where \(T_u\) is a set of times at which u has occurred.

Since we know \(\lambda \) at any time, we can compute the expected time-weighted count for a monitored item u. That is, the estimated time-weighted count of \(u\in K\) becomes \(\alpha ^{1-\lambda }c_u\). Next, we show that as an item becomes more insignificant, the probability that it is not monitored increases exponentially.

Lemma 2

At any time, the probability \(p_u\) that an item u is monitored satisfies the following inequality:

$$\begin{aligned} p_u \ge 1 - \exp \left( -L(u)/2\right) . \end{aligned}$$

Proof

Note that \(p_u=\Pr \left[ c_u>0\right] \) since \(c_u = \sum _{i=1}^{|T_u|} c_u(i)\) is a random variable where \(c_u(i)\) indicates whether the i-th occurrence of u is sampled or not. Applying the Chernoff bound, we obtain the following inequality:

$$\begin{aligned} p_u = \Pr \left[ c_u>0\right] = 1-\Pr \left[ c_u=0\right] \ge 1 - \exp \left( -\mathbb {E}\left[ c_u\right] /2\right) . \end{aligned}$$

Since \(\mathbb {E}\left[ c_u\right] = L(u)\) by Corollary 1, the proof is done. \(\square \)

Although TwSample is simple and provides the theoretical guarantees of its output, its performance may be degraded due to its probabilistic nature. First, the running time may become slow because the number of iterations for the downsampling with increasing \(\lambda \) is not fixed and the time for drawing random variables from \(binomial(c,\theta )\) used in the downsampling depends on c. Second, discovered top-k items and the associated counters may be inaccurate due to unintendedly large \(\lambda \). This inaccuracy can be resolved by maintaining s number of independent sessions each of which monitors at most k items, but it leads to more memory spaces and longer running time. In the next section, we propose a deterministic variation of TwSample, which is fast and requires a single session of monitored items of size at most k.

3.2 Deterministic algorithm

figure d

In this section, we propose TwMinSwap for efficient top-k time-weighted frequent items discovery. This algorithm is a deterministic version of TwSample with truncating insignificantly old items. Concretely, the time-decaying factor \(\alpha \) has the same meaning as that in TwSample, but affects counts of items in a deterministic manner.

The main idea is to record the expected number of samples for each item directly instead of applying the random process. Concretely, for an item u occurring at \(t\le t_{cur}\), instead of incrementing \(c_u\) by 1 with probability \(\alpha ^{t_{cur}-t}\), we increment \(c_u\) by \(\alpha ^{t_{cur}-t}\) with probability 1. Then, the downsampling with rate \(\theta \) becomes that for each monitored item \(v\in K\), \(c_v = \theta c_v\). With this deterministic scenario, however, we encounter a problem when all counters become full. Precisely, because the expected count for an item occurring at least one time in the stream never becomes 0, it needs pruning for dropping an insignificant monitored item to start monitoring a new item. We propose a simple heuristic for the pruning which does not require additional memory spaces, leading to smaller memory usage compared with the previous approaches [9, 39, 40]. We note that this proposed pruning method here corresponds to decreasing the sampling rate by increasing \(\lambda \) in TwSample.

Details of the heuristic for the pruning are as follows. Let K be a set of currently monitored items where \(|K|=k\), and u be an item just arriving from a data stream. Our pruning method first finds an item \(v^*\in K\) having the minimum time-weighted count \(c_{v^*} = \min _{v\in K}c_v\). If \(c_{v^*} < 1\), we drop \(v^*\) and start monitoring u with initial count 1; otherwise, u is ignored. This swapping of u and \(v^*\) makes sense because \(c_{v^*}\) is computed by a few most recent occurrences of \(v^*\) but smaller than the effect 1 by the single occurrence of u at \(t_{cur}\). In this strategy, a newly added item is not evicted for the next r timesteps even though it never occurs where r is the number of items with a count less than 1 at its addition time. The overall procedure of TwMinSwap is described in Algorithm 2.

Advantages of TwMinSwap are summarized as follows. First, its memory usage is small. For each item, only an item identifier and its time-weighted count are maintained. This simple structure enables to reduce per-item computation compared with similar counter-based approaches [39], and greatly saves memory spaces compared with sketch-based algorithms [9]. Second, TwMinSwap requires the minimal parameters: k and \(\alpha \). This especially gives the benefit of reducing efforts for parameter tuning in practice. Third, in contrast to TwSample, TwMinSwap guarantees to output k number of items so long as \(N\ge k\).

Per-item Processing Time The main time-consuming operations are: (1) computing an item with the minimum count, and (2) multiplying \(\alpha \) to all counters each of which requires scanning K. The second operation can be eliminated by maintaining the most recent time for each item when it occurs [39]. In this way, we benefit in speed when a new item is already monitored. In contrast, the first operation taking O(k) time is unavoidable. Although an efficient data structure was proposed for the case without time-weighting [14], it cannot be applied to our time-weighted case since our counters record real numbers with which difference between two numbers is unfixed. As a result, TwMinSwap requires O(k) computation for each iteration.

3.2.1 Analysis

We analyze TwMinSwap especially for the condition that a monitored item is not evicted from K. More precisely, Lemmas 3 and 4 state that TwMinSwap will not evict items whose frequencies of occurrences are above certain thresholds for a general and a power-law item distribution cases, respectively.

Lemma 3

Let \(0<\alpha < 1\) be a time-decaying parameter of TwMinSwap. Any item with count \(c\ge 1\) will not be evicted from K if it occurs at least once per every \(1-\log _\alpha \gamma \) times where \(\gamma = \min \left\{ 1+\alpha , c\right\} \).

Proof

Assume that \(v\in K\) with count \(c_v=c\ge 1\) at a certain time occurs for every d times. Note that any item whose count is at least 1 is never evicted from K by construction. Let us define the following function.

$$\begin{aligned} g(r+1) = \left( g(r)\alpha + 1 \right) \alpha ^{d-1}, \end{aligned}$$

where \(g(1) = c\alpha ^{d-1}\). It is clear that g(r) is \(c_v\) after \(rd-1\) time steps. Since \(c_v\) always decreases from \((r-1)d+1\) to \(rd-1\), it suffices to show that \(g(r)\ge 1\) with \(d\le 1-\log _\alpha \gamma \) for every \(r\ge 1\) where \(\gamma = \min \left\{ 1+\alpha , c\right\} \).

For \(r=1\), assume that \(c \le 1+\alpha \); then,

$$\begin{aligned} g(1) = c \alpha ^{d-1} \ge c \alpha ^{-\log _\alpha c} = 1. \end{aligned}$$

Assume that \(c > 1+\alpha \).

$$\begin{aligned} g(1) = c \alpha ^{d-1} > (1+\alpha ) \alpha ^{d-1} \ge (1+\alpha ) \alpha ^{-\log _\alpha (1+\alpha )} = 1. \end{aligned}$$

For \(r>1\), assume that \(g(r-1)\ge 1\); then,

$$\begin{aligned} g(r)= & {} \left( g(r-1)\alpha + 1 \right) \alpha ^{d-1} \ge (\alpha + 1) \alpha ^{d-1} \\\ge & {} \min \left\{ 1+\alpha ,c\right\} \times \alpha ^{-\log _\alpha \min \left\{ 1+\alpha ,c \right\} } = 1. \end{aligned}$$

\(\square \)

Below, we show which item is expected not to be evicted from K before the next occurrence of the item for a power-law item distribution.

Lemma 4

Let n be the number of items and let us consider a power-law item distribution with the exponent 2. Any monitored item \(i\le \sqrt{(1-\log _\alpha (1+\alpha ))/1.7}\) with count \(c_i\ge \alpha ^{1-1.7i^2}\) is expected not to be evicted.

Proof

For an item \(i\in [1,n]\), the probability that it occurs in a stream is

$$\begin{aligned} \Pr \left[ \text {i occurs}\right] = i^{-2}/Z, \end{aligned}$$

where Z is the normalization constant. Hence, the expected number of occurrences before i occurs becomes \(i^2Z\). Since an occurrence of i obeys \(geometric(i^{-2}/Z)\), the probability that i does not occur during \(i^2 Z\) time steps is less than 1 / e.

Assume that \(c_i \le \alpha +1\). The following guarantees in expectation that i will not be evicted by Lemma 3:

$$\begin{aligned} 1-\log _\alpha c_i \ge i^2 Z ~ \Longleftrightarrow ~ 1-i^2Z \ge \log _\alpha c_i ~ \Longleftrightarrow ~ \alpha ^{1-i^2Z} \le c_i. \end{aligned}$$

The feasible i should satisfy

$$\begin{aligned} \alpha ^{1-i^2 Z} \le \alpha + 1 ~ \Longleftrightarrow ~ i\le \sqrt{\frac{1-\log _\alpha (1+\alpha )}{Z}}. \end{aligned}$$

Assume \(c_i > \alpha + 1\). The following inequality guarantees in expectation that i will not be evicted by Lemma 3:

$$\begin{aligned} 1 - \log _{\alpha } (1+\alpha ) \ge i^2 Z ~ \Longleftrightarrow ~ i\le \sqrt{\frac{1-\log _\alpha (1+\alpha )}{Z}}. \end{aligned}$$

Finally, \(Z \le \sum _{j=1}^{\infty } j^{-2} = \pi ^2/6 \le 1.7\), which completes the proof. \(\square \)

4 Recently frequent itemset mining

In this section, we describe TwMinSwap-Is to find the top-k time-weighted frequent itemsets in data streams. The overall procedure is similar to TwMinSwap, i.e. maintaining at most k recently frequent itemsets, accepting a new itemset which is likely to be recently frequent, and evicting a maintained itemset whose value is assessed to be smaller than that of the new one.

Algorithm 3 shows the outline of TwMinSwap-Is where Faded, Ordering, MostRel and MostCold will be defined later. At a high level, Faded calculates \(\Theta \subseteq K\) that contains insignificant itemsets; Ordering sorts \(2^I\) in relevance to K where \(2^I\) is the power set of I; \(MostRel(\Omega )\) returns and removes the most relevant itemset to K in \(\Omega \); \(MostCold(\Theta )\) returns and removes the most insignificant itemset in \(\Theta \).

figure e

To define those functions, our approach is to introduce several properties that should be satisfied by the algorithm and propose appropriate implementations of Ordering, Faded, MostRel and MostCold. We start with defining the following concept.

Definition 3

(Closeness) Let K be a set of itemsets. The set K is closedFootnote 3 if and only if the Apriori property holds in K, that is, for every \(\mu \in K\),

$$\begin{aligned} \nu \in 2^{\mu }\setminus \emptyset ~~\Longrightarrow ~~\nu \in K. \end{aligned}$$

Example of Closeness. Let us consider two sets of itemsets as follows: \(K_1=\{a,b,c,d,ab,bc\}\) and \(K_2=\{a,b,c,d,ab,bc,abc\}\). Note that \(K_1\) is closed by definition, but \(K_2\) is not closed because \(ac\in 2^{abc}\) is not an element of \(K_2\).

Let K be the set of itemsets maintained by the algorithm; the desired properties of our itemset mining are as follows.

  • (P1) K should be always closed.

  • (P2) For \(\nu ,\mu \in K\), \(\nu \subset \mu \) implies \(c_{\nu } \ge c_{\mu }\) where \(c_{\nu }\) and \(c_{\mu }\) are the counts of itemsets \(\nu \) and \(\mu \), respectively. Note that \(\subset \) denotes a proper subset.

The first property (P1) is to keep K satisfying the Apriori property, the most fundamental property in frequent itemset mining, in which all subsets of any frequent itemset should be frequent. The second property (P2) is a stronger condition than (P1). Note that in the ideal case, the number of occurrences of an itemset cannot be larger than that of its any subset.

To achieve the properties (P1) and (P2), we propose two scoring functions for selecting new itemsets from I and evicting faded itemsets in K, respectively. The first one measures irrelevance of an itemset with respect to a set of itemsets, which is formally defined as follows.

Definition 4

(Irrelevancy of an Itemset) Given an itemset \(\mu \), its irrelevancy with respect to a set K of itemsets is defined as

$$\begin{aligned} f_K(\mu ) = \left| 2^{\mu } \setminus K \right| +\frac{1}{|\mu |}. \end{aligned}$$

The first term of \(f_K(\mu )\) measures the irrelevance of \(\mu \) to K: the score gets smaller as more subsets of \(\mu \) are in K. The second term is used for preferring a larger itemset to a shorter one at the same relevance degree. Note that an itemset gets more relevant to K as its irrelevancy is closer to 0. Next, we define the hotness of an itemset in K, which is used to rank itemsets in K.

Definition 5

(Hotness of an Itemset) Given an itemset \(\mu \in K\) and a new transaction I, the hotness \(h(\mu )\) of \(\mu \) is defined by

$$\begin{aligned} h(\mu ) = \rho _\mu + \frac{1}{|\mu |}, \end{aligned}$$

where \(\rho _\mu = j\) if \(\mu \) has the j-th smallest \(y_\mu = c_\mu +\delta (\mu \subseteq I)\) among all itemsets in K, and \(\delta (\cdot ) = 1\) if \(\cdot \) is true and 0 otherwise.

In terms of hotness, an itemset is preferred as its size gets smaller at the same count. This is the same as sorting itemsets in K by their counts first and by their sizes for the ties. Note that an itemset has a higher hotness score as its time-weighted count becomes larger, i.e. as its recent frequency becomes larger.

With these two scoring functions, we determine the four main operations \({ Ordering}, { Faded, MostRel}\) and \({ MostCold}\) as follows.

  • Ordering constructs a priority queue for all itemsets generated by a new transaction I, where an itemset \(\mu \)’s priority is inversely proportional to the irrelevancy \(f_K (\mu )\).

  • Faded constructs a priority queue for itemsets in K, where an itemset \(\mu \)’s priority is inversely proportional to the hotness \(h(\mu )\). During the construction, only itemsets \(\mu \in K\) such that \(c_{\mu } < 1\) and \(\mu \nsubseteq I\) are considered.

  • MostRel and MostCold become pop operations for the priority queues by Ordering and Faded, respectively.

In terms of implementations, Faded is easy because given a transaction I, counts and sizes for itemsets considered in Faded do not change from Line 4 to Line 17 of Algorithm 3. As a result, Faded outputs an ordinary queue consisting of itemsets sorted in the ascending order of their hotness. Accordingly, MostCold becomes a pop operation of the queue.

In contrast, there is a performance issue when implementing Ordering and MostRel because the irrelevancy of an itemset changes depending on the previously inserted itemsets (subsets of I) into K. A naive approach is to compute the most relevant itemset per request, that is, whenever MostRel is called. However, the computation is expensive. We present an efficient implementation for Ordering and MostRel in Sect. 4.1.

Lastly, we prove that Algorithm 3 satisfies the properties (P1) and (P2). We start with examining addition to and deletion from the itemsets K in Algorithm 3. First, Lemma 5 states that (P1) holds if itemsets are added to K by Ordering and MostRel without deleting itemsets from K.

Lemma 5

Assume that (P1) and (P2) hold. Let \(\psi _i\) be an itemset returned by the i-th call to \(MostRel(\Omega )\) where \(\Omega =Ordering(I,K)\). Then, \(K\cup \{\psi _1,\ldots ,\psi _i\}\) is closed for every \(0\le i\le 2^{|I|}-1\).

Proof

Let \(X_i = \{\psi _1,\ldots ,\psi _i\}\). For \(i=0\), the statement holds since \(X_i=\emptyset \). Assume that for \(0\le i \le 2^{|I|}-2\), \(K'_i = K\cup X_i\) is closed. Letting \(\mu = \psi _{i+1}\), suppose that there is \(\nu \subset \mu \) such that \(\nu \notin K'_{i+1}\). Consider \(\nu \subset \chi \subseteq \mu \). It holds \(\chi \notin K'_i\); if \(\chi \in K'_i\), \(\nu \in K'_i\) since \(K'_i\) is closed. Let \(V = 2^\nu \setminus K'_i\) and \(U = 2^\mu \setminus K'_i\). We know \(|V| \le |U|\) by \(\nu \subset \mu \). We also know \(|V| \ne |U|\) because \(\chi \notin V\) by \(\chi \supset \nu \) and \(\chi \in U\) by \(\chi \notin K'_i\). As a result, we obtain \(|V| < |U|\). Then, \(\nu \) has a higher priority than \(\mu \) as follows.

$$\begin{aligned} f_{K'_i}(\nu ) = |V| + \frac{1}{|\nu |} \le |V| + 1\le |U| < |U| +\frac{1}{|\mu |} = f_{K'_i}(\mu ) \end{aligned}$$

In other words, \(\mu \) is not the itemset of the highest priority at the \((i+1)\)-th iteration, and consequently \(\mu \ne \psi _{i+1}\) which is a contradiction. \(\square \)

Below, Lemma 6 states that (P1) holds if itemsets in K are removed in the order determined by Faded and MostCold without adding itemsets to K.

Lemma 6

Assume that (P1) and (P2) hold. Let \(\phi _i\) be an itemset returned by the i-th call to \(MostCold(\Theta )\) where \(\Theta =Faded(K,I)\). Then, \(K \setminus \{\phi _1,\ldots ,\phi _i\}\) is closed for every \(0 \le i\le |K|\).

Proof

Let \(Y_i = \{\phi _1,\ldots ,\phi _i\}\). For \(i=0\), the statement holds since \(Y_i = \emptyset \). For \(K=\emptyset \), the proof is done. Below, we consider \(K\ne \emptyset \). Assume that for \(0\le i \le |K|-1\), \(K'_i = K\setminus Y_i\) is closed. Letting \(\nu = \phi _{i+1}\), suppose that there is \(\mu \supset \nu \) such that \(\mu \in K'_{i+1}\). From (P2), \(c_\nu \ge c_\mu \); also, if \(\mu \subseteq I\), \(\nu \subset I\). Then, the following equation holds:

$$\begin{aligned} h(\nu ) = \rho _\nu + \frac{1}{|\nu |} > \rho _\mu + \frac{1}{|\mu |} = h(\mu ), \end{aligned}$$

where \(\rho \) is the rank defined by Definition 5. This implies \(\mu \) has a higher priority than \(\nu \), and thus \(\mu \notin K'_{i+1}\) which is a contradiction. \(\square \)

Lemmas 5 and 6 state that if K is closed, adding itemsets by \(\{Ordering,MostRel\}\) to and removing itemsets by \(\{Faded,MostCold\}\) from K keep K closed, respectively. Next, we show that (P1) is preserved by TwMinSwap-Is where both the addition and the removal operations are performed.

Lemma 7

TwMinSwap-Is guarantees (P1).

Proof

Initially, \(K=\emptyset \), and thus (P1) and (P2) hold. Assume that (P1) and (P2) hold at the beginning of a certain iteration, and let I be a new transaction. Let \(X\subseteq 2^I\) be a set of itemsets added to K and \(Y\subseteq K\) be a set of itemsets deleted from K during the iteration. We already show that two operations \(K=K\cup X\) and \(K=K\setminus Y\) preserve (P1) by Lemmas 5 and 6, respectively.

Suppose that \(K'=K\cup X\setminus Y\) is not closed. This means that there is at least one pair of itemsets \(\mu \in K'\) and \(\nu \notin K'\) such that \(\nu \subset \mu \). There are the following two cases: \(\mu \in K\setminus Y\) or \(\mu \in X\). Assume that \(\mu \in K\setminus Y\); then \(\nu \in Y\) since K is closed. But, this implies \(K\setminus Y\) is not closed, which is a contradiction to Lemma 6. Assume that \(\mu \in X\). It holds \(\nu \in Y\) since \(\nu \in K\cup X\) by Lemma 5. By the definition of Faded, Y does not contain any itemset that is a subset of I; but \(\nu \subseteq I\) since \(\nu \subset \mu \in X\), which is a contradiction. Consequently, \(K'\) is closed. \(\square \)

The following lemma states that subsets of I already contained in K are considered before those not in K.

Lemma 8

Assume that (P1) and (P2) hold. Let \(\psi _i\) be an itemset returned by the i-th call to \(MostRel(\Omega )\) where \(\Omega =Ordering(I,K)\). If i is the largest number such that \(\psi _i\in K\), then for every \(1\le j\le i\), \(\psi _j\in K\).

Proof

The size of any itemset is finite. For \(\mu \subseteq I\) contained in K, \(|2^\mu \setminus K| = 0\), and for \(\nu \subseteq I\) not contained in K, \(|2^{\nu } \setminus K| \ge 1\). Thus, always \(f_K(\mu ) < f_K(\nu )\). \(\square \)

With Lemma 8, Algorithm 3 guarantees that all itemsets \(\mu \) such that \(\mu \subseteq I\) and \(\mu \in K\) are unconditionally considered, i.e. \(c_{\mu } = c_{\mu } + 1\) at Line 9 of Algorithm 3. This is because processing such \(\mu \) does not affect K. In other words, the while loop of Algorithm 3 never reach Line 15 before counts of all itemsets \(\mu \in K\) are incremented by 1 at Line 9.

Lemma 9

TwMinSwap-Is guarantees (P2).

Proof

Initially, \(K=\emptyset \), and thus both (P1) and (P2) hold. Assume that (P1) and (P2) hold at the beginning of a certain iteration; let I be a new transaction. Note that \(c_\nu = 0\) if \(\nu \notin K\). We denote K and \(c_\mu \) after processing I by \(K'\) and \(c'_\mu \), respectively. Consider \(\mu \in K'\) and its proper subset \(\nu \ne \emptyset \).

Case 1 \(\mu \notin K\). It means that \(\mu \) is a new itemset added to K during the iteration, and thus \(c'_\mu =1\) by Line 13. Also it implies \(\mu \subseteq I\) and also \(\nu \subset I\). Assume \(\nu \in K\). We obtain \(c'_\nu = \alpha c_\nu + 1 > 1 = c'_\mu \) from Lemma 8. Assume \(\nu \notin K\). We know by Lemma 7 that \(\nu \) is added to K during the iteration as like \(\mu \). That is, \(c'_\nu = 1 = c'_\mu \). Consequently, \(c'_\mu \le c'_\nu \).

Case 2 \(\mu \in K\). It holds \(\nu \in K'\) and \(\nu \in K\) due to Lemma 7. In addition, \(c_\mu \le c_\nu \) holds. Assume \(\mu \subseteq I\). It also implies \(\nu \subseteq I\); then due to Lemma 8, both \(c_\nu \) and \(c_\mu \) are guaranteed to increase by 1 in Line 9. Thus, \(c'_\mu = \alpha c_\mu +1 \le \alpha c_\nu +1 = c'_\nu \). Assume \(\mu \nsubseteq I\); then \(c'_\mu = \alpha c_\mu \le \alpha c_\nu \le c'_\nu \). \(\square \)

The following lemma shows that TwMinSwap-Is does not miss itemsets that occur at a rate above a certain threshold.

Lemma 10

Let \(0<\alpha < 1\) be a time-decaying parameter of TwMinSwap-Is. Any itemset with count \(c\ge 1\) will not be evicted from K if it occurs at least once per every \(1-\log _\alpha \gamma \) times where \(\gamma = \min \left\{ 1+\alpha , c\right\} \).

Proof

Considering an itemset instead of an item, this is proved by the proof of Lemma 3. \(\square \)

4.1 Implementation for ordering and MostRel

In this section, we describe an efficient implementation for Ordering and MostRel.

As stated previously, Ordering returns a priority queue \(\Omega \) for \(2^I\) where a smaller irrelevancy score implies a higher priority, and MostRel is a pop operation for \(\Omega \). The problem is that adding an itemset \(\mu \in 2^I\) to K may change irrelevancies of other itemsets. For instance, let \(I=\{a,b\}\) and \(K=\{\{a\}\}\); the irrelevancy \(f_K(\{a,b\})\) of \(\{a,b\}\) is currently \(3.5 = |2^{\{a,b\}}\setminus K| + 1/|\{a,b\}|\), but if we add \(\{b\}\) to K, \(f_K(\{a,b\}) = 2.5\).

A naive solution is to search for the itemset of the highest priority whenever MostRel is called; accordingly Ordering performs nothing. This approach, however, results in \(O(|2^I||K|^2)\) time complexity for each iteration, which is discussed in Sect. 4.1.3. In contrast, our implementation presented in Sects. 4.1.1 and 4.1.2 reduces it to \(O\left( |K|\left( |I| + \log |K|\right) \right) \), as shown in Lemma 13.

In our implementation, Ordering constructs a priority queue \(\Omega \) as a triple \((\Phi ,\mathcal {T},z)\) where \(\Phi \) is an ordered list only for the itemsets of the highest priorities, \(\mathcal {T}\supseteq \Phi \) additionally contains unordered itemsets that have relatively lower priorities than those in \(\Phi \), but have relatively higher priorities than the others, and \(z_\mu \) for \(\mu \in \mathcal {T}\) conceptually records how many subsets of \(\mu \) are kept in K. Here, \(\mathcal {T}\) is a candidate set for \(\Phi \), and every \(\nu \in \mathcal {T}\) is contained in \(\Phi \) if \(z_\nu \) exceeds a certain threshold which will be specified in Sect. 4.1.1. Our MostRel returns the itemset \(\mu \) of the highest priority in \(\Omega =(\Phi ,\mathcal {T},z)\), and updates \(\Omega \) according to the change of irrelevancies of itemsets by the addition of \(\mu \) to K.

4.1.1 Ordering

Ordering has two tasks. First, it increases the counter of every itemset in \(K\cap 2^I\) by 1. That is, all itemsets originally considered at Line 9 of Algorithm 3 are processed before the while loop. By Lemma 8, this preserves the correct processing order for \(2^I\).

Second, Ordering constructs a priority queue \(\Omega \) where a smaller irrelevancy implies a higher priority. The queue \(\Omega \) keeps a correctly ordered list \(\Phi \) only for itemsets with the highest priorities. Precisely, \(\Phi \) contains every itemset \(\mu \in 2^I\) such that \(2<f_K(\mu )\le 3\) which means that all nonempty proper subsets of \(\mu \) are already in K. Note that adding any itemset in \(\Phi \) to K guarantees the (P1) property.

In addition, a set \(\mathcal {T} \subseteq 2^I\) of itemsets, where \(\Phi \subseteq \mathcal {T}\), with relatively lower priorities than \(\Phi \) is kept unordered, which is a wait-list for addition to \(\Phi \). Precisely, the unordered set \(\mathcal {T}\) includes every itemset \(\nu \in 2^I\) whose any subset with length \(|\nu |-1\) is in K, and every singleton of \(2^I\). Each itemset \(\nu \in \mathcal {T}\) has an associated value \(z_\nu \) recording the number of subsets of \(\nu \) already contained in K, i.e. \(|2^\nu \cap K|\). Instead of the exact number, we set \(z_\nu \) to the number of such subsets whose length is at least \(|\nu |-1\), which enables to reduce time cost for updating \(z_\nu \). Note that by Lemma 7, \(z_\nu = |\nu |\) implies that all nonempty proper subsets of \(\nu \) are contained in K. This also means that \(z_{\mu } = |\mu |\) for every \(\mu \in \Phi \). Similarly, \(z_\nu = |\nu |+1\) implies \((2^\nu \setminus \emptyset ) \subseteq K\). As a result, the priority queue \(\Omega \) returned by Ordering consists of the triple \((\Phi ,\mathcal {T}, z)\).

To describe \(\Omega = (\Phi ,\mathcal {T},z)\) more formally, we define the following two sets related to an itemset.

Definition 6

(1-smaller Subset) Given an itemset \(\mu \), its 1-smaller subsets \(S(\mu )\) consists of subsets of \(\mu \) whose length is smaller than \(\mu \) by 1. That is,

$$\begin{aligned} S(\mu ) = \{ \nu \subset \mu : |\nu | = |\mu |-1 \}. \end{aligned}$$

We especially denote the inclusion of \(\mu \) to \(S(\mu )\) by

$$\begin{aligned} S^+(\mu ) = S(\mu ) \cup \{\mu \}. \end{aligned}$$

Definition 7

(1-larger Superset) Given an itemset \(\mu \) and a transaction I, its 1-larger supersets \(L_I(\mu )\) in I consist of supersets of \(\mu \) in I whose length is larger than \(\mu \) by 1. That is,

$$\begin{aligned} L_I(\mu ) = \{ \nu \subset I : \mu \subset \nu \text { and } |\nu | = |\mu | + 1 \} \end{aligned}$$

We especially denote the inclusion of \(\mu \) to \(L_I(\mu )\) by

$$\begin{aligned} L^+_I(\mu ) = L_I(\mu ) \cup \{\mu \}. \end{aligned}$$

Letting \(\Psi = (2^I\cap K) \cup \emptyset \), the priority queue \(\Omega =(\Phi ,\mathcal {T},z)\) is determined by Ordering as follows.

$$\begin{aligned} \begin{aligned} \mathcal {T}&= \bigcup _{\mu \in \Psi } L_I^+(\mu ), \\ z_\mu&= |S^+(\mu ) \cap K|,~\forall \mu \in \mathcal {T}, \\ \Phi&= \{\mu \in \mathcal {T} : z_\mu = |\mu |\}. \end{aligned} \end{aligned}$$
(1)
figure f

Algorithm 4 presents the whole procedure of Ordering. In Line 1, Scan(KI) performs the first task. It computes \(\Psi =K\cap 2^I\), which is sorted in the descending order of their lengths, while increasing \(c_\mu \) by 1 for every \(\mu \in \Psi \).

After appending \(\emptyset \) to \(\Psi \), the second task is performed, i.e. \(\Phi , \mathcal {T}\), and z are constructed. Every \(\mu \in \Psi \) is added to \(\mathcal {T}\) with \(z_\mu =|\mu |+1\) since \(\mu \in K\). For each \(\nu \in L_I(\mu )\), if \(\nu \in \mathcal {T}\), we set \(z_\nu = \min (z_\nu +1, |\nu |+1)\): i.e. if \(\nu \) is an element of \(\Psi \), \(z_\nu = |\nu | + 1\) remains unchanged. If \(\nu \notin \mathcal {T}\), \(\nu \) is added to \(\mathcal {T}\) with \(z_\nu = 1\). After that, if \(z_\nu \) becomes equal to \(|\nu |\), \(\nu \) is added to the tail of \(\Phi \).

Note that itemsets in \(\Phi \) are sorted in the ascending order of their irrelevancy scores \(f_K\). This is because 1) \(|2^\nu \setminus K|=2\) by \(z_\mu = |\mu |\) for every \(\mu \in \Phi \), and 2) \(\Phi \) is sorted in the descending order of their lengths due to \(\Psi \) considered in the descending order of itemset lengths at Line 4 of Algorithm 4. Also note that \(f_K(\mu ) < f_K(\nu )\) for \(\mu \in \Phi \) and \(\nu \in 2^I\setminus K\setminus \Phi \) because all nonempty proper subsets of \(\mu \) are in K, but not for \(\nu \), i.e. \(f_K(\mu )\le 3\) and \(f_K(\nu )>3\).

Fig. 2
figure 2

Example of \(\Omega =(\Phi , \mathcal {T}, z)\) and its change by MostRel. The boxes correspond to \(\mathcal {T}\) where the green ones correspond to \(\Phi \) and the gray ones contain itemsets already in K. Each box shows the corresponding itemset \(\mu \) and \(z_\mu \). The attached number to a green box denotes the priority in \(\Phi \) where a smaller number means a higher priority. The dotted line denotes set inclusion relation for two sets; note that for the itemset \(\mu \) of each non-gray box, \(z_\mu \) is equal to the number of dotted lines attached to the bottom of the box. Calling MostRel returns \(\{a,c\}\) with the highest priority and updates \(\Omega \) as in the rightmost figure. After that, Algorithm 4 adds \(\{a,c\}\) to K and evicts \(\{e\}\) from K since \(\{e\}\) is the only element that neither has the count at least 1 nor is an element of \(2^I\), i.e. \(\{e\}\) is the only element of \(\Theta =Faded(K,I)\). This completes the iteration since \(\Theta \) becomes empty (color figure online)

Figure 2 shows an example of \(\Omega =(\Phi ,\mathcal {T},z)\) constructed by Ordering for a given K and I. The leftmost box corresponds to the current K and I; the middle corresponds to \(\Omega \) after calling Ordering(IK). The itemset and the number in each box are an element of \(\mathcal {T}\) and the corresponding z value, respectively. The itemset in each gray box corresponds to \(\mu \in 2^I\cap K\) for which the associated count \(c_\mu \) increases by 1 in Scan(KI). The green boxes correspond to \(\Phi \), and the priority of the itemset is attached to each box where a smaller number implies a higher priority. Note that all nonempty proper subsets of the itemset in each green box are already in K, and \(\{a,c\}\) has a higher priority than \(\{d\}\) because \(|\{a,c\}| > |\{d\}|\).

As a result, \(\Phi \) consists of itemsets of the highest priorities with the correct order.

Lemma 11

The time complexity of Algorithm 4 is \(O\left( |K|\left( |I| + \log |K|\right) \right) \) with a constant itemset size.

Proof

Line 1 requires \(O(|K|\log |K|)\). First, computing \({\hat{\Phi }} = K\cap 2^I\) takes O(|K|) time since each element \(\mu \in K\) is verified if it is a subset of I or not by checking each \(u\in I\) for every \(u'\in \mu \). Second, sorting \({\hat{\Phi }}\) takes \(O(|K|\log |K|)\) since \(|{\hat{\Phi }}| \le |K|\).

For every set \(\mu \), \(|L_I(\mu )| \le |I|\) by definition. Thus, the outer and inner for loops run at most \(|K|\ge |\Phi |\) and \(|I|\ge |L_I(\mu )|\) iterations, respectively. As a result, Algorithm 4 has \(O\left( |K|\left( |I| + \log |K|\right) \right) \) time complexity. \(\square \)

4.1.2 MostRel

Next, we explain MostRel in detail which returns the itemset \(\mu \) of the highest priority in \(\Omega =(\Phi ,\mathcal {T},z)\). Finding \(\mu \) is trivial because \(\Phi \) is sorted in the ascending order of \(f_K\), i.e. the desired itemset is at the head of \(\Phi \). As shown in Algorithm 3, \(\mu \) returned by MostRel is added to K at Line 12, or the iteration terminates. Note that \(\mu \) never enters Line 9 of Algorithm 3 since the task in Line 89 is already processed in Scan of Algorithm 4. In this scenario, there are two problems: 1) \(\Phi \) can be empty after removing \(\mu \), and 2) \(\Phi \) can be out of order due to the addition of \(\mu \) to K. Thus, we need to update \(\Omega \) appropriately for the next use.

Before adding \(\mu \) to K, \(\Phi \) contains the top-\(|\Phi |\) itemsets in the correct order, and \(z_\nu \) for \(\nu \in \Phi \) is unchanged unless \(\nu \) itself is added to K. In other words, the relative order of itemsets in \(\Phi \) remains the same. Thus, we only need to focus on itemsets newly added to \(\Phi \) by the addition of \(\mu \) to K. Candidates for such itemsets are those \(\nu \) for which \(z_\nu \) increases by 1. Considering these candidates, the update is done for preserving the definitions of \(\Phi , \mathcal {T}\) and z given by Eq. (1) after the addition of \(\mu \) to K.

figure g

Algorithm 5 describes the update procedure. In the algorithm, \(\Phi .pop()\) returns the head itemset of \(\Phi \) and remove it. In Line 2, every \(\nu \) whose \(z_\nu \) changes is examined. If \(\nu \) is already in \(\mathcal {T}\), its \(z_\nu \) increases by 1; otherwise \(\nu \) is added to \(\mathcal {T}\) with \(z_\nu =1\). After that if \(z_\nu = |\nu |\), \(\nu \) is placed at the head of \(\Phi \), i.e. \(\nu \) has the highest priority. This addition guarantees the correct order because for every itemset \(\omega \in \Phi \), \(z_\omega = |\omega |\) and \(|\nu | > |\mu | \ge |\omega |\).

Figure 2 shows the change of \(\Omega =(\Phi , \mathcal {T}, z)\) by MostRel in the rightmost figure. The itemset \(\{a,c\}\) of the highest priority before calling MostRel is popped from \(\Phi \). Note that the update is done for the case \(\{a,c\}\) is added to K so that the box of \(\{a,c\}\) is colored in gray. Accordingly, \(z_{\{a,b,c\}}\) where \(\{a,b,c\}\in L_I(\{a,c\})\) increases by 1 which makes \(\{a,b,c\}\) added to \(\Phi \) because \(z_{\{a,b,c\}}=|\{a,b,c\}|=3\). Note that \(\{a,b,c\}\) has a higher priority than \(\{d\}\) since \(\{a,b,c\}\) is added at the head of \(\Phi \), which matches \(|\{a,b,c\}| > |\{d\}|\). Another element \(\{a,c,d\}\in L_I(\{a,c\})\) is newly added to \(\mathcal {T}\) with \(z_{\{a,c,d\}} = 1\).

Lemma 12

The time complexity of Algorithm 5 is O(|I|) with a constant itemset size.

Proof

The time consumption is determined by the for loop, and \(L_I(\mu ) \le |I|\) for every set \(\mu \). \(\square \)

4.1.3 Time costs of TwMinSwap-Is and naive method

Below, we analyze the time complexity of TwMinSwap-Is.

Lemma 13

The time complexity of TwMinSwap-Is for every new transaction is \(O(|K|(|I| + \log |K|))\) with a constant itemset size.

Proof

The only remaining time-consuming part is the while loop in Line 6 of Algorithm 3. Line 8 of Algorithm 3 is never reached in our implementation, and \(|\Theta |\le |K|\). Hence, the number of iterations by the while loop is O(|K|). Combining Lemmas 11 and 12, the processing for a new transaction takes \(O\left( |K|\left( |I| + \log |K|\right) \right) \). \(\square \)

Comparing with our implementation, the naive approach described at the beginning of Sect. 4.1 is greatly degraded. It requires O(1) and \(O(|2^I||K|)\) times for Ordering and MostRel, respectively, since for every \(\mu \in 2^I\), \(f_K(\mu )\) should be calculated whenever MostRel is called. More worse is that MostRel is called at most K times while Ordering is called only one time for every new transaction. Consequently, the naive approach takes \(O(|2^I||K|^2)\) time for each transaction. Consequently, TwMinSwap-Is is orders of magnitude faster than the naive method.

5 Experiments on frequent item mining

In this section, we present experimental results to show the performance of our proposed TwMinSwap with synthetic data streams. Especially, we want to answer the following questions:

  1. Q1

    How many top-k time-weighted frequent items can we discover?

  2. Q2

    How accurately can we estimate time-weighted counts of the discovered items?

  3. Q3

    How fast is TwMinSwap?

5.1 Setup

We consider two types of data streams generated from power-law distributions to simulate bursty item occurrences where N is the stream length and n is the number of distinct items.

  • Static distribution: N size-one transactions are generated from the following power-law distribution.

    $$\begin{aligned} \Pr \left[ i~\text {is generated}\right] \propto i^{-\beta }, \end{aligned}$$

    where \(i\in [1,n]\).

  • Dynamic distribution: for \(0\le r\le 1\), the first rN size-one transactions are generated from

    $$\begin{aligned} \Pr \left[ i~\text {is generated}\right] \propto i^{-\beta }, \end{aligned}$$

    and the last \((1-r)N\) size-one transactions are generated from

    $$\begin{aligned} \Pr \left[ i~\text {is generated}\right] \propto (n-i+1)^{-\beta }, \end{aligned}$$

    where \(i\in [1,n]\). This provides the setting that frequent items differ from time-weighted frequent items.

Table 4 shows the difference between the two types of distributions.

Table 4 Top-5 items with and without time-weighting for the two types of item distributions

In our experiments, \(\beta \) is varied in \(\{0.5,0.75,1,1.25,1.5,1.75\}\); \(N=10^6,n=10^4, r=0.8\) and \(k=50\) are fixed. Here, as \(\beta \) gets larger, item frequencies get skewed more. Note that although the stream length N is fixed in our experiments, the per-item time and the space complexities of our algorithms are independent on N.

We consider the following competitors, and in our experiments, all methods are implemented in Java.

  • TwFreq [39]: a counter-based algorithm to find time-weighted frequent items from a data stream.

  • TwHCount [9]: a sketch-based algorithm to find time-weighted frequent items from a data stream.

  • SpaceSaving [10, 35]: a counter-based algorithm to find frequent items from a data stream. Since this is without time-weighting, we compare this algorithm with the others only in precision and recall.

The memory requirements for all the algorithms are as follows. TwMinSwap, TwFreq, and SpaceSaving require O(k) memory spaces; TwSample requires O(sk) where s is the number of parallel sessions each of which independently samples at most k distinct items; TwHCount requires \(O(k + rm)\) where r is the number of hash functions and m is a range size of the hash functions.

Fig. 3
figure 3

Our proposed TwMinSwap shows the best performance in both precision and recall whose values are close to 1, regardless of the distribution types and their \(\beta \) values. For distributions with small \(\beta \) in which frequencies of items are relatively even, TwSample shows low precision and recall, but as \(\beta \) gets larger, the values rapidly increase. TwFreq also shows better performance for large \(\beta \), but the improved precision and recall are limited below 0.8. Regardless of the hash table size, TwHCount shows the second best performance; however, the performance is below that of TwMinSwap. As expected, SpaceSaving is degraded for the dynamic item distributions while performing well on average for the static item distributions. a Static distribution, b static distribution, c dynamic distribution, d dynamic distribution

We denote TwSample with s number of the independent sessions by TwSample(\(s\)), and TwHCount with the hash table size \(w\%\) of n, i.e. \(rm = \frac{wn}{100}\), by TwHCount(\(w\)). For TwSample, we use \(s=10\) and \(\sigma =0.0001\); for TwSample and TwMinSwap, we use \(\alpha =0.99\); for TwHCount, we use the parameters in the original paper [9], and set hash table sizes rm to 1 and \(10\%\) of n.

The overall comparison is summarized in Table 1 and Fig. 1. TwMinSwap outperforms the others in terms of accuracy and memory usage, and its speed is comparable to that of the fastest competitor TwHCount.

5.2 Discovering top-k time-weighted frequent items

Figure 3 presents accuracy of discovered items in terms of precision and recall defined as follows:

$$\begin{aligned} \text {precision} = \frac{|\Theta \cap {\hat{\Theta }}|}{|{\hat{\Theta }}|}, \hbox {and} ~ \text {recall} = \frac{|\Theta \cap {\hat{\Theta }}|}{|\Theta |}, \end{aligned}$$
(2)

where \(\Theta \) and \({\hat{\Theta }}\) are the sets of the true and estimated top-k items, respectively. Overall, our proposed TwMinSwap outperforms other algorithms regardless of the types of item distributions and the exponent values \(\beta \). Its precision and recall are always very close to 1. TwSample(\(10\)) improves precision and recall as \(\beta \) gets larger in general. The reason why the recall of TwSample(\(10\)) is low for \(\beta =1.75\) is that a large amount of probability density is assigned to only fewer items as \(\beta \) gets larger, leading to a small number \({\hat{k}}<k\) of discovered items. The exact numbers are 33 and 36 for the static and the dynamic distributions, respectively. For TwMinSwap and TwHCount, always \({\hat{k}}=50\), and for TwFreq, always \({\hat{k}}\ge 49\).

Fig. 4
figure 4

The estimated time-weighted counts by TwMinSwap are the most accurate. The plots show the error in estimated time-weighted counts of the algorithms for at most k number of discovered items (the lower, the better). While TwHCount(\(10\)) shows the second best performance, TwHCount(\(1\)) using smaller memory spaces performs poorly especially for small \(\beta \). The accuracy of TwFreq is generally better for high rank items than for low rank items. TwSample(\(10\)) shows slightly better performance than TwFreq on average, but in contrast to TwFreq, there is no great degradation of the estimations for low rank items in TwSample(\(10\))

TwFreq shows a similar pattern to TwSample(\(10\)), but its improvement is less significant than TwSample(\(10\)). TwHCount results in high precision and recall regardless of hash table sizes, but they are still below TwMinSwap. SpaceSaving performs quite well for the static item distributions: both precision and recall are about 0.9 on average. However, since SpaceSaving does not consider a time-weighting factor, for the dynamic item distributions its performance is greatly degraded as \(\beta \) gets larger.

The overall result implies that our time-weighted counting plays an important role in finding recent frequent items from data streams.

5.3 Estimating time-weighted counts

Accurately estimating time-weighted counts for discovered items enables to quantitatively compare them. Figure 4 shows estimated time-weighted counts of the discovered top-k items. The error \(\epsilon _i\) for \(1\le i\le k\) is calculated as follows:

$$\begin{aligned} \epsilon _i = \frac{\left| x_i-{\hat{x}}_i\right| }{x_i}, \end{aligned}$$
(3)

where \(x_i\) and \({\hat{x}}_i\) are the true and estimated time weighted counts of the top-ith item, respectively. Notably, TwMinSwap estimates the true time-weighted counts very accurately, which is shown by the blue line (almost overlapped with green).

Since TwFreq provides an upper and a lower bounds of the true time-weighted count, we choose the mean of the two bounds. In general, as ranks of items get lower, its accuracy is generally degraded. One reason of the poor estimation of TwFreq for low rank items is that TwFreq is originally developed to find frequent items having time-weighted counts above a certain threshold rather than top-k ones though it maintains at most k items. TwSample(\(10\)) shows the third best performance and estimates time-weighted counts more accurately for relatively large \(\beta \). The performance is slightly better than TwFreq on average, but TwSample(\(10\)) is not degraded for relatively low rank items.

The performance of TwHCount highly depends on the hash table size rm. The estimation of TwHCount(\(1\)) results in larger error while TwHCount(\(10\)) is comparable to TwMinSwap. This is notable because the estimation of TwMinSwap is as accurate as that of TwHCount(\(10\)) which keeps an additional hash table of size 0.1n for accuracy. The comparison of the error on average is shown in Fig. 1.

Fig. 5
figure 5

TwMinSwap is faster than TwSample(\(10\)) and TwFreq in most cases of \(\beta \). Although TwHCount is slightly faster than TwMinSwap, its memory usage is much larger than TwMinSwap as shown in  Fig. 1. a Static distribution, b dynamic distribution

5.4 Running time

Figure 5 shows running times of the algorithms taken to process all the items in the data streams. The overall trend is that running time decreases over increasing \(\beta \). This is because with large \(\beta \), the probability of a new item being already monitored increases, leading to infrequent invoking eviction processes such as DownSampling in TwSample. TwSample(\(10\)) shows the slow running times due to performing 10 independent sampling.

Although TwFreq has the same per-item processing time as TwMinSwap in the big-O notation, TwFreq involves more computations for the eviction process than TwMinSwap. TwFreq updates more variables and scans monitored items twice while TwMinSwap scans them once. As a result, the running time of TwFreq changes more dramatically than TwMinSwap.

Since TwHCount has O(r) per-item processing time, it is the fastest. In fact, this fast running time is achieved by maintaining the hash table of size rm for approximate time-weighted counting, which leads to more memory spaces than TwMinSwap. Although the running time of TwHCount does not depend on m, to guarantee small error of estimated time-weighted counts of discovered items, rm should be large as shown in Fig. 4. In our experiments, \(rm=0.1n\) is satisfactory while \(rm=0.01n\) is not.

All algorithms show no meaningful difference in running time with respect to the two types of distributions.

5.5 Effect of parameters \(\alpha \) and k

We investigate the two parameters of TwMinSwap: the time-decaying factor \(\alpha \) and the number k of time-weighted frequent items to be discovered. Figure 6 shows precision and mean of error of TwMinSwap while changing \(\alpha \) and k for a dynamic item stream with \(\beta =1\). TwMinSwap works very accurately for \(\alpha \in \{0.9,0.95,0.99\}\). For extremely large \(\alpha =0.999\), TwMinSwap is relatively degraded with a small k, but still keeps the precision of about 0.7 and the error of about 0.2.

6 Experiments on frequent itemset mining

In this section, we present experimental results to show the performance of our proposed TwMinSwap-Is.

Fig. 6
figure 6

Precision and error of TwMinSwap with changing k and \(\alpha \). A dynamic stream with \(\beta = 1\) is used. TwMinSwap outputs accurate results unless \(\alpha \) becomes too large. Even for extremely large \(\alpha =0.999\), the accuracy is maintained as about 0.7 and 0.2 for the precision and the error, respectively, when k is small, and improved as k gets larger. a Precision, b error

Table 5 Dataset used in our experiments on frequent itemset mining

6.1 Setup

Table 5 lists the datasets used in our experiments. Figure 7a shows the cumulative size distribution of transactions for every dataset. Note that a large portion of transactions have a short length. We consider the following competitor:

  • Skip LC-SS [38]: a method to find frequent itemsets in data streams which is based on combining LossyCounting and SpaceSaving.

Although Skip LC-SS does not consider time-weighting, we choose it as a competitor to examine the effect of the Apriori property preserved in the top-k results: Skip LC-SS maintains at most k itemsets but neglects the Apriori property in them. For all experiments, we use \(k=1000\) and \(\alpha =0.999\). We omit evaluation using precision and recall since calculating the true time-weighted counts for all itemsets is intractable: e.g., there are \(2^{267} - 1\) itemsets for the largest transaction of BMS1.

Fig. 7
figure 7

a Cumulative distribution on transaction sizes for every dataset. b True time-weighted count (left y-axis) and true count (right y-axis) for itemsets discovered by TwMinSwap-Is and Skip LC-SS over their corresponding estimated ranks (x-axis), respectively, for Retail. Note that the itemsets by TwMinSwap-Is are approximately ranked with respect to their time-weighted counts, while it is not the case for Skip LC-SS

6.2 Results

Figure 7b shows the true time-weighted count and the true count of itemsets for Retail found by TwMinSwap-Is and Skip LC-SS over their ranks, respectively. TwMinSwap-Is ranks itemsets according to their true time-weighted counts, while Skip LC-SS does not perform well.Footnote 4 The main reason for the poor performance of Skip LC-SS is that it swaps itemsets frequently since it randomly selects itemsets to be monitored among all subsets of a transaction. In addition, we observe that TwMinSwap-Is detects recently frequent itemsets despite its small occurrences in total. For instance, in Retail, the two itemsets of \(\mu =\{39, 48, 4994\}\) and \(\nu =\{36,16430,16431\}\) occur 129 and 9 in the stream, respectively. At the end of the stream, however, only \(\nu \) is identified as one of the top-k itemsets by TwMinSwap-Is, because \(\nu \) occurs more recently than \(\mu \) as shown in Fig. 8.

Fig. 8
figure 8

In Retail, itemset \(\{36,16430,16431\}\) occurs only 9 times but very recently, which leads to it detected by TwMinSwap-Is. In contrast, much frequent one \(\{39,48,4994\}\) with 129 occurrences over the entire stream is not identified as a time-weighted frequent itemset by TwMinSwap-Is since it is rare in recent times. We omit points for which the corresponding y-axis value is 0 or remains the same as the previous one

Figure 9a shows average error in time-weighted count over all itemsets discovered by TwMinSwap-Is. During the entire stream, TwMinSwap-Is maintains small error in time-weighted counts on average. Figure 9b shows the size distribution of itemsets found by TwMinSwap-Is for every dataset. The largest itemset size is about 5–7, and most of transactions in the datasets are shorter than the size as shown in Fig. 7a. In other words, an itemset whose length is larger than 7 has a low chance to be frequent in the time-weighted count in nature.

Fig. 9
figure 9

a Error in time-weighted count for itemsets found by TwMinSwap-Is for Retail. Note that while processing the stream, the error is kept low for all datasets. b Size distribution of itemsets found by TwMinSwap-Is. Itemsets of a moderate size (5–7) are discovered for every dataset

7 Discovery

In this section, we present discoveries from applying our methods to several real-world data streams. For each dataset, the used time decaying factor \(\alpha \) is specified. Although choosing the optimal \(\alpha \) is not trivial, we observed that in practice, TwMinSwap and TwMinSwap-Is work well with a large \(\alpha \ge 0.9\).

7.1 MemeTracker dataset

Setup The MemeTracker dataset [24] provides quotes and phrases from blogs and news media. We consider a keyword stream consisting of words in the quotes and the phrases where 571 stopwords provided in [25] are excluded. The stream covers time period between August 2008 and April 2009; the length of the stream is 1,681,760,809; we consider 1 min as one time step.

Results We run TwMinSwap with \(k=300\) and \(\alpha =0.9\), and examine top-300 keywords for every month.

Figure 10a shows the tracking results of keywords related to the U.S. presidential election in Nov 4 2008. The values for each month are normalized time-weighted counts divided by the sum of those of k number of discovered items. Since multiple items can occur at one time step, this normalization is required to eliminate effects of undesirably large time-weighted counts due to relatively large stream lengths for certain time periods. Both keywords related to the candidates obama and mccain were mentioned actively before, and received less attentions after the election. Notably, despite high frequencies of both keywords, the winner obama was more frequently mentioned in blogs and media than the loser mccain. Even after the election, obama occasionally becomes hot.

Figure 10b, c show sudden arising and quick vanishing of keywords closely related to two incidents: the Mumbai terror attack in Nov 2008, and the Gaza War beginning on Dec 2008. Although each incident happened in the last part of a month, TwMinSwap correctly detects related keywords as hot items in the report for that month.

7.2 Amazon movie review dataset

Setup The Amazon movie review dataset [34] provides user reviews with product id information where the movie title of each product id can be checked by http://www.amazon.com/dp/PRODUCT_ID. We consider the stream of the product ids. The stream covers the period from Aug 20 1997 to Sep 25 2012; the length of the stream is 7,911,684; the number of distinct product ids is 253,059.

Results We run TwMinSwap with \(k=100\) and \(\alpha =0.9\). Figure 11 shows the tracking result of several movies among the top-100, which is summarized as two patterns. The major pattern is doubly-active attention when a movie is released at theaters and in DVD as for Minority Report, X-Men 2, The Day After Tomorrow, Ratatouille, and Captain America: The First Avenger. The other pattern is periodical attention: e.g.  A Christmas Carol appears in every winter.

Fig. 10
figure 10

a Changes of time-weighted counts of keywords related to the presidential election in Nov 4 2008. The keywords obama, mccain and vote become hot keywords before and cooled down after the election. Notably, the winner obama is more actively mentioned than the loser mccain before the election. Also, the winner obama does not disappear after the election in contrast to the loser mccain. b Changes of time-weighted counts of keywords related to the Mumbai terror attack in Nov 26 2008. As similar to the pattern for the Gaza War, keywords related to the Mumbai terror attack such as india, mumbai and terrorists show sudden rises right after the attack time. c Changes of time-weighted counts of keywords related to the Gaza War beginning in the last part of Dec 2008. In December, several keywords related to the war suddenly had arisen and quickly disappeared. Note that the war ended in the middle part of January

Fig. 11
figure 11

Changes of time-weighted counts of movies reviewed by users in Amazon. We observe two patterns. (1) The general pattern is doubly-active attention to movies when they are released at theaters and in DVDs. (2) The minor pattern is periodical attention: e.g., A Christmas Carol is popular in every winter

7.3 Yelp dataset

Setup The Yelp datasetFootnote 5 provides tip data for businesses by users. Here, the tips are short comments about the businesses, and each business has several associated categories. We consider the stream of the business ids. The stream covers the period from April 16 2009 to February 11 2014; the length of the stream is 113, 993; the number of distinct businesses is 15, 585.

Results We run TwMinSwap with \(k=50\) and \(\alpha =0.9\), and track the top-30 hot businesses per month. For each month, we obtain a category distribution with respect to the top-30 businesses. Figure 12 shows the result for shopping-related categories—Shopping and Shopping Centers. Around the new years, shopping activity increased. This reflects that there are several special days such as Christmas, New Year, and Valentine Day in winter.

Fig. 12
figure 12

Count distribution over time for shopping-related categories—Shopping and Shopping Centers. For each month, the value is calculated with respect to top-30 businesses. In winter, there were a number of visits to shopping businesses

Fig. 13
figure 13

Time-weighted counts of several itemsets over time. We observe three patterns in physical human contacts: periodical (black), occasional (red), and rare (green) (color figure online)

7.4 Human contact dataset

Setup The Human Contact dataset,Footnote 6 originally provided in [15], contains physical human contact information over 9 months. The dataset is given as an undirected multigraph with timestamps for edges where a node and an edge correspond to a person and its contact, respectively. To make it transactional data, we (1) divide time into non-overlapping intervals of 10 min, (2) construct a subgraph with nodes and edges appearing in each interval, and (3) for each node in the subgraph, make one transaction consisting of all neighbors of the person corresponding to the node. The resulting stream contains 94 persons and 239, 443 transactions.

Results We run TwMinSwap-Is with \(k=300\) and \(\alpha =0.998\). We divide the 9 months into 100 equal-sized time intervals, and track the top-300 time-weighted frequent itemsets at every interval boundary. Figure 13 shows changes of time-weighted counts for several itemsets, which exhibit different contact patterns, over time. For instance, the three persons 10, 15 and 35 in black meet a number of common people periodically. Such meetings are occasional for the persons 34, 37, 51 and 89 in red, and rare for 11, 12, 27, 41 and 67 in green. This shows the effectiveness of TwMinSwap-Is in discovering temporal patterns in stream data.

8 Conclusion

In this paper we propose algorithms to track recently frequent patterns in high-speed data streams: TwMinSwap for items and TwMinSwap-Is for itemsets. We propose TwMinSwap for efficient time-weighted counting of items in data streams. TwMinSwap is inspired by TwSample, our sampling-based randomized algorithm with theoretical guarantees. Both methods require only O(k) memory spaces for tracking top-k items. We also propose TwMinSwap-Is for time-weighted itemset counting. TwMinSwap-Is guarantees self-consistency in results, i.e., the Apriori property holds, and requires O(k) memory spaces for a constant itemset size. Conducting extensive experiments on synthetic and real data streams, we show that TwMinSwap is fast and outperforms all existing methods in accuracy; TwMinSwap-Is is accurate and discovers itemsets of a non-trivial length. Analyzing real-world data streams, we discover interesting patterns, including the difference of trends between the winner and the loser of the U.S. presidential election, and temporal patterns in physical human contacts.