1 Introduction

Elasticity, one of the prominent features of cloud computing, is the degree of the system adaptability to workload changes by provisioning and deprovisioning resources automatically in a way that the allocated resources match the current demand [1, 2]. The future demand prediction is the only practical and effective solution for the fast resources provisioning and the rapid elasticity implementation [3, 4]. The most important challenges of the application prediction models are as follows [3]:

  • Complexity Each prediction model needs computation resources to estimate future behavior of applications. The computation resources consumption of the prediction models should not be significant in comparison with the other applications. So, time and space complexities of the prediction model should be reasonable in a way that its deployment is affordable.

  • Pattern length In most of the prediction models such as [4,5,6,7,8,9,10], the pattern length is fixed. In these models, using a sliding window, the extracted patterns have a predefined length. The constraint of the pattern length restricts the model to specific patterns and prevents the model from learning the other useful patterns. However, choosing the pattern length is a challenge. The pattern length should be selected in a way that the most popular patterns can be extracted and application behavior can be estimated accurately.

In our previous works, based on Sequential Pattern Mining (SPM), we proposed two new prediction models, called POSITING [11] and RELENTING [2]. As Fig. 1 shows, POSITING considers application behavior in the past, extracts behavioral patterns and stores them in the off-line pattern base. Based on the extracted patterns and the recent behavior of the application, POSITING predicts the future demand for different resources. In [2], we developed POSITING with the capability of online learning and proposed RELENTING. While RELENTING predicts the status of all the allocated resources, it also learns the new behavior of the application rapidly without gathering new data and retraining the model.

Fig. 1
figure 1

The scheme of POSITING [11]

According to Fig. 1, the pattern mining engine is the core of the predictors. To improve mining efficiency and avoid information loss, a compressed set of patterns, called closed patterns, is extracted by the pattern mining engine [2, 11]. A pattern is closed if none of its super-patterns have the same frequency as the pattern’s [12]. The common approach for extracting the closed episodes under gap constraints is based on the hash table [12, 13]. As our experiment results show this approach is very time-consuming if there are many candidate closed patterns.

The main goal of the paper is to improve time and space complexities of the pattern mining engine in a way that it could be employed in different fields such as [14,15,16,17] efficiently. To improve space complexity, we define redundant occurrences of patterns and prove that omitting them causes no information loss. Then, by using a new imProved RepresentatiOn of the Stream based on PointERs (\({\textit{P}ROSPER}\)), we develop algorithms proposed in [11] to identify the redundant occurrences and omit them from the occurrence list of patterns. Thus, memory consumption improves. To improve time complexity, we introduce a new data structure, called closed pattern backward tree (\({\textit{C}PBT}\)) to extract the closed patterns directly. As it will be shown in the experiment results, \({\textit{C}PBT}\) improves the efficiency of the pattern mining engine significantly when there are a huge number of patterns. So, the contributions of this paper are as follows:

  • Since SPM is widely used in different fields, this paper presents a general approach to improve time and space complexities of the pattern mining engine.

  • To improve space complexity, this paper proposes \({\textit{P}ROSPER}\), defines the redundant occurrences of patterns and suggests algorithms to identify and omit them. Thus, the length of the occurrence list of patterns decreases and memory consumption improves (Sect. 4).

  • To improve time complexity, this paper introduces a new data structure, called \({\textit{C}PBT}\), to store the closed patterns. Based on \({\textit{C}PBT}\), depth-first search algorithms are presented to extract the closed patterns under gap constraints directly without storing/processing all of the candidate closed patterns (Sect. 5).

  • The performance of the proposed approach is evaluated using both real and synthetic workloads and compared with the hashing based approach. According to the evaluation results, the proposed approach outperforms the hashing based approach significantly when there are a large number of the candidate closed patterns (Sect. 6).

The rest of the paper is organized as follows: The related works are discussed in Sect. 2. Section 3 introduces the main concepts of POSITING and RELENTING briefly. To improve space complexity, the redundant occurrences of patterns are discussed in Sect. 4. To improve time complexity, Sect. 5 introduces \({\textit{C}PBT}\) and presents new algorithms to extract the closed patterns directly. Experimental results are presented in Sect. 6. Finally, the paper is concluded in Sect. 7.

2 Related work

The sequential patterns could capture causative chains of influences present in data. They are useful in many real-life applications [12, 18, 19] such as the prediction of system failures [20], ICT risk assessment and management [21] and mining web access patterns [22].

Events could be classified into two groups: time-point events and time-interval events. The time-point events are stamped with the time of the event occurrence. On the contrary, the time interval events describe the status of variables in time intervals. An episode is defined as a partially ordered collection of events that occur together [23]. The main goal of episode mining is to find the relationship between events [24].

Most of the research works focus on events stamped with the time-point and extend algorithms to extract their corresponding episodes. The time-interval events are considered in many applications such as health care, data network, financial applications [25, 26] and cloud workloads [2, 11]. The problem of mining the time-interval episodes is a relatively young research field [27]. Most of the research works ignore the time-interval events and the relationship between them [28]. So presenting algorithms with the capability of learning from such complex data is one of the most important and the most challenging topics in the field of data mining [27, 29]. It is clear that mining the time-interval episodes from such data is more complicated [28]. This paper focuses on the time-interval events and proposes a general approach for mining the closed time-interval episodes efficiently. In the following section, we review some research works on the time-interval events briefly.

Winarko et al. [28] propose a new algorithm, ARMADA, for discovering temporal patterns from interval-based data. The authors extend MEMISP (MEMory Indexing for Sequential Pattern mining) [30] to mine frequent patterns. Their algorithm requires one database scan and does not require the candidate generation.

In [31], time-stamped multivariate data are converted into time interval-based abstract concepts by using the temporal abstraction (see Sect. 3.1). An algorithm, called KarmaLego, enumerates all of the patterns whose frequency is above a given support threshold. Moskovitch et al. [32] improve KarmaLego to handle thousands of symbol types.

Batal et al. [27] present Recent Temporal Pattern (RTP) mining to find predictive patterns for the event detection problems. At first, their approach converts the time series data into time-interval sequences of temporal abstractions. Then complex time-interval patterns are constructed by using temporal operators. The mining algorithm explores the space of temporal patterns in the level by level fashion. They also present the minimal predictive recent temporal patterns framework to choose a small set of predictive and non-spurious patterns.

In [33], the abstracted time series is used to find temporal association rules by generalizing Allen’s rules [34] into a relation called PRECEDES. The user defines a set of complex patterns, which constitutes the basis of the construction of temporal rules. An Apriori-like algorithm looks for meaningful temporal relationships among the complex patterns.

Patal et al. [35] augment the hierarchical representation with count information to achieve a lossless representation. The hierarchical representation provides a compact mechanism to express the temporal relations among events. Based on this representation, an Apriori-based algorithm, called IEMiner (Interval-based Event Miner), is proposed to discover frequent temporal patterns. IEMiner employs two optimization strategies to reduce the search space. Finally, interval-based temporal patterns are used for classification.

Physiological conditions of patients are reported by using variables such as blood pressure and the heart rate [27, 36, 37]. Gosh et al. [37] propose an approach that combines sequential patterns extracted from multiple physiological variables and captures interactions between these patterns by Coupled Hidden Markov Models (CHMM).

Laxman et al. [38] present a pattern discovery framework that utilizes event duration information in the temporal data mining process. They incorporate event duration information explicitly into the episode structure and discover event duration-dependent temporal correlations in the data. Furthermore, they define “principal episodes”, which are similar to closed episodes, and extract them based on the hashing approach.

As it is observed, most of the research works focus on mining frequent patterns. Mining frequent patterns might lead to extracting a huge number of patterns. To improve mining efficiency and avoid information loss, closed patterns are usually extracted [39]. Most of the works such as [12, 39] focus on extracting closed episodes from the sequence of the time-point events. The common approach to extract closed episodes under gap constraints is based on the hash table [12, 13]. This approach generates closed patterns under gap constraints in two steps [13]. In the first step, candidate closed patterns are extracted from frequent patterns. In the next step, they are considered and closed patterns are determined by using a hashing procedure with frequency as the key. In this step, all the candidates with the same frequency are hashed to the same bucket in the hash table. Among the candidate patterns which are hashed to the same bucket, those patterns for which a super-pattern with the same frequency is found, are discarded. As our experimental results show, this approach is not appropriate to extract the closed time-interval patterns for the small values of the frequency threshold because it leads to generating a huge number of candidate closed patterns. In this paper, we introduce a new data structure, called \({\textit{C}PBT}\), to store the closed pattern. Based on \({\textit{C}PBT}\), depth-first search algorithms are proposed to extract the closed patterns directly. As it will be shown in the experiment results, \({\textit{C}PBT}\) improves the efficiency of the pattern mining engine significantly when there are a huge number of the candidate closed patterns.

3 An overview of the pattern mining engine of POSITING/RELENTING

In this section, we consider the structure of POSITING briefly. Firstly, the background concepts such as event, stream and episode are defined. Then, the episode occurrence is discussed. Finally, the pattern mining engine is explained concisely. We recommend that readers refer to [2, 11] for more detail.

3.1 Background concepts

As Fig. 2 shows, POSITING converts the numeric time series of all the resources allocated to the application into a sequence of abstractions \(\langle S_1[st_1, et_1],\ldots , S_n[st_n, et_n] \rangle \) where \(S_i\in Status,1\le i \le n\) is an abstraction that holds from time \(st_i\) to time \(et_i\) and Status is the abstraction alphabet. Let \(Status=\{S_1,\ldots ,S_M\}\) be a set of the abstract values and \(ResourceType=\{R_1,\ldots ,R_N\}\) be a set of all the resources allocated to the application. Without loss of generality, we define an arbitrary order on ResourceType, for example \(R_1<R_2<\cdots <R_N\).

Definition 1

An event \(e_i\) is defined as a 4-tuple \(\langle r_i,s_i,st_i,et_i\rangle \) that means the abstract value of \(r_i\in ResourceType\) is \(s_i\in Status\) from the start time \(st_i\) to the end time \(et_i\). The span of the event \(e_i=\langle r_i,s_i,st_i,et_i\rangle \) is \(\varDelta e_i=et_i-st_i>\epsilon \), where \( \epsilon \) is a positive constant (\( \epsilon \in \mathbb {Z}_{\ge 0} \)). So the span of each event is at least \( \epsilon +1 \) time slots.

Fig. 2
figure 2

Converting a time series into a symbolic (discretized) time series by the value abstraction that \(Status=\{Very\;Low,Low,Medium,High\}\) and blue dashed lines show the border of the values [11, 27] (colour figure online)

All the discretized time series of the resources are represented as a multivariate stream. Note that the value of \(\epsilon \) depends on the length of sampling intervals. In coarse grained sampling, \(\epsilon \) is set to small values. For fine grained sampling, \(\epsilon \) could be set to larger values.

Definition 2

A multivariate stream \(E=\langle e_1,e_2,\ldots ,e_n\rangle \), where n is the index of the latest observed event, is a sequence of events that are ordered according to their start time:

$$\begin{aligned} \forall e_i,e_j\in E\;that\; 1\le i<j\le n:(st_i<st_j)or(st_i=st_j\;and\;r_i<r_j) \end{aligned}$$

Definition 3

A state is an ordered pair of (rs), where \(r\in ResourceType\) and \(s\in Status\). The Resource-Status (RS) is a set of all the possible states: \(RS=\{(r,s)|\forall r\in ResourceType, \forall s\in Status\}\).

Fig. 3
figure 3

An example of a multivariate stream with \(ResourceType=\{CPU,Memory,Disk\}\) and \(Status=\{Very\;Low,Low,Medium,High\}\) [27]

Example 1

Figure 3 shows a multivariate stream E with and \(Status=\{Very\;Low,Low,Medium,High\}\). If the order on ResourceType is defined as \(CPU<Memory<Disk\), according to Definition 2, .

If the span of events is large, they are decomposed based on the decomposition unit \(\mu \). For example the event (DiskMedium, 3, 7) with \(\mu =3\) is decomposed into two events (DiskMedium, 3, 6) and (DiskMedium, 6, 7). However, after decomposing the event e, the span of the last decomposed event might be less than \(\epsilon \). Here, to satisfy Definition 1, the latest and penultimate decomposed events merge together.

Inspired by the temporal relations defined in [27], we define two types of relations between events: concurrent and consecutive.

Definition 4

Given the stream \(E=\langle e_1,\ldots ,e_n \rangle \), two events \(e_i\) and \(e_j\), \(1\le i,j\le n \), are concurrent iff \(|st_i-st_j|\le \epsilon \) and are consecutive iff \(|st_i-st_j|>\epsilon \).

Mannila et al. [23] informally define an episode as a partially ordered collection of events that occur together [23]. Inspired by the definition of the episode in [23], we present a detailed definition of the episode based on our problem domain. Note that we use the terms “pattern” and “episode” interchangeably in this paper.

Definition 5

A Concurrent Nodes Group (CNG) \(G=D_1D_2\cdots D_l\) is a group of nodes such that \(\forall D_{j},D_{m}\in G,1\le j,m\le l\), there is no partial order between \(D_{j}\) and \(D_{m}\).

Definition 6

An episode \(\alpha \) is defined as a directed acyclic graph \(\alpha =(V_{\alpha },\prec _{\alpha },g_{\alpha })\), where \(V_{\alpha }\) is a set of nodes, \(\prec _{\alpha }\) is a partial order on \(V_{\alpha }\) and \(g_{\alpha }:V_{\alpha }\rightarrow RS\) is a function that maps each node into one state. The episode \(\alpha \) is composed of \(k(>1)\)CNGs in the form of \(G_1=D^{1}_{1},D^{1}_{2},\ldots ,D^{1}_{l_1}\),..., \(G_k=D^{k}_{1},D^{k}_{2},\ldots ,D^{k}_{l_k}\) that:

  1. 1.

    \(|G_i|=l_i\)

  2. 2.

    \(V_{\alpha }=\{D^{1}_{1}\ldots ,D^{1}_{l_1},D^{2}_{1}\ldots ,D^{2}_{l_2},\ldots ,D^{k}_{1}\ldots ,D^{k}_{l_k}\}\)

  3. 3.

    \(\forall D^{i}_{j}\in G_i,\forall D^{m}_{n}\in G_{m},1\le i<m\le k,j\in \{1,\ldots ,l_i\},n\in \{1,\ldots ,l_m\}:D^{i}_j\prec _{\alpha }D^{m}_{n}\)

  4. 4.

    \(|CNG_{\alpha }|=k\)

  5. 5.

    \(G'_{i}=\{(r,s)\in RS|g_{\alpha }(v)=(r,s)\;\;,\forall v\in G_i\}\).

The episode \(\alpha \) could be represented as a general form \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\).

Fig. 4
figure 4

The graphical representation of the episode \(\alpha =(CPU,High)(Memory,Medium)\rightarrow (CPU,Low)(Disk,Low)\) [11]

Example 2

Consider the episode \(\alpha =(V_{\alpha },\prec _{\alpha },g_{\alpha })\) in Fig. 4. The set \(V_{\alpha }\) contains four nodes. As it is shown, the function \(g_{\alpha }\) maps the nodes into the states and \(D^{1}_{1}\prec _{\alpha }D^{2}_{1}\), \(D^{1}_{1}\prec _{\alpha }D^{2}_{2}\), \(D^{1}_{2}\prec _{\alpha }D^{2}_{1}\) and \(D^{1}_{2}\prec _{\alpha }D^{2}_{2}\). As a simple graphical notation, this episode is represented as \(\alpha =(CPU,High)(Memory,Medium)\rightarrow (CPU,Low)(Disk,Low)\).

3.2 The episode occurrence

Informally, the occurrence of an episode in the stream means that the nodes of the episode have the corresponding events in the stream such that the partial order of the episode is preserved [40]. A frequent episode occurs often enough in the stream. Given a frequency threshold \( \theta \in {\mathbb {R}}_{\ge 0} \), the goal of episode mining is to extract all the frequent episodes in the stream. We choose the Non-Overlapped (NO) frequency [41] to compute the frequency of episodes. Two occurrences \(h_1\) and \(h_2\) of the episode \( \alpha \) are said to be non-overlapped if either “\( h_1 \) starts after \( h_2 \)” or “\( h_2 \) starts after \( h_1 \)” [41]. The NO frequency is computed by using minimal occurrences. A minimal occurrence is an occurrence that includes no other occurrences. So, \( freq(\alpha ) \) is the cardinality of a maximal NO set of minimal occurrences of the episode \( \alpha \) in the stream [11, 41]. In Sect. 4, we discuss how to compute the episode frequency based on the occurrences.

Definition 7

Given the episode \(\alpha \) such that \(|CNG_{\alpha }|=k\) and \(1\le i\le k \), for each occurrence of \(\alpha \), the starting interval of the occurrence of \(G_i\), \([t^{i}_{1},t^{i}_{2}]\), is:

$$\begin{aligned} t^{i}_{1}&=\min \{st\text { of the corresponding events of the nodes of }G'_i\text { in the occurrence}\} \nonumber \\\end{aligned}$$
(3.1)
$$\begin{aligned} t^{i}_{2}&=\max \{st\text { of the corresponding events of the nodes of }G'_i\text { in the occurrence}\}\nonumber \\ \end{aligned}$$
(3.2)

Definition 8

Given the episode \(\alpha \) such that \(|CNG_{\alpha }|=k\), each occurrence O of \(\alpha \) is determined as a sequence of the starting intervals of CNGs: \( O=([t^{i}_{1},t^{i}_{2}]^{k}_{i=1}) \)

Example 3

Consider the stream \(E=\langle e_1=(CPU,High,0,3),e_2=(Memory,Medium,0,4),e_3=(Network,Low,0,2),e_4=(Disk,Medium,0,3),e_5=(Network,Medium,2,5),e_6=(CPU,3,5,Low),e_7=(Disk,Low,3,5),e_8=(Memory,Very\;Low,4,5)\rangle \). For \(\epsilon =0\), there is an occurrence of the episode \(\alpha \) given in Example 2 in the stream E. The starting intervals of the occurrence of \(G_1\) and \(G_2\) are \([t^{1}_1,t^{1}_2]=[0,0]\) and \([t^{2}_1,t^{2}_2]=[3,3]\) respectively. So the occurrence is represented as \( O=([0,0],[3,3])\).

Dynamic resources allocation is based on virtualization techniques [42]. Based on time spent on booting VMs, patterns should be extracted from application behavior in a way that SLA is satisfied and energy wasting is avoided. Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\) and an occurrence \(O=([w^{i}_1,w^{i}_2]^{k}_{i=1})\) of \(\alpha \), if the time it takes to instantiate a new VM instance is \(\delta (>\epsilon )\) time slots, the starting interval of \(G'_{i+1},1\le i< k,\) should begin after \(\delta +w^{i}_2\). Thus, the resources manager has enough time to instantiate a new VM instance. On the other hand, if resources are allocated before occurring workload burstiness for a long time, energy and resources are wasted. According to the discretion of the resources manager and characteristics of the cloud data center, the gap constraint \(\varDelta (\ge \delta )\) determines that resources might be allocated at most \(\varDelta -\delta \) time slots before occurring workload burstiness. Therefore, the Valid Interval (VI) of \(G'_{i+1}\) is \(VI([w^{i}_1,w^{i}_2],i+1)=[w^{i}_2+\delta , w^{i}_2+\varDelta ]\) to satisfy QoS and SLA and avoid wasting energy. \(\delta \) and \(\varDelta \) are called minimum internal gap and maximum internal gap respectively.

To compute the NO frequency of episodes under gap constraints, tracking the minimal occurrences of episodes is not enough [12]. So we introduced a new type of the occurrence, called the latest occurrence, to compute the NO frequency under gap constraints in [11]:

Definition 9

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\) and the internal gaps \(\delta \) and \(\varDelta \), an occurrence \(O=([w^{i}_1,w^{i}_2]^{k}_{i=1})\) of \(\alpha \) is a valid occurrence iff \(\forall i, 1\le i<k, w^{i}_2+\delta \le w^{i+1}_{1}\le w^{i}_2+\varDelta \).

Example 4

Consider Example 3. If \( \delta =2 \) and \( \varDelta =3 \), \(VI([0,0],2)=[0+2,0+3]=[2, 3]\). In addition, the occurrence O is valid because we have \( 0+2\le 3\le 0+3 \).

Definition 10

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\), if for a valid occurrence \(O=([t^{i}_1,t^{i}_2]^{k}_{i=1})\) of \(\alpha \) there exists no other valid occurrence \(Q=([w^{i}_1,w^{i}_2]^{k-1}_{i=1},[t^k_1,t^k_2])\) of \( \alpha \) such that \(\exists j,1\le j<k, w^{j}_1>t^{j}_{1}\), it is said that O includes the Latest Prefix Occurrence (LPO).

Definition 11

Each valid occurrence of the episode \(\alpha \) that includes LPO, is called the Latest Occurrence (LO). \(LO(\alpha )\) is a set of all the latest occurrences of \(\alpha \).

Example 5

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow G'_3\), \(\epsilon =1\), \(\delta =4\) and \(\varDelta =7\), Fig. 5a shows the occurrences of \(G'_i,i=1,2,3\). Note that there are two occurrences for each \(G'_i\): \(A_1=[1,2]\) and \(A_2=[4,5]\) are the occurrences of \(G'_1\), \(B_1=[9,10]\) and \(B_2=[12,13]\) are the occurrences of \(G'_2\) and \(C_1=[17,18]\) and \(C_2=[21,22]\) are the occurrences of \(G'_3\). Figure 5b shows that four valid occurrences (note that each LO is also a valid occurrence) could be identified for \(\alpha \) under gap constraints. According to Definition 11, the corresponding occurrences of the red lines in Fig. 5b are not LO. As the figure shows there are two LOs for \(\alpha \): \((A_2,B_2,C_1)\) and \((A_2,B_2,C_2)\).

Fig. 5
figure 5

Extracting LOs of the episode \(\alpha \) based on the occurrences of its CNGs

In previous works, a superset of the minimal occurrences, called Minimal Prefix Occurrences (MPOs), is extracted to compute the NO frequency under gap constraints. An MPO with the span of [tste] is a valid occurrence (satisfying gaps) such that there is no other valid occurrence that starts strictly after ts and ends at or before te [12].

Example 6

Consider Example 5 again. Assume there is only the occurrence \(A_2\) for \(G'_1\). In this case, according to the definition of MPO, there are three MPOs: \(O_1=(A_2,B_1,C_1)\), \(O_2=(A_2,B_2,C_1)\) and \(O_3=(A_2,B_2,C_2)\) for \(\alpha \). On the contrary, there are two LOs: \(O_2=(A_2,B_2,C_1)\) and \(O_3=(A_2,B_2,C_2)\) for \(\alpha \). This example shows that LOs of \(\alpha \) are a subset of its MPOs.

Lemma 1

Given the episode \(\alpha =G'_1\rightarrow \cdots \rightarrow G'_k\), if \(MPO(\alpha )\) is a set of all the minimal prefix occurrences of \(\alpha \), then \(LO(\alpha )\subseteq MPO(\alpha )\).Footnote 1

Definition 12

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\), \(LOList(\alpha )\) includes a 4-tuple \((t^{k-1}_{2},t^{k}_1,t^{k}_2,t^{1}_1)\) for each occurrence \(O=([t^{i}_{1},t^{i}_{2}]^{k}_{i=1})\in LO(\alpha )\). \( LOList(\alpha )[i]\) is the i-th member of \(LOList(\alpha )\).

The focus of the following sections is on improvements to time and space complexities of the pattern mining engine.

3.3 The pattern extraction

The pattern mining engine constructs a pattern tree and extracts frequent patterns.

Definition 13

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\) and \((r,s)\in RS\), the serial extension of \(\alpha \) with (rs) is:

$$\begin{aligned} \alpha \oplus (r,s)=G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k \rightarrow (r,s) \end{aligned}$$
(3.3)

Definition 14

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_{k-1}\rightarrow G'_k\) and \((r,s)\in RS\), the concurrent extension of \(\alpha \) with (rs) is:

$$\begin{aligned} \alpha \odot (r,s)=G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_{k-1} \rightarrow G''\;\;that\;G''=G'_k\cup (r,s) \end{aligned}$$
(3.4)

The lexicographic tree (pattern tree) is constructed based on the serial and concurrent extensions as follows [13]:

  • The root is labeled with \(\varnothing \).

  • Each node n of the tree is labeled with a state. Label(n) is the corresponding label of the node n.

  • Each node n of the tree corresponds to an episode. Pattern(n) is the corresponding episode of the node n.

  • If \(Pattern(n)=\alpha \), the corresponding episode of each child of n is either a serial extension or a concurrent extension of \(\alpha \).

  • The left sibling is less than the right sibling.

Fig. 6
figure 6

A part of the lexicographic pattern tree [11]

Figure 6 shows a part of the pattern tree constructed on RS [11]. Note that \(\forall i,j, 1\le i\le N,1\le j\le M,(R_i,S_j)\in RS\). Here, the lexicographic order is defined on RS as \((R_1,S_1)<\cdots<(R_1,S_M)<(R_2,S_1)<\cdots<(R_2,S_M)<\cdots<(R_N,S_1)<\cdots <(R_N,S_M)\). The root of the tree is null. All the patterns in the tree are generated only by the serial extension or the concurrent extension. For example the episode \(((R_1,S_1)(R_2,S_1))\) is generated from the concurrent extension of \((R_1,S_1)\) with \((R_2,S_1)\) and the episode \(((R_1,S_1)\rightarrow (R_N,S_M))\) is generated from the serial extension of \((R_1,S_1)\) with \((R_N,S_M)\).

Mining frequent episodes might lead to extracting a huge number of patterns. To improve mining efficiency and avoid information loss, a compressed set of episodes, called closed episodes, is extracted [43].

Definition 15

The episode \( \beta \) is a sub-episode of the episode \( \alpha \) (or \( \alpha \) is a super-episode of \( \beta \)), \( \beta \sqsubseteq \alpha \), if all the states of \( \beta \) and the partial order between them exist in \( \alpha \).

Definition 16

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k \) and \( 1\le i\le k \), \( Prefix(\alpha ,i)\)= \(G''_1\rightarrow G''_2\rightarrow \cdots \rightarrow G''_i\) and \(Postfix(\alpha ,i)=G''_i\rightarrow G''_{i+1}\rightarrow \cdots \rightarrow G''_k\), where \( G''_i\subseteq G'_i\).

Definition 17

The episode\( \alpha \)is closed under gap constraints\(\delta \)and\(\varDelta \) iff there is no other episode \(\beta \) whose prefix or suffix is \( \alpha \) and \(freq(\alpha )=freq(\beta )\) [12].

Example 7

Consider the two episodes \( \alpha =(Memory,Low)\rightarrow (Disk,High) \) and \( \beta =(CPU,High)(Memory,Low)\rightarrow (Disk,High) \). It is clear that \( \alpha \sqsubseteq \beta \) and \( \alpha =Suffix(\beta ,1) \). So if \( freq(\alpha )=freq(\beta ) \), then the episode \( \alpha \) is not closed.

Based on the definitions of the serial extension and the concurrent extension, if nodes \(n'\) and \(n''\) are the serial and concurrent extensions of the node n respectively, then we have:

$$\begin{aligned} \underbrace{Pattern(n')}_{\alpha }&=\underbrace{Pattern(n)}_{\beta }\oplus \underbrace{Label(n')}_{x} \end{aligned}$$
(3.5)
$$\begin{aligned} \underbrace{Pattern(n'')}_{\gamma }&=\underbrace{Pattern(n)}_{\beta }\odot \underbrace{Label(n'')}_{y} \end{aligned}$$
(3.6)

So without scanning the stream, a maximal non-overlapped set of minimal occurrences of \(\alpha \) and \(\gamma \) can be determined by using the join of \(LO(\beta )\) with the occurrences of x and y respectively [12].

Firstly, the pattern mining engine extracts frequent episodes by the complete traverse of the pattern tree in a depth-first way. For this purpose, all the frequent 1-node episodes are extracted. Then, the pattern tree is traversed in a depth-first manner from each of the frequent 1-node episodes. When the serial and concurrent extensions of the episode are constructed, it is checked whether any of the super patterns has the same frequency as the episode’s or not; if not, the episode is added to the list of candidate closed episodes. After extracting the candidate closed episodes, a post-processing step is performed on them using a hashing procedure with frequency as the key. Finally, a set of all the closed frequent episodes are extracted. Note that to avoid enlarging the pattern tree, we could limit the number of CNGs of episodes. We define Level as the maximum number of CNGs of episodes.

4 Improving space complexity: computing NO frequency based on redundant occurrences

This section focuses on improvements to space complexity of the pattern mining engine. In Sect. 4.1, redundant LOs are introduced. The improved representation of the stream is introduced to identify the redundant occurrences in Sect. 4.2. In Sect. 4.3, algorithms are proposed to compute the NO frequency under gap constraints.

4.1 Redundant LO

In this section, the redundant LOs are defined and it is proved that removing these occurrences does not affect the frequency of episodes. Therefore, the redundant occurrences could be removed without any information loss, which improves memory consumption.

Definition 18

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\) and the occurrence \(O=([t^{i}_{1},t^{i}_{2}]^{k}_{i=1})\in LO(\alpha )\), the occurrence O is a redundant occurrence iff three conditions below are satisfied:

  1. 1.

    There exists another occurrence \(Q=([w^{i}_{1},w^{i}_{2}]^{k}_{i=1})\in LO(\alpha )\) such that \(w^1_1=t^1_1\) and \(w^k_2<t^k_1\)

  2. 2.

    There is no sub-interval of \([t^k_2+\delta ,t^k_2+\varDelta ]\) such that only O covers it.

  3. 3.

    There exists no event \(e=(r,s,st,et)\) such that \(|t^k_2-st|\le \epsilon \) and \(|t^k_1-st|\le \epsilon \).

In Sect. 3.3, we explained how the pattern tree is constructed based on the serial and concurrent extensions. If for the occurrence O there is an event \(e =(r,s,st ,et)\) such that \(|t^k_2-st|\le \epsilon \) and \(|t^k_1-st|\le \epsilon \), then an episode \( \beta \) can be extended from \(\alpha \) by the concurrent extension (\( \beta =\alpha \odot (r,s) \) ). Therefore, the occurrence O is not redundant and it should not be removed.

Example 8

Given the episode \(\alpha =G'_1\rightarrow G'_2\), \(\epsilon =1\), \(\delta =12\) and \(\varDelta =23\), Fig. 7 shows the occurrences of \(G'_i,i=1,2\). \(A_1=[80,81]\) and \(A_2=[92,93]\) are the occurrences of \(G'_1\) and \(B_1=[100,101]\), \(B_2=[103,104]\) and \(B_3=[106,106]\) are the occurrences of \(G'_2\). There are three LOs: \(O_1=(A_1,B_1)\), \(O_2=(A_1,B_2)\) and \(O_3=(A_2,B_3)\). Note that the occurrence \(O_2\) satisfies the first two conditions of Definition 18: the occurrence \(O_1\) satisfies the first condition and as Fig. 8 shows VI([103, 104], 3) is covered completely by the valid intervals of \(O_1\) and \(O_3\). Therefore if there exists no event \(e=(v,s,st,et)\) such that \(|104-st|\le \epsilon \) and \(|103-st|\le \epsilon \), then \(O_2\) is redundant.

Fig. 7
figure 7

The occurrences of \(G'_i,i=1,2\) of the episode \(\alpha =G'_1\rightarrow G'_2\)

Fig. 8
figure 8

The valid intervals of the occurrences \(O_1\), \(O_2\) and \(O_3\) in Example 8

Lemma 2

Given the episode \(\alpha \), if \(\beta \) and \(\gamma \) are the serial and concurrent extensions of \(\alpha \), removing redundant occurrences from \(LOList(\alpha )\) does not affect \(freq(\alpha )\), \(freq(\beta )\) and \(freq(\gamma )\).

Lemma 3

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\) and the occurrences \(A=([a^i_1,a^i_2]^k_{i=1})\), \(B=([b^i_1,b^i_2]^k_{i=1})\) and \(C=([c^i_1,c^i_2]^k_{i=1})\), where \(A,B,C\in LO(\alpha )\) and A and C are LOs immediately before and after B respectively, if \(a^1_1\ne b^1_1\) and \([b^k_1,b^k_2]\) is not covered by \([a^k_1,a^k_2]\) and \([c^k_1,c^k_2]\), then all of the LOs before A start before B and \([b^k_1,b^k_2]\) is covered by none of the LOs before A and after C.

Lemma 4

If \(\epsilon \ge \frac{\delta }{4}\) and \( \varDelta \in [\delta ,2\delta ) \), then there is no redundant LO.

4.2 Improved representation of the stream based on pointers \(({\textit{P}ROSPER})\)

According to Lemma 2, all the LOs of the episode don’t include useful information. So removing these LOs could improve memory consumption of LOList of episodes. To identify redundant LOs, according to the third condition of Definition 18, LOs should not extend concurrently. For this purpose, the occurrence list of all the states of RS should be checked, which might be time-consuming. To expedite the identification of concurrent events and the removal of redundant LOs, we propose \({\textit{P}ROSPER}\), which is the improved representation of the stream based on pointers. \({\textit{P}ROSPER}\) is based on the vertical representation of the stream. It connects the concurrent events by using pointers.

In the vertical representation of the stream [44], each \((r,s)\in RS\) is associated with a list whose entries include the starting intervals of that (rs). In \({\textit{P}ROSPER}\), each entry of the list is augmented with two pointers to the entry, which are called Next and Previous. The concurrent events are connected by the pointers. The corresponding list of (rs) in \({\textit{P}ROSPER}\) is called LOListRS(rs).

Definition 19

LOListRS(rs) includes a 4-tuple \( ([v,v'],Next,Previous) \) for each occurrence of \( (r,s)\in RS \), where \( [v,v'] \) is the starting interval of the occurrence and Next and Previous are the pointers that connect the concurrent events. LOListRS(rs)[i] is the i-th member of LOListRS(rs). (Note that for each occurrence of (rs) , \( v=v' \). )

Example 9

Consider the stream E:

$$\begin{aligned} \begin{aligned} E=\langle&(CPU,Low,0,1),(Memory,Medium,0,3),\\&(CPU,High,1,2),(CPU,Low,2,3),\\&(CPU,Medium,3,4),(Memory, High,3,4),(CPU,Low,4,6),\\&(Memory,Low,4,5),(Memory,High,5,6)\rangle \\ \end{aligned} \end{aligned}$$

Figure 9 shows the vertical representation and \({\textit{P}ROSPER}\) of the stream E for \(\epsilon =0\). As Fig. 9b shows, the concurrent events could be identified by using the pointers easily. Note that the pointer L always points to the last event of the stream.

Fig. 9
figure 9

The representation of the stream E in Example 9 in the forms of the vertical representation and \({\textit{P}ROSPER}\)

4.3 NO frequency under gap constraints

To compute the NO frequency of episodes, their LOList should be extracted firstly [11]. In this section, based on \({\textit{P}ROSPER}\), we modify the algorithms SSMakeLOList and SCMakeLOList presented in [11] to extract the non-redundant LOs of episodes.

4.3.1 Extracting the non-redundant LOs of episodes using the serial extension

The algorithm SMakeLOList (Algorithm 1) is proposed to extract the non-redundant LOs of episodes using the serial extension. The algorithm receives \(LOList(\alpha )\) (see Definition 12) and LOListRS(rs) (see Definition 19) that \(\alpha \) is an episode and \((r,s)\in RS\), and computes \(LOList(\beta =\alpha \oplus (r,s))\) without scanning the stream. Note that LOListRS(rs) is the occurrence list of (rs) in \({\textit{P}ROSPER}\). The counters i, z and j traverse the LOLists of \(\alpha \) and \(\beta \) and LOListRS(rs) respectively. Lines 2 to 23 consider for each LO of \(\alpha \) which LOs of (rs) could create a non-redundant LO for \(\beta \). Lines 4 to 6 check whether the i-th LO of \(\alpha \) could be the latest prefix occurrence for the j-th occurrence of (rs) or not. If not, this LO of \(\alpha \) could not be the latest prefix occurrence for the next occurrences of (rs). So the next LOs of \(\alpha \) are considered (line 22). For the new LOs of \(\alpha \), we start from the occurrences of (rs) that there is no latest prefix occurrence for them. In lines 4 to 5, if an LO of \(\alpha \) is the latest prefix occurrence for an LO of (rs), the corresponding LO of \(\beta \) is generated and inserted in \(LOList(\beta )\). In lines 7 to 17, the LOs immediately before and after each LO of \(\beta \) are considered whether that LO is redundant based on Definition 18. In line 12, the function CExtending (see Algorithm 7 in “Appendix B”) considers whether LO could extend concurrently or not. According to Lemma 3, if the conditions of Definition 18 are not satisfied for the next and previous LOs, then the conditions would not be satisfied by the other LOs. In line 13, if an LO is redundant, it is removed and the counter z is updated. Note that in line 10, if \(b_3+\delta \le b_1+\varDelta \), then \([b_2+\delta ,b_2+\varDelta ]\) is covered because we have \(b_1<b_2<b_3\). So if \(b_1+\delta<b_2+\delta<b_3+\delta \le b_1+\varDelta<b_2+\varDelta <b_3+\varDelta \), then \(VI([a_2,b_2],k+1)\) is completely covered by \(VI([a_1,b_1],k+1)\) and \(VI([a_3,b_3],k+1)\), where \(|CNG_{\alpha }|=k\). Time complexity of the algorithm is discussed in Lemma 9 in “Appendix A”.

figure a

Theorem 1

Given the episode \(\alpha \) and \((r,s)\in RS\), the algorithm SMakeLoList only finds all the non-redundant LOs of \(\beta =\alpha \oplus (r,s)\).

Example 10

Consider Example 8. Figure 10 shows \( LOList(G'_1) \), LOListRS(rs) and \( LOList(\beta =G'_1\rightarrow (r,s))\) extracted by Algorithm 1. Since all of the pointers of LOListRS(rs) are null, according to lines 10 to 20 of the algorithm, the second element of \( LOList(\beta ) \) is redundant. So it is removed.

Fig. 10
figure 10

The serial extension of \( G'_1 \) with (rs) and extracting \( LOList(G'_1\rightarrow (r,s)) \) by the algorithm SMakeLOList in Example 10

4.3.2 Extracting the non-redundant LOs of episodes using the concurrent extension

The algorithm CMakeLOList (Algorithm 2) is proposed to extract the non-redundant LOList of episodes using the concurrent extension. The algorithm receives \(LOList(\alpha )\) and LOListRS(rs) that \(\alpha \) is an episode and \((r,s)\in RS\), and computes \(LOList(\beta =\alpha \odot (r,s))\) without scanning the stream.

figure b

The counters i, z and j traverse the \(LOList(\alpha )\), \(LOList(\beta )\) and LOListRS(rs) respectively. In lines 3 to 5, the first element of LOListRS(rs) that is concurrent with at least one event is found. There are three cases for \(LOList(\alpha )\) and LOListRS(rs): 1) In lines 7 to 20, if the corresponding entries of \(LOList(\alpha )[i]\) and LOListRS(rs)[j] could generate an LO of \(\beta =\alpha \odot (r,s)\), it is inserted in \(LOList(\beta )\) and three counters i, j and z increase by +1. In lines 10 and 20, if an LO is redundant, it is removed and the counter z is updated. (2) In lines 21 to 22, if LOListRS(rs)[j] occurs after \(LOList(\alpha )[i]\), then \(LOList(\alpha )[i]\) should not be considered for the members after LOListRS(rs)[j]. So the next LO of \(\alpha \) is checked. (3) In lines 23 and 24, if LOListRS(rs)[j] occurs before \(LOList(\alpha )[i]\), the next occurrences of (rs) are considered for \(LOList(\alpha )[i]\). Time complexity of the algorithm CMakeLOList is discussed in Lemma 10 in A.

Theorem 2

Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_{k}\) and \(G=(r,s)\in RS\), the algorithm CMakeLOList only finds all the non-redundant LOs of \(\beta =\alpha \odot G\).

Example 11

Given the episode \( \alpha =G'\rightarrow (r,s)\), \( (r',s')\in RS \), \( (r,s)<(r',s') \), \( \epsilon =1 \), \( \delta =12 \) and \( \varDelta =23\), Fig. 11 shows \( LOList(\alpha ) \), LOListRS(rs) , \( LOListRS(r',s') \) and \( LOList(\beta )\) extracted by Algorithm 2. According to the algorithm, the second element of \( LOList(\beta ) \) is removed because it is redundant.

Fig. 11
figure 11

The concurrent extension of \( \alpha \) with (rs) and extracting \( LOList(\beta ) \) by the algorithm CMakeLOList in Example 11

After extracting LOList of episodes, their NO frequency could be computed by calling the function ComputeFreq presented in [11] easily. ComputeFreq scans LOList of the episode and counts the number of the non-overlapped LOs.

5 Improving time complexity: a new approach for mining the closed episodes

As it was mentioned, the common approach to extract closed episodes under gap constraints is based on the hash table [13]. It is a two-step approach. In the first step, candidate closed episodes are extracted. In the next step, they are considered and closed episodes are determined by using a hashing procedure with frequency as the key [12, 13]. As it will be shown in the evaluation results, the number of closed episodes is usually much fewer than candidate closed episodes’. So extracting closed episodes from among a huge number of candidate closed episodes is very time-consuming. For this purpose, we introduce a new data structure, called \({\textit{C}PBT}\), to store closed episodes and present depth-first search algorithms based on \({\textit{C}PBT}\) to extract closed episodes directly. In this section, \({\textit{C}PBT}\) is introduced firstly. Then, the algorithms for mining closed frequent episodes are presented.

5.1 The data structure \({\textit{C}PBT}\)

The data structure \({\textit{C}PBT}\) is introduced to store closed episodes compactly. The root of \({\textit{C}PBT}\) is labeled with \(\varnothing \). The episode is traversed in the backward direction and inserted in \({\textit{C}PBT}\) in a way that each node is corresponding to one CNG of the episode. So the episodes whose postfixes are the same share the same nodes. To avoid losing the frequency of the episodes, each node maintains the sum of the frequency of the episodes that share that node. Each Noden of \({\textit{C}PBT}\) includes:

  • label Given the episode \(\alpha =G'_1\rightarrow \cdots \rightarrow G'_k\), if the node n is corresponding to \(G'_i,1\le i\le n\), then label(n) is the inverse of \(G'_i\).

  • freq: It is the sum of the frequency of the episodes that share n.

  • children There is a node as a child of n for each episode that shares n and n is not corresponding to the first CNG of the episode.

Example 12

Given \(RS_1=\{A,B,C,D,E,F,G,M,N,Z\}\), where \(RS_1\subseteq RS\), assume the three episodes \(\alpha \), \(\beta \) and \(\gamma \) are extracted as follows:

$$\begin{aligned} \alpha&=AB\rightarrow CEG\rightarrow FM \quad&\beta = A\rightarrow C\rightarrow MN \quad \gamma&=DZ\rightarrow FM\\&freq(\alpha )=50 \quad&freq(\beta )=70 \quad&freq(\gamma )=100 \end{aligned}$$

As Fig. 12a shows, the episode \(\alpha \) is inserted in the backward direction in \({\textit{C}PBT}\). Each node of \({\textit{C}PBT}\) is corresponding to one CNG of \(\alpha \). Note that each node includes the frequency of \(\alpha \). Figure 12b shows \({\textit{C}PBT}\) after inserting the episode \(\beta \). Since the episodes \(\alpha \) and \(\beta \) have no equal postfix, the episode \(\beta \) is inserted as a new branch of the root. Since the postfixes of the episodes \(\gamma \) and \(\alpha \) are equal, so the node labeled MF is shared between them. Note that the frequency of this node is the sum of the frequency of \(\alpha \) and \(\gamma \).

Fig. 12
figure 12

The step-by-step construction of \({\textit{C}PBT}\) while inserting the episodes of Example 12

5.2 Mining closed frequent episodes

Firstly, we introduce two new supersets of closed episodes, called Forward Closed and Backward Closed. Then based on them and \({\textit{C}PBT}\), we propose algorithms that extract closed frequent episodes directly.

Definition 20

Given \(\alpha =G'_1\rightarrow \cdots \rightarrow G'_k\), if there is no episode \( \beta \) such that \(\alpha =Prefix(\beta ,k)\) and \(freq(\alpha )=freq(\beta )\), then \(\alpha \) is Forward Closed (FC).

Definition 21

Given \(\alpha =G'_1\rightarrow \cdots \rightarrow G'_k\), if there is no episode \(\beta \) such that \( |CNG_{\beta }|=u\), \(\alpha =Suffix(\beta ,u-k+1)\) and \(freq(\alpha )=freq(\beta )\), then \(\alpha \) is Backward Closed (BC).

If an episode is FC and BC then it is a closed episode. Therefore, we extract the FC episodes and insert them in \({\textit{C}PBT}\) firstly. From among the FC episodes, the episodes that are not BC are absorbed by their super-episodes. Thus, \({\textit{C}PBT}\) only includes closed frequent episodes. Note that for the two FC episodes \( \alpha \) and \(\beta \), checking the CNG occurrences of the episodes is redundant because we could check whether \( \alpha \) is a suffix of \( \beta \) and identify their frequency by using \({\textit{C}PBT}\) simply. As it will be shown, closed episodes and their frequency could be extracted from \({\textit{C}PBT}\) easily.

Algorithm 3 extracts closed episodes by the complete traverse of the pattern tree in a depth-first way. At first, in line 1, all the 1-node episodes (denoted by P) are extracted. Then, in lines 3 and 4, the pattern tree is traversed in a depth-first manner from each of the 1-node episodes using the recursive calls of the algorithm FindClosedFreqEpisode (see Algorithm 8 in “Appendix B”). Note that LOListRS(p) is the corresponding list of p in \({\textit{P}ROSPER}\). The algorithm FindClosedFreqEpisode identifies the FC episodes and inserts them in \( {\textit{C}PBT} \) by calling the function \(Insert{\textit{C}PBT}\). Finally, after extracting closed episodes, Algorithm 3 calls the function ExtractClosedEpisodesFromCBBT to traverse \({\textit{C}PBT}\) and extract closed episodes from it. Note that episodes are represented in the form of SAVE [11] to expedite the episode extraction. In the next section, we comprehensively explain how to insert the FC episodes in CPBT.

figure c

5.3 Insert in \({\textit{C}PBT}\)

When the FC episode \(\alpha \) is inserted in \({\textit{C}PBT}\), two cases could occur: (1) In \({\textit{C}PBT}\), there is another episode \(\beta \) whose suffix is \(\alpha \) and \(freq(\alpha )=freq(\beta )\). It means that \(\beta <\alpha \) because \(\beta \) has been inserted in \({\textit{C}PBT}\) sooner than \(\alpha \). So \(\alpha \) is not BC and should not be inserted in \({\textit{C}PBT}\). (2) After inserting \(\alpha \), another episode \(\beta \) whose suffix is \(\alpha \) and \(freq(\alpha )=freq(\beta )\) might be inserted. It means that \(\alpha <\beta \). So \(\alpha \) should be removed from \({\textit{C}PBT}\).

Definition 22

If the episodes \(\alpha \) and \(\beta \) are FC, \(\alpha \) is a suffix of \(\beta \) and \(freq(\alpha )=freq(\beta )\), it is said that \(\beta \)absorbs\(\alpha \).

The procedure for calling functions to insert an FC episode in \( {\textit{C}PBT} \) is shown in Fig. 13. According to the figure, the episode is converted to a branch by calling the function CreateBranch. In the next step, the function SearchInTree is called. This function calls the function EpisodeAbsorbByTree. It considers whether \( {\textit{C}PBT} \) absorbs the episode or not. If not, the function TreeAbsorbByEpisode is called. This function finds the branches that are absorbed by the episode. Then, these branches are updated by calling the function UpdateBranch. Finally, the function InsertInTree is called to insert the episode in the right place.

Fig. 13
figure 13

The flowchart of the function InsertInCPBT to insert the FC episodes in \( {\textit{C}PBT} \)

The algorithm InsertInCPBT inserts the FC episode \(\alpha \) in \({\textit{C}PBT}\). As Algorithm 4 shows this function receives the FC episode \(\alpha \) and its frequency. The pointer R points to the root of \({\textit{C}PBT}\) at first. In line 2, the corresponding branch of the episode \(\alpha \), \(\alpha '\), is created by calling the function CreateBranch (Algorithm 9). In line 3, the branch \(\alpha '\) is searched in \({\textit{C}PBT}\) by calling the function SearchInTree (Algorithm 5). If \(\alpha '\) is absorbed by a branch of \({\textit{C}PBT}\), this function returns \(-\,1\). Otherwise the branches of \({\textit{C}PBT}\) that are absorbed by \(\alpha '\) are removed and the function returns 0. Finally, the branch \(\alpha '\) is inserted in \({\textit{C}PBT}\) in lines 4 to 6 by calling the function InsertInTree (Algorithm 6). So while inserting the episode \(\alpha \) in CPBT, it should be considered whether \(\alpha \) absorbs the other episodes or is absorbed by another episode. In the following section, the functions called by the function InsertInCPBT are presented in detail.

figure d

(1) Function CreateBranch: The algorithm CreateBranch (Algorithm 9 in “Appendix B”) receives the FC episode \(\alpha \) and its frequency, converts them into a branch of \({\textit{C}PBT}\) and returns a pointer to this branch. Time complexity of the algorithm is discussed in Lemma 11 in “Appendix A”.

Definition 23

Given the node n of \({\textit{C}PBT}\), Episode(n) is the corresponding episode of a branch that starts from the root and ends in the node n.

(2) Function SearchInTree : This function (Algorithm 5) searches an episode in \({\textit{C}PBT}\) and checks whether the episode is absorbed by a branch of the tree or not. If it is not absorbed, the function checks whether the episode absorbs branches of \({\textit{C}PBT}\) or not. As Algorithm 5 shows, by calling the function EpisodeAbsorbByTree (Algorithm 10 in “Appendix B”) in lines 2 to 4, it is checked whether the branch \(\alpha '\) is absorbed by a branch of \({\textit{C}PBT}\) or not. If \(\alpha '\) is absorbed, the value 0 is returned, which shows \(\alpha '\) should not be inserted in \({\textit{C}PBT}\). If \(\alpha '\) is not absorbed, it should be checked whether \(\alpha '\) could absorb branches of \({\textit{C}PBT}\) or not. If it could, before the \(\alpha '\) is inserted, these branches should be removed from \({\textit{C}PBT}\). The stack Path defined in line 1, stores the path of branches that should be removed. The function TreeAbsorbByEpisode (Algorithm 11 in “Appendix B”) in line 6 finds these Paths and inserts them into the global variable PathList. Finally, in lines 7 to 9, the function UpdateBranch (Algorithm 12 in “Appendix B”) is called to update the corresponding branches of the Paths in PathList.

figure e

(3) Function InsertInTree: This function (Algorithm 6) inserts the corresponding branch of the episode \(\alpha \), \(\alpha '\), in the subtree of the node R in \({\textit{C}PBT}\). As Algorithm 6 shows, in lines 2 to 12, a child of R whose label is equal to the label of \(\alpha '\) is found. Note that there exists just one such node because the labels of nodes are unique. Since this node is shared with the branch \(\alpha '\), so the frequency of \(\alpha '\) is added to the node’s in line 4. In lines 5 to 8, the following nodes of the branch \(\alpha '\) are traversed and this process is repeated. While inserting \(\alpha '\), if no node whose label is equal to the label of \(\alpha '\) is found, the value of flag remains False. So in lines 13 to 15, \(\alpha '\) is added to the children of R as a new child.

figure f

Theorem 3

The algorithm AllClosedFreqEpisodes only finds all the closed episodes.

Example 13

Given \(RS_1=\{A,B,C,E,F,G\}\), where \(RS_1\subseteq RS\), assume the FC episodes \(\alpha _i,i=1,2,3,4,5\) are identified as follows:

$$\begin{aligned} \alpha _1&= A\rightarrow B\rightarrow C,freq(\alpha _1)=100 \quad \alpha _2= BE\rightarrow C,freq(\alpha _2)=170 \\ \alpha _3&= B\rightarrow C,freq(\alpha _3)=170 \quad \alpha _4= F\rightarrow A \rightarrow B\rightarrow C,freq(\alpha _4)=100 \\ \alpha _5&= G\rightarrow A \rightarrow B\rightarrow C,freq(\alpha _5)=50 \end{aligned}$$

We define the lexicographic order on RS as \(A<B<C<E<F<G\). Therefore, based on the lexicographic tree of episodes, we have \(\alpha _1<\alpha _2<\alpha _3<\alpha _4<\alpha _5\) [11]. It is clear that the function InsertInCPBT is called for episodes in ascending order:

  • \(\alpha _1\): Since CPBT has no branch, \(\alpha _1\) is inserted in it. Figure 14a shows \({\textit{C}PBT}\) after inserting \(\alpha _1\).

  • \(\alpha _2\): As Fig. 14b shows, the episode \(\alpha _2\) is also inserted in \({\textit{C}PBT}\).

  • \(\alpha _3\): When the function InsertInCPBT is called for \(\alpha _3\), the function SearchInTree returns \(-\,1\) because the function EpisodeAbsorbByTree detects that \(\alpha _3\) is absorbed by \(\alpha _2\). Therefore, \(\alpha _3\) is not inserted in \({\textit{C}PBT}\).

  • \(\alpha _4\): Since none of the branches of \({\textit{C}PBT}\) absorbs \(\alpha _4\), the function TreeAbsorbByEpisode is called for \(\alpha _4\). It detects that \(\alpha _1\) is absorbed by \(\alpha _4\). As Fig. 14c shows, \(\alpha _1\) is replaced with \(\alpha _4\).

  • \(\alpha _5\): \(\alpha _5\) is absorbed by no episode. Furthermore, it absorbs no episode. Therefore, it is inserted in \({\textit{C}PBT}\) as Fig. 14d shows.

The function ExtractClosedEpisodesFromCPBT (see Algorithm 3 in “Appendix B”) extracts the closed episodes stored in \({\textit{C}PBT}\) and provides fast access to them.

Fig. 14
figure 14

Inserting the FC episodes of Example 13 in CPBT

5.4 Analysis of time complexity

In general, the running time of an algorithm is roughly proportional to how many times some basic operation is done [45]. To analyze time complexity, we consider the comparison of the FC/candidate closed episodes as the basic operation. In the hashing approach, all the candidates with the same frequency are hashed to the same bucket in the hash table. If there are v distinct frequency values, then there are v buckets such that \( |bucket_i|=l_i, 1\le i\le v \) and \( \sum _{i=1}^{v}{l_i}=|CandidateClosedEpisodes|=|FC~Episodes| \). Among the candidate patterns which are hashed to the same bucket, those patterns for which a super-pattern with the same frequency is found, are discarded. So the number of comparisons is \( \sum _{i=1}^{v}{|bucket_i|^2\le |CandidateClosedEpisodes|^2} \). It means that time complexity of the hashing approach is \( O(|CandidateClosedEpisodes|^2)\). Since the maximum number of CNGs of episodes is Level, the height of \( {\textit{C}PBT} \) is also Level. Therefore, in our approach, the number of comparisons is \( O(BranF\times Level) \), where BranF is the branching factor of \( {\textit{C}PBT}\). In the worst case, the branching factor of \( {\textit{C}PBT} \) is \(MaxBranF=\sum _{|CNG|=1}^{N}{{N \atopwithdelims ()|CNG|}M^{|CNG|}} \), where \( |ResourceType|=N \) and \( |Status|=M \). As we will see in Sect. 6.2, we should choose the small values such as 6 for Level. In addition, M and N are not large (in this paper, we set \( M=N=4 \)). In general, BranF depends on the extracted FC episodes and in practice \( BranF\ll MaxBranF \). As we will see in Sect. 6, if |CandidateClosedEpisodes| is small, the hashing approach is a good choice. Otherwise, our approach could identify closed episodes much faster than the hashing approach.

6 Evaluation

In [2, 11], we evaluate POSITING and RELENTING on both the real and synthetic workloads comprehensively and investigate the impact of different parameters on them. According to the main concepts introduced in Sect. 3.1, there are some parameters for the pattern mining engine: \(\delta \), \(\varDelta \), \(\epsilon \), \(\mu \), Level, \(\theta \). The parameters setting for the evaluation of the proposed approach is as follows:

  • \(\varDelta \) and \(\delta \): As we mentioned in Sect. 3.1, \( \delta \) and \( \varDelta \) are internal gaps, which determine the starting interval of CNGs. The values of \( \delta \) depend on the time spent on booting VMs. In [2, 11], the valid interval of \( \varDelta \) is \( [\delta ,\delta +\epsilon ] \). To provide a more general approach for mining closed episodes in different fields and conduct more comprehensive experiments, we extend the valid interval of \( \varDelta \) as follows:

    • Given the episode \(\alpha =G'_1\rightarrow G'_2\rightarrow \cdots \rightarrow G'_k\) and the occurrence \(O=([t_j,t'_j]_{j=1}^{k})\in LOList(\alpha )\) such that \(k>2\) and \(1\le i<k-1\), if there is overlap between \(VI([t_i,t'_i], i + 1)\) and \(VI([t_{i+1},t'_{i+1}], i + 2)\), then the two serial episodes \(\beta =G'_1\rightarrow \cdots \rightarrow G'_i\rightarrow G'_{i+1}\) and \(\gamma =G'_1\rightarrow \cdots \rightarrow G'_i\rightarrow G'_{i+2}\) could be extracted. Based on these episodes, different status could be predicted for the next slots. So \(\varDelta \) should be selected in a way that the precise prediction is possible and extracting redundant episodes is avoided. For this purpose, as (6.1) implies \(\varDelta \) should be in the interval of \([\delta ,2\delta )\):

      $$\begin{aligned} {\left. \begin{aligned} t'_{i+1}\ge t'_i+\delta \\ t'_{i+1}+\delta >t'_{i}+\varDelta \end{aligned}\right\} } \rightarrow \varDelta< t'_{i+1}-t'_i+\delta \rightarrow \varDelta <2\delta \rightarrow \varDelta \in [\delta ,2\delta ) \nonumber \\ \end{aligned}$$
      (6.1)
  • \(\epsilon \): According to Definition 1, the span of the events should be greater than \( \epsilon \). \( \epsilon \) should be determined based on the length of sampling intervals. Since the real workloads [46, 47] are coarse-grained and the synthetic workloads are also generated in a similar way to them, we set \(\epsilon =0\) in all the experiments.

  • \(\mu \): The events whose span is large, are decomposed into two events based on the decomposition unit \( \mu \). For smooth workloads, the small values of \(\mu \) might lead to generating many events, which could increase the duration of the episode extraction. On the other hand, the large values of \(\mu \) might lead to the inability to extract all the hidden useful episodes. We evaluate the impact of \(\mu \) on the pattern mining engine for both the real and synthetic workloads.

  • \(\theta \): It is a threshold value that is used to extract frequent episodes. It is clear that the small values of \(\theta \) might lead to identifying a huge number of episodes, which might be very time-consuming. On the other hand, the large values of \(\theta \) could lead to losing some useful episodes for prediction. We evaluate the impact of \(\theta \) on the pattern mining engine for both the real and synthetic workloads.

  • Level: To avoid enlarging the pattern tree, Level limits the length of episodes. As Level increases, the height of the pattern tree, the number of episodes and the time consumed to identify them increase. So we investigate the impact of Level on the pattern mining engine for both the real and synthetic workloads.

Since the focus of the paper is on the pattern mining engine, we evaluate the efficiency of the proposed approach to extract closed episodes. For this purpose, the effect of two important parameters \(\theta \) and \(\mu \) is considered on the pattern mining engine for both the synthetic and real workloads.

6.1 Workloads

The data set GWA-T-12Footnote 2 Bitbrains contains the performance metrics of 1750 VMs from a distributed data center from Bitbrains, which is a service provider that specializes in managed hosting and business computation for enterprises. The workload traces are corresponding to requested and actually used resources in a distributed data center servicing business-critical workloads. The data set focuses on four key types of resources, which can become bottlenecks for business-critical applications: CPU, disk I/O, memory and network I/O. For each VM, the performance metrics are sampled every 5 min. The traces include data for 1750 nodes, with over 5000 cores and 20 TB of memory, and operationally include over 5 million CPU hours in 4 operational months [48].

Since the workloads of the data set GWA-T-12 Bitbrains are more dynamic than the other public workloads [48], in a similar way to [11], we use these workloads for evaluation. Furthermore, we use the synthetic workloads generated in [2, 11]. Table 1 shows the types of the generated synthetic workloads and their corresponding embedded episodes. Since the time it takes to instantiate a new VM instance is about 5–15 min [49] and VMs are sampled every 5 min, so three values 1, 2 and 3 should be evaluated for \(\delta \).

Table 1 The types of the synthetic workloads and their embedded episodes [11]

6.2 Impact of Level

Since both the methods extend the pattern tree the same, we only report the impact of Level on the hashing approach. For this purpose, we set \(\theta = 0.1\), \( \mu =3 \) and \( \delta =\varDelta \) and investigate the impact of Level on both the real and synthetic workloads.

Impact ofLevelon the real workload: One VM, called \( VM_1 \), is selected randomly from GWA-T-12 to evaluate the impact of Level on the number of episodes and the processing time to identify them. Table 2 shows the impact of Level on \( VM_1 \) for \( \delta =\varDelta =1,2,3\). According to the table, as Level increases, the number of episodes and the time consumed to identify them increase. Although there is no significant change in the number of episodes for \( Level>6 \), the processing time increases strongly. So, according to the bolded row, there is a trade-off between the processing time (Time) and the number of episodes for \(Level=6\).

Table 2 The impact of the parameter Level on the pattern tree for \( VM_1(\mu =3, \theta =0.1) \)

Impact ofLevelon the synthetic workload: As Table 3 shows, For each workload type, one trace is generated by the workload generator. The distinct values of \( \delta \) are selected for each workload type randomly. Table 4 shows the impact of Level on \( Trace_i,i=A,B,C \). According to the table, as Level increases, the number of episodes and the time consumed to identify them increase. In a similar way to the real workloads for \( Level>6 \), although there is no significant change in the number of episodes, the processing time increases strongly. According to the bolded rows, for all the traces, there is a trade-off between the processing time and the number of episodes for Level\(=\) 6. So in all of the experiments, we set \(Level=6\).

Table 3 The traces generated from the different types of the synthetic workload
Table 4 The impact of the parameter Level on the pattern tree for \( Trace_i,i=A,B,C~(\mu =3, \theta =0.1) \)

6.3 Evaluation results

To the best of our knowledge, the hashing approach is the only approach that is used in different literature to identify closed patterns among frequent patterns [12, 13, 44]. Therefore, to evaluate the performance and efficiency of the approach proposed in this paper, the approach is compared to the hashing approach [13]. For evaluation, there are four parameters \(\delta \), \(\varDelta \), \(\theta \) and \(\mu \), which should be investigated. As it was mentioned the valid values of \(\delta \) are 1, 2 and 3 and \(\varDelta \) is in the interval of \([\delta ,2\delta )\). The parameter \(\theta \) is in the interval of [0, 1]. Since \(\mu \ge \max (2\epsilon +1,\epsilon + 2)\) [11] and \(\epsilon = 0\), then we have \(\mu \ge 2\). If \(\mu \) is bound to 10 and evaluated in the steps of 1 and \(\theta \) is evaluated in the steps of 0.1, then there are \(6\times 9\times 10=540\) distinct combinations of the parameters for evaluation. Due to space limitation, we select some combinations of the parameters and investigate the impact of the parameters on the pattern extraction of some VMs. The valid values of the parameters \(\delta \) and \(\varDelta \) could be divided into the three groups \(\delta =\varDelta \) and \(\delta <\varDelta \) with spans of 1 and 2 time slots. So we select (\(\delta =1,\varDelta =1\)) from the group \(\delta =\varDelta \), (\(\delta =2\), \(\varDelta =3\)) from the group \(\delta <\varDelta \) with the span of 1 and (\(\delta =3\), \(\varDelta =5\)) from the group \(\delta <\varDelta \) with the span of 2. The impact of \(\varDelta -\delta \) on episode mining is investigated on the real workloads. Since the main goal of cloud is to satisfy SLA and avoid wasting resources [3], the occurrence time of the future events should be determined precisely. So the main focus of evaluation is on the group \(\delta =\varDelta \). The values of \(\delta \) and \(\varDelta \) are selected from this group randomly for the synthetic workloads. All of the experiments run on a machine with an Intel Core 2 Duo 2.53 GHz processor and 4GB of RAM.

Table 5 The impact of the parameter \(\theta \) on extracting closed episodes from \(VM_i,i=1,2,3\) using the hashing approach (\(\mu =3\))

6.3.1 Experimental results of the real workload

In addition to \( VM_1 \), We select two other VMs of GWA-T-12 randomly, called \(VM_2\) and \(VM_3\). Each VM is evaluated for distinct values of \(\delta \) and \(\varDelta \): \(VM_1(\delta =1,\varDelta =1)\), \(VM_2(\delta =2,\varDelta =3)\) and \(VM_3(\delta =3,\varDelta =5)\). For each VM, the impact of the parameters \(\mu \) and \(\theta \) on the number of patterns and the processing time is investigated. Note that since the same sets of closed episodes are extracted from each VM by using both of the methods, the number of closed episodes is only reported for the hashing approach.

Impact of\(\theta \): To consider the impact of \(\theta \), the values of \(\mu \) and Level are set to 3 and 6 respectively. For different values of \(\theta \), Table 5 shows the number of episodes, candidate closed and closed episodes, the extraction time of candidate closed episodes (\(Time_E\)) and the processing time of candidate closed episodes (\(Time_P\)) in seconds for the hashing approach. The symbol \(\infty \) indicates that the hashing approach could not complete the extraction of candidate closed episodes due to insufficient memory. In this case, the number of closed episodes is reported by using the proposed approach. As the table shows, the small values of \(\theta \) increase the number of extracted episodes (and closed episodes) and the time consumed to identify them. According to the table, as the span of \(\varDelta -\delta \) increases, the number of candidate and closed episodes increases abruptly. For example, if \(\theta =0.1\), for \(VM_1\) with \(\varDelta -\delta =0\), the number of closed episodes is 247, for \(VM_2\) with \(\varDelta -\delta =1\), the number of closed episodes is 22166 and for \(VM_3\) with \(\varDelta -\delta =2\), the hashing approach could not extract candidate closed episodes due to insufficient memory. On the other hand, as the table shows for \(VM_2\) with \(\theta =0.1\) and \(VM_3\) with \(\theta =0.3\), for a large number of candidate closed episodes, processing candidate closed episodes consumes more time in comparison to extracting them. Furthermore, the number of closed episodes is much fewer than candidate closed episodes’. These points imply that the hashing approach is not a good choice for small values of \(\theta \) or large spans of \(\varDelta -\delta \).

The total time consumed by the hashing approach to extract closed episodes is defined as \(Time_E+Time_P\). Figure 15 shows the total time consumed to extract the closed episodes by the hashing approach and the proposed approach for different values of \(\theta \). According to the figure, as the value of \(\theta \) increases, the consumed time decreases. Furthermore, for \(VM_1\) with \(\varDelta -\delta =0\) (Fig. 15a), the consumed time of the proposed approach is reasonable and similar to the hashing approach’s. As the span of \(\varDelta -\delta \) increases for \(VM_2\) and \(VM_3\) (Fig. 15b, c), the number of candidate closed episodes increases and the total time consumed by the hashing approach increases abruptly. As Fig. 15 shows the proposed approach extracts the closed episodes much faster than the hashing approach for small values of \(\theta \) and large spans of \(\varDelta -\delta \) because it extracts closed episodes directly without storing/processing all of the candidate closed episodes.

Fig. 15
figure 15

The total time consumed by the hashing approach and the proposed approach to extract closed episodes from \(VM_i,i=1,2,3\) for different values of \(\theta \) and \(\mu =3\)

Table 6 The impact of the parameter \(\mu \) on extracting closed episodes from \(VM_i,i=1,2,3\) using the hashing approach (\(\theta =0.1\))

Impact of\(\mu \): To evaluate the impact of \(\mu \), we set \(\theta = 0.1\) and \( Level=6 \). Since \(\mu \ge \max (2\epsilon +1,\epsilon +2)\) and \(\epsilon = 0\), then we have \(\mu \ge 2\). Table 6 shows the impact of \(\mu \) in the interval of [2, 10] on the number of patterns and the consumed time of the hashing approach. For small values of \(\mu \) such as 2 and 3, the number of candidate closed episodes increases abruptly. On the other hand, as the span of \(\varDelta -\delta \) increases the number of episodes increases suddenly. So for \(VM_2\) with \(\mu =2\) and \(VM_3\) with \(\mu =2,3\), the hashing approach could not complete the extraction of candidate closed episodes due to insufficient memory. In these cases, the number of closed episodes is reported by using the proposed approach. It is clear that as \(\mu \) increases, the span of events increases and the number of events decreases subsequently. So the number of episodes decreases as \(\mu \) increases. Furthermore, for \(VM_2\) with \(\mu =3\) and \(VM_3\) with \(\mu =4\), processing candidate closed episodes consumes more time in comparison to extracting them due to a large number of candidate closed episodes. Therefore, since the hashing approach extracts a large number of candidate closed episodes, it could not be an appropriate choice for small values of \(\mu \) or large spans of \(\varDelta -\delta \).

Figure 16 shows the total time consumed by the hashing approach and the proposed approach to extract the closed episodes for different values of \(\mu \) and \(\theta =0.1\). According to the figure, as the value of \(\mu \) increases, the consumed time decreases. Furthermore, for \(VM_1\) with \(\varDelta -\delta =0\) (Fig. 16a), the processing time of the hashing approach is reasonable. As the span of \(\varDelta -\delta \) increases for \(VM_2\) and \(VM_3\) (Fig. 16b, c), the number of candidate closed episodes increases and the total time consumed by the hashing approach increases abruptly. As Fig. 16 shows the proposed approach extracts closed episodes much faster than the hashing approach for small values of \(\mu \) and large spans of \(\varDelta -\delta \). For example, for \(VM_3\), the time consumed by the hashing approach for \(\mu =4\) is nearly equal to the time consumed by the proposed approach for \(\mu =2\). All these points show that extracting closed episodes without storing/processing all of the candidate closed episodes improves the mining efficiency significantly.

Fig. 16
figure 16

The total time consumed by the hashing approach and the proposed approach to extract closed episodes from \(VM_i,i=1,2,3\) for different values of \(\mu \) and \(\theta =0.1\)

6.3.2 Experimental results of the synthetic workload

In this section, the impact of the two parameters \(\theta \) and \(\mu \) on the number of closed episodes extracted from the synthetic workloads and the time consumed by the methods to identify them is considered. We use the traces generated in Table 3 and compare the time consumed by the hashing approach with the proposed approach’s.

Table 7 The impact of the parameter \(\theta \) on extracting closed episodes from \(Trace_i,i=A,B,C\) using the hashing approach (\(\mu =3\))

Impact of\(\theta \): To consider the impact of \(\theta \), the value of \(\mu \) is set to 3. For different values of \(\theta \), Table 7 shows the number of episodes, candidate closed and closed episodes, \(Time_E\) and \(Time_P\) in seconds for the hashing approach. According to the table, the small values of \(\theta \) increase the number of extracted episodes (and closed episodes) and the time consumed to extract them. As the table shows for \(Trace_B\) with \(\theta =0.1\), for a large number of candidate closed episodes, processing candidate closed episodes consumes more time in comparison to extracting them. So the time consumed by the hashing approach mainly depends on the number of candidate closed episodes. For example, \(Trace_C\) with \(\theta =0.1\) needs the minimum time for the episode extraction due to the minimum number of candidate closed episodes. Furthermore, for all the traces, closed episodes are a small fraction of candidate closed episodes. These points imply that the hashing approach is not an appropriate choice for small values of \(\theta \) because it generates a large number of candidate closed episodes.

Figure 17 shows the total time consumed by the hashing approach and the proposed approach to extract closed episodes for different values of \(\theta \). According to the figure, as the value of \(\theta \) increases, the consumed time decreases for both the methods. Since a large number of candidate closed episodes are extracted for \(Trace_A\) and \(Trace_B\) with small values of \(\theta \), the proposed approach improves the consumed time for them significantly (Fig. 17a, b). As it is observed, for \(Trace_B\) with a larger number of candidate closed episodes, the effect of the proposed approach on the consumed time is more prominent. On the contrary, since a smaller set of candidate closed episodes is extracted by the hashing approach for \(Trace_C\), the consumed time of the hashing approach is less than the proposed approach’s (Fig. 17c). However, the time consumed by both the methods is similar and comparable.

Fig. 17
figure 17

The total time consumed by the hashing approach and the proposed approach to extract closed episodes from \(Trace_i,i=A,B,C\) for different values of \(\theta \) and \(\mu =3\)

Table 8 The impact of the parameter \(\mu \) on extracting closed episodes from \(Trace_i,i=A,B,C\) using the hashing approach (\(\theta =0.1\))

Impact of\(\mu \): To evaluate the impact of \(\mu \), we set \(\theta = 0.1\). Table 8 shows the impact of \(\mu \) in the interval of [2, 10] on the number of patterns and the consumed time of the hashing approach. Unlike the real workloads, there is no clear behavior of the impact of \(\mu \) on the traces. The increase of \(\mu \) does not show clear behavior on the training phase of \(Trace_A\). For example, the number of candidate closed episodes for \(\mu =10\) is more than the number of candidate closed episodes for \(\mu =4\). On the other hand, for \(\mu \ge 3\), there is no change in the number of candidate closed episodes of \(Trace_B\). For \(\mu \ge 8\), there is no significant change in the episode extraction of \(Trace_C\). These results imply that the increase of \(\mu \) does not affect the span of events of dynamic workloads. As the table shows the time consumed by the hashing approach mainly depends on the number of episodes and candidate closed episodes.

Figure 18 shows the total time consumed by the hashing approach and the proposed approach to extract the closed episodes for different values of \(\mu \) and \(\theta =0.1\). According to the figure, there is no clear behavior of the impact of \(\mu \) on the consumed time of both the methods. However, as the figure shows, since there are a large number of candidate closed episodes for \(Trace_A\) and \(Trace_B\), the proposed approach reduces the consumed time for episode mining (Fig. 18a, b). According to the results, for a larger number of candidate closed episodes, the effect of the proposed approach on the consumed time is more eminent (Fig. 18b). On the contrary, although the time consumed by both the methods is comparable for \(Trace_C\), the hashing approach needs less time due to a smaller set of candidate closed episodes.

Fig. 18
figure 18

The total time consumed by the hashing approach and the proposed approach to extract closed episodes from \(Trace_i,i=A,B,C\) for different values of \(\mu \) and \(\theta =0.1\)

7 Conclusion and future work

The prediction of the future workload of applications is an essential step before resource provisioning in cloud. This paper improves the efficiency of the previous predictors proposed based on pattern mining. The paper proposes a general approach, which not only improves time and space complexities of the pattern mining engine of the predictors, but also can be employed in different fields of SPM. To improve space complexity, redundant LOs are identified and omitted based on the improved vertical representation of the stream. To improve time complexity, a new data structure, called \({\textit{C}PBT}\), is introduced to store closed episodes. Based on \({\textit{C}PBT}\), a new approach is suggested to extract closed episodes directly. The experimental results show that for small values of the frequency threshold and the decomposition unit, the proposed approach improves the efficiency of mining closed episodes significantly in comparison to the hashing approach.

In the future work, we plan to conduct more experiments to evaluate the efficiency of the proposed approach for pattern mining in different fields. Furthermore, we plan to propose an approach to select the values of the parameters according to workload changes dynamically.