Keywords

1 Introduction

Cloud applications are reaching more and more industries, such as online social network services, user data processing services, online videos, customer relationship management applications, online mass games, online booking and online banking. Different cloud applications have different workload patterns. At the same time, the global distribution of cloud users and the continuous development of mobile technology make workload patterns more complex. For example, because the users are in different time zones, the new diurnal pattern may become a superposition of multiple traditional diurnal patterns. In most real-world cloud scenarios, the workload population is hybrid with more than one cloud applications, and the workload patterns are more and more diversified. To mimic the cloud workloads closer to reality, a generic workload model is needed. A generic arrival process model plays a key role which should cover a variety of arrival processes and independent of application types.

The limitations of existing arrival process models for cloud workload generation can be broadly classified into two types. One limitation is the arrival process models are too unitary, for example only for specific applications or mimic some specific scenario such as burst. Another limitation is the arrival process models are not general and scalable enough to cover a variety of arrival processes. For instance, most current arrival models are defined as arrival time points that are consistent with an independent and identical distribution like possion process mostly. These models lack the time-dependent characteristics such as periodicity. Though in recent years, the arrival process models based on MAP (Markov arrival process) were studied to describe some time-dependent characteristics. Unfortunately, the number of states and the generation matrix of MAP models depend on different applications or different scenarios. As a result, the arrival models lack generality and calculations for workload generation will multiply as the number of states increases. Therefore, the MAP model is not suitable for use as a generic arrival process model to generate hybrid workloads.

In this paper, a hierarchical generic arrival process model was developed. The main characteristics of various arrival process models were captured and were independent of different applications types. At the same time, we proposed a unified algorithm to automatically generate arrival process instances that conform to the arrival model definition.

The arrival process model is defined by four steps: (1) the duration of workload which is defined as the number of time intervals, (2) the variation of each time interval, (3) the variation of the requests numbers within each time intervals, (4) each arrival time points within each metering interval in final.

Our generation algorithm supports that each step can be defined as a constant, or a statistical distribution/process, or a self-defined function. Then the algorithm automatically generates an arrival process instance based on the defined arrival process model. The generation of sample arrival time points conforms to the corresponding definition and has random characteristics. The option of self-defined function further enhances the extensibility and flexibility of the generic arrival process model.

In the case study following the model definition, we demonstrated how to use the generic arrival model to define the arrival process models for three representative cloud applications types. A Web application, a batch application and a MapReduce application were picked. Additionally, an arrival process model included a self-defined function was illustrated. Furthermore, the corresponding four arrival process instances were generated based on their arrival process models.

The rest of this article is organized as follows: Sect. 2 analyses the related works involved in arrival process models and generation algorithms. Section 3 details the generic arrival process model and its generation algorithm, as well as four example demonstrations. Section 4 summarizes our work and future directions.

2 Related Works

A number of researchers have investigated modeling and simulation of arrival processes. There are three typical representations of arrival process: point process, count process and rate process [1]. The random variable in point process is the interval between adjacent arrival time points. The random variable in count process is the number of points arrived at each equally spaced interval T. The random variable in rate process is the normalized count, which equals to the number of points in each time interval divided by time interval T. The point process is the most accurate because it requires all the arrival time points. Count process and rate process lose the accurate time points with interval T. However, if the research problem does not care about the accurate arrival time points, the latter two processes could be chosen. For instance, in [2] different arrival rates were defined for different types of jobs. In the existing studies about arrival process model, some studies adopted one representation, others mentioned above combined representations.

Next, the related works about arrival process model are classified into three types: i.i.d. first-order model, temporal dependent model and hierarchical model.

At present, the first-order arrival models mostly adopted point process representation. That is, the intervals sequence conforms to some distribution or some statistical process. The most common assumption was the arrival process followed the Poisson Process [3, 4]. Some researchers argued that this assumption are unrealistic. They built their arrival process models by fitting real data. For example, in [5] the arrival of batch jobs conformed to Weibull distribution. In [6] the arrival intervals were in accordance with Pareto distribution by fitting the traces from the Google internal data center. In [7] the interarrival times in private cloud workloads were well modeled by a 3-phase Hyper-Exponential distribution. The author found the model with 5 parameters was more realistically than the lognormal and Pareto models with 2 parameters. And in [8] the author used a queueing system to model a cloud system with a lot of servers, and interarrival times were denoted as a phase-type distribution. The phase-type distribution is composed of adjustable number of exponential phases. This phase-type arrival process model is a general i.i.d. first-order model because any distributions can be generated closely by a combination of these exponential phases [9].

The i.i.d. first-order arrival process models describe the statistical distribution of interarrival times within a period of time, but not the temporal dependence between different time periods. Namely, they cannot define the temporal features such as periodicities, burstiness and self-similarity. To address these shortcomings, a MAP (Markov Arrival Process) model was proposed in [10] to represent the distribution and correlation of arrival times. Meantime, the author [10] pointed out that the MAP model is a superset of i.i.d. first-order models, and also gave a method to fit the MAP model. MAP models define that arrival process can be in different states. During each state holding period, the arrival process conforms to a certain distribution, which is generally exponential distribution [11], called MMPP (Markov Modulated Poisson Process), or other distribution [12], called semi-Markov process, and the state transition is defined by an intensity matrix. In recent years, some researchers believe MAP models are more realistic than the i.i.d. first-order processes. In [13], the arrival process of a Web application was modeled by MAP. The work in [14] fitted the interarrival times for Grid level jobs through Poisson, Interrupted Poisson Process (IPP), MMPP2, MMPP3, and MMPP4 models. The result was that MMPP2 is closer to the real data in changeability than Poisson and IPP. Although MAP model introduces more realistic temporal correlation, the state definitions and generation matrix are dependent on applications and workload scenarios. Moreover, the complexity of model definition will be multiplied with the increase of the number of states. Therefore, MAP models are not suitable for generating diverse hybrid cloud workloads in terms of generality and scalability.

In order to more accurately grasp the characteristics of the arrival process, some studies have adopted hierarchical models. This is similar to our modeling approach. In [15] the author built a two-level arrival process model to describe the access to a file system. Firstly the access times were divided into groups by clustering method. Then three features were used to define arrival process: the interval between clusters, the number of accesses within a cluster and the interarrival times within a cluster. Lastly, the arrival process model was proved to synthesize access instances of a distributed replicated file system close to the original data. Another research [1] fitted a LRD (Long Range Dependent) arrival process model in two steps. First, the arrival rate process was fitted by a multifractal wavelet model. Second, Controlled-Variability InF was made to convert rate process to arrival time points. After the two steps a completely determined LRD arrival instance were generated. Strictly speaking, the above two studies only gave the hierarchical method to generate arrival process instances which could be close to real trace logs. Neither of them proposed arrival process model formally. Besides, they cannot be general applicable in other types of applications or scenarios. The hierarchical arrival process model we proposed combines the counting process with point process, captures the main characteristics of the arrival process, and maximizes the flexibility of the model with time-section and hierarchical methods. This model can be used to specify diverse arrival scenarios for different applications.

At present, most cloud workload generation tools are built in the Benchmarks. These workload generators typically enable users to generate the required workloads by defining the distributions or parameter values of the arrival process. Most arrival process models in Benchmarks are i.i.d. first-order model. For example, a popular Web Benchmark: Rubis [16] defined that user session length and think time between sessions conformed to negative exponential distributions. Some researchers [17] adjusted the arrival process model of Rubis, where the request arrival rates can be set. And at a request arrival rate, the requests arrived accord with uniform distribution. A “standard” Benchmark for NOSQL cloud system is: Yahoo! Cloud Serving Benchmark (YCSB) [18]. The total number of operations and different throughputs can be configured before workload generation. Then during each timeframe, the interarrival times are generated conformed to uniform distribution. The generated arrival process based on arrival rate lost the variability of arrival time points within a timeframe. SPEC Cloud ™ IaaS 2016 [19] use open source CBTool [20] to generate workload. CBTool allows users to set the duration of a workload, the maximum number of requests, and the distribution of arrival process. Our workload generation tool is more flexible than CBTool. Four features can be configured: the number of intervals during the workload lifetime, the length of each time interval, the number of requests arriving during each time interval and the arrival time points during each time interval.

We noticed there are some recent Benchmarks turned to generate workloads based on MAP models. For example, BURSE [21] was proposed to generate workloads with spikes and self-similarity according to a MMPP model. However, MMPP model is complex and not intuitive for users. And one model is only true usually of one specific application or some specific scenarios. As a consequence, MMPP model is not suitable for a general workload generation tool. By contrast, our arrival process model and generation algorithm are more simple, general and intuitive. Our work laid a solid foundation for generating hybrid cloud workloads. The work about a general workload model and generation tool in cloud computing can be referred to [22].

3 A Generic Arrival Process Model

For simplicity and generalization of the arrival process model, the inner features which are specific to applications are out of the question. For example, for web applications, only the arrival time points of the first request in each session are considered. The other requests in sessions are taken no account because the requests arrived interdependently. Similarly, for MapReduce application, only the arrival time points of each job are included. The start-times of mapper and reducer in each job are excluded. In other words, our model is concerned only the variations in the number and frequency of user requests which are common for different applications. A generic model defined the interdependencies of a cloud application you can refer to our work [22]. Another aspect requires further explanation, the generic arrival model defined one arrival at a time, but it can easily be extended to the batch arrival process by adding a bulk-number parameter.

In this section, the generic arrival process model will be reported in three parts: (1) the mathematical specification of the general arrival process model, (2) the instances generation algorithm based on the arrival process model, (3) case study for three typical cloud applications and one arrival process model with self-defined function.

3.1 Formal Specification of a Generic Arrival Process Model

In this section the arrival process is formally defined by a hierarchical mathematical model. For the convenience of the reader, Table 1 lists a summary of the main annotations in the order introduced in the paper.

Table 1. The annotations in the model

A generic arrival process model that can be used to generate hybrid cloud workload requires two conditions as follows

  • □ Generality: the model can grasp the essential characteristics of the arrival process for different cloud applications. And more complex and diverse arrival processes can be generated by superposition.

  • □ Accuracy: the arrival process model combines counting process and point process to generate time-dependent workloads without losing the accurate arrival time points.

In this model, firstly the total time of the workload is defined in terms of the number of time intervals included. Then, the length of each time interval and the number of requests arrived within each time interval can be constant or variable. And the arrival time points within each time interval can be defined individually. This enables the arrival process model not only to describe the accurate arrival points within a time interval but also to describe the temporal dependence among different time intervals, such as periodicity, bursty, variability and self-similarity. The generic arrival process model is defined in four steps as follows.

  1. 1.

    The total time: Duration

    An arrival process is divided into several time intervals. The total time of an arrival process is defined as a positive integer Duration, which represents the number of time intervals included in an arrival process. The length of each time interval could be unequal.

  2. 2.

    The length of each time interval

    A total arrival process is split into Duration continuous and disjoint time intervals. The length of each time interval could be equal, noted as a constant \( \Delta {\text{T}} \). It can also be variable, the variation of the lengths can be represented as a stochastic process \( {\text{F}}(\Delta {\text{T}}_{x} )\{\Delta {\text{T}}_{x} ,\quad x = 1,2, \ldots ,{\text{Duration}}\} \), where \( \Delta {\text{T}}_{x} \) is a random variable which represents the length of \( x \) th time interval. Let \( \uptheta\left( . \right) \) be a function used to determine the variation of the interval lengths. The relationship between \( {\text{F}}(\Delta {\text{T}}_{x} ) \) and \( \uptheta\left( . \right) \) is denoted as

    $$ F\left( {\Delta T_{x} } \right)\sim\theta \left( . \right) $$
    (1)
  3. 3.

    The quantity of arrivals in each interval

    The variation of the arrival numbers in each time interval is denoted as a stochastic process \( {\text{F}}N^{R\_U} \left( {\Delta T_{x} } \right),\left\{ { N^{R\_U} \left( {\Delta T_{x} } \right),\;x = 1,2, \ldots ,Duration } \right\} \), where \( {\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\Delta {\text{T}}_{\text{x}} } \right) \) is the number of arrived requests within \( \Delta {\text{T}}_{x} \) (the \( x{\text{th}} \) time interval). Let \( \updelta\left( . \right) \) be a function used to determine the variation of the arrival numbers in each time interval. The relationship between \( {\text{FN}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right) \) and \( \updelta\left( . \right) \) is denoted as

    $$ FN^{R\_U} \left( {\Delta T_{x} } \right)\sim\delta \left( . \right) $$
    (2)

    A special case is that all time intervals are equal, so we simplified the stochastic process as \( {\text{FN}}^{{{\text{R}}\_{\text{U}}}} \left( {\text{x}} \right),\left\{ {{\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\text{x}} \right),{\text{x}} = 1,2, \cdots } \right\} \), where \( {\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\text{x}} \right) \) is the quantity of arrivals in the \( {\text{xth}} \) interval. Let \( \updelta\left( . \right) \) be a function used to determine the variation of the arrival numbers in each time interval. The relationship between \( {\text{FN}}^{R\_U} \left( x \right) \) and \( \updelta\left( . \right) \) is denoted as

    $$ {\text{FN}}^{R\_U} \left( x \right)\sim\delta \left( . \right) $$
    (3)
  4. 4.

    The arrival time points in each time interval.

    The arrival point process for each time interval can be defined individually. For each time interval \( \Delta {\text{T}}_{\text{x}} ,\;{\text{x}} = 1,2, \cdots ,{\text{Duration}} \), an arrival point process composed of \( {\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\Delta {\text{T}}_{\text{x}} } \right) \) time points is denoted as a stochastic process

    $$ {\text{F}}\left( {{\text{T}}\left( {{\text{x}},{\text{k}}} \right)} \right),\;\left\{ {{\text{T}}\left( {{\text{x}},{\text{k}}} \right),{\text{k}} = 1,2, \cdots {\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\Delta {\text{T}}_{\text{x}} } \right)} \right\} $$
    (4)

    where \( {\text{T}}\left( {{\text{x}},{\text{k}}} \right) \) is the \( {\text{kth }} \) arrival time point within the \( {\text{xth}} \) time interval. \( {\text{T}}\left( {{\text{x}},{\text{k}}} \right),{\text{k}} = 1,2, \cdots {\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\Delta {\text{T}}_{\text{x}} } \right) \) should be satisfied with

    $$ \sum\nolimits_{{{\text{l}} = 1}}^{{{\text{x}} - 1}} {\Delta {\text{T}}_{\text{x}} } < T\left( {{\text{x}},{\text{k}}} \right) \le \sum\nolimits_{{{\text{l}} = 1}}^{{{\text{x}} - 1}} {\Delta {\text{T}}_{\text{x}} } ,{\text{k}} = 1,2, \cdots {\text{N}}^{{{\text{R}}\_{\text{U}}}} \left( {\Delta {\text{T}}_{\text{x}} } \right) $$
    (5)

    We denote a set of functions \( \left\{ {{\text{Func}}_{\text{x}} \left( \right), {\text{x}} = 1,2, \cdots ,{\text{Duration}}} \right\} \), where \( {\text{Func}}_{\text{x}} \left( \right) \) is the function that the arrival time points within the xth time interval conforms to. Namely,

    $$ {\text{F}}\left( {T\left( {x,{\text{k}}} \right)} \right)\sim{\text{Func}}_{x} \left( \right) ,x = 1,2, \cdots ,Duration, \,\,k = 1,2, \cdots {\text{N}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right) $$
    (6)

    A special case is that each arrival point process is consistent, so we simplified the stochastic process and the function as \( {\text{F}}\left( {{\text{T}}\left( {\text{k}} \right)} \right) \) and \( {\text{Func}}\left( \right) \) respectively. And they have

    $$ {\text{F}}\left( {T\left( {\text{k}} \right)} \right) \sim{\text{Func}}\left( \right) , k = 1,2, \cdots {\text{N}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right) $$
    (7)

3.2 A Generation Algorithm Based on Arrival Process Model

The input parameters of the generation algorithm are on basis of the generic arrival process model, include: (1) the number of time intervals Duration, (2) the function \( \uptheta\left( . \right) \) determining the variation of the interval lengths, (3) the function \( \updelta\left( . \right) \) determining the variation of the arrival numbers in each time interval, (4) a set of functions \( \{ {\text{Func}}_{x} \left( \right), x = 1,2, \cdots ,Duration \)} determining the arrival time points within each time interval conforms to. Here \( \uptheta\left( . \right) \), \( \updelta\left( . \right) \) and every \( {\text{Func}}_{x} \left( \right), x = 1,2, \cdots ,Duration \) can be assigned by three methods: constant, statistical distribution or process and self-defined function. Any statistical distribution can be assigned, such as exponential distribution, Weibull distribution. The steps of the algorithm are consistent with the steps of the generic arrival process model. Firstly, a list of \( \{\Delta {\text{T}}_{x} ,\;x = 1,2, \cdots ,Duration\} \) are generated randomly according to \( \uptheta\left( . \right) \). Secondly, a list of \( \left\{ {{\text{N}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right),\;x = 1,2, \cdots ,Duration} \right\} \) are generated randomly according to \( \updelta\left( . \right) \). Lastly, arrival time points {\( T\left( {x,{\text{k}}} \right),\;x = 1,2, \cdots ,Duration,\;k = 1,2, \cdots ,{\text{N}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right) \)} are generated randomly according to the corresponding \( {\text{Func}}_{\text{x}} \left( \right),\;x = 1,2, \cdots ,Duration \). Table 2 showed the pseudo-code for generating the arrival process instance based on the arrival process model.

Table 2. Pseudo-code of generation algorithm

3.3 Case Study

To explain the generality of the arrival process model for different cloud applications in detail, we take three typical cloud applications: Web applications [23], MapReduce applications [24] and batch applications [5] as examples. We explained in detail the definitions of the three arrival processes using our generic model, and gave three generated arrival process instances applying our generation algorithm. At the same time, we presented an extra example on the definition of an arrival process model with self-defined function as well as the generated arrival process instance.

Web Application Arrival Process Model Example

In this web application workload, the arrival process was divided into rounds. The time of each round is unequal. Within one round, the number of active users is denoted as \( Concurrent\_Users \), and each active user initiated one session. \( Ramp\_Up\_Period \) specifies the time to initiate all the sessions in one round. If all sessions are created at the same time, then \( Ramp\_Up\_Period = 0 \). Otherwise, the sessions are created one after another at regular intervals. The interval between two adjacent sessions is \( Ramp\_Up\_Period/Concurrent\_Users \). For example, there were 5 active users in a round and 10 s of \( Ramp\_Up\_Period \), it will take 2 s between each session creation. The length of each session conforms to the negative exponential distribution \( {\text{Exp}}\left( {15} \right) \). Thus, the time of each round is defined as

$$ {\text{F}}\left( {\Delta {\text{T}}_{\text{x}} } \right)\sim\uptheta\left( . \right) = {\text{Ramp}}\_{\text{Up}}\_{\text{Period}} + {\text{Exp}}\left( {15} \right) $$
(8)

Because in one round each active user can only create one session, the number of sessions in one round \( {\text{N}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right) \) is the number of active users. According to the specification in [23], the variation of the arrival numbers in each round is defined as

$$ {\text{FN}}^{{{\text{R}}\_{\text{U}}}} \left( {\text{x}} \right)\sim\updelta\left( . \right) = {\text{N}}\left( {10,3} \right) $$
(9)

The arrival time points in round x is defined as

$$ T\left( {x,{\text{k}}} \right) = \left( {{\text{k}} - 1} \right) *\left( {\frac{{{\text{Ramp}}\_{\text{Up}}\_{\text{Period}}}}{{{\text{Concurrent}}\_{\text{Users}}}}} \right),\;{\text{k}} = 1,2, \ldots {\text{N}}^{R\_U} \left( {\Delta {\text{T}}_{x} } \right) $$
(10)

We use the following steps to generate the arrival process instance.

  1. 1.

    Give Duration a value

  2. 2.

    Generate Duration Concurrent_Users which are random sampled by normal distribution \( N\left( {10,3} \right) \)

  3. 3.

    Generate Concurrent_Users session lengths which are random sampled by negative exponential distribution \( {\text{Exp}}\left( {15} \right) \). In each round, the length of a round is given by the sum of Ramp_Up_Period and the maximum session length of the samples.

  4. 4.

    In each round, Ramp_Up_Period is set as 800 s, then the interarrival time is (Ramp_Up_Period/Concurrent_Users)

Because of limited space, Fig. 1 showed an arrival process instance with Duration = 4, the number of sessions in 4 rounds are 9, 11, 13, 5, the 4 session lengths are 823.6955 s, 849.5503 s, 826.9957 s, 826.9957 s. We take the start time of the round as the first arrival point. And the interarrival time in each round is in turn Ramp_Up_Period/Concurrent_Users = 88.8889 s, 72.7273 s, 61.5385 s, 160 s.

Fig. 1.
figure 1

A web application arrival process instance

MapReduce Application Arrival Process Model Example

The arrival process model [24] is first-order. The interarrival time conformed to \( Weibull\left( {20,0.5} \right) \). As a result, it is not necessary to divide the arrival process model into time intervals, then Duration = 1;And the interarrival time \( \Delta {\text{t}}_{k} \) conforms to \( Weibull\left( {20,0.5} \right) \), that is \( {\text{Func}}\left( {\,} \right) \sim Weibull\left( {20,0.5} \right), \) thus, the \( {\text{kth}} \) arrival time point is equal to

$$ T\left( {1,\;{\text{k}}} \right) = {\text{T}}\left( {1,({\text{k}} - 1)} \right) +\Delta {\text{t}}_{\text{k}} ,\;{\text{k}} = 1,2 \ldots $$
(11)

We use the following steps to generate the arrival process instance.

  1. 1.

    \( Duration = 1 \), \( \Delta {\text{T}} = 6000 \) s

  2. 2.

    Randomly generate sample points conforming to \( Weibull\left( {20,0.5} \right) \) as interarrival times until the arrival time point is beyond \( \Delta {\text{T}} \).

Figure 2 shows an instance of the arrival process, which generates 20 arrivals within 6000 s.

Fig. 2.
figure 2

A MapReduce application arrival process instance

Batch Application Arrival Process Model Example

The arrival process model [5] appears Daily-cycle pattern. The day is first divided into 48 intervals by half an hour, i.e. \( \Delta {\text{T}} = 1800\,{\text{s}} \). Then the variation of the number of arrivals during each of 48 intervals is defined to conform to Weibull distribution, i.e. \( \updelta\left( . \right)\sim{\text{W}}\left( {1.79,24.16} \right) \). The hours from 8AM to 5PM are called “peak hours”. The variation of the interarrival time \( \Delta {\text{t}}_{k} \) conforms to also Weibull distribution with different parameters, i.e. \( Func\left( \right)\sim {\text{W}}\left( {4.25,7.86} \right) \). Thus, the arrival time points in each interval can be defined as

$$ {\text{T}}\left( {x,{\text{k}}} \right) = {\text{T}}\left( {x,{\text{k}} - 1} \right) +\Delta {\text{t}}_{k} $$
(12)

We use the following steps to generate the arrival process instance.

  1. 1.

    \( Duration = 48 \), \( \Delta {\text{T}} = 1800 \) s

  2. 2.

    Randomly generate 48 sample points conforming to \( {\text{W}}\left( {1.79,24.16} \right) \) as the number of arrivals within each of 48 intervals

  3. 3.

    During the peak hours, the interarrival times in each interval are generated randomly according to the Weibull distribution \( {\text{W}}\left( {4.25,7.86} \right) \). The numbers of sample points within each interval are determined by step 2. And in each interval the first arrival time point is the start time of the interval.

Because the author only gave the interarrival time during the peak hours, also the space is limited, we showed an instance of the arrival process during only the first four time intervals (that is, from 8 to 10) in Fig. 3. A batch arrival process is multiple arrivals at a time. However, we are more concerned with the arrival times in this work, so that the number of jobs in an arrival is not defined.

Fig. 3.
figure 3

A batch application arrival process instance

An Example of Arrival Process Model with Self-defined Functions

The arrival process model introduces more flexibility by supporting self-defined functions. We presented an example of the arrival process including self-defined functions. Firstly, we gave the specification of the self-define arrival process.

(13)

As can be seen from Fig. 4, the arrival rate increases linearly from 8 am to 11 am, from average 5 arrivals per hour to 20 arrivals per hour. From 11 pm to 1 pm, the arrival rate remains at average 20 arrivals per hour. From 1 pm to 5 pm, the arrival rate drops linearly until 12 arrivals per hour. After 5 pm it is closed. The time is divided into hours, i.e. \( \Delta {\text{T}} = 1\,{\text{h}} \). The arrival time point in an interval conforms to the poisson process, and \( \uplambda_{i} \) of the possion process is set to the mean number of arrivals in the \( ith \) interval. In Fig. 4, horizontal axis is time with hour as the unit, and vertical axis is arrival rate. 8 am corresponds to the value 0 on the x-axis. The mathematical function of the variation of arrival rate is defined in Eq. 13.

Fig. 4.
figure 4

Arrival rate function example

We specify the access process with our model as follows:

  1. 1.

    The time is divided into hours, i.e.\( \Delta {\text{T}} = 1\,{\text{h}} \).

  2. 2.

    The mean number of arrivals in the \( ith \) interval is

    $$ {\text{E}}\left( {{\text{N}}^{R\_U} \left( i \right)} \right) = \mathop \smallint \limits_{i - 1}^{i}\updelta\left( x \right)dx $$
    (14)
  3. 3.

    The arrival time points in the \( ith \) interval are

    $$ \left\{ {T\left( {i,{\text{k}}} \right),\;k = 1,2, \cdots {\text{N}}^{R\_U} \left( i \right)} \right\} ,\;{\text{F}}_{x} \left( \right)\,\sim\,Possion\left( {{\text{E }}\left( {{\text{N}}^{R\_U} \left( i \right) } \right)} \right) $$
    (15)

We use the following steps to generate the arrival process instance.

  1. 1.

    \( Duration = 9,\;\Delta {\text{T}} = 1\,{\text{h}} . \)

  2. 2.

    The number of arrivals in each interval is randomly generated according to the mean number of arrivals in this interval. As shown in Fig. 5, the nine samples are 6, 11, 16, 20, 19, 18, 15, 12.

    Fig. 5.
    figure 5

    A self-defined arrival process instance

  3. 3.

    The interarrival times in each interval are generated randomly according to the passion process with their parameter \( \uplambda_{i} \).

4 Conclusion

This paper presented a general hierarchical arrival process model and an algorithm for generating arrival process instances based on the arrival model.

The general arrival process model was specified in four steps, had two advantages. (1) It captured the essential features of arrival process models, was independent of application and workload scenario. (2) It combined the advantages of point process and count process, can not only describe accurate time points in each interval but also describe temporal dependence of intervals.

Our corresponding generation algorithm supports that each of four steps can be defined as a constant, or a statistical distribution/process, or a self-defined function. The option of self-defined function further enhances the extensibility and flexibility of the generic arrival process model. Compared with the existing generation tools, our algorithm is simple and effective, and has stronger scalability.

The case study showed the generality, flexibility and effectiveness of our works. In sum, the generic arrival process model and generation algorithm will provide a solid foundation for generating more realistic hybrid cloud workloads.

In the future, we will further study how to formally define a more complex arrival process model by combined multiple general arrival process models in vertical and horizontal axis of time. In vertical axis of time, a complicated arrival process model can be the superposition of multiple parallel arrival processes model. In horizontal axis of time, a time-dependent arrival process model can be defined by multiple consecutive arrival process models. We believe that the study will make the general arrival process model more comprehensive.