Abstract
This paper examines various techniques for estimating the intensity of Long-Range Dependence (LRD). Trial data sets with LRD are generated using Fractional Gaussian noise and Markov modulated Poisson process. The real data set collected in IITiS PAN is also used.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Self-similarity
- Long-range dependence
- Hurst parameter
- Fractional gaussian noise
- Markov modulated poisson process
1 Introduction
Many empirical and theoretical studies have shown the self-similar and LRD characteristics of the network traffic [1–4]. These features have a great impact on a network performance [3, 5, 6]. They enlarge mean queue lengths at buffers and increase packet loss probability, reducing this way the quality of services provided by a network [7]. That is why it is necessary to take into account this feature when you want to create a realistic model of traffic sources [8, 9].
Various modelling techniques are currently used for generating the LRD traffic. The majority of them is based on non-Markovian approach [10–13]. The advantage of these models is that they give a good description of the traffic with the use of few parameters. Their drawbacks consist in the fact that they do not allow the use of traditional and well known queueing models and modeling techniques for computer networks performance analysis. There are also Markov based models to generate a LRD traffic over a finite number of time scales [8, 14–17]. This approach makes possible the adaptation of traditional Markovian queueing models to evaluate network performance.
In this work we investigate long-range dependence in the synthetic data generated from Fractional Gaussian noise traffic source and Markovian traffic source. We also use for our analysis the realistic packet traffic trace collected in IITiS PAN.
Section 2 briefly describes the distinction between the terms: Self-similarity and Long-Range Dependence. Section 3 demontrates Fractional Gaussian noise as an example of exactly self-similar traffic source. Section 4 shortly describes the Markovian traffic source which is sufficient to generate long-range dependent traffic over several time scales. Section 5 presents the intensity of LRD estimation for real and synthetic traffic traces. Some conclusions are presented in Sect. 6.
2 The Distinction Between Self-Similarity and Long-Range Dependence
Over the last two decades, long-range dependency (LRD), self-similarity and heavy-tailed distributions have dominated Internet traffic analysis. Extensive measurements have revealed these phenomena in network traffic.
In the literature the terms long-range dependence and self-similarity are often used without distinction, although they are not equivalent concepts [18].
A continuous time process \(Y(t)\) is exactly self-similar with the Hurst parameter \(H\) if it satisfies the following condition [19]:
for \(t\ge 0\), \(a\ge 0\) and \(0<H<1\). The above equality is in the sense of finite dimensional distributions and the Hurst parameter expresses the degree of the self-similarity [20]. The process \(Y(t)\) may be nonstationary [21].
In the case of network traffic one usually has to deal with time series rather than a continuous process. In that context the above definition can be summarized as follows. Let \(X(t)\) be a stationary sequence representing increment process (e.g. in bytes/second). The corresponding aggregated sequence having level of aggregation \(m\):
is obtained by averaging \(X(t)\) over nonoverlapping blocks of length \(m\). The following condition is satisfied for a self-similar process:
for all integers \(m\). A stationary sequence \(X\) is second-order self-similar if for all \(m\), \(m^{1-H} X^{(m)}\) has the same variance and auto-correlation as \(X\). A stationary sequence \(X\) is asymptotically second-order self-similar if \(m^{1-H} X^{(m)}\) has the same variance and auto-correlation as \(X\) as \(m\rightarrow \infty \).
Asymptotically second-order self-similar processes are also called long-range dependent processes and this is a property exhibited by network traffic [19]. Long-range dependence of data means that the behavior of a time-dependent process shows statistically significant correlations across large time scales and self-similarity describes the phenomenon in which the behavior of a process is preserved irrespective of scaling in space or time [2]. Figure 1 shows as an example the fluctuation over differents time scales for IITiS trace described in Sect. 5.
The autocorrelation function od LRD process decays very slowly and if the Hurst parameter is between \(0.5\) and \(1\), the autocorrelation has an asymptotically behavior. If the Hurst parameter is smaller or equal to \(0.5\), the process exhibits SRD (Short-Range Dependence). Second-order stationary process whose autocorrelation function decays hyperbolically is asymptotically second-order self-similar [22]. For this reason the term self-similarity is often used interchangeably with the long-range dependence [19].
3 Fractional Gaussian Noise
Fractional Gaussian noise (fGn) has been proposed as a model [23] for the long-range dependence postulated to occur in a variety of hydrological and geophysical time series. Nowadays, fGn is one of the most commonly used self-similar processes in network performance evaluation [18]. Let \(B_h(t)\) be a fractional Brownian motion process. Then the sequence of increments:
is an exactly self-similar stationary Gaussian process with zero mean, referred to as fGn process.
The autocorrelation function of fGn process is given by [2]:
which is sufficient condition for second-order selfsimilarity. The fGn process is the only stationary Gaussian process that is exactly self-similar [24].
For \(0.5<H<1\) the autocorrelation decays hyperbolically [25]:
so the process exhibits long-range dependence.
The spectral density of fGn process is given by [18]:
where \(\lambda \in [-\pi ,\pi ]\), \(0.5<H<1\) and \(c\) is a normalization constant such that \(\int ^{\pi }_{-\pi }f(\lambda )d\lambda =Var(X)\).
The important problem is the systhetic generation of sample paths (traces) of self-similar processes [18]. In this paper we use the fast algorithm for generating approximate sample paths for a fGn process, first introduced in [26].
4 Markovian Model of LRD Traffic
Many researchers discussed the suitability of Markovian models to describe IP network traffic that exhibits self-similarity and long-range dependence [8, 14, 15, 17, 21]. They concluded that matching LRD is only required within the time scales of interest to the system under research [21, 27, 28].
In consequence, more traditional traffic models such as Markov Modulated Poisson Process may still be used for modeling LRD traffic. Markov Modulated Poisson Process (MMPP) is a widely used tool for the teletraffic models analysis. To generate LRD traffic we use in this paper the MMPP model proposed in [29], and precisely described in [17]. This model consists of \(d\) two-state MMPP models. The \(i\)-th MMPP (\(1\le i \le d\)) can be parameterized by two square matrices:
The element \(c_{1i}\) is the transition rate from state 1 to 2 of the \(i\)-th MMPP and \(c_{2i}\) is the rate out of state 2 to 1. \(\lambda _{1i}\) and \(\lambda _{2i}\) are the traffic rate when the \(i\)-th MMPP is in state 1 and 2 respectively. The sum of \(\mathbf{D_0}^i\) and \(\mathbf{D_1}^i\) is an irreducible infinitesimal generator \(\mathbf{Q}^i\) with the stationary probability vector:
The superposition of these two-state MMPPs is a new MMPP with \(2^d\) states and its parameter matrices, \(\mathbf{D_0}\) and \(\mathbf{D_1}\), can be computed using the Kronecker sum of those of the \(d\) two-state MMPPs [30]:
Let \(N_t^i\) be a number of arrivals from the \(i\)-th MMPP in time slot \((0,t]\). The variance time for this MMPP can be expressed as:
where:
The second-order properties are determined by this three entities: \(\lambda ^*_i\), \(k_{1i}\) and \(k_{2i}\). The covariance function of the number of arrivals in two time slots of size \({\varDelta }t\) is expressed by [29]:
Article [29] illustrates that a superposition of four described above two-state MMPP models suffices to replicate second-order self-similar behaviour over several time scales.
5 Long-Range Dependence Estimation
The Hurst parameter characterizes a process in terms of the degree of self-similarity and LRD [31]. The degree of self-similarity and LRD increases with increasing of \(H\) [32]. A Hurst value smaller or equal to \(0.5\) means the lack of self-similarity or the presence of SRD [33]. A Hurst parameter greater than \(0.5\) means the existance of LRD [31].
In our works three types of LRD traffic traces were considered. Two of them: fGn traffic source and MMPP traffic source have been described in Sects. 3 and 4. We also used a trace of real Internet traffic collected from the network of IITiS institute serving \(<\!\!50\) academic users. This data set has been collected during the whole May 2012 on the Internet gateway of our Institute [4, 34]. The datasets contains different subsets of network protocols. Table 1 gives a summary description of the IITiS traffic data analyzed in the paper.
There are many methods of estimating this parameter. In order to estimate the true values of the Hurst parameter we used in our works five estimators:
-
aggregate variance method,
-
R-S Plot,
-
method based on periodogram,
-
local Whittle’s estimator,
-
Wavelet based method.
We used the statistical package \(R\) mentioned in [35].
The aggregate variance method was described in [35–38]. This technique is time-domain based. It considers \(Var(X^{(m)})\), where the aggregated process \(X^m\) is a time series derrived from \(X\) by aggregating it over blocks of size \(m\). The aggregated sequence is then plotted versus \(m\) after taking logarithm. The estimated value of Hurst parameter is obtained by fitting a simple least squares line through the resulting points in the plane. The asymptotic slope between \(-1\) and \(0\) suggests LRD and estimated Hurst parameter is given by \(H=1-\mathrm {slope}/{2}\). Figure 2 shows as an example the variance-time plot for IITiS trace.
The R-S Plot method [35, 37, 39] is one of the oldest Hurst parameter estimator. Let \(R(n)\) be the range of the data aggregated over blocks of length \(n\) and \(S^2(n)\) be the sample variance of data aggregated at the same scale. For a stochastic process \(X\) the rescaled range of \(X\) over a time interval \(n\) is defined as the ratio \({R}/{S}\):
where \(\overline{X}(n)\) is the sample mean over the time interval \(n\) and \(S(n)\) is standard deviation. For LRD processes, the ratio has the following characteristic for large \(n\):
A log-log plot of \(\frac{R}{S}(n)\) versus \(n\) should have a constant slope \(H\) as \(n\) becomes large. Figure 2 shows as an example the R-S plot for IITiS trace.
The method using Periodogram [35, 37, 40] is frequency domain method, the periodogram is defined by:
A log-log plot \(I_X(\omega _{n,k})\) versus \(\omega _{n,k}=\frac{2\pi k}{n}\) should have a slope of \(1-2H\) around \(\omega =0\). Figure 3 shows as an example the periodogram for IITiS trace.
Local Whittle’s estimator assumes only a functional form for the spectral density at frequencies near zero. To estimate Hurst parameter one should minimize the function:
where \(I_X(\omega )\) is the periodogram and \(f(\omega )=c\omega ^{2H-1}\).
Wavelet analysis is successfully used to measure the Hurst parameter [35]. Wavelets can be thought of as akin to Fourier series, but using waveforms other than sine waves. Figure 3 shows as an example the wavelet analysis for IITiS trace. The estimator fits a straight line to a frequency spectrum derived using wavelets.
Table 2 gives the the obtained Hurst parameters for one day IITiS traces:
-
trace 1 – 6 002 874 samples,
-
trace 2 – 13 874 610 samples,
-
trace 3 – 36 135 490 samples.
In the case of Wavelets based method a \(95\,\%\) confidence interval should be interpreted only as a confidence interval on the fitted line. Our previous work [4] did not confirm the relationship between the degree of LRD and the number of transmitted packet of a given type. One can see a little agreement between the estimators.
Table 3 shows the obtained Hurst parameters for MMPP data traces:
-
MMPP 1 – input parameters: \(d=5\), \(n=6\), \(\lambda ^*=3.5\), \(H=0.75\) and \(\rho =0.6\),
-
MMPP 2 – input parameters: \(d=5\), \(n=6\), \(\lambda ^*=3.5\), \(H=0.6\) and \(\rho =0.6\),
-
MMPP 3 – input parameters: \(d=4\), \(n=5\), \(\lambda ^*=9.985\), \(H=0.79\) and \(\rho =0.0213\).
The fGn model and the MMPP 3 model were used to simulate the LRD data using a theoretical Hurst parameter and the same theoretical mean as the IITiS data. Different Hurst parameters have been chosen to represent a low and a high level of long-range dependence in data. The models are run to produce 200 000 packets. The MMPP 1 and MMPP 2 models have the example parameters described above.
As can be seen the MMPP model is more inconsistent than the fGn model (see Table 4).
6 Conclusions
The article confirms that although the Hurst parameter is well defined mathematically, it is problematic to measure it properly [35]. There are several methods to estimate the Hurst parameter but they often produce conflicting results [2]. Many researchers conclude that wavelet technique generally is faring well in comparative studies [31, 35].
Our results demonstrate the best agreement among the estimators results in the case of fGn model. In the case of real traffic the differences estimators predictions are more visible. This confirms the opinion that the problems with real-life data are worse than those faced when measuring and characterizing synthetic data [35]. Real data can have periodicity, trends and quantisation effect and obtained results are more dependent on the sampling rate than in the case of synthetic traffic traces.
References
Crovella, M., Bestavros, A.: Self-similarity in world wide web traffic: evidence and possible causes. IEEE/ACM Trans. Netw. 5, 835–846 (1997)
Karagiannis, T., Molle, M., Faloutsos, M.: Long-range dependence: ten years of internet traffic modeling. IEEE Internet Comput. 8(5), 57–64 (2004)
Domański, A., Domańska, J., Czachórski, T.: The impact of self-similarity on traffic shaping in wireless LAN. In: Balandin, S., Moltchanov, D., Koucheryavy, Y. (eds.) NEW2AN 2008. LNCS, vol. 5174, pp. 156–168. Springer, Heidelberg (2008)
Domańska, J., Domański, A., Czachórski, T.: A few investigation of long-range dependence in network traffic. In: Czachórski, T., Gelenbe, E., Lent, R. (eds.) Information Science and Systems 2014, pp. 137–144. Springer International Publishing, Switzerland (2014)
Domańska, J., Domański, A.: The influence of traffic self-similarity on QoS mechanism. In: International Symposium on Applications and the Internet, SAINT. Trento, Italy (2005)
Domańska, J., Augustyn, D.R., Domański, A.: The choice of optimal 3-rd order polynomial packet dropping function for NLRED in the presence of self-similar traffic. Bull. Pol. Acad. Sci. Tech. Sci. 60(4), 779–786 (2012)
Stallings, W.: High-Speed Networks: TCP/IP and ATM Design Principles. Prentice-Hall, New Jersey (1998)
Muscariello, L., Mellia, M., Meo, M., Ajmone Marsan, M., Lo Cigno, R.: Markov models of internet traffic and a new hierarchical MMPP model. Comput. Commun. 28(16), 1835–1851 (2005)
Foremski, P., Gorawski, M., Grochla, K.: Source model of TCP traffic in LTE networks. In: Czachórski, T., Gelenbe, E., Lent, R. (eds.) Information Science and Systems 2014, pp. 125–135. Springer International Publishing, Switzerland (2014)
Erramilli, A., Singh, R.P., Pruthi, P.: An application of deterministic chaotic maps to model packet traffic. Queueing Syst. 20(1–2), 171–206 (1995)
Gallardo, J.R., Makrakis, D., Orozco-Barbosa, L.: Use of \(\alpha \)-stable self-similar stochastic processes for modeling traffic in broadband networks. Perform. Eval. 40(1–3), 71–98 (2000)
Harmantzis, F.C., Hatzinakos, D.: Heavy network traffic modeling and simulation using stable FARIMA processes. In: 19th International Teletraffic Congress, pp. 300–303. Beijing, China (2005)
Laskin, N., Lambadatis, I., Harmantzis, F.C., Devetsikiotis, M.: Fractional Levy motion and its application to network traffic modeling. In: Computer Networks, vol. 40, no. 3, pp. 363–375. Elsevier Science Publishers B.V. (2002)
Robert, S., Boudec, J.Y.L.: New models for pseudo self-similar traffic. Perform. Eval. 30(1–2), 57–68 (1997)
Clegg, R.G.: Markov-modulated on/off processes for long-range dependent internet traffic. In: Computing Research Repository (2006). CoRR. arXiv:cs/0610135
Domańska, J., Domański, A., Czachórski, T.: Internet traffic source based on hidden markov model. In: Balandin, S., Koucheryavy, Y., Hu, H. (eds.) NEW2AN 2011 and ruSMART 2011. LNCS, vol. 6869, pp. 395–404. Springer, Heidelberg (2011)
Domańska, J., Domański, A., Czachórski, T.: Modeling packet traffic with the use of superpositions of two-state MMPPs. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2014. CCIS, vol. 431, pp. 24–36. Springer, Heidelberg (2014)
Lopez-Ardao, J.C., Lopez-Garcia, C., Suarez-Gonzalez, A., Fernandez-Veiga, M., Rodriguez-Rubio, R.: On the use of self-similar processes in network simulation. ACM Trans. Model. Comput. Simul. 10(2), 125–151 (2000)
Gong, W.-B., Liu, Y., Misra, V., Towsley, D.: Self-similarity and long range dependence on the internet: a second look at the evidence, origins and implications. Comput. Netw. 48(3), 377–399 (2005)
Bhattacharjee, A., Nandi, S.: Statistical analysis of network traffic inter-arrival. In: 12th International Conference on Advanced Communication Technology, pp. 1052–1057. USA (2010)
Nogueira, A., Salvador, P., Valadas, R., Pacheco, A.: Markovian modelling of internet traffic. In: Kouvatsos, D.D. (ed.) Next Generation Internet: Performance Evaluation and Applications. LNCS, vol. 5233, pp. 98–124. Springer, Heidelberg (2011)
Tsybakov, B., Georganas, N.D.: On self-similar traffic in ATM queues: definitions, overflow probability bounds, and cell delay distribution. IEEE/ACM Trans. Netw. 5(3), 397–409 (1997)
Mandelbrot, B.B., Ness, J.V.: Fractional brownian motions, fractional noises and applications. SIAM Rev. 10(4), 422–437 (1968)
Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall (1994)
Cox, D.R.: Long-range dependance: a review. In: Statistics: An Appraisal (1984)
Paxson, V.: Fast, approximate synthesis of fractional Gaussian noise for generating self-similar network traffic. ACM SIGCOMM Comput. Commun. Rev. 27(5), 5–18 (1997)
Grossglauser, M., Bolot, J.C.: On the relevance of long-range dependence in network traffic. IEEE/ACM Trans. Netw. 7(5), 629–640 (1999)
Nogueira, A., Valadas, R.: Analyzing the relevant time scales in a network of queues. In: SPIE Proceedings, vol. 4523 (2001)
Andersen, A.T., Nielsen, B.F.: A markovian approach for modeling packet traffic with long-range dependence. IEEE J. Sel. Areas Commun. 16(5), 719–732 (1998)
Fischer, W., Meier-Hellstern, K.: The Markov-modulated poisson process (MMPP) cookbook. Perform. Eval. 18(2), 149–171 (1993)
Stolojescu, C., Isar, A.A.: Comparison of some Hurst parameter estimators. In: 13th International Conference on Optimization of Electrical and Electronic Equipment, pp. 1152–1157. Brasov, Romania (2012)
Rutka, G.: Neural network models for internet traffic prediction. Electron. Electr. Eng. 4(68) (2006)
Abry, P., Veitch, D.: Wavelet analysis of long-range-dependent traffic. IEEE Trans. Inf. Theory 44(1), 2–15 (1998)
Foremski, P., Callegari, C., Pagano, M.: Waterfall: rapid identification of IP flows using cascade classification. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2014. CCIS, vol. 431, pp. 14–23. Springer, Heidelberg (2014)
Clegg, R.G.: A practical guide to measuring the hurst parameter. Int. J. Simul. 7(2), 3–14 (2006)
Beran, J.: Statistics for Long-Memory Processes. Chapman and Hall (1994)
Taqqu, M.S., Teverovsky, V.: On estimating the intensity of long-range dependence in finite anf infinite variance time series. In: A Practical Guide To Heavy Tails: Statistical Techniques and Applications, pp. 177–217. Birkhauser Boston Inc., Boston (1998)
Park, C., Hernandez-Campos, F., Long, L., Marron, J., Park, J., Pipiras, V., Smith, F., Smith, R., Trovero, M., Zhu, Z.: Long range dependence analysis of internet traffic. J. Appl. Stat. 38(7), 1407–1433 (2011)
Mandelbrot, B.B., Wallis, J.: Computer experiments with fractional gaussian noises. Water Resour. Res. 5(1), 228–241 (1969)
Geweke, J., Porter-Hudak, S.: The estimation and application of long memory time series models. J. Time Ser. Anal. 4(4), 221–238 (1983)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Domańska, J., Domański, A., Czachórski, T. (2015). Estimating the Intensity of Long-Range Dependence in Real and Synthetic Traffic Traces. In: Gaj, P., Kwiecień, A., Stera, P. (eds) Computer Networks. CN 2015. Communications in Computer and Information Science, vol 522. Springer, Cham. https://doi.org/10.1007/978-3-319-19419-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-19419-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19418-9
Online ISBN: 978-3-319-19419-6
eBook Packages: Computer ScienceComputer Science (R0)