Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The performance evaluation of wireless networks requires a proper representation of the traffic transmitted within the network. The realistic assessment of evaluated network architecture or networking mechanisms must be done under conditions representative for the analyzed scenario. In particular, the traffic transmitted through the network must have similar statistical properties as the traffic imposed in future working conditions. Therefore, a good modeling of traffic source is crucial for the performance evaluation of any network protocol.

The network traffic modeling problem has been analyzed for many years. The large number of models representing popular protocols and applications were proposed, like HTTP [2], video streaming [14], or online gaming [1]. As most of the traffic in the Internet is transmitted via HTTP, the novel models better representing this protocol are still being developed—see, e.g., [19]. The first traffic models were based on Poisson distribution, but the authors of [18] shown that it does not represent correctly the traffic characteristics. The Internet traffic has self-similar characteristics, thus a large number of more sophisticated methods were used to correctly represent the observed statistical properties: on/off processes, Hidden Markov Models (HMM), ARIMA processes, Wavelets, diffusion approximation and multifractals [3, 6, 10].

The wireless network traffic is currently rapidly growing: according to Cisco Visual Networking Index, the volume of data transmitted by mobile devices increased by more than 80 % in 2013 [17]. Mobile users use different applications and browse the web slightly differently than the users of regular PCs: according to [16], the largest fraction of traffic to mobile devices is multimedia, and the HTTP objects transmitted to mobile devices are on average larger than transmitted to other devices. A considerable part of the traffic on mobile devices is generated by the synchronization of mobile applications with the servers, such as, e.g., social networks, mail, or calendar applications. This type of traffic is not present on stationary computers, thus dedicated models representing the properties of mobile device traffic are needed for proper performance evaluation of wireless networking protocols and algorithms.

There are a few traffic source models dedicated for wireless networks evaluation. LiTGen [20] allows to reproduce accurately the traffic burstiness and internal properties over a wide range of timescales, but is limited to mail and P2P applications. Authors of [15] use Hidden Markov Models to represent different QoS classes of the network traffic, but the model is based on a small traffic trace gathered in laboratory from a WiFi network, thus is not representative per large-scale mobile network traffic. In [4], the authors propose a theoretical model of eNodeB traffic that considers 6 different parameters—for example number of subscribers, data and voice activity during the busy hour, and the bandwidth required for bearer sessions—but the model provides only rough estimates on the aggregated traffic volume of a typical eNodeB. To the best of our knowledge, we were not able to find a traffic source model based on large-scale measurements in the \(4\mathrm{th}\) generation wireless network, i.e., Long-Term Evolution (LTE) networks.

2 Proposed Model

In this work, we propose a traffic generator representative for modern mobile devices traffic in LTE networks. It is based on measurements of traffic in a large-scale wireless network. The proposed tool generates TCP flow sizes and durations. We implement our model in the OMNeT++ environment and verify it by comparison to traces available in the literature. The model is based on a real-world research data [12], and thus generates traffic that is similar to transmissions in real mobile devices.

2.1 Network Traffic Data

Traffic traces are the key requirement for modeling network traffic. For example, popular methods apply HMM to model the interpacket time gaps and the packet lengths basing on observations of real traffic [6]. However, obtaining adequate samples of real-world LTE traffic is practically hard, so in our work we build on the data presented in the recent paper by J. Huang et al., which gives an in-depth study of LTE performance [12]. We consider this work as representative and authoritative, as it presents the first study of a large real-world LTE packet trace. The traffic was collected from 22 eNBs in 2012 at a large metropolitan area in the U.S., and contains 2.9 TB of LTE traffic from over 300,000 users. The details on the dataset are presented in Table 1. For other papers that analyze wireless network traffic, see, e.g., [7, 8, 11, 13, 21]

Table 1 Characteristics of the real-world LTE dataset [12]
Fig. 1
figure 1

Histograms of TCP flow sizes

The paper by Huang et al. characterizes several aspects related to the performance of LTE networks, but in our work we use the most fundamental data: the statistical characteristics of TCP session size and TCP session duration, given in Sect. 4.1 of [12]. As the authors point out, the TCP protocol dominates in LTE networks—carrying over 97 % of bytes and 95 % of flows—so we believe it is enough to model just the TCP protocol for a good approximation of the whole mixture of LTE traffic. Interestingly, for TCP, the pair of HTTP/HTTPS protocols is responsible for over 92 % of traffic flows.

In order to obtain the data required for our model, we digitized Figs. 2 and 3 from [12] and statistically processed the obtained data, as explained herein. After digitizing the Cumulative Distribution Functions (CDFs) of TCP session sizes and durations, we used the finite difference method to obtain their histograms and Probability Density Functions (PDFs), displayed in Figs. 1, 2, 3, and 4. Due to the range of the underlying data, density functions had to be presented in the logarithmic scale. This makes their interpretation harder, because integral of a PDF over its entire range must equal 1, but the corresponding histograms allow us to easily draw conclusions. The fit of the log-normal distribution is presented in Figs. 3 and 4 to roughly compare the data with a simple model. As visible in Fig. 4, the empirical distribution of flow duration cannot be explained using such a data model.

Fig. 2
figure 2

Histogram of flow durations

Fig. 3
figure 3

Distribution densities of TCP flow sizes

Fig. 4
figure 4

Distribution density of flow durations

Fig. 5
figure 5

Uncorrelated flow data

Fig. 6
figure 6

Correlated flow data, \(c = 0.16\)

We chose to generate traffic flow data using inverse transform sampling on the digitized CDF plots. Each flow \(F\) is fully characterized with a tuple of \(F = \langle S_u, S_d, D \rangle \), where \(S_u\) and \(S_d\) describe the TCP payload size in the uplink and downlink direction (respectively), and \(D\) is the flow duration (i.e., the time from the first packet to the last packet). In order to linearly correlate \(\mathrm{log}(S_d)\) and \(\mathrm{log}(D)\) we used two correlated random variables \(X\) and \(Y\) for sampling the CDFs of \(S_d\) and \(D\):

$$\begin{aligned} X \sim \fancyscript{U}(0, 1) , \end{aligned}$$
(1)
$$\begin{aligned} Y = c \cdot X + (1 - c) \cdot Z , \end{aligned}$$
(2)
$$\begin{aligned} Z \sim \fancyscript{U}(0, 1) , \end{aligned}$$
(3)

where \(c \in [0,1]\). The correlation of \(0.196\) reported in [12] was obtained for \(c = 0.16\). See Figs. 5 and 6 for comparison between uncorrelated and correlated \(\mathrm{log}(S_d)\) and \(\mathrm{log}(D)\), for a sample of 2,500 pairs. For the uplink direction, we consider \(\mathrm{log}(S_u)\) and \(\mathrm{log}(D)\) as uncorrelated. Please refer to Chap. 7 and Example 2.1 in [5] for background on generating correlated random variates. We leave the task of describing \(F\) with a more sophisticated model for future work.

Fig. 7
figure 7

The OMNeT\(++\) implementation

2.2 Network Traffic Model

To model the traffic from mobile devices, we used the OMNeT++/INET discrete event simulator, basing on the model of TCP connection described in [9]. To simulate uplink and downlink flows, we implemented two modules: a client (representing a mobile device) and a server (representing the Internet). Both modules were implemented as TCP applications (TCPApps) in the INET StandardHost module (Fig. 7). The StandardHost module enables simulating data flows in a TCP environment, along with a representation of all network layers. We assume no concurrent sessions, hence the communication is realized using a single pair of sockets. The sockets are closed after the session ends, and reopened for the next session: server socket is set to the ListenOnce state and binds to the client socket on connection request. Flow data is generated on the client side, where the correlated values described in Sect. 2.1 are drawn randomly. The packet transmission interval is set to \(100\) ms, and the packet size is constant for the whole session, so the total flow size matches the generated data.

Fig. 8
figure 8

QQ-plot of flow durations

The transmission is realized as follows. First, an uplink packet (named “Precast”) is sent. It includes the information about expected response size and session duration. When the server receives the first packet it starts its own independent transmission with an adequate packet size. Transmission ends when both sides finish their transfers, and the session is immediately closed. In case of no uplink data, the client sends a single Precast packet and waits for the server to finish. For session times below \(100\) ms, the transmission is resolved by TCP modules; for session times between 100 and 200 ms, transmission is trimmed down to \(100\) ms. The interval between consecutive sessions is set by default to \(1\)s, but it can be adjusted to any value or distribution, as needed. Note that this interval regulates the aggregated client bandwidth. Current implementation does not support simultaneous transmissions, which was left for future versions of our traffic generator.

3 Experimental Results

For experimental evaluation, we run our model implemented in OMNeT++ and collected statistics of 100,000 TCP sessions.

In Fig. 8, we compare the flow durations obtained from simulation with the expected flow data values, using a Quantile-Quantile plot (QQ-plot) of 1,000 samples. As visible, the distributions generally match each other, with an exception for the data in the range of 100–200 ms (on the OX-axis). This is expected, because our model presented in Sect. 2.2 divides data into 100 ms transfer windows, which limits the granularity of flow durations observed in simulation. For larger flow durations this effect is negligible, hence not visible on the plot.

Fig. 9
figure 9

Uplink flow rates (500 samples)

Fig. 10
figure 10

Downlink flow rates (500 samples)

Fig. 11
figure 11

CDF comparison of uplink flow rates

Fig. 12
figure 12

CDF comparison of downlink flow rates

In Figs. 9, 10, 11, and 12 we compare the TCP flow rates obtained in OMNeT++ with the measured flow data shown in [12]. For both directions we see that the distributions are similar, with only minor deviations. Generally, slow flows (less than a few kilobits per second) and fast flows (more than a few megabits per second) tend to be transmitted with lower bit rates in the simulation environment. On the other hand, some flows in the middle are transmitted faster: for uplink the flows in the range of 5–100 kbps and for downlink the flows in the range of 50–1,000 kbps.

4 Conclusions

In this paper we proposed a model of TCP traffic that represents statistical properties of traffic in large-scale LTE networks, using the OMNeT++ discrete event simulation environment. The model is based on measurements of flow sizes and durations in a real network. The evaluation of the model confirms that it well matches with the traffic observed in the literature, providing similar distributions of transmission rates, for downlink and uplink directions.

The presented model can be used in various simulations as a source of TCP traffic generated by a mobile device working in an LTE network. We acknowledge that the model is based on the data collected in the U.S., whereas the behavior of the network users in other parts of the world may vary. However, as of this writing, there is no credible network data available, e.g., for European users, which may be an interesting avenue for future research.

The model can be downloaded at https://projekty.iitis.pl/rezultaty-badan-2-en.