3.5 Simulation and Statistical Tools
3.5.3 Estimation Tools
Traffic models require estimating traffic parameters. In most cases it concerns distribu- tion fitting problems to determine the distribution of packet sizes, packet inter-arrivals, data slots sizes etc. . . Besides, basic distributions (e.g. exponential, normal, lognormal, . . . ) some parameters are better fitted by a mixture of distributions (especially quantities dealing with sizes in bytes). We suggest estimating the probability density function by mixture distribution y(x) defined as follows:
y(x) = α1f1(x) + α2f2(x) + · · · + αnfn(x) (3.20)
Where {αi, i = 1 · · · n} are positive weights such that n
P
i=1αi = 1. We use the EM
(Expectation-Maximization) algorithm to estimate distribution parameters by the max- imum likelihood approach [DLR77].
On the other hand, function fitting problems useful to estimate the Autocorrelation function are better solved using non-linear regression methods. We use the Levenberg- Marquardt algorithm to estimate the correlation structure of traffic traces.
Chapter 3. Generic Framework for Traffic Modelling 45
3.5.3.1 EM Algorithm
The EM algorithm provides an efficient method to estimate a parametric model of pa- rameter α using an incomplete observation data set y. The algorithm uses a hidden variable h that it is supposed to provide a complete data set (y, h) with the incomplete observation data set y. In other words, we suppose the existence of a joint distribu- tion p(y, h|α) that is simpler to estimate. The idea of EM is to estimate a maximum likelihood of the complete data set p(y, h|α′) using the current estimation of α′. This
maximum likelihood will be used to estimate the new parameter α. Formally, this could be expressed as:
Q(α, α′) = Eh(log p(y, h|α)|y, α′) (3.21)
When estimating a mixture distribution as defined in equation (3.20). The problem is formulated as follows:
We have n known distributions f1(x), f2(x), · · · , fn(x) with an incomplete observation
data set x = x1, x2, · · · , xT, and:
( fα(x) = n X i=1 αifi(x) with αi ∈ [0, 1] and n X i=1 αi = 1 ) (3.22)
We estimate ˆα = arg maxαlog fα(y)
The hidden variable in this case is the state s (s ∈ [1, n]) of the model fα at the
prediction instant. The auxiliary function is expressed as follows:
Q(α, α′) =X y ˜ f (y) n X i=1
fα′(s = i|y) logfα(y, s = i) (3.23) ˜
f is the empirical distribution and:
• αi is the initial probability of being in state i,
• fα(y, s = i) = αi× fi(y) is the joint probability of being in state i and generate y,
• and fα′(s = i|y) = Pαifi(y) iαifi(y)
is the probability of being in state i knowing that y is the current observation.
EM tries to maximize the auxiliary function (on α) under the constraint: the sum of coefficients equals to 1. Using a normalisation factor λ we can write:
δ δαi h Q(α, α′) − λX iαi− 1 i = 1 αi X α ˜ f (y)fα′(s = i|y) − λ | {z } Ci = 0 (3.24) And then αi = PCi iCi
. The EM algorithm in this case can be summarized as follows:
• Initialize α′
i to any values under the condition n
P
i=1α ′ i = 1
Chapter 3. Generic Framework for Traffic Modelling 46
• Repeat until convergence:
– Expectation-step: Evaluation of Ci.
– Maximization-step: for each i, αi ← PCi
iCi .
Ci could be seen as the expected number of times the model i is used to generate the
observation. It is very intuitive! We should note that, the maximization step depends on the distribution type. Solutions for most used probability density functions can be found in [Vil00].
We use the EM algorithm to estimate the different parameters and weights of different distribution mixtures suggested to fit traffic model parameters. The main advantage of the EM algorithm is that it converges towards a local maximum starting from any initial point. On the other hand, its convergence speed may be very slow. We use an accuracy criterion to stop the iterations of the algorithm. The distribution fit results are generally very positive.
3.5.3.2 Levenberg-Marquardt Algorithm
The Levenberg-Marquardt algorithm [Lev44, Mar63] is used to solve numerically the min- imization problem in least squares curve fitting. In many cases the Levenberg-Marquardt algorithm can find a solution even when it is initialized far from the final minimum at the cost of slower convergence than traditional algorithms (e.g Gauss-Newton).
Let g be a function of two vectors x and P . The sum of the squares of the deviations f (P ) could be expressed like this:
f (P ) =D(g(x, P ) − y)2E (3.25) Where h.i stands for the calculated average over a set of couples (x, y). The algorithm is iterative and seeks to find the vector P that minimizes the function f (P ) based on a set of real measured values y.
The algorithm seeks to calculate the vector Pi as a function of vector Pi−1at iteration
i, so that f (Pi) tends to a local minimum of f . Indeed, a quadratic approximation ˆf
of f is calculated based on a linear approximation ˆg of g around the point Pi−1. This approximation is not efficient unless function g is really linear around point Pi−1. Other- wise, very bad results are obtained. This motivated Levenberg to use this approximation only in the regions where g is quasi linear, otherwise a gradient descent is used instead. Later Marquardt work focused on quickly switching to gradient descent to avoid a big number of iterations in linear regions. This combined algorithm is known as Levenberg- Marquardt algorithm. The algorithm implementation details can be found in [EM78] where more technical information are provided.
Practically, this algorithm converges with small number of iterations. However, for each iteration the number of operations is proportional to N3, where N is the size of
vector P . As a consequence, this algorithm is usually limited to problems with small number of parameters to optimize.
Chapter 3. Generic Framework for Traffic Modelling 47
This algorithm will be used to estimate the correlation structure of traffic traces based on autocorrelation function models for short range and long range dependent correlations where the number of parameters to optimize is very limited (maximum three).
3.6
Conclusion
In this chapter we presented a generic and hierarchical framework for traffic modelling, based on the generalization of the ON-OFF behavioural model. The model can describe most of the applications deployed on the Internet throughout a detailed description of the application behaviour. Its generic structure allows modelling applications of higher complexity with several parallel flows. We implemented this framework in a simulation tool for performance evaluation studies. Throughout this thesis we use the framework to describe multimedia applications: Audio, Video and Web. A complementary analysis and estimation module was implemented. It is used to analyze statistically traffic sources and to estimate traffic model parameters. The framework offers a complete workbench for estimating, modelling and evaluating multimedia traffic sources.
Chapter 4
IP Traffic Modelling
4.1
Introduction
Since Paxson and Floyd [PF95], reported the failure of Poisson process in modelling Wide Area Network (WAN) traffic, long-range dependence of Internet traffic was widely revealed (e.g. [LTWD94]). Many approaches were explored to model Internet traffic. We can classify these different approaches into two broad categories: Flow level approaches and Packet level approaches. Flow level approach focuses on traffic modelling at the connection level. In this case, we group IP packets into connections or into more generic “flows” and we model the flow arrival process and the fluctuation of active flows. On the other hand, Packet level approach has two different trends: point processes and aggregate count processes. In point process, traffic may be seen as single arrivals of dis- crete entities (packets, cells, etc). It can be mathematically described as a point process [ ¸Cin75], consisting of a sequence of arrival instants t1, t2, . . . , tn. There are two equivalent
descriptions of point processes: counting processes and inter-arrival time processes. A counting process N(t) is a continuous-time, non-negative integer-valued stochastic pro- cess, where N(t) = max{n, tn ≤ t} is the number of (traffic) arrivals in the interval
{0, t}. An inter-arrival time process is a non-negative random sequence A(n), where A(n) = tn− tn−1 is the length of the time interval separating the nth arrival from the
previous one. However, given the huge number of packets involved in any network traffic, this would result in huge data sets. A a consequence, aggregate count process, denoted X∆(k) is generally preferred. The aggregate count process X∆(k) consists of the number
of packets (or bytes) lying within the kth slot of size ∆ > 0 and whose time stamps lie
between k∆ ≤ ti < (k + 1)∆.
A recurrent theme relating to traffic modelling in broadband networks is the traffic “burstiness”. Burstiness is present in a traffic process if the arrival points {tn} appear
to form visual clusters; that is, {An} tends to give rise to runs of several relatively short
inter-arrival times followed by a relatively long one. The main sources of burstiness are due to the shapes of the probability distribution and autocorrelation function of {An}.
There is no single widely-accepted measure of burstiness. Some of the commonly-used mathematical measures are: the ratio of peak rate to mean rate, the Index of Dispersion
Chapter 4. IP Traffic Modelling 49
Of Inter-arrivals (IDI), and the Hurst exponent (see Chapter 3 for definitions).
In this chapter, we model IP traffic by aggregate byte count process (a window-based approach). We construct traffic models for some Internet traffic traces and we evaluate the generated traffic both statistically and in a network environment. In particular, we analyze the poor performance of the generated traffic when injected in queuing system. Then we introduce proposals to enhance the packet generation process. The proposed modifications are valid also for similar traffic models based on the aggregate byte count process approach.