TECHNICAL BACKGROUND - Self Similar Network Traffic and Performance Evaluation pdf

1.4.1 Self-Similar Processes and Long-Range Dependence

1.4.1.1 Second-Order Self-Similarity and Stationarity Consider a discrete time stochastic process or time seriesXt,t2Z, whereXtis interpreted as the traf®c volumeÐmeasured in packets, bytes, or bitsÐattime instancet. Of interest is also the interpretation thatXtis the total traf®c volumeup totimet, say, from time 0. To minimize confusion, when a ``cumulative'' view is taken, we will denote the process byYt. We will then reserveXtto be theincrement processcorresponding toYt, that is,Xt Yt Yt 1.

For traf®c modelingpurposes, we would likeXtto be ``stationary'' in the sense that its behavior or structure is invariant with respect to shifts in time. In other words,

t's responsibility as anabsolute reference frame is relieved. Without some form of stationarity, ``anything'' is allowed and a model loses much of its usefulness as a compact description of (assumed) tractable phenomena.Xtisstrictly stationaryif

Xt1;Xt2;. . .;XtnandXt1k;Xt2k;. . .;Xtnkpossess the same joint distribution for alln2Z,t1;. . .;tn;k2Z. Denotingthek-shifted process or time seriesXk;X andXk are said to be equivalent in the sense of®nite-dimensional

disributions,X d Xk. Imposingstrict stationarity, it turns out, is too restrictive and we will be interested in a weaker form of stationarityÐsecond-order stationarity7_Ð

which requires that the autocovariance function gr;s EXr mXs m

satis®es translation invariance, that is, gr;s grk;sk for all r;s;k2Z. 7_{Equivalent names are}_weak_,_covariance_{, and}_{wide sense}_{stationarity.}

The ®rst two moments are assumed to exist and be ®nite, and we setmEXt, s2_E_X_t _m2_{for all}_t₂_Z_{. We will also assume}_m_{0. Since, by stationarity,} gr;s gr s;0, we denote the autocovariance bygk.

To formulate scale invariance, ®rst de®ne the aggregated process Xm _of_X _at aggregation levelm,

Xm_i1 m

Pmi tmi 11Xt:

That is, Xt is partitioned into nonoverlappingblocks of size m, their values are averaged, andiis used to index these blocks. Letgm_k_{denote the autocovariance} function ofXm_{. Under the assumption of second-order stationarity we arrive at the} followingde®nitions of second-order self-similarity.

De®nition 1.4.1 (Second-Order Self-Similarity) Xt is exactly second-order self-similarwith Hurst parameterH (1=2<H<1) if

gk s₂2k12H 2k2H_k ₁2H _1:1

for allk1.Xtisasymptotically second-order self-similarif

lim m!1g

m_ks2

2 k12H 2k2H k 12H: 1:2 It can be checked that Eq. (1.1) impliesgk gm_k_{for all}_m_{1. Thus, second-} order self-similarity captures the property that the correlation structure is exactlyÐ condition (1.1)Ðor asymptoticallyÐthe weaker condition (1.2)Ðpreserved under time aggregation. The form of gk k12H ₂_k2H_k ₁2H_s2_{=2 is not} accidental and implies further structureÐlong-range dependenceÐto which we will return later. Second-order self-similarity (in the exact or asymptotic sense) has been a dominant framework for modelingnetwork traf®c and this is also re¯ected in the chapters of this book.

1.4.1.2 An Allegory into Distributional Self-Similarity To understand the particular form ofgk in the de®nition of second-order self-similarity, we will make a short detour and discuss self-similar processes in slightly more generality. Further extensions and detailed treatments can be found in Beran [9] and Samorodnitsky and Taggu [60].

Consider the cumulative processYt, albeit in continuous timet2R. Following is a de®nition of self-similarity for continuous-time processes in the sense of ®nite- dimensional distributions.

De®nition 1.4.2 (H-ss) Yt isself-similar with self-similarity parameter, that is, Hurst parameter,H (0<H <1), denotedH-ss, if for alla>0 andt0,

Yt d a HYat: 1:3 Thus Yt and its time scaled version YatÐafter normalizingby a H_Ðmust follow the same distribution. In the traf®c modelingcontext, it is convenient to think ofYtas the cumulative or total traf®c up to timet. Fora>1Ðtime is stretched or dilatedÐa contraction factor a H _{is applied to make the magnitude of} _Y_at comparable to that of Yt. For a<1, the opposite holds true. As a varies, the scalingexponentH remains invariant. This is a most natural de®nition; however, it has an important drawback: unlessYtis degenerate, that isYt 0 for allt2R,

Ytcannot be stationary due to the normalization factora H_{. Its increment process}

Xt Yt Yt 1, however, is another matter. In particular, consider the case whereYtisH-ss and hasstationary increments; in this case we sayYtisH-sssi. Let us further assume that Yt has ®nite variance. It can be checked that

EYt 0,EY2_t_s2_j_t_j2H_{, and}

gk s₂2jtj2H _j_{t s}_j2H_j_s_j2H_: _1:4 This is achieved by notingthat8

Yt _d tH_Y₁_;

from which it follows EY2_t_s2_t2H_{. The latter, then, can be used in the} derivation of the autocovariance function (1.4). The increment process Xt has mean 0 and autocovariancegkas given in Eq (1.1). The derivation is similar to that ofYt.

How does distributional self-similarity (of a continuous time process) tie in with second-order self-similarity (of a discrete time process), which requires exact or asymptotic invariance with respect to second-order statistical structure of the aggregated time series Xm_{? A key observation lies in notingthat} _Xm _{can be} viewed as computinga sample mean

Xm 1 m Pm t1Xt m 1_Y_m _Y₀ d m 1mHY1 Y0 mH 1X:

Thus, ifYtis a H-sssi process then its increment processXtsatis®es

X _dm1 H_Xm_; _1:5

8_From_aH_Yt₌

dYat, substitutet1 andat.

which shows howXm _{is related to}_X _{via a simple scalingrelationship involving}_H in the sense of ®nite-dimensional distributions. Equations (1.1) and (1.2), then, express the fact thatX andm1 H_Xm _{are required to have exactly or asymptotically} the same second-order structure. As a result, dependingon whether a discrete time processXtsatis®es Eq. (1.5) for all m0 or only in the limit as m! 1,Xt

is said to be exactly self-similar or asymptotically self-similar. Note that in the Gaussian case, this de®nition coincides with second-order self-similarity. As a lead-in to the role of the parameterH, recall that the variance of the sample meanZ of a random variableZsatis®es varZ s2

Z=m, wheremis the sample size. From Eq. (1.5) it follows that varXm_s2_m2H 2_{. When viewed as a sample mean} where the samples are drawnindependently, varXm_{reduces to}_s2_m 1_if_H1

2. If

H61

2, in particular,12<H<1, then

varXm_s2_m b

with 0<b<1 (and H 1 b=2), which hints at a certainÐand not just anyÐ

dependency structure in the ``samples'' (i.e., time series in our case) that causes varXm_{to converge to zero slower than the rate} _m 1_.

1.4.1.3 Long-Range Dependence Thus far we have focused on explicatingthe role of self-similarity in the second-order stationary and distributional senses with little regard to the role ofHand its range of values. Let us return to the de®nition of second-order self-similarity and its autocovariancegk. Letrk gk=s2 _denote theautocorrelation function. For 0<H<1,H61

2, it holds

rk H2H 1k2H 2_; _k_{! 1}_: _1:6

In particular, if 1

2<H<1, rk asymptotically behaves as ck b for 0<b<1, wherec>0 is a constant,b2 2H, and we have

k 1rk 1: 1:7

That is, the autocorrelation function decays slowlyÐthat is, hyperbolicallyÐwhich is the essential property that causes it to be not summable. When rk decays hyperbolically such that condition (1.7) holds, we call the correspondingstationary processXtlong-range dependent.Xtisshort-range dependentif the autocorrelation function is summable9_{. An essentially equivalent de®nition can be given in the} 9_{Technically, more subtle de®nitions of long-range dependence are possible, but in this book, we will}

frequency domain where the spectral density Gn 2p 1P1

k 1rkeikn is required to satisfy the property

Gn cjnj a_; _n_!_0:

Herec>0 is a constant and 0<a2H 1<1. ThusGndiverges around the origin, implying ever larger contributions by low-frequency components.

Followingare some simple facts regardingthe value ofHand its impact onrk. First, ifH 1

2, thenrk 0, andXtis trivially short-range dependent by virtue of beingcompletely uncorrelated. In the case where 0<H<1

2, we have P₁

k 1rk 0, an arti®cial condition rarely encountered in applications. H 1 is uninterestingsince it leads to the degenerate situation rk 1 for all k1. Finally,H-values bigger than 1 are prohibited due to the stationarity condition on

Xt.

1.4.1.4 Self-Similarity Versus Long-Range Dependence The precedingdiscus- sion indicates that there are self-similar processes that are not long-range dependent, and vice versa. For example, Brownian motion is1

2-sssi with white Gaussian noise as its increment process, but the latter is not long-range dependent. Conversely, certain fractional ARIMA time series generate long-range dependence but they are not self- similar in the distributional sense. In the case of asymptotic second-order self- similarity, however, by the restriction 1

2<H<1 in the de®nition, self-similarity implies long-range dependence, and vice versa. It is for this reason and the fact that asymptotic second-order self-similar processes are employed as ``canonical'' traf®c models, that we sometimes use self-similarity and long-range dependence inter- changeably when the context does not lead to confusion.

1.4.2Impact of Heavy Tails

1.4.2.1 Heavy-Tailed Distribution There is an intimate relationship between heavy-tailed distributions and long-range dependence, which we will discuss in the next sections. First, a few de®nitions and basic facts. A random variableZhas a

heavy-tailed distributionif

PrfZ>xg cx a_; _x_{! 1}_; _1:8

where 0<a<2 is called the tail index or shape parameter and c is a positive constant10_{. That is, the tail of the distribution, asymptotically, decays hyperbolically.} This is in contrast to light-tailed distributionsÐfor example, exponential and GaussianÐwhich possess an exponentially decreasingtail. A distinguishingmark of heavy-tailed distributions is that they have in®nite variance for 0<a<2, and if 10_{Technically, more subtle de®nitions involvingslowly varyingfunctions are possible and can be found in}

some chapters of this book. However, for practical purposes and to convey the main ideas, our working de®nition, centered around condition (1.8), will suf®ce.

0<a1, they also have an unbounded mean. In the networkingcontext, we will be primarily interested in the case 1<a<2. A frequently used heavy-tailed distribution is thePareto distributionwhose distribution function is given by

PrfZxg 1 b

x a

; bx;

where 0<a<2 is the shape parameter andbis called thelocation parameter. The mean is given byab=a 1. We remark that there are distributionsÐfor example Weibull and lognormalÐthat have subexponentiallydecreasingtails but possess ®nite variance.

The main characteristic of a random variable obeyinga heavy-tailed distribution is that it exhibits extreme variability. Practically speaking, a heavy-tailed distribution gives rise to very large values with nonnegligible probability so that sampling from such a distribution results in the bulk of values being``small'' but a few samples having ``very'' large values. Not surprisingly, heavy-tailedness impacts sampling by slowingdown the convergence rate of the sample mean to the population mean, dilatingit as the tail indexaapproaches 1. For example, pendingon the sample size

m, the sample meanZ_mof a Pareto distributed random variableZ may signi®cantly deviate from the population meanak=a 1, oftentimes underestimatingit. In fact, the absolute estimation errorjZ_m EZj asymptotically behaves asm1=a 1 _(see, e.g., Crovella and Lipsky [15]), and thus foravalues close to 1, care must be given when samplingfrom heavy-tailed distributions such that conclusions about network behavior and performance attributable to samplingerror are not advanced. A more detailed discussion of samplingissues is given in Chapter 3.

1.4.2.2 Heavy Tails and Predictability Heavy-tailedness of certain network- related variablesÐfor example, ®le sizes and connection durationsÐcan be shown to underlie the root cause of long-range dependence and self-similarity in network traf®c. First, let us examine a simple fact on the intrinsic predictability associated with heavy-tailed random variables. LetZ be a heavy-tailed random variable interpreted as thedurationorlifetimeof a network connection (e.g., TCP connection, IP-¯ow, or session). Since connection durations are physically measurable events, assume that we observeÐin timeÐthat a connection has been active for t>0 seconds. To simplify the discussion, assume time is discrete (t2Z) andA:Z! f0;1gis an indicator function such thatAt 1 iffZt. We are interested in the probability that the connection will persist into the future given that it has been active for t seconds. That is, we would like to estimate the conditional probability

Lt PrfAt1 1jAt 1;1ttg: 1:9 Ltcan be expressed as

Let us ®rst compute Lt for light tails, in particular, distributions with asymptotically exponential tails PrfZ>xg c₁e c2x_{, where} _c

1;c2>0 are constants. The second term in Eq. (1.10) is computed by

PrfZ tg PrfZtg c₁e c2t _c 1e c2t1 c1e c2t 1 e c2

for larget, and we getLt e c2_{. Thus for exponentially light tails, prediction is} not enhanced by conditioningon ever longer periods of observed activity. For heavy tails, the correspondingderivations are

PrfZ tg PrfZtg ct a _c_t₁ a ct a 1 t t1 a ; which yields Lt %1; t! 1: 1:11

Thus the longer the period of observed activity, the more certain that it will persist into the future. In fact, it is straightforward to generalize Eq. (1.9) so that we can measure thepersistenceof activityd1 time units into the future, that is

Lt PrfAts 1;1sdjAt 1;1ttg:

This does not change the qualitative results: for the light-tailed case,Lt e c2d_;

for the heavy-tailed case,Lt's asymptotic behavior follows1d=t a%1. Since

1d=t ae ad=t_{, we observe that in both cases predictability is exponentially}

sensitive to the prediction interval d. However, in the heavy-tailed case, for any desired d time unit ``peek into the future,'' by conditioningthe prediction on a suf®ciently longpast observation of activity, the prediction error can be reduced to an arbitrarily small level.

We remark that the mathematical implications of asymptotic analysis need not deter from the practical relevance of its conclusions, even consideringthe fact that tails are always ®nite in a physical network environment. First, if heavy tails are modeled usingthe Pareto distribution, then its shape is hyperbolic across itsentire

rangeÐnot just asymptoticallyÐand accurate ®nitary computations can be carried out. Second, given an empirical distribution with ®nite support, the fact that it has a ®nite cut off point will not signi®cantly in¯uence the predictability computations carried out in practice as longas the tail is ``suf®ciently''Ðfor example, several orders of magnitude beyond the meanÐlong. As with time series, the identi®cation problem of whether an empirical distribution is best modeled by heavy-tailed or light-tailed distributions is intrinsically ill-posed and secondary to the fact that the predictability structure as computed by Eq. (1.10) fromempirical distributions is signi®cant.

1.4.2.3 Heavy Tails and Long-Range Dependence As we saw in the previous section, heavy tails lead to predictability, and for a related reason, they lead to long- range dependence in network traf®c. First, we give a de®nition of fractional Brownian motion (FBM) and its increment processÐfractional Gaussian noise (FGN)Ðwhich are Gaussian self-similar processes with, in general, long-range dependence, ®rst introduced by Mandelbrot [45]. Their Gaussian structure renders them especially useful asaggregatetraf®c models where aggregation of independent traf®c sourcesÐby the central limit theoremÐleads to the Gaussian property. In practice, of course, traf®c ¯ows need not be independent if they engage in feedback control and share common resources at bottleneck routers. The de®nitions of FBM and FGN are couched in the framework of distributional self-similarity given in Section 1.4.1.2.

De®nition 1.4.3 (FBM) Yt;t2R, is called fractional Brownian motion with parameterH;0<H <1;if Ytis Gaussian andH-sssi.

De®nition 1.4.4 (FGN) Xt;t2Z, is called fractional Gaussian noise with parameterHifX(t) is the increment process of FBM with parameterH.

By the de®nition of H-sssi, FBM reduces to Brownian motionÐand FGN to white Gaussian noiseÐwhen H1

2. Thus Xt;t2Z, becomes completely uncorrelated. Since Gaussian processes are characterized by their second-order structure, for each H;0<H<1, there is a unique Gaussian process that is the

In document Self Similar Network Traffic and Performance Evaluation pdf (Page 33-43)