• No results found

1.4.1 Self-Similar Processes and Long-Range Dependence

1.4.1.1 Second-Order Self-Similarity and Stationarity Consider a discrete time stochastic process or time seriesX…t†,t2Z, whereX…t†is interpreted as the traf®c volumeÐmeasured in packets, bytes, or bitsÐattime instancet. Of interest is also the interpretation thatX…t†is the total traf®c volumeup totimet, say, from time 0. To minimize confusion, when a ``cumulative'' view is taken, we will denote the process byY…t†. We will then reserveX…t†to be theincrement processcorresponding toY…t†, that is,X…t† ˆY…t† Y…t 1†.

For traf®c modelingpurposes, we would likeX…t†to be ``stationary'' in the sense that its behavior or structure is invariant with respect to shifts in time. In other words,

t's responsibility as anabsolute reference frame is relieved. Without some form of stationarity, ``anything'' is allowed and a model loses much of its usefulness as a compact description of (assumed) tractable phenomena.X…t†isstrictly stationaryif

…X…t1†;X…t2†;. . .;X…tn††and…X…t1‡k†;X…t2‡k†;. . .;X…tn‡k††possess the same joint distribution for alln2Z‡,t1;. . .;tn;k2Z. Denotingthek-shifted process or time seriesXk;X andXk are said to be equivalent in the sense of®nite-dimensional

disributions,X ˆd Xk. Imposingstrict stationarity, it turns out, is too restrictive and we will be interested in a weaker form of stationarityÐsecond-order stationarity7Ð

which requires that the autocovariance function g…r;s† ˆE‰…X…r† m†…X…s† m†Š

satis®es translation invariance, that is, g…r;s† ˆg…r‡k;s‡k† for all r;s;k2Z. 7Equivalent names areweak,covariance, andwide sensestationarity.

The ®rst two moments are assumed to exist and be ®nite, and we setmˆE‰X…t†Š, s2ˆE‰…X…t† m†2Šfor allt2Z. We will also assumemˆ0. Since, by stationarity, g…r;s† ˆg…r s;0†, we denote the autocovariance byg…k†.

To formulate scale invariance, ®rst de®ne the aggregated process X…m† ofX at aggregation levelm,

X…m†…i† ˆ1 m

Pmi tˆm…i 1†‡1X…t†:

That is, X…t† is partitioned into nonoverlappingblocks of size m, their values are averaged, andiis used to index these blocks. Letg…m†…k†denote the autocovariance function ofX…m†. Under the assumption of second-order stationarity we arrive at the followingde®nitions of second-order self-similarity.

De®nition 1.4.1 (Second-Order Self-Similarity) X…t† is exactly second-order self-similarwith Hurst parameterH (1=2<H<1) if

g…k† ˆs22……k‡1†2H 2k2H‡ …k 1†2H† …1:1†

for allk1.X…t†isasymptotically second-order self-similarif

lim m!1g

…m†…k† ˆs2

2 ……k‡1†2H 2k2H‡ …k 1†2H†: …1:2† It can be checked that Eq. (1.1) impliesg…k† ˆg…m†…k†for allm1. Thus, second- order self-similarity captures the property that the correlation structure is exactlyÐ condition (1.1)Ðor asymptoticallyÐthe weaker condition (1.2)Ðpreserved under time aggregation. The form of g…k† ˆ ……k‡1†2H 2k2H‡ …k 1†2H†s2=2 is not accidental and implies further structureÐlong-range dependenceÐto which we will return later. Second-order self-similarity (in the exact or asymptotic sense) has been a dominant framework for modelingnetwork traf®c and this is also re¯ected in the chapters of this book.

1.4.1.2 An Allegory into Distributional Self-Similarity To understand the parti- cular form ofg…k† in the de®nition of second-order self-similarity, we will make a short detour and discuss self-similar processes in slightly more generality. Further extensions and detailed treatments can be found in Beran [9] and Samorodnitsky and Taggu [60].

Consider the cumulative processY…t†, albeit in continuous timet2R. Following is a de®nition of self-similarity for continuous-time processes in the sense of ®nite- dimensional distributions.

De®nition 1.4.2 (H-ss) Y…t† isself-similar with self-similarity parameter, that is, Hurst parameter,H (0<H <1), denotedH-ss, if for alla>0 andt0,

Y…t† ˆd a HY…at†: …1:3† Thus Y…t† and its time scaled version Y…at†Ðafter normalizingby a HÐmust follow the same distribution. In the traf®c modelingcontext, it is convenient to think ofY…t†as the cumulative or total traf®c up to timet. Fora>1Ðtime is stretched or dilatedÐa contraction factor a H is applied to make the magnitude of Y…at† comparable to that of Y…t†. For a<1, the opposite holds true. As a varies, the scalingexponentH remains invariant. This is a most natural de®nition; however, it has an important drawback: unlessY…t†is degenerate, that isY…t† ˆ0 for allt2R,

Y…t†cannot be stationary due to the normalization factora H. Its increment process

X…t† ˆY…t† Y…t 1†, however, is another matter. In particular, consider the case whereY…t†isH-ss and hasstationary increments; in this case we sayY…t†isH-sssi. Let us further assume that Y…t† has ®nite variance. It can be checked that

E‰Y…t†Š ˆ0,E‰Y2…t†Š ˆs2jtj2H, and

g…k† ˆs22…jtj2H jt sj2H‡ jsj2H†: …1:4† This is achieved by notingthat8

Y…t† ˆd tHY…1†;

from which it follows E‰Y2…t†Š ˆs2t2H. The latter, then, can be used in the derivation of the autocovariance function (1.4). The increment process X…t† has mean 0 and autocovarianceg…k†as given in Eq (1.1). The derivation is similar to that ofY…t†.

How does distributional self-similarity (of a continuous time process) tie in with second-order self-similarity (of a discrete time process), which requires exact or asymptotic invariance with respect to second-order statistical structure of the aggregated time series X…m†? A key observation lies in notingthat X…m† can be viewed as computinga sample mean

X…m† ˆ1 m Pm tˆ1X…t† ˆm 1…Y…m† Y…0†† ˆd m 1mH…Y…1† Y…0†† ˆmH 1X:

Thus, ifY…t†is a H-sssi process then its increment processX…t†satis®es

X ˆdm1 HX…m†; …1:5†

8FromaHY…t†=

dY…at†, substitutetˆ1 andaˆt.

which shows howX…m† is related toX via a simple scalingrelationship involvingH in the sense of ®nite-dimensional distributions. Equations (1.1) and (1.2), then, express the fact thatX andm1 HX…m† are required to have exactly or asymptotically the same second-order structure. As a result, dependingon whether a discrete time processX…t†satis®es Eq. (1.5) for all m0 or only in the limit as m! 1,X…t†

is said to be exactly self-similar or asymptotically self-similar. Note that in the Gaussian case, this de®nition coincides with second-order self-similarity. As a lead-in to the role of the parameterH, recall that the variance of the sample meanZ of a random variableZsatis®es var…Z† ˆs2

Z=m, wheremis the sample size. From Eq. (1.5) it follows that var…X…m†† ˆs2m2H 2. When viewed as a sample mean where the samples are drawnindependently, var…X…m††reduces tos2m 1ifHˆ1

2. If

H6ˆ1

2, in particular,12<H<1, then

var…X…m†† ˆs2m b

with 0<b<1 (and H ˆ1 b=2), which hints at a certainÐand not just anyÐ

dependency structure in the ``samples'' (i.e., time series in our case) that causes var…X…m††to converge to zero slower than the rate m 1.

1.4.1.3 Long-Range Dependence Thus far we have focused on explicatingthe role of self-similarity in the second-order stationary and distributional senses with little regard to the role ofHand its range of values. Let us return to the de®nition of second-order self-similarity and its autocovarianceg…k†. Letr…k† ˆg…k†=s2 denote theautocorrelation function. For 0<H<1,H6ˆ1

2, it holds

r…k† H…2H 1†k2H 2; k! 1: …1:6†

In particular, if 1

2<H<1, r…k† asymptotically behaves as ck b for 0<b<1, wherec>0 is a constant,bˆ2 2H, and we have

P1

kˆ 1r…k† ˆ 1: …1:7†

That is, the autocorrelation function decays slowlyÐthat is, hyperbolicallyÐwhich is the essential property that causes it to be not summable. When r…k† decays hyperbolically such that condition (1.7) holds, we call the correspondingstationary processX…t†long-range dependent.X…t†isshort-range dependentif the autocorre- lation function is summable9. An essentially equivalent de®nition can be given in the 9Technically, more subtle de®nitions of long-range dependence are possible, but in this book, we will

frequency domain where the spectral density G…n† ˆ …2p† 1P1

kˆ 1r…k†eikn is required to satisfy the property

G…n† cjnj a; n!0:

Herec>0 is a constant and 0<aˆ2H 1<1. ThusG…n†diverges around the origin, implying ever larger contributions by low-frequency components.

Followingare some simple facts regardingthe value ofHand its impact onr…k†. First, ifH ˆ1

2, thenr…k† ˆ0, andX…t†is trivially short-range dependent by virtue of beingcompletely uncorrelated. In the case where 0<H<1

2, we have P1

kˆ 1r…k† ˆ0, an arti®cial condition rarely encountered in applications. H ˆ1 is uninterestingsince it leads to the degenerate situation r…k† ˆ1 for all k1. Finally,H-values bigger than 1 are prohibited due to the stationarity condition on

X…t†.

1.4.1.4 Self-Similarity Versus Long-Range Dependence The precedingdiscus- sion indicates that there are self-similar processes that are not long-range dependent, and vice versa. For example, Brownian motion is1

2-sssi with white Gaussian noise as its increment process, but the latter is not long-range dependent. Conversely, certain fractional ARIMA time series generate long-range dependence but they are not self- similar in the distributional sense. In the case of asymptotic second-order self- similarity, however, by the restriction 1

2<H<1 in the de®nition, self-similarity implies long-range dependence, and vice versa. It is for this reason and the fact that asymptotic second-order self-similar processes are employed as ``canonical'' traf®c models, that we sometimes use self-similarity and long-range dependence inter- changeably when the context does not lead to confusion.

1.4.2Impact of Heavy Tails

1.4.2.1 Heavy-Tailed Distribution There is an intimate relationship between heavy-tailed distributions and long-range dependence, which we will discuss in the next sections. First, a few de®nitions and basic facts. A random variableZhas a

heavy-tailed distributionif

PrfZ>xg cx a; x! 1; …1:8†

where 0<a<2 is called the tail index or shape parameter and c is a positive constant10. That is, the tail of the distribution, asymptotically, decays hyperbolically. This is in contrast to light-tailed distributionsÐfor example, exponential and GaussianÐwhich possess an exponentially decreasingtail. A distinguishingmark of heavy-tailed distributions is that they have in®nite variance for 0<a<2, and if 10Technically, more subtle de®nitions involvingslowly varyingfunctions are possible and can be found in

some chapters of this book. However, for practical purposes and to convey the main ideas, our working de®nition, centered around condition (1.8), will suf®ce.

0<a1, they also have an unbounded mean. In the networkingcontext, we will be primarily interested in the case 1<a<2. A frequently used heavy-tailed distribution is thePareto distributionwhose distribution function is given by

PrfZxg ˆ1 b

x a

; bx;

where 0<a<2 is the shape parameter andbis called thelocation parameter. The mean is given byab=…a 1†. We remark that there are distributionsÐfor example Weibull and lognormalÐthat have subexponentiallydecreasingtails but possess ®nite variance.

The main characteristic of a random variable obeyinga heavy-tailed distribution is that it exhibits extreme variability. Practically speaking, a heavy-tailed distribution gives rise to very large values with nonnegligible probability so that sampling from such a distribution results in the bulk of values being``small'' but a few samples having ``very'' large values. Not surprisingly, heavy-tailedness impacts sampling by slowingdown the convergence rate of the sample mean to the population mean, dilatingit as the tail indexaapproaches 1. For example, pendingon the sample size

m, the sample meanZmof a Pareto distributed random variableZ may signi®cantly deviate from the population meanak=…a 1†, oftentimes underestimatingit. In fact, the absolute estimation errorjZm E…Z†j asymptotically behaves asm…1=a† 1 (see, e.g., Crovella and Lipsky [15]), and thus foravalues close to 1, care must be given when samplingfrom heavy-tailed distributions such that conclusions about network behavior and performance attributable to samplingerror are not advanced. A more detailed discussion of samplingissues is given in Chapter 3.

1.4.2.2 Heavy Tails and Predictability Heavy-tailedness of certain network- related variablesÐfor example, ®le sizes and connection durationsÐcan be shown to underlie the root cause of long-range dependence and self-similarity in network traf®c. First, let us examine a simple fact on the intrinsic predictability associated with heavy-tailed random variables. LetZ be a heavy-tailed random variable interpreted as thedurationorlifetimeof a network connection (e.g., TCP connection, IP-¯ow, or session). Since connection durations are physically measurable events, assume that we observeÐin timeÐthat a connection has been active for t>0 seconds. To simplify the discussion, assume time is discrete (t2Z‡) andA:Z‡! f0;1gis an indicator function such thatA…t† ˆ1 iffZt. We are interested in the probability that the connection will persist into the future given that it has been active for t seconds. That is, we would like to estimate the conditional probability

L…t† ˆPrfA…t‡1† ˆ1jA…t† ˆ1;1ttg: …1:9† L…t†can be expressed as

Let us ®rst compute L…t† for light tails, in particular, distributions with asymptotically exponential tails PrfZ>xg c1e c2x, where c

1;c2>0 are constants. The second term in Eq. (1.10) is computed by

PrfZ ˆtg PrfZtg c1e c2t c 1e c2…t‡1† c1e c2t ˆ1 e c2

for larget, and we getL…t† e c2. Thus for exponentially light tails, prediction is not enhanced by conditioningon ever longer periods of observed activity. For heavy tails, the correspondingderivations are

PrfZ ˆtg PrfZtg ct a c…t‡1† a ct a ˆ1 t t‡1 a ; which yields L…t† %1; t! 1: …1:11†

Thus the longer the period of observed activity, the more certain that it will persist into the future. In fact, it is straightforward to generalize Eq. (1.9) so that we can measure thepersistenceof activityd1 time units into the future, that is

L…t† ˆPrfA…t‡s† ˆ1;1sdjA…t† ˆ1;1ttg:

This does not change the qualitative results: for the light-tailed case,L…t† e c2d;

for the heavy-tailed case,L…t†'s asymptotic behavior follows…1‡d=t† a%1. Since

…1‡d=t† ae ad=t, we observe that in both cases predictability is exponentially

sensitive to the prediction interval d. However, in the heavy-tailed case, for any desired d time unit ``peek into the future,'' by conditioningthe prediction on a suf®ciently longpast observation of activity, the prediction error can be reduced to an arbitrarily small level.

We remark that the mathematical implications of asymptotic analysis need not deter from the practical relevance of its conclusions, even consideringthe fact that tails are always ®nite in a physical network environment. First, if heavy tails are modeled usingthe Pareto distribution, then its shape is hyperbolic across itsentire

rangeÐnot just asymptoticallyÐand accurate ®nitary computations can be carried out. Second, given an empirical distribution with ®nite support, the fact that it has a ®nite cut off point will not signi®cantly in¯uence the predictability computations carried out in practice as longas the tail is ``suf®ciently''Ðfor example, several orders of magnitude beyond the meanÐlong. As with time series, the identi®cation problem of whether an empirical distribution is best modeled by heavy-tailed or light-tailed distributions is intrinsically ill-posed and secondary to the fact that the predictability structure as computed by Eq. (1.10) fromempirical distributions is signi®cant.

1.4.2.3 Heavy Tails and Long-Range Dependence As we saw in the previous section, heavy tails lead to predictability, and for a related reason, they lead to long- range dependence in network traf®c. First, we give a de®nition of fractional Brownian motion (FBM) and its increment processÐfractional Gaussian noise (FGN)Ðwhich are Gaussian self-similar processes with, in general, long-range dependence, ®rst introduced by Mandelbrot [45]. Their Gaussian structure renders them especially useful asaggregatetraf®c models where aggregation of independent traf®c sourcesÐby the central limit theoremÐleads to the Gaussian property. In practice, of course, traf®c ¯ows need not be independent if they engage in feedback control and share common resources at bottleneck routers. The de®nitions of FBM and FGN are couched in the framework of distributional self-similarity given in Section 1.4.1.2.

De®nition 1.4.3 (FBM) Y…t†;t2R, is called fractional Brownian motion with parameterH;0<H <1;if Y…t†is Gaussian andH-sssi.

De®nition 1.4.4 (FGN) X…t†;t2Z‡, is called fractional Gaussian noise with parameterHifX(t) is the increment process of FBM with parameterH.

By the de®nition of H-sssi, FBM reduces to Brownian motionÐand FGN to white Gaussian noiseÐwhen Hˆ1

2. Thus X…t†;t2Z‡, becomes completely uncorrelated. Since Gaussian processes are characterized by their second-order structure, for each H;0<H<1, there is a unique Gaussian process that is the