1.4.1 Self-Similar Processes and Long-Range Dependence
1.4.1.1 Second-Order Self-Similarity and Stationarity Consider a discrete time stochastic process or time seriesX t,t2Z, whereX tis interpreted as the traf®c volumeÐmeasured in packets, bytes, or bitsÐattime instancet. Of interest is also the interpretation thatX tis the total traf®c volumeup totimet, say, from time 0. To minimize confusion, when a ``cumulative'' view is taken, we will denote the process byY t. We will then reserveX tto be theincrement processcorresponding toY t, that is,X t Y t Y t 1.
For traf®c modelingpurposes, we would likeX tto be ``stationary'' in the sense that its behavior or structure is invariant with respect to shifts in time. In other words,
t's responsibility as anabsolute reference frame is relieved. Without some form of stationarity, ``anything'' is allowed and a model loses much of its usefulness as a compact description of (assumed) tractable phenomena.X tisstrictly stationaryif
X t1;X t2;. . .;X tnand X t1k;X t2k;. . .;X tnkpossess the same joint distribution for alln2Z,t1;. . .;tn;k2Z. Denotingthek-shifted process or time seriesXk;X andXk are said to be equivalent in the sense of®nite-dimensional
disributions,X d Xk. Imposingstrict stationarity, it turns out, is too restrictive and we will be interested in a weaker form of stationarityÐsecond-order stationarity7Ð
which requires that the autocovariance function g r;s E X r m X s m
satis®es translation invariance, that is, g r;s g rk;sk for all r;s;k2Z. 7Equivalent names areweak,covariance, andwide sensestationarity.
The ®rst two moments are assumed to exist and be ®nite, and we setmEX t, s2E X t m2for allt2Z. We will also assumem0. Since, by stationarity, g r;s g r s;0, we denote the autocovariance byg k.
To formulate scale invariance, ®rst de®ne the aggregated process X m ofX at aggregation levelm,
X m i 1 m
Pmi tm i 11X t:
That is, X t is partitioned into nonoverlappingblocks of size m, their values are averaged, andiis used to index these blocks. Letg m kdenote the autocovariance function ofX m. Under the assumption of second-order stationarity we arrive at the followingde®nitions of second-order self-similarity.
De®nition 1.4.1 (Second-Order Self-Similarity) X t is exactly second-order self-similarwith Hurst parameterH (1=2<H<1) if
g k s22 k12H 2k2H k 12H 1:1
for allk1.X tisasymptotically second-order self-similarif
lim m!1g
m k s2
2 k12H 2k2H k 12H: 1:2 It can be checked that Eq. (1.1) impliesg k g m kfor allm1. Thus, second- order self-similarity captures the property that the correlation structure is exactlyÐ condition (1.1)Ðor asymptoticallyÐthe weaker condition (1.2)Ðpreserved under time aggregation. The form of g k k12H 2k2H k 12Hs2=2 is not accidental and implies further structureÐlong-range dependenceÐto which we will return later. Second-order self-similarity (in the exact or asymptotic sense) has been a dominant framework for modelingnetwork traf®c and this is also re¯ected in the chapters of this book.
1.4.1.2 An Allegory into Distributional Self-Similarity To understand the parti- cular form ofg k in the de®nition of second-order self-similarity, we will make a short detour and discuss self-similar processes in slightly more generality. Further extensions and detailed treatments can be found in Beran [9] and Samorodnitsky and Taggu [60].
Consider the cumulative processY t, albeit in continuous timet2R. Following is a de®nition of self-similarity for continuous-time processes in the sense of ®nite- dimensional distributions.
De®nition 1.4.2 (H-ss) Y t isself-similar with self-similarity parameter, that is, Hurst parameter,H (0<H <1), denotedH-ss, if for alla>0 andt0,
Y t d a HY at: 1:3 Thus Y t and its time scaled version Y atÐafter normalizingby a HÐmust follow the same distribution. In the traf®c modelingcontext, it is convenient to think ofY tas the cumulative or total traf®c up to timet. Fora>1Ðtime is stretched or dilatedÐa contraction factor a H is applied to make the magnitude of Y at comparable to that of Y t. For a<1, the opposite holds true. As a varies, the scalingexponentH remains invariant. This is a most natural de®nition; however, it has an important drawback: unlessY tis degenerate, that isY t 0 for allt2R,
Y tcannot be stationary due to the normalization factora H. Its increment process
X t Y t Y t 1, however, is another matter. In particular, consider the case whereY tisH-ss and hasstationary increments; in this case we sayY tisH-sssi. Let us further assume that Y t has ®nite variance. It can be checked that
EY t 0,EY2 t s2jtj2H, and
g k s22 jtj2H jt sj2H jsj2H: 1:4 This is achieved by notingthat8
Y t d tHY 1;
from which it follows EY2 t s2t2H. The latter, then, can be used in the derivation of the autocovariance function (1.4). The increment process X t has mean 0 and autocovarianceg kas given in Eq (1.1). The derivation is similar to that ofY t.
How does distributional self-similarity (of a continuous time process) tie in with second-order self-similarity (of a discrete time process), which requires exact or asymptotic invariance with respect to second-order statistical structure of the aggregated time series X m? A key observation lies in notingthat X m can be viewed as computinga sample mean
X m 1 m Pm t1X t m 1 Y m Y 0 d m 1mH Y 1 Y 0 mH 1X:
Thus, ifY tis a H-sssi process then its increment processX tsatis®es
X dm1 HX m; 1:5
8FromaHY t=
dY at, substitutet1 andat.
which shows howX m is related toX via a simple scalingrelationship involvingH in the sense of ®nite-dimensional distributions. Equations (1.1) and (1.2), then, express the fact thatX andm1 HX m are required to have exactly or asymptotically the same second-order structure. As a result, dependingon whether a discrete time processX tsatis®es Eq. (1.5) for all m0 or only in the limit as m! 1,X t
is said to be exactly self-similar or asymptotically self-similar. Note that in the Gaussian case, this de®nition coincides with second-order self-similarity. As a lead-in to the role of the parameterH, recall that the variance of the sample meanZ of a random variableZsatis®es var Z s2
Z=m, wheremis the sample size. From Eq. (1.5) it follows that var X m s2m2H 2. When viewed as a sample mean where the samples are drawnindependently, var X mreduces tos2m 1ifH1
2. If
H61
2, in particular,12<H<1, then
var X m s2m b
with 0<b<1 (and H 1 b=2), which hints at a certainÐand not just anyÐ
dependency structure in the ``samples'' (i.e., time series in our case) that causes var X mto converge to zero slower than the rate m 1.
1.4.1.3 Long-Range Dependence Thus far we have focused on explicatingthe role of self-similarity in the second-order stationary and distributional senses with little regard to the role ofHand its range of values. Let us return to the de®nition of second-order self-similarity and its autocovarianceg k. Letr k g k=s2 denote theautocorrelation function. For 0<H<1,H61
2, it holds
r k H 2H 1k2H 2; k! 1: 1:6
In particular, if 1
2<H<1, r k asymptotically behaves as ck b for 0<b<1, wherec>0 is a constant,b2 2H, and we have
P1
k 1r k 1: 1:7
That is, the autocorrelation function decays slowlyÐthat is, hyperbolicallyÐwhich is the essential property that causes it to be not summable. When r k decays hyperbolically such that condition (1.7) holds, we call the correspondingstationary processX tlong-range dependent.X tisshort-range dependentif the autocorre- lation function is summable9. An essentially equivalent de®nition can be given in the 9Technically, more subtle de®nitions of long-range dependence are possible, but in this book, we will
frequency domain where the spectral density G n 2p 1P1
k 1r keikn is required to satisfy the property
G n cjnj a; n!0:
Herec>0 is a constant and 0<a2H 1<1. ThusG ndiverges around the origin, implying ever larger contributions by low-frequency components.
Followingare some simple facts regardingthe value ofHand its impact onr k. First, ifH 1
2, thenr k 0, andX tis trivially short-range dependent by virtue of beingcompletely uncorrelated. In the case where 0<H<1
2, we have P1
k 1r k 0, an arti®cial condition rarely encountered in applications. H 1 is uninterestingsince it leads to the degenerate situation r k 1 for all k1. Finally,H-values bigger than 1 are prohibited due to the stationarity condition on
X t.
1.4.1.4 Self-Similarity Versus Long-Range Dependence The precedingdiscus- sion indicates that there are self-similar processes that are not long-range dependent, and vice versa. For example, Brownian motion is1
2-sssi with white Gaussian noise as its increment process, but the latter is not long-range dependent. Conversely, certain fractional ARIMA time series generate long-range dependence but they are not self- similar in the distributional sense. In the case of asymptotic second-order self- similarity, however, by the restriction 1
2<H<1 in the de®nition, self-similarity implies long-range dependence, and vice versa. It is for this reason and the fact that asymptotic second-order self-similar processes are employed as ``canonical'' traf®c models, that we sometimes use self-similarity and long-range dependence inter- changeably when the context does not lead to confusion.
1.4.2Impact of Heavy Tails
1.4.2.1 Heavy-Tailed Distribution There is an intimate relationship between heavy-tailed distributions and long-range dependence, which we will discuss in the next sections. First, a few de®nitions and basic facts. A random variableZhas a
heavy-tailed distributionif
PrfZ>xg cx a; x! 1; 1:8
where 0<a<2 is called the tail index or shape parameter and c is a positive constant10. That is, the tail of the distribution, asymptotically, decays hyperbolically. This is in contrast to light-tailed distributionsÐfor example, exponential and GaussianÐwhich possess an exponentially decreasingtail. A distinguishingmark of heavy-tailed distributions is that they have in®nite variance for 0<a<2, and if 10Technically, more subtle de®nitions involvingslowly varyingfunctions are possible and can be found in
some chapters of this book. However, for practical purposes and to convey the main ideas, our working de®nition, centered around condition (1.8), will suf®ce.
0<a1, they also have an unbounded mean. In the networkingcontext, we will be primarily interested in the case 1<a<2. A frequently used heavy-tailed distribution is thePareto distributionwhose distribution function is given by
PrfZxg 1 b
x a
; bx;
where 0<a<2 is the shape parameter andbis called thelocation parameter. The mean is given byab= a 1. We remark that there are distributionsÐfor example Weibull and lognormalÐthat have subexponentiallydecreasingtails but possess ®nite variance.
The main characteristic of a random variable obeyinga heavy-tailed distribution is that it exhibits extreme variability. Practically speaking, a heavy-tailed distribution gives rise to very large values with nonnegligible probability so that sampling from such a distribution results in the bulk of values being``small'' but a few samples having ``very'' large values. Not surprisingly, heavy-tailedness impacts sampling by slowingdown the convergence rate of the sample mean to the population mean, dilatingit as the tail indexaapproaches 1. For example, pendingon the sample size
m, the sample meanZmof a Pareto distributed random variableZ may signi®cantly deviate from the population meanak= a 1, oftentimes underestimatingit. In fact, the absolute estimation errorjZm E Zj asymptotically behaves asm 1=a 1 (see, e.g., Crovella and Lipsky [15]), and thus foravalues close to 1, care must be given when samplingfrom heavy-tailed distributions such that conclusions about network behavior and performance attributable to samplingerror are not advanced. A more detailed discussion of samplingissues is given in Chapter 3.
1.4.2.2 Heavy Tails and Predictability Heavy-tailedness of certain network- related variablesÐfor example, ®le sizes and connection durationsÐcan be shown to underlie the root cause of long-range dependence and self-similarity in network traf®c. First, let us examine a simple fact on the intrinsic predictability associated with heavy-tailed random variables. LetZ be a heavy-tailed random variable interpreted as thedurationorlifetimeof a network connection (e.g., TCP connection, IP-¯ow, or session). Since connection durations are physically measurable events, assume that we observeÐin timeÐthat a connection has been active for t>0 seconds. To simplify the discussion, assume time is discrete (t2Z) andA:Z! f0;1gis an indicator function such thatA t 1 iffZt. We are interested in the probability that the connection will persist into the future given that it has been active for t seconds. That is, we would like to estimate the conditional probability
L t PrfA t1 1jA t 1;1ttg: 1:9 L tcan be expressed as
Let us ®rst compute L t for light tails, in particular, distributions with asymptotically exponential tails PrfZ>xg c1e c2x, where c
1;c2>0 are constants. The second term in Eq. (1.10) is computed by
PrfZ tg PrfZtg c1e c2t c 1e c2 t1 c1e c2t 1 e c2
for larget, and we getL t e c2. Thus for exponentially light tails, prediction is not enhanced by conditioningon ever longer periods of observed activity. For heavy tails, the correspondingderivations are
PrfZ tg PrfZtg ct a c t1 a ct a 1 t t1 a ; which yields L t %1; t! 1: 1:11
Thus the longer the period of observed activity, the more certain that it will persist into the future. In fact, it is straightforward to generalize Eq. (1.9) so that we can measure thepersistenceof activityd1 time units into the future, that is
L t PrfA ts 1;1sdjA t 1;1ttg:
This does not change the qualitative results: for the light-tailed case,L t e c2d;
for the heavy-tailed case,L t's asymptotic behavior follows 1d=t a%1. Since
1d=t ae ad=t, we observe that in both cases predictability is exponentially
sensitive to the prediction interval d. However, in the heavy-tailed case, for any desired d time unit ``peek into the future,'' by conditioningthe prediction on a suf®ciently longpast observation of activity, the prediction error can be reduced to an arbitrarily small level.
We remark that the mathematical implications of asymptotic analysis need not deter from the practical relevance of its conclusions, even consideringthe fact that tails are always ®nite in a physical network environment. First, if heavy tails are modeled usingthe Pareto distribution, then its shape is hyperbolic across itsentire
rangeÐnot just asymptoticallyÐand accurate ®nitary computations can be carried out. Second, given an empirical distribution with ®nite support, the fact that it has a ®nite cut off point will not signi®cantly in¯uence the predictability computations carried out in practice as longas the tail is ``suf®ciently''Ðfor example, several orders of magnitude beyond the meanÐlong. As with time series, the identi®cation problem of whether an empirical distribution is best modeled by heavy-tailed or light-tailed distributions is intrinsically ill-posed and secondary to the fact that the predictability structure as computed by Eq. (1.10) fromempirical distributions is signi®cant.
1.4.2.3 Heavy Tails and Long-Range Dependence As we saw in the previous section, heavy tails lead to predictability, and for a related reason, they lead to long- range dependence in network traf®c. First, we give a de®nition of fractional Brownian motion (FBM) and its increment processÐfractional Gaussian noise (FGN)Ðwhich are Gaussian self-similar processes with, in general, long-range dependence, ®rst introduced by Mandelbrot [45]. Their Gaussian structure renders them especially useful asaggregatetraf®c models where aggregation of independent traf®c sourcesÐby the central limit theoremÐleads to the Gaussian property. In practice, of course, traf®c ¯ows need not be independent if they engage in feedback control and share common resources at bottleneck routers. The de®nitions of FBM and FGN are couched in the framework of distributional self-similarity given in Section 1.4.1.2.
De®nition 1.4.3 (FBM) Y t;t2R, is called fractional Brownian motion with parameterH;0<H <1;if Y tis Gaussian andH-sssi.
De®nition 1.4.4 (FGN) X t;t2Z, is called fractional Gaussian noise with parameterHifX(t) is the increment process of FBM with parameterH.
By the de®nition of H-sssi, FBM reduces to Brownian motionÐand FGN to white Gaussian noiseÐwhen H1
2. Thus X t;t2Z, becomes completely uncorrelated. Since Gaussian processes are characterized by their second-order structure, for each H;0<H<1, there is a unique Gaussian process that is the