• No results found

Generation and validation of empirically-derived TCP application workloads

N/A
N/A
Protected

Academic year: 2020

Share "Generation and validation of empirically-derived TCP application workloads"

Copied!
332
0
0

Loading.... (view fulltext now)

Full text

(1)

TCP APPLICATIONWORKLOADS

Felix Hernandez-Campos

A dissertation submitted to the faulty of the Universityof North Carolina at ChapelHill in

partialfulllmentoftherequirementsforthedegreeofDotorofPhilosophyintheDepartment

of ComputerSiene.

ChapelHill

2006

Approved by:

Advisor: Kevin Jeay

Reader: F. DonelsonSmith

Reader: KetanMayer-Patel

Reader: J. Steve Marron

Reader: Andrew Nobel

(2)

2006

Felix Hernandez-Campos

(3)

F

ELIXHERN

ANDEZ-CAMPOS: Generationand Validationof

Empirially-Derived TCPAppliationWorkloads.

(Underthediretion ofKevin Jeay)

This dissertation proposes and evaluates a new approah for generating realisti traÆ

in networking experiments. The main problem solved by our approah is generating

losed-loop traÆ onsistent with the behavior of the entire set of appliations in modern traÆ

mixes. Unlikeearlierapproahes,whihdesribedindividualappliationsintermsofthespei

semantis of eah appliation, we desribe the soure behavior driving eah onnetion in a

generi manner using the a-b-t model. This model provides an intuitive but detailedway of

desribingsourebehaviorintermsofonnetion vetorsthatapturethesizesandorderingof

appliationdataunits,thequiettimes betweenthem, andwhetherdataexhangeissequential

or onurrent. This is onsistent with the view of traÆ from TCP, whih does not onern

itself withappliationsemantis.

Thea-b-t modelalsosatisesaruialproperty: givena paketheader traeolletedfrom

an arbitrary Internetlink, we an algorithmially inferthe soure-level behavior driving eah

onnetion,andastitintothenotationofthemodel. Theresultofpaketheaderproessingis

aolletion of a-b-tonnetion vetors, whihan thenbe replayed insoftwaresimulators and

testbedexperimentstodrivenetworkstaks. SuhareplaygeneratessynthetitraÆthatfully

preserves the feedbak loop between the TCP endpointsand the state of the network, whih

is essentialin experiments where network ongestionan our. By onstrution, this type of

traÆ generationis fully reproduible,providing a solid foundation foromparative empirial

studies.

Our experimental work demonstrates the high qualityof thegenerated traÆ, by diretly

(4)

onnetion,and theirreprodutiontogether withtheorresponding sourebehavior. Ournal

ontributiononsistsoftworesamplingmethodsforintroduingontrolledvariabilityinnetwork

(5)

Firstofall,ImustthankKevinJeayandDonSmithfortheirguidaneandenouragement

throughoutmydotoralprogram. Theirpatieneand friendshiphave beeninvaluableallthese

years. Ialso thank them, togetherwith other faultyand student membersof theDistributed

and Real-Time Systems group (DiRT), for buildinga phenomenal infrastruture for Internet

measurement and experimental networking researh. DiRT students have greatly ontributed

to mydotoralexperiene,mostespeiallyJayAikat andDavid Ott.

My ommittee members and other ollaborators have ontributed tremendously to my

ef-forts. IamspeiallyindebtwithSteveMarron and AndrewNobel, whohave greatlyenrihed

the statistial sideof my work. In thisregard, being part of SAMSI's \Network Modelingfor

theInternet" program and of theinter-disiplinaryInternetstudy group at UNC gave me

su-perbopportunitiesto widen my understandingof Internetresearh. I mustalso thank UNC's

DepartmentofComputerSieneaswhole,inludingfaulty, studentsandsta,forreatingan

outstandingresearh and teahing environment. Overall, myyears at UNC were an inredible

positiveexperiene.

I thank the National SieneFoundation, IBM, Ciso,Intel, SunMirosystemsand others

forsupportingthiswork. IamspeiallygratefultotheComputerMeasurementGroup(CMG)

fortheirdotoralfellowship.

Finally, I thank my family for their support. Their onstant example of hard-work, and

theirrespetforintelletualendeavors hasmotivated meduringmyentirelife. Mywife'shelp

withthe editingof thismanusriptwasinvaluable,aswas heronstant enouragement during

mygraduatestudies. More thananybodyelse,myparentsgave memypassion forknowledge,

(6)

LIST OF TABLES x

LIST OF FIGURES xi

LIST OF ABBREVIATIONS xxiii

1 Introdution 1

1.1 Abstrat Soure-Level Modeling. . . 4

1.2 Soure-Level Trae Replay . . . 12

1.3 TraeResampling and LoadSaling . . . 15

1.4 ThesisStatement . . . 17

1.5 Contributions . . . 18

1.6 Overview . . . 20

2 Related Work 22 2.1 Paket-Level TraÆ Generation . . . 23

2.2 Soure-Level TraÆ Generation . . . 27

2.2.1 Web TraÆ Modeling . . . 31

2.2.2 Non-Web TraÆ Soure-levelModeling . . . 35

2.2.3 Beyond Single AppliationModeling . . . 38

2.3 Saling OeredLoad . . . 40

2.4 Implementing TraÆ Generation . . . 41

2.5 Summary . . . 42

(7)

3.1.2 Beyond Client/ServerAppliations . . . 57

3.2 The Conurrent a-b-t Model . . . 60

3.3 Abstrat Soure-Level Measurement . . . 63

3.3.1 FromTCP SequeneNumbers to AppliationDataUnits . . . 63

3.3.2 Logial Order ofData Segments . . . 67

3.3.3 Data Analysis Algorithm . . . 71

3.4 ValidationusingSynthetiAppliations . . . 77

3.5 Analysis Results . . . 82

3.5.1 VariabilityAross Sites . . . 86

3.5.2 Time-of-DayVariabilityand Workload Diretionality. . . 95

3.6 Summary . . . 99

4 Network-Level Parameters and Metris 104 4.1 Network-level Parameters . . . 106

4.1.1 Round-TripTime . . . 106

4.1.2 ReeiverWindowSize . . . 121

4.1.3 Loss Rate . . . 124

4.2 Network-level Metris . . . 129

4.2.1 Aggregate ThroughputTime Series. . . 130

4.2.2 ThroughputMarginals . . . 141

4.2.3 ThroughputSelf-Similarityand Long-RangeDependene . . . 149

4.2.4 Time Series ofAtive Connetions . . . 157

4.3 Summary . . . 160

5 Generating TraÆ 165 5.1 ReplayingTraes at theSoure-Level . . . 165

5.1.1 Trae Partitioning . . . 168

5.1.2 Conduting Experiments. . . 169

(8)

5.2.1 Leipzig-II . . . 174

5.2.2 UNC 1 PM . . . 182

5.2.3 Abilene-I . . . 187

5.3 Summary . . . 192

6 Reproduing TraÆ 195 6.1 Beyond Comparing ConnetionVetors . . . 196

6.2 Soure-levelReplayof Leipzig-II . . . 200

6.2.1 Time Series ofByte Throughput . . . 200

6.2.2 Time Series ofPaket Throughput . . . 203

6.2.3 Marginal Distributions . . . 206

6.2.4 Long-Range Dependene . . . 211

6.2.5 Time Series ofAtive Connetions . . . 214

6.3 Soure-levelReplayof UNC1 PM . . . 217

6.3.1 Time Series ofByte Throughput . . . 217

6.3.2 Time Series ofPaket Throughput . . . 220

6.3.3 Marginal Distributions . . . 221

6.3.4 Long-Range Dependene . . . 224

6.3.5 Time Series ofAtive Connetions . . . 229

6.4 Mid-ChapterReview . . . 229

6.4.1 Observations onByteThroughput . . . 230

6.4.2 Observations onPaket Throughput . . . 232

6.4.3 Observations onAtive Connetions . . . 233

6.5 Soure-levelReplayof UNC1 AM . . . 235

6.5.1 Time Series ofByte Throughput . . . 235

6.5.2 Time Series ofPaket Throughput . . . 235

6.5.3 Marginal Distributions . . . 237

(9)

6.6 Soure-levelReplayof UNC7:30 PM . . . 243

6.6.1 Time Series ofByte Throughput . . . 243

6.6.2 Time Series ofPaket Throughput . . . 244

6.6.3 Marginal Distributions . . . 245

6.6.4 Long-Range Dependene . . . 249

6.6.5 Time Series ofAtive Connetions . . . 251

6.7 Soure-levelReplayof Abilene-I. . . 252

6.7.1 Time Series ofByte Throughput . . . 252

6.7.2 Time Series ofPaket Throughput . . . 253

6.7.3 Marginal Distributions . . . 254

6.7.4 Long-Range Dependene . . . 256

6.7.5 Time Series ofAtive Connetions . . . 259

6.8 Summary . . . 260

7 Trae Resampling and Load Saling 261 7.1 PoissonResampling . . . 265

7.1.1 Basi PoissonResampling . . . 265

7.1.2 Byte-Driven PoissonResampling . . . 271

7.2 BlokResampling . . . 275

7.3 Summary . . . 287

8 Conlusions and Future Work 288 8.1 EmpirialModelingof TraÆ Mixes . . . 289

8.2 Reningand ExtendingourModeling . . . 291

8.3 Assessing RealisminSyntheti TraÆ . . . 294

8.4 InorporatingAdditionalNetwork-Level Parameter . . . 296

8.5 FlexibleTraÆ Generation . . . 298

(10)

3.1 Breakdownof theTCP onnetionsfoundinve traes. . . 82

4.1 EstimatedHurstparametersandtheirondeneintervalsforthepaket

through-puttimeseriesof ve traes. . . 156

4.2 EstimatedHurstparametersandtheirondeneintervalsforthebyte

through-puttimeseriesof ve traes. . . 156

6.1 EstimatedHurstparametersandtheirondeneintervalsforthebyte

through-puttimeseriesof Leipzig-IIand its fourtypesof soure-leveltrae replay. . . 212

6.2 EstimatedHurstparametersandtheirondeneintervalsforthepaket

through-puttimeseriesof Leipzig-IIand its fourtypesof soure-leveltrae replay. . . 215

6.3 EstimatedHurstparametersandtheirondeneintervalsforthebyte

through-puttimeseriesof UNC 1PM and its fourtypesof soure-leveltrae replay. . . . 225

6.4 EstimatedHurstparametersandtheirondeneintervalsforthepaket

through-puttimeseriesof UNC 1PM and its fourtypesof soure-leveltrae replay. . . . 228

6.5 EstimatedHurstparametersandtheirondeneintervalsforthebyte

through-puttimeseriesof UNC 1AM and its fourtypesof soure-leveltrae replay. . . . 240

6.6 EstimatedHurstparametersandtheirondeneintervalsforthepaket

through-puttimeseriesof UNC 1AM and its fourtypesof soure-leveltrae replay. . . . 241

6.7 EstimatedHurstparametersandtheirondeneintervalsforthebyte

through-puttimeseriesof UNC 7:30PM and its fourtypesof soure-leveltrae replay. . 248

6.8 EstimatedHurstparametersandtheirondeneintervalsforthepaket

through-puttimeseriesof UNC 7:30PM and its fourtypesof soure-leveltrae replay. . 250

6.9 EstimatedHurstparametersandtheirondeneintervalsforthebyte

through-puttimeseriesof Abilene-Iand its fourtypesofsoure-level trae replay. . . 257

6.10 EstimatedHurstparametersandtheirondeneintervalsforthepaket

through-puttimeseriesof Abilene-Iand its fourtypesofsoure-level trae replay. . . 258

7.1 Estimated Hurst parameters and their ondene intervals for the onnetion

arrivaltimeseriesofUNC 1 PMand UNC 1 AM,and theirPoissonarrivalts. . 275

7.2 EstimatedHurstparametersand theirondeneintervalsforvesubsamplings

(11)

1.1 Network traÆseen fromdierent levels. . . 4

1.2 An a-b-t diagram illustratinga persistent HTTPonnetion. . . 8

1.3 A diagram illustratingthe interationbetweentwo BitTorrent peers. . . 10

1.4 Overviewof Soure-levelTraeReplay. . . 12

3.1 An a-b-t diagram representing a typialADU exhange inHTTP version1.0. . . 48

3.2 An a-b-t diagram illustratinga persistent HTTPonnetion. . . 49

3.3 An a-b-t diagram illustratingan SMTP onnetion.. . . 53

3.4 Threea-b-t diagramsrepresenting three dierenttypesofNNTPinterations. . . 54

3.5 An a-b-t diagram illustrating a server push from a webam using a persistent HTTP onnetion. . . 57

3.6 An a-b-t diagram illustratingIeast audiostreaming inaTCP onnetion. . . . 58

3.7 Three a-b-t diagrams of onnetions taking part in the interation between an FTPlientand an FTPserver. . . 58

3.8 An a-b-t diagram illustrating an NNTP onnetion in \stream-mode", whih exhibits dataexhange onurreny. . . 60

3.9 An a-b-t diagram illustratingthe interationbetweentwo BitTorrent peers. . . . 60

3.10 A rst set of TCP segments for the onnetion vetor in Figure 3.1: lossless example. . . 64

3.11 A seond set of TCP segments for the onnetion vetor in Figure 3.1: lossy example. . . 66

3.12 DistributionsofADUsizesforthetestbedexperimentswithsynthetiappliations. 79 3.13 Distributionsofquiettime durationsforthetestbedexperimentswithsyntheti appliations.. . . 79

3.14 DistributionsofADUsizesforthetestbedexperimentswithsynthetiappliations. 81 3.15 Distributionsofquiettime durationsforthetestbedexperimentswithsyntheti appliations.. . . 81

(12)

3.18 Bodies of the A and B distributions with per-byte probabilities for Abilene-I,

Leipzig-IIand UNC 1 PM . . . 88

3.19 Bodiesof theE distributionsforAbilene-I, Leipzig-IIand UNC 1PM. . . 88

3.20 BodiesoftheE distributionswithper-byteprobabilitiesforAbilene-I,Leipzig-II

and UNC 1PM. . . 89

3.21 Tailsofthe E distributionsforAbilene-I,Leipzig-IIandUNC 1 PM. . . 89

3.22 Average sizeoftheepohsineahonnetionvetor asafuntionofthenumber

of epohsforAbilene-I, Leipzig-IIand UNC1 PM. . . 90

3.23 Average of themediansizeoftheADUs ineah onnetionvetor asa funtion

of thenumberofepohsforAbilene-I, Leipzig-IIand UNC 1 PM.. . . 90

3.24 Average of themediansizeoftheADUs ineah onnetionvetor asa funtion

of thenumberofepohs, forLeipzig-II.. . . 91

3.25 Average of themediansizeoftheADUs ineah onnetionvetor asa funtion

of thenumberofepohsforAbilene-I. . . 91

3.26 Bodiesof theTA and TB distributionsforAbilene-I, Leipzig-IIand UNC 1PM. 92

3.27 Tailsofthe TA andTB distributionsforAbilene-I,Leipzig-IIand UNC 1 PM. . 92

3.28 Distribution of the durations of the quiet times between the nal ADU and

onnetion termination. . . 93

3.29 BodiesoftheAand B distributionsfortheonurrent onnetionsinAbilene-I,

Leipzig-IIand UNC 1 PM.. . . 94

3.30 Tails of the A and B distributions forthe onurrent onnetions in Abilene-I,

Leipzig-IIand UNC 1 PM.. . . 94

3.31 BodiesoftheTAandTB distributionsfortheonurrentonnetionsin

Abilene-I, Leipzig-IIand UNC 1 PM. . . 95

3.32 TailsoftheTAandTB distributionsfortheonurrentonnetionsinAbilene-I,

Leipzig-IIand UNC 1 PM.. . . 95

3.33 Bodiesof theA distributionsforUNC 1 AM,UNC 1 PMand UNC 7:30PM. . . 96

3.34 Bodiesof theB distributionsforUNC 1 AM,UNC 1 PMand UNC 7:30PM. . . 96

3.35 Bodiesof theTB distributionsforUNC 1 AM,UNC 1PM and UNC7:30 PM. . 97

3.36 Tailsofthe TB distributionsforUNC1 AM,UNC 1 PM andUNC 7:30 PM. . . 97

(13)

4.1 AsetofTCPsegmentsillustratingRTTestimationfromonnetionestablishment.109

4.2 Two sets of TCPsegmentsillustratingRTTestimationambiguities inthe

pres-eneof lossand early retransmissioninonnetion establishment. . . 110

4.3 A setof TCPsegmentsillustratingRTT estimationusingthe sumof two OSTTs.111

4.4 A set of TCP segments illustrating the impat of delayed aknowledgments on

OSTTs. . . 113

4.5 ComparisonofRTTestimatorsforasynthetitrae: nolossandenableddelayed

aknowledgments.. . . 115

4.6 ComparisonofRTTestimatorsforasynthetitrae: nolossanddisableddelayed

aknowledgments.. . . 115

4.7 ComparisonofRTTestimatorsforasynthetitrae: xedlossrateof 1%forall

onnetions. . . 117

4.8 Comparison of RTT estimators for a syntheti trae: loss rates uniformly

dis-tributed between0%and 10%. . . 117

4.9 AsetofTCPsegmentsillustratinganinvalidOSTTsampleduetotheinteration

betweenlossand umulativeaknowledgments. . . 118

4.10 Comparison of RTT estimators for a syntheti trae: loss rates uniformly

dis-tributed between0%and 10%. . . 119

4.11 Comparison of RTT estimators for syntheti traes: xed lossrate of 1%; real

RTTsup to 4 seonds. . . 119

4.12 Bodiesof theRTTdistributionsfortheve traes. . . 120

4.13 Bodiesof theRTTdistributionswithper-byte probabilitiesfortheve traes.. . 120

4.14 Comparisonofthesum-of-minimaandsum-of-mediansRTTestimatorsforUNC

1 PM. . . 121

4.15 Comparisonofthesum-of-minimaandsum-of-mediansRTTestimatorsfor

Leipzig-II. . . 121

4.16 Bodiesof thedistributionsofmaximumreeiverwindowsizes fortheve traes. 123

4.17 Bodies of the distributions of maximum reeiver window sizes with per-byte

probabilitiesfortheve traes. . . 123

4.18 Measured loss rates from experiments with 1% loss rates applied only on one

diretionoron bothdiretions oftheTCPonnetions. . . 126

(14)

traes. . . 129

4.21 Breakdownof thebyte throughputtimeseriesforLeipzig-II inbound. . . 131

4.22 Breakdownof thepaket throughputtimeseriesforLeipzig-IIinbound. . . 131

4.23 Breakdownof thebyte throughputtimeseriesforLeipzig-II outbound. . . 133

4.24 Breakdownof thepaket throughputtimeseriesforLeipzig-IIoutbound. . . 133

4.25 Breakdownof thebyte throughputtimeseriesforLeipzig-II outbound. . . 134

4.26 Breakdownof thepaket throughputtimeseriesforLeipzig-IIoutbound. . . 134

4.27 Breakdownof thebyte throughputtimeseriesforAbilene-IIpls/Clev. . . 135

4.28 Breakdownof thepaket throughputtimeseriesforAbilene-IIpls/Clev. . . 135

4.29 Breakdownof thebyte throughputtimeseriesforAbilene-IClev/Ipls. . . 137

4.30 Breakdownof thepaket throughputtimeseriesforAbilene-IClev/Ipls. . . 137

4.31 Breakdownof thebyte throughputtimeseriesforUNC 1 PMinbound. . . 138

4.32 Breakdownof thepaket throughputtimeseriesforUNC 1 PMinbound. . . 138

4.33 Breakdownof thebyte throughputtimeseriesforUNC 1 PMoutbound. . . 138

4.34 Breakdownof thepaket throughputtimeseriesforUNC 1 PMoutbound. . . . 138

4.35 Breakdownof thebyte throughputtimeseriesforthethree UNC traes. . . 140

4.36 Breakdownof thepaket throughputtimeseriesforthethree UNC traes. . . . 140

4.37 BytethroughputmarginalsofLeipzig-IIinbound,itsnormaldistributiont,the marginaldistributionof itsPoissonarrivalt, andthenormaldistributiont of thisPoissonarrivalt. . . 141

4.38 Paket throughput marginals of Leipzig-II inbound, its normal distribution t, the marginal distribution of its Poisson arrival t, and the normal distribution t of thisPoisson arrivalt. . . 142

4.39 Bytethroughput marginalsof UNC 1PM outbound,its normaldistributiont, the marginal distribution of its Poisson arrival t, and the normal distribution t of thisPoisson arrivalt. . . 143

(15)

Leipzig-II inbound. The top four plots show byte throughput, while the four

bottom plotsshow paket throughput. . . 146

4.42 Quantile-quantileplotswithsimulationenvelopsforthemarginaldistributionof UNC 1 PMoutbound. The topfourplotsshow byte throughput,whilethefour bottom plotsshow paket throughput. . . 147

4.43 Wavelet spetra ofthepaket throughputtimeseriesforLeipzig-IIinboundand its Poissonarrivalt. . . 153

4.44 Wavelet spetra of the byte throughput time series for Leipzig-II inbound and its Poissonarrivalt. . . 153

4.45 Wavelet spetraof thepaket throughputtimeseriesforAbilene-I. . . 154

4.46 Wavelet spetraof thebyte throughputtimeseriesforAbilene-I. . . 154

4.47 Wavelet spetraof thepaket throughputtimeseriesforUNC 1 PM. . . 155

4.48 Wavelet spetraof thebyte throughputtimeseriesforUNC 1 PM. . . 155

4.49 Breakdownof theative onnetionstimeseriesforLeipzig-II. . . 157

4.50 Impat of thedenitionofativeonnetion on Leipzig-II. . . 157

4.51 Breakdownof theative onnetionstimeseriesforAbilene-I. . . 158

4.52 Impat of thedenitionofativeonnetion on Abilene-I. . . 158

4.53 BreakdownofativeonnetionstimeseriesforUNC1PMusingbothdenitions of ative onnetion. . . 159

4.54 Impatofthetime-of-dayontheativeonnetionstimeseriesforthethreeUNC traes. . . 159

5.1 Overviewof Soure-levelTraeReplay. . . 166

5.2 Diagramof thenetwork testbed wheretheexperimentsof thisdissertationwere onduted.. . . 167

5.3 End-hostarhitetureof thetraÆgenerationsystem. . . 169

5.4 Bodies and tails of the A distributions for Leipzig-II and its soure-level trae replays. . . 174

5.5 Bodies and tails of the B distributions for Leipzig-II and its soure-level trae replays. . . 175

(16)

replays. . . 177

5.8 Bodiesand tails of theTB distributionsforLeipzig-II andits soure-level trae

replays. . . 179

5.9 Bodiesoftheround-triptimeandreeiverwindowsizedistributionsforLeipzig-II

and its soure-leveltrae replays. . . 180

5.10 BodiesthelossratedistributionsforLeipzig-IIandits soure-leveltrae replays,

with probabilitiesomputed peronnetion (left)and perbyte(right). . . 181

5.11 Bodiesand tails ofthe AdistributionsforUNC 1 PM andits soure-level trae

replays. . . 183

5.12 Bodiesand tails oftheB distributionsforUNC 1PM and itssoure-level trae

replays. . . 184

5.13 Bodiesand tails oftheE distributionsforUNC 1PM and itssoure-level trae

replays. . . 184

5.14 Bodiesand tailsoftheTAdistributionsforUNC1PManditssoure-leveltrae

replays. . . 185

5.15 BodiesandtailsoftheTB distributionsforUNC1PManditssoure-leveltrae

replays. . . 185

5.16 Bodiesof theround-trip timeand reeiverwindowsizedistributionsforUNC 1

PM and itssoure-level trae replays. . . 186

5.17 Bodies of the loss rate distributions for UNC 1 PM and its soure-level trae

replays,with probabilitiesomputed peronnetion (left) andperbyte (right). . 187

5.18 Bodies and tails of the A distributions for Abilene-I and its soure-level trae

replays. . . 188

5.19 Bodies and tails of the B distributions for Abilene-I and its soure-level trae

replays. . . 188

5.20 Bodies and tails of the E distributions for Abilene-I and its soure-level trae

replays. . . 188

5.21 Bodies and tails of the TA distributionsfor Abilene-Iand its soure-level trae

replays. . . 189

5.22 Bodies and tails of theTB distributionsforAbilene-I and its soure-level trae

replays. . . 190

5.23 Bodiesoftheround-triptimeandreeiverwindowsizedistributionsforAbilene-I

(17)

plays,withprobabilitiesomputedperonnetion (left)and per byte (right). . . 191

6.1 Byte throughputtimeseriesforLeipzig-II inboundand its fourtypesof

soure-level traereplay. . . 201

6.2 BytethroughputtimeseriesforLeipzig-IIoutboundanditsfourtypesof

soure-level traereplay. . . 202

6.3 PaketthroughputtimeseriesforLeipzig-IIinboundanditsfourtypesof

soure-level traereplay. . . 203

6.4 Paket throughput time series for Leipzig-II outbound and its four types of

soure-level trae replay. . . 205

6.5 Byte throughput marginals for Leipzig-II inboundand its four types of

soure-level traereplay. . . 206

6.6 Bytethroughputmarginals forLeipzig-IIoutboundand itsfourtypesof

soure-level traereplay. . . 209

6.7 PaketthroughputmarginalsforLeipzig-IIinboundanditsfourtypesof

soure-level traereplay. . . 209

6.8 PaketthroughputmarginalsforLeipzig-IIoutboundanditsfourtypesof

soure-level traereplay. . . 210

6.9 Wavelet spetra of the byte throughput time series for Leipzig-II inbound and

its fourtypesof soure-leveltrae replay.. . . 212

6.10 Wavelet spetra ofthe byte throughput timeseriesforLeipzig-II outboundand

its fourtypesof soure-leveltrae replay.. . . 212

6.11 Wavelet spetra ofthepaket throughputtimeseriesforLeipzig-IIinboundand

its fourtypesof soure-leveltrae replay.. . . 215

6.12 Wavelet spetra of the paket throughput time series for Leipzig-II outbound

and its fourtypesof soure-leveltrae replay. . . 215

6.13 AtiveonnetiontimeseriesforLeipzig-IIanditsfourtypesofsoure-leveltrae

replay. . . 216

6.14 BytethroughputtimeseriesforUNC1PMinboundanditsfourtypesof

soure-level traereplay. . . 217

6.15 Byte throughput time series for UNC 1 PM outbound and its four types of

soure-level trae replay. . . 218

6.16 Paket throughput time series for UNC 1 PM inbound and its four types of

(18)

soure-level trae replay. . . 221

6.18 BytethroughputmarginalsforUNC1PM inboundandits fourtypesof

soure-level traereplay. . . 221

6.19 BytethroughputmarginalsforUNC1PMoutboundanditsfourtypesof

soure-level traereplay. . . 222

6.20 PaketthroughputmarginalsforUNC1PMinboundanditsfourtypesof

soure-level traereplay. . . 223

6.21 Paket throughput marginals for UNC 1 PM outbound and its four types of

soure-level trae replay. . . 224

6.22 Wavelet spetraof thebytethroughput timeseriesforUNC 1 PMinboundand

its fourtypesof soure-leveltrae replay.. . . 225

6.23 Wavelet spetraofthebytethroughputtimeseriesforUNC1PMoutboundand

its fourtypesof soure-leveltrae replay.. . . 225

6.24 Wavelet spetra of the paket throughput time series for UNC 1 PM inbound

and its fourtypesof soure-leveltrae replay. . . 228

6.25 Wavelet spetra of the paket throughput timeseries forUNC 1 PM outbound

and its fourtypesof soure-leveltrae replay. . . 228

6.26 Ative onnetion time series for UNC 1 PM and its four types of soure-level

trae replay. . . 229

6.27 BytethroughputtimeseriesforUNC1AMinboundanditsfourtypesof

soure-level traereplay. . . 234

6.28 Byte throughput time series for UNC 1 AM outbound and its four types of

soure-level trae replay. . . 234

6.29 Paket throughput time series for UNC 1 AM inbound and its four types of

soure-level trae replay. . . 236

6.30 Paket throughput time series for UNC 1 AM outbound and its four types of

soure-level trae replay. . . 236

6.31 BytethroughputmarginalsforUNC 1AMinboundandits fourtypesof

soure-level traereplay. . . 237

6.32 BytethroughputmarginalsforUNC1AMoutboundanditsfourtypesof

soure-level traereplay. . . 237

6.33 PaketthroughputmarginalsforUNC1AMinboundanditsfourtypesof

(19)

soure-level trae replay. . . 238

6.35 Wavelet spetraof thebyte throughputtimeseriesforUNC1 AMinboundand

its fourtypesof soure-leveltrae replay.. . . 240

6.36 WaveletspetraofthebytethroughputtimeseriesforUNC1AMoutboundand

its fourtypesof soure-leveltrae replay.. . . 240

6.37 Wavelet spetra of the paket throughput time series for UNC 1 AM inbound

and its fourtypesof soure-leveltrae replay. . . 241

6.38 Wavelet spetra of the paket throughput time seriesforUNC 1 AM outbound

and its fourtypesof soure-leveltrae replay. . . 241

6.39 Ative onnetion time series for UNC 1 AM and its four types of soure-level

trae replay. . . 242

6.40 Byte throughput time series for UNC 7:30 PM inbound and its four types of

soure-level trae replay. . . 243

6.41 Byte throughput time series for UNC 7:30 PM outboundand its four types of

soure-level trae replay. . . 243

6.42 Paket throughput time seriesfor UNC 7:30 PM inboundand its four types of

soure-level trae replay. . . 244

6.43 Paket throughputtimeseriesforUNC 7:30PM outboundandits fourtypesof

soure-level trae replay. . . 245

6.44 Byte throughput marginals for UNC 7:30 PM inbound and its four types of

soure-level trae replay. . . 245

6.45 Byte throughput marginals for UNC 7:30 PM outbound and its four types of

soure-level trae replay. . . 246

6.46 Paket throughput marginals for UNC 7:30 PM inbound and its four types of

soure-level trae replay. . . 246

6.47 Paket throughputmarginals for UNC 7:30 PM outbound and its fourtypes of

soure-level trae replay. . . 247

6.48 Wavelet spetra of the byte throughput time series for UNC 7:30 PM inbound

and its fourtypesof soure-leveltrae replay. . . 248

6.49 Wavelet spetraof thebytethroughput timeseriesforUNC 7:30PM outbound

and its fourtypesof soure-leveltrae replay. . . 248

6.50 Wavelet spetra ofthepaket throughputtimeseriesforUNC 7:30PMinbound

(20)

and its fourtypesof soure-leveltrae replay. . . 250

6.52 AtiveonnetiontimeseriesforUNC7:30PManditsfourtypesofsoure-level

trae replay. . . 251

6.53 BytethroughputtimeseriesforAbilene-IClev/Iplsand itsfourtypesof

soure-level traereplay. . . 252

6.54 BytethroughputtimeseriesforAbilene-IIpls/Clevand itsfourtypesof

soure-level traereplay. . . 252

6.55 PaketthroughputtimeseriesforAbilene-IClev/Iplsanditsfourtypesof

soure-level traereplay. . . 253

6.56 PaketthroughputtimeseriesforAbilene-IIpls/Clevanditsfourtypesof

soure-level traereplay. . . 254

6.57 Byte throughputmarginals forAbilene-IClev/Iplsand its fourtypesof

soure-level traereplay. . . 254

6.58 Byte throughputmarginals forAbilene-IIpls/Clevand its fourtypesof

soure-level traereplay. . . 255

6.59 PaketthroughputmarginalsforAbilene-IClev/Iplsanditsfourtypesof

soure-level traereplay. . . 255

6.60 PaketthroughputmarginalsforAbilene-IIpls/Clevanditsfourtypesof

soure-level traereplay. . . 256

6.61 Wavelet spetra of the byte throughput time seriesfor Abilene-IClev/Ipls and

its fourtypesof soure-leveltrae replay.. . . 257

6.62 Wavelet spetra of the byte throughput time seriesfor Abilene-IIpls/Clev and

its fourtypesof soure-leveltrae replay.. . . 257

6.63 Wavelet spetraofthepaket throughputtimeseriesforAbilene-IClev/Iplsand

its fourtypesof soure-leveltrae replay.. . . 258

6.64 Wavelet spetraofthepaket throughputtimeseriesforAbilene-IIpls/Clevand

its fourtypesof soure-leveltrae replay.. . . 258

6.65 AtiveonnetiontimeseriesforAbilene-Ianditsfourtypesofsoure-leveltrae

replay. . . 259

7.1 Bodiesofthedistributionsofonnetioninter-arrivalsforUNC1PMand1AM,

and theirexponentialts. . . 266

7.2 Tailsof thedistributionsof onnetion inter-arrivals forUNC 1 PM and 1AM,

(21)

II,and theirexponentialts. . . 266

7.4 Tailsofthedistributionsofonnetioninter-arrivalsforAbilene-IandLeipzig-II,

and theirexponentialts. . . 266

7.5 Average oeredloadvs. numberofonnetionsfor1,000Poissonresamplingsof

UNC 1 PM. . . 268

7.6 Histogramoftheaverage oeredloadsin1,000PoissonresamplingsofUNC1PM.268

7.7 Tailsofthe distributionsof onnetion sizesforUNC 1 PM. . . 270

7.8 Analysis of the auray of onnetion-driven Poisson Resampling from 6,000

resamplings ofUNC 1 PM(1,000 foreah target oered load). . . 270

7.9 Comparisonofaverageoeredloadvs. numberofonnetionsfor1,000

onnetion-drivenPoissonresamplingsand1,000byte-drivenPoissonresamplingsofUNC1

PM. . . 271

7.10 Histogram ofthe average oeredloadsin1,000 byte-drivenPoissonresamplings

of UNC 1PM. . . 271

7.11 Analysis of the auray of byte-driven Poisson Resampling from 4,000

resam-plings ofUNC 1 PM(1,000 foreah target oered load). . . 273

7.12 Analysis of the auray of byte-driven Poisson Resampling using soure-level

traesreplay: replaysofthreeseparateresamplingsofUNC1PMforeahtarget

oeredload,illustratingthesaling downof loadfrom theoriginal177.36 Mbps. 274

7.13 Analysis of the auray of byte-driven Poisson Resampling using testbed

ex-periments: replayof one resamplingof UNC 1 AM for eah target oered load,

illustratingthesaling upof load fromtheoriginal 91.65Mbps. . . 274

7.14 ConnetionarrivaltimeseriesforUNC1PM(dashedline)andaPoissonarrival

proess withthe same mean(solid line). . . 275

7.15 ConnetionarrivaltimeseriesforUNC1AMandaPoissonarrivalsproesswith

thesame mean. . . 275

7.16 WaveletspetraoftheonnetionarrivaltimeseriesforUNC1PMandaPoisson

arrivalproess withthesame mean. . . 276

7.17 WaveletspetraoftheonnetionarrivaltimeseriesforUNC1AMandaPoisson

arrivalproess withthesame mean. . . 276

7.18 BlokresamplingsofUNC1PM:impatofdierentbloklengthsonthewavelet

spetrum ofthe onnetionarrivaltimeseries.. . . 279

7.19 BlokresamplingsofUNC1AM:impatofdierentbloklengthsonthewavelet

(22)

vetors (left) and orresponding histograms of average oered loads (right) in

3,000 resamplings. . . 282

7.21 Wavelet spetra of several random subsamplings of the onnetion vetors in

UNC 1 PM(left) and1 AM (right) . . . 283

7.22 Analysisoftheaurayofbyte-drivenBlokResamplingusingsoure-leveltrae

replay: replaysoftwoseparateresamplingsofUNC1PMforeahtarget oered

load,illustratingthesaling downof loadfrom theoriginal177.36 Mbps. . . 285

7.23 Analysisoftheaurayofbyte-drivenBlokResamplingusingsoure-leveltrae

replay: replay of one resampling of UNC 1 AM for eah target oered load,

illustratingthesaling upof load fromtheoriginal 91.65Mbps. . . 285

7.24 Wavelet spetraof thepaket arrivaltime seriesforUNC 1 PMand the

soure-level traereplays of two blokresamplingsof thistrae. . . 286

7.25 Wavelet spetraof thepaket arrivaltime seriesforUNC 1 PMand the

(23)

ACK Positive aknowledgment TCPsegment

ADU AppliationData Unit

API AppliationProgrammingInterfae

AQM Ative QueueManagement

BGP Border Gateway Protool

BPF BerkeleyPaket Filter

C.I. Condene Interval

CCDF ComplementaryCumulativeDistributionFuntion

CDF CumulativeDistributionFuntion

DAG Data Aquisition andGeneration

FIFO First-In First-Out

FIN TCP ontrolag indiating\no more datafrom sender".

FTP File Transfer Protool

GB Gigabyte

GPS GlobalPositioning System

HTML HyperText MarkupLanguage

HTTP HyperText TransferProtool

I/O Input/Output

ICMP InternetControlMessage Protool

IP InternetProtool

IRC InternetRelayChat

ISP InternetServieProvider

K-S Kolmogorov-Smirnovtest

KB Kilobyte

Kpps Kilo paket perseond

(24)

MIME MultipurposeInternetMail Extensions

MSS MaximumSegment Size

MTU MaximumTransmissionUnit

Mbps Megabit perseond

NNTP Network NewsTransferProtool

OSTT One-SideTransitTime

PMA Passive Measurement andAnalysis

Q-Q Quantile-Quantile

RED Random Early Detetion

RFC Request ForComments

RST TCP ontrolag indiating\onnetion reset".

RTT Round-TripTime

SMTP SimpleMail TransferProtool

SSH Seure Shell

SYN SynhronizeTCP ontrolsegment

SYN-ACK Positive aknowledgement of SYNsegment

TCP TransportControlProtool

UDP User Datagram Protool

UNC Universityof North Carolinaat ChapelHill

(25)

Introdution

As far as the laws of mathematis refer to reality, they are not ertain; and as

far as they are ertain, they do notrefer to reality.

| Albert Einstein (1879{1955)

Humankind annot stand very muh reality.

| T. S. Elliot (1888{1965)

Researhinnetworkinghasto dealwith theextremeomplexityofmanylayers of

tehnol-ogy interating with eah other in frequently unexpeted ways. As a onsequene, there is a

broad onsensus among researhers that purely theoretial analysis is not enough to

demon-strate theeetivenessof network tehnologies. Moreoften thannot, areful experimentation

insimulatorsandnetwork testbedsunderontrolledonditionsisneededtovalidatenewideas.

Every researher therefore faes, at some point or another, the need to design realisti

net-working experiments,and synthetinetwork traÆisaforemost elementof theseexperiments.

SynthetinetworktraÆrepresentsnotonlytheworkload ofaomputer network,butalsothe

diret orindirettarget of anyoptimization. Forinstane, ongestionontrolresearh fouses

on preservingasmuh aspossibletheabilityof a network to transfer datain thefaeof

over-load. Therefore, evaluating a new ongestionontrol mehanism in a transport protoolsuh

astheTransport ControlProtool(TCP)[Pos81 ℄ usuallyrequiresonstruting experimentsin

whihanumberofnetworkhostsexhangedatausingthisprotoolinanenvironmentwithone

or more saturated links. The value of the new mehanism is then expressed as a funtion of

theperformaneof these dataexhanges. Forexample, thenew mehanism may be optimized

(26)

the harateristis of syntheti traÆ have a dramati impat on the outome of networking

experiments. Forexample,anewmehanismthatimprovesthethroughputofbulk,long-lasting

letransfersina ongestedenvironmentmaynotimprove and mayeven degrade theresponse

time of thesmalldata exhanges inweb traÆ. Thiswaspreiselythe ase of Random Early

Detetion(RED), an Ative QueueManagement (AQM) mehanism. The originalanalysis by

Floyd and Jaobson [FJ93a℄ learly demonstrated the benets of RED over the basi

First-In First-Out (FIFO) queuing mehanism for bulk transfers. In this study, RED queues were

exposedto asmallnumber(2{4)oflargeletransfers. However, alaterexperimentalstudyby

Christiansenetal. [CJOS00℄ showedthatthisrstAQMmehanismdegradedtheperformane

of web traÆ in highly ongested environments. In ontrast to the original evaluation, web

traÆ mostly onsists of a very large number of small data transfers, whih reate a very

dierentworkload. TheemergeneoftheweblearlyhangedthenatureofInternettraÆ,and

madeitneessarytorevisitexistingresultsobtainedunderdierentworkloads. Thesystemati

evaluationofnetworkmehanismsmustthereforeinludeexperimentsoveringthewiderangeof

traÆharateristisobservedonInternetlinks. Itisritialtoprovidetheresearhommunity

withmethodsandtoolsforgeneratingsynthetitraÆasrepresentativeaspossibleofthisrange

of harateristis.

The onept of soure-level modeling introdued by Paxson and Floyd [PF95 ℄ onstitutes

a major inuene on this dissertation. These authors advoated for building models of the

behaviorofInternetappliations(i.e., thesoures ofInternettraÆ),and generatingtraÆin

networking experiments by driving network staks with these appliation models. The main

benetofthisapproahisthattraÆisgeneratedinalosed-loopmanner,whihfullypreserves

the fundamental feedbak loop between network endpoints and network harateristis. For

example, a model of web traÆ an be usedto generatetraÆ usingTCP/IP network staks,

and the generated traÆ will properly reat to dierent levels of ongestion in networking

experiments. In ontrast, open-loop traÆ generation is assoiated to models of the paket

(27)

forexperimental studiesthat hangethese onditions.

The main motivation of our work is to address one important diÆulty with soure-level

modeling. In the past, soure-level modeling has been assoiated with haraterizing the

be-haviorofindividualappliations. Whilethisapproah anresult inhigh-qualitymodels,itisa

diÆultproess thatrequiresa largeamountof eort. Asa onsequene, onlyasmallnumber

of models is available, and they are often outdated. This is in sharp ontrast to the traÆ

observedinmostInternetlinks,whihisdrivenbyrihtraÆ mixes omposedofalarge

num-berof appliations. Soure-level modelingof individualappliationsdoes notsale to modern

traÆ mixes,making it very problemati fornetworking researhers to ondut representative

experimentswith losed-loop traÆ.

This dissertation presentsa new methodology forgeneratingnetwork traÆin testbed

ex-periments and software simulations. We make three main ontributions. First, we develop

a new soure-level model of network traÆ, the a-b-t model, for desribing in a generi and

intuitive manner the behavior of the appliations driving TCP onnetions. Given a paket

header trae olleted at an arbitrary Internet link, we use this model to desribe eah TCP

onnetion inthe trae interms of dataexhanges and quiet times,without any knowledge of

the atual semantis of the appliation. Our algorithms make it possible to eÆiently derive

empirialharaterizations of network traÆ, reduing modelingtimes from months to hours.

The same analysis an be used to inorporate network-level parameters, suh as round-trip

times, to the desription of eah onnetion, providing a solid foundation for traÆ

genera-tion. Seond, we propose a traÆ generation method, soure-level trae replay, where traÆ

is generated by replayingthe observed behavior of the appliations assoures of traÆ. This

is therefore a method forgenerating entire traÆmixes in a losed-loop manner. One ruial

benet of ourmethod is thatit an be evaluated by diretly omparingan originaltrae and

its soure-level replay. This makes it possibleto systematially studythe realism of syntheti

traÆ, in the terms of how well our desription of the onnetions in the original traÆ mix

(28)

!"

"""!#!$ %!&%&

"& '

#!$ %!

&

(!%&

))& ! %&*+,

-, ./(! 0

1.233(4 5. 633(4 633( 7. 633(

8 & "

Figure1.1: Network traÆseen from dierentlevels.

tounderstandtheimpatthatthedierentharateristisofatraÆmixhaveonspeitraes

andon InternettraÆingeneral. Third,weproposeand studytwo approahesforintroduing

variability in the generation proess and saling (up or down) the level of traÆ load in the

experiments. Theseoperationsgreatly inreasetheexibilityof ourapproah, enablingawide

range ofexperimentalinvestigations ondutedusingour traÆgenerationmethod.

1.1 Abstrat Soure-Level Modeling

This dissertation presents a methodologyfor generating syntheti network traÆ that

ad-dresses some of themain shortomingsof existingtehniques. Figure 1.1 illustratesthe levels

of detail at whih Internet traÆ an be studied, providing a good starting point for framing

our disussion. We fous on the traÆ on a single Internet link, suh as the one between the

UniversityofNorth Carolinaat ChapelHill(UNC) and theInternet. We an studythetraÆ

in this link at dierent levels of detail. The top-most time-line represents traÆ observed in

(29)

wereinterleaved reatingaomplexarrivalproessinthenetworklink. Ingeneral,TCPtraÆ

aountsforthevastmajorityofthepaketsonInternetlinks(usuallybetween90%and95%),

whihjustiesourfouson TCPinthiswork. Theseond time-linedepitsthepaket arrivals

that belonged to a single TCP onnetion. These pakets were used to send data bak and

forth between two network endpoints, one loated at UNC, and the other one somewhere on

the Internet. The soures of these data are appliationsrunningon theendpoints, whih rely

onthepaket swithingservieprovidedbytheInternetto ommuniate. Prominentexamples

of these appliations are the World Wide Web, email, le sharing, et. Hundreds of dierent

appliationsare ommonly foundon Internetlinks. The traÆobserved at an Internetlink is

therefore the result of multiplexingthe ommuniation of a large numberof endpointsdriven

byawiderangeofappliations. ThisdissertationonsiderstheproblemofgeneratingtraÆin

networking experimentsthat preservesboththeaggregate-level and theonnetion-level

prop-ertiesoftraÆobservedinarealnetworklink. Notethatwerestritourselvestothismostbasi

form of the problem where only a single link is onsidered both for observing traÆ and for

reproduingit innetworking experiments. Our ndings an ertainly be applied to a broader

ontext, e.g.,multiplelinksalong a pathfollowing the\parking lot topology" [PF95℄, linksin

an ISP, et., butwe hoose to keep to thisproblem in its mostessential form throughout this

dissertation.

Asmentionedbefore,everyonnetionontheInternetisdrivenbyanappliationexhanging

databetweentwoendpoints. It istherefore possibletoexaminetraÆ at ahigher-level, where

theommuniationisdesribedinterms ofappliation data units (ADUs)ratherthannetwork

pakets. Thisappliationlevelisillustratedinthebottomtime-lineofFigure1.1,whihreveals

thatthesoureof thepakets intheseond time-linewastheexhangeof databetweenaweb

browser and a web serverusing a TCPonnetion. The time-line shows a rst ADU of 2,500

bytes, whih arried a request foran HTML page. The way thedata is organized within this

ADUanditsmeaningisgivenbythespeiationoftheHyperTextTransferProtool(HTTP)

[FGM +

(30)

therst ADU. It arried theatual HTML soureode of the pagerequested by thebrowser.

Itssizewas4,800bytes,whihinludednotonlytheHTMLsoureodebutalsoanappropriate

HTTPheader. The time-lineshowsanotherpairofADUs thatalso orrespondedto anHTTP

request andanHTTP response, whihthistimearriedanimagele. EahADUisassoiated

to one or more pakets in the seond time-line. The amount of data in these ADUs and its

meaning was deided by the appliation, while the atual number of pakets, their sizes, the

needforretransmissions, et.,weredeidedbylower layers (transportand below).

The appliation level provides the starting point for the traÆ modeling and generation

methodology developed in this dissertation. Our approah to traÆ generation relies on the

notionof soure-level modeling, advoated byPaxson and Floyd [FP01 ℄. Rather than diretly

generatingpaketsaording tosometraeorsomepaket arrivalmodel,soure-levelmodeling

involves simulating the behavior of the appliations running on the endpoints and allowing

lowerlayers to ontrolthe atual exhange of pakets. For example, generating traÆ with a

soure-levelmodelofweb traÆmeansto simulate web browsersandwebserversaordingto

statistial models of web page sizes, the durations of user think times and other soure-level

parameters [Mah97 ,BC98 , SHCJO01℄.

ModelingtraÆat thesourelevelproduesdesriptionsoftraÆthataremostly

indepen-dent of the underlying protools and network onditions, so they an be used to drive traÆ

generation in experiments that modify these same protools and onditions. For this reason,

soure-levelmodelsarealso known asnetwork-independent model. Forexample, thesizeof an

HTML page arried in a TCP onnetion does not hange with the degree of ongestion (it

alwayshas thesamenumberof haraters). Therefore, its sizeisa network-independent

prop-erty. Lower-leveldesriptionsoftraÆ,suhasharaterizationsofpaket arrivals,arenetwork

dependent. For example, the rate at whih thepakets of a TCP onnetion arrive dereases

as thedegree of ongestion inreases, sineTCP uses a ongestionontrol algorithm that

de-reases the sendingrate asthe lossrate inreases. Also,paket losses fore TCP endpointsto

(31)

to be transferred, depending on the number of lost pakets. A soure-level model desribes

the sizes of ADUs, but not the times at whih a onnetion should lower its sending rate or

retransmit a paket. For this reason, the same model an be used to generate traÆ under

dierent network onditions, suh as low and high levels of ongestion. Endpointsgenerating

traÆ using these models are able to adapt to eah spei set of network onditions in the

experiments. Thispreservesthefundamentalfeedbakloop thatexistsbetweenendpointsand

network onditions. For this reason, this type of traÆ generation is said to be losed-loop.

Ontheontrary, traÆgenerated aordingto lowerlevel modelsisneessarilyopen-loop. For

example,tpreplay [tpb℄anbeusedtoreenatthesendingofeverypaketreordedinatrae,

whihresultsinopen-looptraÆthat isinsensitiveto theunderlyingnetwork onditions. This

traÆ is inappropriate for experiments where network onditions are important, suh as the

evaluation ofongestionontrolmehanisms.

In thepast,soure-level modelinghasbeenonsidereda synonymof appliation modeling,

so researhers have developed a number of appliation-spei models inluding models for

web traÆ, le transferring and other individual appliations. This approah is good if one

is interested in the traÆ generated bya single appliation (or by a handfulof appliations).

However, if one is interested in realistitraÆ mixes, appliation-spei traÆ modelinghas

some important shortomings. Therst problemisthatappliationspeimodelingdoesnot

salewelltothelargenumberofappliationsthatformontemporarytraÆmixes. For

exam-ple,the weekly traÆ report from Internet2 [Con04 ℄ollets separate statistis for more than

80 dierentappliationsthat make upInternet2traÆ. Usingexistingtehnology, it issimply

tootime-onsumingtodevelopandpopulateindividualmodelsforeahappliation. Moreover,

even ifwehad theresouresto examinethebehaviorofallappliations,manyappliationsuse

proprietary protools, so painstaking reverse engineering is needed to understand and model

theirbehavior. Inaddition,InternettraÆevolvesquikly,sinenewappliationsandimproved

versionsof theexistingones appearvery frequently.

(32)

!

"# $ %& $ ' ! "#$ %& $ ( )* ( )* ) ) )

Figure1.2: An a-b-t diagramillustrating a persistentHTTP onnetion.

traÆ generation problems. We develop an abstrat model of network data exhange wherein

eah onnetion is desribed independently of the semantis of the appliation initiating the

onnetion. This idea is illustrated in thethird time-lineof Figure 1.1. Here the

ommunia-tion is desribed in generi terms, simply as a sequene of ADU exhanges between the two

endpointsof theTCPonnetion,withoutattahinganymeaningto theADUs. Othergeneri

harateristis of traÆ inlude the diretion in whih the ADUs are sent, from the

onne-tioninitiator orfromtheonnetion aeptor, andthe durationofquiettimes betweenADUs,

whih are due to user behavior and proessing times. These harateristis an generally be

usedtodesribethebehaviorofanyspeiappliation. Forexample,theADUsofwebtraÆ

are HTTPrequests and responses, whiletheinter-ADU times areuser think times and server

proessingtimes. TheruialobservationisthatthesizesofADUsandthetimesbetweenthem

an bemeasured from thepaket traes oftwo onnetionswithout knowledge ofthe behavior

of the appliation driving the onnetion. This makes it possible to onstrut a soure-level

desription of the entire set of onnetions observed in a measured link, instead of only the

onnetions driven by one or a few well-known appliations. Any trae of pakets traversing

a network link an be transformed into an abstrat soure-level trae, withoutexamining the

payloadof thepakets andwithoutinstrumentingthe endpoints.

Our approah to soure-levelmodeling resultsinan abstratrepresentationof a TCP

on-netion usinga notation that we allan a-b-t onnetion vetor. We also refer to thisidea as

(33)

at thesourelevel, ratherthaninthesenseof amathematial orstatistial model . Theterm

a-b-tisdesriptiveofthebasibuildingbloksofthismodel: a-typeADUs(a's),whiharesent

from theonnetion initiator to theonnetion aeptor, b-type ADUs (b's), whih owinthe

oppositediretion,andquiettimes(t's),duringwhihnodatasegmentsareexhanged. Wewill

make use of these terms to desribe the soure-levelbehavior of TCPonnetions throughout

thisdissertation.

Our a-b-t modelhas a sequentialversion and a onurrent version. The sequential version

applies toonnetions wheretheendpointsfollowa strit orderintheirexhangeof ADUs. In

thisversion, a TCPonnetion is desribed bya vetor of epohs(e

1 ;e

2 ;:::;e

n

). Eah epoh

has the form e

j = (a

j ;ta

j ;b

j ;tb

j

), where a

j

is the size of an ADU sent from the onnetion

initiator to the onnetion aeptor, b

j

is the size of an ADU sent in the opposite diretion,

and ta

j

and tb

j

are inter-ADU quite times (during whih the endpoints are idle). We all

this representation of soure-level behavior a sequential onnetion vetor. For example, the

onnetion illustratedinFigure 1.2is represented as

((329;0;403;0:1 2) ;(4 03 ;0 ;25 82 1;3:1 2) ;(3 56 ;0 ;11 98 ;1 5:3 ))

usingthe sequential a-b-t model. This onnetion has three epohs, eah arrying one HTTP

request/response pair. The rst epoh hasan ADU a

1

of size329 bytes, whihwassent from

theonnetioninitiator(awebbrowser)totheonnetionaeptor(awebserver),andanADU

b

1

ofsize803bytes,whihwassentintheoppositediretion. Wealsoobservesomequiettimes

betweenthe ADUs,suhtb

2

, whih had aduration of 3.12seonds. While Figure 1.2 inludes

labels forHTTPrequests, responses and douments,oura-b-t notation isompletely generi.

We onsider this TCP onnetion sequential beause only one endpoint sent data to the

otheroneat anypointinthelifetimeoftheonnetion. Itisimportantto iteratethatanADU

isnot aTCPsegment(i.e., TCPpaket), butanappliationmessagethatisindependentofits

1

Oura-b-tmodelprovideshoweveragoodfoundationfordevelopingmathematialandstatistialmodelsof

traÆatthe soure-level. Thisdissertation onsistentlyfollows anon-parametriapproahtotraÆmodeling.

Theonlyexeption isthe PoissonResampling methodpresented inChapter 7,for whihwealso oer amore

(34)

! ! " ! # ! $ ! % " # $ %

Figure1.3: Adiagram illustratingthe interation between twoBitTorrent peers.

atualnetwork representationasalink-levelpaket. Assuh,anADUanbeof arbitrarysize,

like thesmaller a

1

=329 bytesand thelarger b

2

=25;821 bytesinthepreviousexample. The

transferringof a

1

would usuallyinvolve a singleTCP segment,butit is alsopossiblethat this

segmentgetsdupliated,orlostandthenretransmitted. Inthisase,theTCPendpointsending

a

1

would result in the generation of two or more segments arrying this ADU. Our notation

would stilldesribe thispartof theTCPonnetion asa single329-byte ADU, and notasthe

sequene ofTCP segmentsused to transferthe data. Similarly,transferringb

2

=25;821 bytes

requires a minimumof 18 TCPsegments ina pathwithout lossand witha regular Maximum

Segment Size (MSS) of 1,460 bytes (the one derived from Ethernet's Maximum Transmission

Unit (MTU) of 1,500 bytes, after subtrating 20 bytesfor the IP header and 20 bytes forthe

TCPheader). Itmay requiremanymore segmentsina lossyenvironment,orina pathwitha

lowerMTU.However, thesedetailsareirrelevantattheabstratsourelevel,whereb

2

aptures

theneed ofone ofthe endpointsto send25,821 bytesof data, and thisneedis independent of

the way inwhih the data is transferred bythe network. Our modeling is therefore

network-independent,whihmakesit suitableforgeneratinglosed-loop traÆ.

While mostTCPonnetionsaredrivenbyappliationsthatfollowasequentialpatternof

ADU exhanges, we an also ndases inwhihthe two endpointssend data toeah otherat

thesametime. ThisisillustratedinFigure1.3usingaBitTorrent[Coh03℄onnetion,wherewe

anseeADUswhosetransmissionoverlapsintime(i.e.,theADUsareexhangedonurrently).

Thispatternisertainlylessommonthatthesequentialone,butitissupportedinimportant

protools like HTTP/1.1 (pipelining), NNTP(streaming mode)and BitTorrent. Our analysis

shows that whilethe fration of onnetions with onurrent dataexhanges is usuallysmall,

(35)

realistitraÆmixes.

To represent onurrent ADU exhanges, the ations of eah endpoint are onsidered to

ourindependentlyof eah other. Thuseah endpoint is a separate souregeneratingADUs

that appear as a sequene of epohs following a unidiretional ow pattern. Formally, this

meansthatwe represent eah onnetion asa pair(;) of onnetion vetorsof theform

=((a

1 ;ta

1 );(a

2 ;ta

2

);:::;(a

n

a ;ta

n

a ))

and

=((b

1 ;tb

1 );(b

2 ;tb

2

);:::;(b

n

b ;tb

n

b ));

where a

i and b

i

are sizes of ADUs sent from the initiator and from the aeptor of the TCP

onnetion respetively, and ta

i

and tb

i

arequiet timesbetweentheADUs. Weallthis

repre-sentationofsoure-levelbehavioraonurrentonnetion vetor. Unlikethesequentialversion

of the a-b-t model, thisrepresentation does notapture any ausality between thetwo

dire-tionsofaTCPonnetion. Asaonsequene, traÆgeneratedaording tothisversionofthe

model usuallyexhibitsa substantialnumberofonurrent dataexhanges.

Thea-b-tmodelprovidesasimpleyetexpressivewayofdesribingsoure-levelbehaviorina

generimannerthatisnottiedtothedetailsofanyappliation. Inaddition,thisnon-parametri

model wasdesigned to inorporatequantities(ADU sizes,ADU diretionality, and inter-ADU

quiet time duration) that an be extrated from paket header traes in a eÆient, aurate

manner. We an easily imaginemore omplex and expressive modelsof TCPonnetions for

whih noeÆient dataaquisitionalgorithmexists, ormodelsthatdeal withharateristisof

soure-levelbehaviorthatannotbeextratedpurelyfrompaket headers. Intheaseofthe

a-b-tmodel,wehavedevelopedadataaquisitionalgorithmthatreliesonTCPsequenenumbers

formeasuringADUsizes,andonthepaketarrivaltimestampsobtainedduringtraeolletion

to determineinter-ADU quitetimes. Our algorithmonstrutsa datastrutureinwhihTCP

(36)

Tmix Traffic

Generators

Tmix Traffic

Generators

Trace Partitioning

TESTBED

Original Packet

Header Trace

T

h

Original Packet

Header Trace

T

h

Original

Connection Vectors

T

c

Original

Connection Vectors

T

c

Trace Analysis

Generated Packet

Header Trace

T

h

Generated Packet

Header Trace

T

h

Replayed

Connection Vectors

T

c

Replayed

Connection Vectors

T

c

Trace Analysis

Figure 1.4: Overview of Soure-levelTrae Replay.

be delivered to the appliation layer of the reeiving endpoint. In reonstruting thislogial

order for eah onnetion, we have developed methods for dealing with network pathologies

suh as arbitrary segment reordering,dupliation and retransmission. Furthermore, when the

datasegmentsin aTCP onnetion annotbe orderedaording to thelogialdata order,we

an lassify the onnetion as onurrent with ertainty. Our data struture supports both

sequential(i.e., bidiretional)and onurrent(i.e., unidiretional)ordering,making itpossible

toextrat ADUsizesandquiettimeswithasinglepassoverthesegmentsofaTCPonnetion

foundinatrae. Theanalysisan beperformedinO(sW)time,wheresisthenumberofdata

segmentsintheonnetionand W isthemaximumsizeoftheTCPwindow(whihboundsthe

maximumamount of reordering).

1.2 Soure-Level Trae Replay

Our abstrat soure-levelmodelingofTCPonnetion provides asolid foundationfor

gen-eration traÆmixes insimulators and network testbeds. We propose to generate traÆ using

soure-level trae replay, asillustrated inFigure 1.4. Given a paket header trae T

h

olleted

from some Internet link, we rst use our data aquisition algorithm to analyze the trae and

(37)

i

orresponding to thisonnetion. The basi approah forgenerating traÆaording to T

is

to replayeveryonnetion vetor C

i

. Eah onnetionvetor C

i

isreplayed bystartingaTCP

onnetion preisely at C

i

's relative start timeT

i

, and transmitting themeasured sequene of

ADUs (a

j and b

j

) separated intime bythe inter-ADU measured quiettimes (ta

i

and tb

i ). In

thisdissertation,we evaluatea spei implementationof thisapproah forFreeBSD network

testbeds, wheretraÆis generated usinga toolwe developed alledtmix .

The goalof thediret soure-level trae replayof T

is to reprodue thesoure-level

har-ateristis of the traÆ in the original link, generating the traÆ in a losed-loop fashion.

Closed-loop traÆ generationimpliesthe need to simulate thebehavior of appliations, using

regular network staks to atuallytranslate soure-level behavior into network traÆ. In

par-tiular, our experiments use an implementation whih relies on the standard soket interfae

to reprodue the dataexhanges in eah onnetion vetor. Generating traÆin thismanner

is losed-loop in the sense that it preserves the feedbak mehanism in TCP, whih adapts

its behavior to hanges in network onditions, suh as loss and reeiver saturation. In

on-trast, paket-level traereplay,thediretreprodutionofT

h

, isanopen-looptraÆgeneration

methodinthesensethatTCPontrolalgorithmsarenotusedduringthegeneration,andhene

thetraÆdoesnotadapt to network onditions.

The evaluation of our methodology onsists of omparing the original trae T

h

and the

synthetitraeT 0

h

obtainedfromthesoure-leveltraereplay. ValidatingourtraÆgeneration

methodonsistsof transformingT 0

h

intoa setofonnetion vetorsT 0

, usingthesame method

usedto transformT

h intoT

. We thenompare theresultingsetof onnetionvetors T 0

with

theoriginal T

. In priniple,theyshouldbe idential, sineT

represents theinvariant

soure-levelharateristisofT

h

. Therearehoweversome dierenesthatareexplainedbythenature

of themodeland ourmeasurement methods.

The diret omparison of T

h

and T 0

h

also provides a way to study the auray of our

approahinterms of how welltraÆisdesribed bythea-b-t model. Thisis however asubtle

exerise. The atual replay of T

, whih reates T 0

(38)

for eah TCP onnetion in the soure-level trae replay. The exat set of generated TCP

segments and theirarrivaltimes isa diret funtionof these parameters. As aonsequene, if

weondutasoure-leveltraereplayusingarbitrarynetwork-levelparameters,weobtainaT 0

h

withlittleresemblanetotheoriginalT

h

. Thereplayeda-b-tonnetionvetorsmaybeaperfet

desriptionof the soure behavior drivingthe original onnetions, but thegenerated

paket-level trae T 0

h

would still be very dierentfrom the original T

h

. To addressthis diÆulty, our

replayinorporatesnetwork-level parameters individuallyderived from eah onnetion inT

h .

We have also inorporated methods for measuring three important network-level parameters

(round-trip time, TCP reeiver window size and loss rate) into our analysis and generation

proedure. While this set of parameters is by no means omplete, it does inlude the main

parameters that aet the average throughput of a TCP onnetion found in a trae. This

enablesusto generate traÆin alosed-loopmanner thatapproximates measured traes very

losely.

Inorporating network-level properties is important, but it is ritial to understand the

mainshortoming of thisapproah. The goal of ourwork is not to make thegenerated traÆ

T 0

h

idential to the originaltraÆ T

h

, whih ould be aomplished witha simple paket-level

replay. Asmentionedbefore,paket-levelreplaysgeneratetraÆthatdoesnotadapttohanges

innetworkonditions,resultinginopen-looptraÆ. Ourgoalistodevelopalosed-looptraÆ

generation method based on a detailed haraterization of soure behavior. TraÆ generated

in a losed-loop manner an adapt to dierent network onditions, whih are intrinsi when

evaluating dierent network mehanisms. Our omparison of T

h

and T 0

h

is only a means to

understand the quality of traÆ generationmethod, where qualityis onsidered to be higher

astheoriginal trae ismore losely approximated. If enoughparameters of theoriginaltraÆ

areaurately measured and inorporated into thetraÆgenerationexperiment, we expetto

observeagreatsimilaritybetweenT

h andT

0

h

. Ontheontrary,ifwearemissingsomeimportant

parameters,weexpetto observe substantialdierenes betweentraes.

(39)

multiplexinga largenumberofonnetionsontoa singlelink,andthese onnetionstraverse a

largenumberofdierentpathswithavarietyofnetworkonditions. Itissimplynotpossibleto

fullyharaterizethisenvironmentand reprodueitina laboratorytestbed orinasimulation.

Thisisbothbeauseof thelimitationsof passiveinferenefrom paketheaders, andbeauseof

the stohasti nature of network traÆ. Soure-leveltrae replayan never inorporate every

fatorthatshapedT

h

,andthereforedierenesbetweenT

h andT

0

h

areunavoidable. Still,nding

alose mathbetweenanoriginaltraeanditsreplay,eveniftheyarenotidential,onstitutes

strong evidene of the auray of the a-b-t model and the data aquisition and generation

methodswehave developed. Italsodemonstratesthefeasibilityofgeneratingrealistinetwork

traÆina losed-loopmannerthat resemblesa rihtraÆ mix.

1.3 Trae Resampling and Load Saling

Aslongasthenetworksetupofa simulationortestbedexperimentremainsunhanged,the

soure-level trae replay of a onnetion vetor trae T

= f(T

i ;C

i

)g always results in traÆ

thatissimilartotheoriginaltrae. EveryreplayontainsthesamenumberofTCPonnetions

behavingaordingto thesame onnetionvetor speiationandstartingat thesametimes.

Onlytinyvariationsareintroduedontheend-systemsbyhangesinloksynhronization,

op-eratingsystemshedulingandinterrupthandling,andatswithesandroutersbythestohasti

nature ofpaket multiplexing. Soure-leveltrae replayhastherefore two desirableproperties:

The qualityof thesynthetitraÆ an be evaluatedbydiretly omparing syntheti and

originaltraÆ. Thismakesitpossibleto studytheauray oftheanalysismethodsand

thegenerationsystemwithompletefreedom, usinganymetrithatan bederived from

realtraÆ. Inontrast, moreabstrat methodsbasedon parametrimodelsoftraÆare

inherentlystohastiandtherefore morediÆulttoevaluate. Forsuhmethods,itisless

obviouswhethertheobserveddierenebetweenthetraÆgeneratedusingtheparametri

(40)

olletion of network protools and mehanisms to exatly the same losed-loop traÆ,

whih provides the right foundation forfair omparative studies. Inontrast, stohasti

variationinthetraÆgeneratedusingparametrimodelsisoftendiÆultto ontrol. For

example, experimentswith models that rely on heavy-tailed distributionsonverge very

slowlyto omparableonditions,asdisussed byCrovellaand Lipsky[CL97 ℄.

While these properties are important, the pratie of experimental networking often requires

to introdue ontrolled variability in the generated traÆ for exploring a wider range of

se-narios. This motivates the development of methods that manipulate T

in order to generate

dierent traÆ that still resembles the original one. Furthermore, developing a statistially

sound way of manipulating T

is essential for generatingtraÆ with dierent levels of oered

load. Thismanipulationto mathatarget oeredloadisa veryommonneedinexperimental

networking researh. This is beause the performane of a network mehanism or protool is

oftenaetedbytheamountof traÆtowhihitis exposed. Therefore,rigorousexperimental

studies frequentlyrequireto generate aompleterange of target loads.

In this dissertation, we propose two exible methods for introduing variability in traÆ

generationexperiments. Inbothases,thesetofonnetionvetorsinT

israndomlyresampled,

resultinginanewsetT 0

thatpreservestheaggregatesoure-levelharateristisoftheoriginal

traÆ. In our rst method, Poisson Resampling, we onstrut a new onnetion vetor trae

T 0

byrandomlyresamplingonnetionsfrom T

, andassigningthem exponentiallydistributed

inter-arrival times. As a result, onnetions in T 0

arrive aording to a Poisson proess. In

theseondmethod,Blok Resampling, weresamplebloks(groups)ofonnetionsratherthan

individualonnetions. Thismethodresultsinamorerealistionnetionarrivalproess,whih

mathes the substantial burstiness observed in real traes. In more tehnial terms, Blok

Resampling preserves the moderate long-range dependene found in real onnetion arrival

proesses, while Poisson Resampling results in a short-range dependent onnetion arrivals

proess. This dierene is demonstrated in our experimental evaluation of the two methods.

(41)

trade-dependene(whihdisappearsforshortbloks). Ouranalysisdemonstratesthatblokdurations

between1 and 5minutes oer thebest ompromise.

ResearhersoftenneedtoondutasetofexperimentswitharangeofdierenttraÆloads.

Whenusingatraditionalsoure-levelmodel,e.g.,amodelofwebtraÆ,researhershavetorst

ondut apreliminaryexperimentalstudytodeterminehowtheparametersof themodel,e.g.,

the number of user equivalents, aet the generated load [CJOS00, LAJS03, KLH +

02 ℄. This

isusuallyknownasthealibration oftraÆgenerator. Our resamplingmethodseliminatethis

ommon needforalibratingtraÆ generators, sinetheresampling proess an beontrolled

to math a spei target load (i.e., generated loadis known a priori). In thease of Poisson

Resampling, this is aomplished by hanging the mean arrival rate of onnetions. In the

ase ofBlokResampling,oeredload ismanipulatedusingblokthinning(i.e., subsampling)

and blok thikening (i.e., ombining bloks). Our work reveals that load saling annot be

based simply on ontrolling the number of onnetions. Suh an approah frequently results

inoered loadsthatare farfrom thetarget, beause thenumberof onnetionsina resample

is not strongly orrelated withthe oered load representedby these onnetions. We address

thisdiÆultybydevelopingbyte-driven versionsofPoissonResamplingandBlokResampling,

whih sale load using a running ount of the total data in the resampled trae T 0

. Unlike

the number of onnetions, the total amount of data in T 0

is strongly orrelated to traÆ

load oered by T 0

. Our experiments onrm that byte-driven resampling is highly aurate,

eliminatingthe ommonneedforalibratingtraÆ generators.

1.4 Thesis Statement

This dissertationonsidersthe followingthesis:

1. An abstrat soure-level model an desribe in detail the entire set of TCP appliation

(42)

header traesinan eÆient, auratemanner.

3. TraÆ generationbasedon thisabstrat soure-levelmodelingresultsinsynthetitraÆ

thatis realistiand suitableforexperimental networking researh.

4. The abstrat soure-level model of a trae an be manipulated to introdue statistially

validvariabilityinthegeneratedtraÆandalsotoauratelymathatarget oeredload

whilepreservingappliationharateristis.

1.5 Contributions

We highlightthefollowingontributionsfrom thisdissertation:

We develop the onept of abstrat soure-level modelingand thea-b-t notation for

de-sribing the soure-level behavior of entire traÆ mixes. We identifya fundamental

di-hotomy in soure-level behavior between onnetions that exhange data sequentially

and onnetionsthat exhangedata onurrently. Our a-b-t notation inludesa

sequen-tialversionandaonurrentversionthatmakesitpossibletoappropriatelydesribethese

two typesof behaviors.

We formulate a formal test of onurreny that an be applied to the paket headers of

any TCP onnetion, and that does not suer from false positives. This enables us to

aurately lassify onnetions as sequential or onurrent. We show that only a small

frationofTCPonnetions(lessthan4%inourtraes)exhangedataonurrently, but

that these TCP onnetions aount for a substantial fration (up to 32%) of the total

traÆ.

We presentan eÆientalgorithmfortransformingapaket header traeintoa olletion

ofsequentialandonurrent a-b-tonnetionvetors. Given aTCPonnetionforwhih

Figure

Figure 1.2: An a-b-t diagram illustrating a persistent HTTP 
onne
tion.
Figure 1.3: A diagram illustrating the intera
tion between two BitTorrent peers.
Figure 3.2: An a-b-t diagram illustrating a persistent HTTP 
onne
tion.
Figure 3.4: Three a-b-t diagrams representing three dierent types of NNTP intera
tions.
+7

References

Related documents

Thus further studies investigating these speculations could examine primitive and mature narcissism’s relationship to Helms’ (1990; 1995) White racial identity statuses, as well as

This guide aims to provide general direction on appropriate selection of sampling locations, and sites, sampling and preservation of samples, storage and transport of samples,

The spread in the CFC-12 data was larger than for any of the other Nordic Seas cruises (not shown) and based on these results we flagged the CFC-12 and CFC-113 data as questionable

ix. Fee once paid will not be refunded under any circumstance other than that mentioned at Guidelines in paragraph 4. If a candidate who has discontinued a course in a particular

(Respondent 4, principal of a public secondary school) It is worth noting that while many Faith organisations, in particular, are intervening as non- state providers of

Exploring the management strategies that minority small enterprise proprietors implement to sustain themselves for longer than 5 years could help new minority small business

Flow and Heat transfer in the steady laminar flow of an incompressible MHD viscoelastic fluid over a stretching sheet with prescribed surface temperature and prescribed heat