• No results found

What TCP/IP Protocol Headers Can Tell Us About the Web

N/A
N/A
Protected

Academic year: 2021

Share "What TCP/IP Protocol Headers Can Tell Us About the Web"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

1

1

What TCP/IP Protocol Headers

Can Tell Us About the Web

What TCP/IP Protocol Headers

What TCP/IP Protocol Headers

Can Tell Us About the Web

Can Tell Us About the Web

http://www.cs.unc.edu/Research/dirt

http://www.cs.unc.edu/Research/dirt

University of North Carolina at

Chapel Hill

University of North Carolina at

University of North Carolina at

Chapel Hill

Chapel Hill

SIGMETRICS, June 2001

SIGMETRICS, June 2001

Félix Hernández Campos

F. Donelson Smith

Kevin Jeffay

David Ott

2 2

Motivation

Motivation

Traffic Modeling and Characterization Traffic Modeling and Characterization • Can we continuously acquire network traffic data

using off-the-shelf hardware and software? •

• Can we continuously acquire network traffic dataCan we continuously acquire network traffic data using off-the-shelf hardware and software?

using off-the-shelf hardware and software?

• Can we use this information to construct up-to-date, application-level traffic models?

– Populate traffic generator with analytic distributions for simulations and lab experiments

• Can we study the traffic generated by a large population of users while protecting their privacy? • Case study: Web Traffic

• Can we use this information to construct up-to-date,Can we use this information to construct up-to-date, application-level traffic models?

application-level traffic models? –

–Populate traffic generator with analytic distributions forPopulate traffic generator with analytic distributions for simulations and lab experiments

simulations and lab experiments •

• Can we study the traffic generated by a Can we study the traffic generated by a largelarge population

population of users while protecting their privacy? of users while protecting their privacy? •

• Case study: Case study: Web TrafficWeb Traffic

3

3

Internet Traffic Characterization

Internet Traffic Characterization

Previous Work Previous Work

• Traffic modeling before the WWW explosion – Danzig et al. (91, 92)

– Paxson (94) •

• Traffic modeling before the WWW explosionTraffic modeling before the WWW explosion –

–DanzigDanzig et al. (91, 92) et al. (91, 92) –

–PaxsonPaxson (94) (94)

• Browsing-based web traffic models – Mah (95)

– Crovella et al. (95, 98)

• Models of TCP connections in the web – Cleveland et al. (00)

• Other large-scale trace analyses related to the web – Gribble & Brewer (97), Balakrishnan et al. (98), Wolman

et al. (99), and Feldmann (00) •

• Browsing-based web traffic modelsBrowsing-based web traffic models –

–MahMah (95) (95) –

–CrovellaCrovella et al. (95, 98) et al. (95, 98) •

• Models of TCP connections in the webModels of TCP connections in the web –

–Cleveland et al. (00)Cleveland et al. (00) •

• Other large-scale trace analyses related to the webOther large-scale trace analyses related to the web –

–GribbleGribble & Brewer (97), & Brewer (97), BalakrishnanBalakrishnan et al. (98), et al. (98), Wolman Wolman et al. (99), and

et al. (99), and FeldmannFeldmann (00) (00)

4 4

Methodology

Methodology

Trace Acquisition Trace Acquisition University of University of North Carolina North Carolina at Chapel Hill

at Chapel Hill Internet

Internet Traffic Monitor Traffic Monitor ( (tcpdumptcpdump)) 35,000 people 35,000 people

• Study Internet traffic generated by a large and diverse population

• Study Internet traffic generated by a large andStudy Internet traffic generated by a large and diverse population

diverse population

Gigabit Ethernet

(2)

5

5

Methodology

Methodology

Benefits of TCP/IP Header Tracing Benefits of TCP/IP Header Tracing • Light-weight

– Off-the-shelf hardware – Freely available software •

• Light-weightLight-weight –

–Off-the-shelf hardwareOff-the-shelf hardware –

–Freely available softwareFreely available software • Privacy

– Easy to address by anonymizing IP address offline • Efficient

– Reduces storage requirements

» E.g. 161 GB for headers instead of 803 GB for entire packets

– Reduces processing requirements during tracing

» Header extraction and recording only

• Large-scale

– E.g. 7 days x 12 hr, 1 Gbps link (20% avg. util.), 35K users

• PrivacyPrivacy –

–Easy to address byEasy to address by anonymizing anonymizing IP address offline IP address offline •

• EfficientEfficient –

–Reduces storage requirementsReduces storage requirements

»

» E.g. 161 GB for headers instead of 803 GB for entire packetsE.g. 161 GB for headers instead of 803 GB for entire packets

–Reduces processing requirements during tracingReduces processing requirements during tracing

»

» Header extraction and recording onlyHeader extraction and recording only

• Large-scaleLarge-scale –

–E.g. E.g. 7 days x 12 hr, 1 7 days x 12 hr, 1 GbpsGbps link (20% link (20% avgavg. . utilutil.), 35K users.), 35K users

6 6

Trace Collection

Trace Collection

Summary Summary

• Three sets of traces from UNC – October 99, October 00, April 01

– 1 hour-long tracing periods (1-6 GB per trace) – 42 traces in each set

• Three sets of traces from Three sets of traces from UNCUNC –

–October 99, October 00, April 01October 99, October 00, April 01 –

–1 hour-long tracing periods (1-6 GB per trace)1 hour-long tracing periods (1-6 GB per trace) –

–42 traces in each set42 traces in each set

• Two sets of traces from NLANR (for comparison) – October 99, October 00

– 2 sites

» San Diego Supercomputing Center » Univ. of Michigan/Merit

– 90 second tracing periods (3-67 MB per trace) – 58 traces in each set

• Two sets of traces from Two sets of traces from NLANR NLANR (for comparison)(for comparison) –

–October 99, October 00October 99, October 00 –

–2 sites2 sites

»

» San Diego Supercomputing CenterSan Diego Supercomputing Center »

» UnivUniv. of Michigan/Merit. of Michigan/Merit

–90 second tracing periods (3-67 MB per trace)90 second tracing periods (3-67 MB per trace) –

–58 traces in each set58 traces in each set

7 7

Trace Collection

Trace Collection

Summary Summary 99 99 0000 0101 Total Total 525 M525 M 1873 M1873 M 2419 M2419 M TCP TCP 85%85% 91%91% 91%91% HTTP HTTP 38%38% 29%29% 28%28% Total Total 212 GB212 GB 721 GB721 GB 905 GB905 GB TCP TCP 86%86% 90%90% 91%91% HTTP HTTP 56%56% 35%35% 36%36% 36 GB 36 GB 127 GB127 GB 164 GB164 GB 0 % 0 % 0.02 %0.02 % 0.003 %0.003 % Packets Packets Bytes Bytes

Total Traces Size

Total Traces Size

Avg. % of Packets Avg. % of Packets Lost by Monitor Lost by Monitor 8 8

Case Study: Web Traffic

Case Study: Web Traffic

Packet Capturing Packet Capturing University of University of North Carolina North Carolina at Chapel Hill

at Chapel Hill InternetInternet

• We study a large collection of users as web content consumers

• We study a large collection of users as web contentWe study a large collection of users as web content consumers consumers Web Servers Web Servers Web Clients Web Clients

• We only capture TCP/IP headers – No HTTP headers

• We only capture TCP/IP headersWe only capture TCP/IP headers – –No HTTP headersNo HTTP headers HTTP Request HTTP Request HTTP Response HTTP Response

(3)

9

9

Methodology

Methodology

Do We Really Need HTTP Headers? Do We Really Need HTTP Headers? • We can infer plenty of HTTP information from

TCP/IP headers – Request size – Response size

– Embedded objects per web page – Servers per page

– Use of persistent connections – ...

• TCP/IP headers are sufficient for

– Constructing application-level traffic models – Studying the impact of new HTTP dynamics •

• We can infer plenty of HTTP information fromWe can infer plenty of HTTP information from TCP/IP headers

TCP/IP headers –

–Request sizeRequest size –

–Response sizeResponse size –

–Embedded objects per web pageEmbedded objects per web page –

–Servers per pageServers per page –

–Use of persistent connectionsUse of persistent connections –

–... •

• TCP/IP headers are sufficient forTCP/IP headers are sufficient for –

–Constructing application-level traffic modelsConstructing application-level traffic models –

–Studying the impact of new HTTP dynamicsStudying the impact of new HTTP dynamics

10

10

Web Traffic Analysis

Web Traffic Analysis

Processing Sequence Overview Processing Sequence Overview

Raw TCP/IP Raw TCP/IP headers headers trace trace tcpdump tcpdump TCP TCP Connections Connections (Port 80) (Port 80)

Filter & Sort

Filter & Sort

HTTP HTTP Req Req//RspRsp Exchanges Exchanges Connection Connection Analysis Analysis HTTP HTTP Distributions

Distributions StatisticsStatistics

11 11

Methodology

Methodology

Request/Response Traces Request/Response Traces Web Client

Web Client Web ServerWeb Server HTTP HTTP Request Request x bytes x bytes HTTP HTTP Response Response y bytes y bytes 12 12

TCP/IP Headers and HTTP

TCP/IP Headers and HTTP

Request/response Exchange Request/response Exchange Web Client (UNC)

Web Client (UNC) Web Server (Internet)Web Server (Internet)

HTTP HTTP Response Response HTTP HTTP Request Request DATA DATA ACK ACK DATA DATA DATA DATA ACK ACK seqno

seqno305305 acknoackno 11

seqno

seqno11 acknoackno 305305

seqno

seqno14611461acknoackno 305305 seqno

seqno28762876acknoackno 305305 seqno

seqno305305 acknoackno 28762876

FIN FIN FIN-ACK FIN-ACK FIN FIN FIN-ACK FIN-ACK SYN SYN SYN-ACK SYN-ACK ACK ACK 304 bytes 304 bytes 2875 bytes 2875 bytes

(4)

13

13

Packet Capturing

Packet Capturing

Inbound TCP/IP Headers Only Inbound TCP/IP Headers Only

Internet Internet University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill Gigabit Ethernet Gigabit Ethernet Web Servers Web Servers Web Clients Web Clients 14 14

Packet Capturing

Packet Capturing

Inbound TCP/IP Headers Only Inbound TCP/IP Headers Only • Two fiber links•• Two fiber linksTwo fiber links

Internet Internet University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill Outbound Fiber Outbound Fiber Inbound Fiber Inbound Fiber Web Servers Web Servers Web Clients Web Clients 15 15

Packet Capturing

Packet Capturing

Inbound TCP/IP Headers Only Inbound TCP/IP Headers Only • Only inbound TCP/IP headers are captured

– Eliminate synchronization and buffering issues on the NIC – Reduce trace size

• Only inbound TCP/IP headers are capturedOnly inbound TCP/IP headers are captured –

–Eliminate synchronization and buffering issues on the NICEliminate synchronization and buffering issues on the NIC –

–Reduce trace sizeReduce trace size

Internet Internet University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill Outbound Fiber Outbound Fiber Traffic Monitor Traffic Monitor ( (tcpdumptcpdump)) Inbound Fiber Inbound Fiber Web Servers Web Servers Web Clients Web Clients 16 16

TCP/IP Headers and HTTP

TCP/IP Headers and HTTP

Request/response Exchange Request/response Exchange Web Client (UNC)

Web Client (UNC) Web Server (Internet)Web Server (Internet)

HTTP HTTP Response Response 2875 bytes 2875 bytes HTTP HTTP Request Request 304 bytes 304 bytes DATA DATA ACK ACK DATA DATA DATA DATA ACK ACK seqno

seqno305305 acknoackno 11

seqno

seqno11 acknoackno 305305

seqno

seqno14611461acknoackno 305305 seqno

seqno28762876acknoackno 305305 seqno

seqno305305 acknoackno 28762876

FIN FIN FIN-ACK FIN-ACK FIN FIN FIN-ACK FIN-ACK SYN SYN SYN-ACK SYN-ACK ACK ACK

(5)

17

17

TCP/IP Headers and HTTP

TCP/IP Headers and HTTP

Server-to-client Segments Only Server-to-client Segments Only Web Client (UNC)

Web Client (UNC) Web Server (Internet)Web Server (Internet)

HTTP HTTP Response Response 2875 bytes 2875 bytes HTTP HTTP Request Request 304 bytes 304 bytes ACK ACK DATA DATA DATA DATA seqno

seqno11 acknoackno 305305

seqno

seqno14611461acknoackno 305305 seqno

seqno28762876acknoackno 305305

FIN FIN FIN-ACK FIN-ACK SYN-ACK SYN-ACK 18 18

Methodology

Methodology

Request/Response Traces Request/Response Traces Web Client

Web Client Web ServerWeb Server

Computed Computed Directly Observed Directly Observed HTTP Request HTTP Request 304 bytes 304 bytes HTTP Response HTTP Response 2875 bytes 2875 bytes

• Unidirectional TCP/IP header traces are sufficient for capturing application-level behavior

• Unidirectional TCP/IP header traces are sufficient forUnidirectional TCP/IP header traces are sufficient for capturing application-level behavior

capturing application-level behavior

19

19

HTTP Characterization

HTTP Characterization

Response Data Sizes – Body CDF Response Data Sizes – Body CDF

Cumulative Probability Cumulative Probability

(% Responses)

(% Responses)

Response Size (in bytes)

Response Size (in bytes)

0.85 0.85 2001 2001 1999 1999 20 20

HTTP Characterization

HTTP Characterization

Response Data Volumes – Body CDF Response Data Volumes – Body CDF

Response Size (in bytes)

Response Size (in bytes) Cumulative Probability Cumulative Probability

(% Bytes) (% Bytes) 0.25 0.25 10000 10000 1e+051e+05

(6)

21

21

HTTP Characterization

HTTP Characterization

Response Data Sizes – Tail CCDF Response Data Sizes – Tail CCDF

Complementary Cumulative ProbabilityComplementary Cumulative Probability

Response Size (in bytes)

Response Size (in bytes)

90 second 90 second traces traces 1-hour long 1-hour long traces traces 22 22

HTTP Characterization

HTTP Characterization

Request Data Size – Body CDF Request Data Size – Body CDF

Cumulative ProbabilityCumulative Probability

Request Size (in bytes)

Request Size (in bytes) 1999 1999 20002000 2001 2001 23 23

HTTP Characterization

HTTP Characterization

Request Data Size – Tail CCDF Request Data Size – Tail CCDF

Complementary Cumulative ProbabilityComplementary Cumulative Probability

Request Size (in bytes)

Request Size (in bytes)

24 24

Persistent Connections in HTTP

Persistent Connections in HTTP

Effective Persistence Effective Persistence

• An HTTP persistent connection can use a single TCP connection to carry one or more request/response exchanges

• An HTTP An HTTP persistent connectionpersistent connection can use a single TCP can use a single TCP connection to carry one or more request/response connection to carry one or more request/response exchanges

exchanges

• This feature is supported in newer versions of the protocol

– HTTP/1.0 (limited support) – HTTP/1.1

• We study how persistent connections are used – We define effective persistence as two or more

request/response exchanges in the same TCP connection •

• This feature is supported in newer versions of theThis feature is supported in newer versions of the protocol

protocol –

–HTTP/1.0 (limited support)HTTP/1.0 (limited support) –

–HTTP/1.1HTTP/1.1 •

• We study how persistent connections are usedWe study how persistent connections are used –

–We defineWe define effective persistence effective persistence as two or moreas two or more

request/response exchanges in the same TCP connection request/response exchanges in the same TCP connection

(7)

25

25

Persistent Connections in HTTP

Persistent Connections in HTTP

Example – TCP/IP Headers Example – TCP/IP Headers Web Client (UNC)

Web Client (UNC) Web Server (Internet)Web Server (Internet)

SYN-ACK

SYN-ACK

ACK

ACK seqnoseqno11 acknoackno 305305

DATA

DATA seqnoseqno14611461acknoackno 305305 DATA

DATA seqnoseqno28762876acknoackno 305305

FIN

FIN

FIN-ACK

FIN-ACK

ACK

ACK seqnoseqno28762876acknoackno 567567

DATA

DATA seqnoseqno43364336acknoackno 567567 DATA

DATA seqnoseqno57965796acknoackno 567567 DATA

DATA seqnoseqno63416341acknoackno 567567 seqno

seqno11 acknoackno 11

Ackno Ackno increased increased Request 1 Request 1 304 bytes 304 bytes Response 1 Response 1 2875 bytes 2875 bytes Response 2 Response 2 3465 bytes 3465 bytes Request 2 Request 2 262 bytes 262 bytes Ackno Ackno increased increased Seqno Seqno increased increased 26 26

Persistent Connections in HTTP

Persistent Connections in HTTP

Example – Request/Response Exchanges Example – Request/Response Exchanges

Computed

Computed

Directly Observed

Directly Observed HTTP Request 1

HTTP Request 1 304 bytes304 bytes

HTTP Response 1

HTTP Response 12875 bytes2875 bytes

Computed

Computed

Directly Observed

Directly Observed HTTP Request 2

HTTP Request 2 262 bytes262 bytes

HTTP Response 2

HTTP Response 23465 bytes3465 bytes

Web Client (UNC)

Web Client (UNC) Web Server (Internet)Web Server (Internet)

27

27

Effective Persistent Connections

Effective Persistent Connections

Summary Statistics Summary Statistics Objects Objects Persistent Persistent 49.7%49.7% 42.8%42.8% Non-Persistent Non-Persistent 50.3%50.3% 57.2%57.2% Connections

Connections PersistentPersistent 15.1%15.1% 13.8%13.8%

Non-Persistent

Non-Persistent 78.1%78.1% 63.4%63.4% Unclassified

Unclassified 6.8%6.8% 22.8%22.8% UNC 00

UNC 00 NLANR 00NLANR 00

Bytes Bytes Persistent Persistent 40.4%40.4% 35.7%35.7% Non-Persistent Non-Persistent 49.6%49.6% 54.3%54.3% 28 28

HTTP Characterization

HTTP Characterization

Objects in Persistent Connections Objects in Persistent Connections

Cumulative ProbabilityCumulative Probability

No. of Request/Response Exchanges

No. of Request/Response Exchanges

0.87

(8)

29 29

HTTP Characterization

HTTP Characterization

Other Statistics Other Statistics

• Page-based statistics (based on Mah and Crovella et al.)

– Think times

– Top-level vs. embedded objects

» Requests and Responses

– Unique TCP connections per page – Unique server IP addresses per page – Consecutive pages per server

– Number of pages per client – Primary vs. secondary servers

» Requests and Responses

• Other non-page-based statistics – Number of exchanges per client •

• Page-based statistics Page-based statistics (based on (based on Mah Mah and and Crovella Crovella et al.)et al.)

–Think timesThink times –

–Top-level vs. embedded objectsTop-level vs. embedded objects

»

» Requests and ResponsesRequests and Responses

–Unique TCP connections per pageUnique TCP connections per page –

–Unique server IP addresses per pageUnique server IP addresses per page –

–Consecutive pages per serverConsecutive pages per server –

–Number of pages per clientNumber of pages per client –

–Primary Primary vsvs. secondary servers. secondary servers

»

» Requests and ResponsesRequests and Responses

• Other non-page-based statisticsOther non-page-based statistics –

–Number of exchanges per clientNumber of exchanges per client

30

30

Limitations

Limitations

TCP/IP Header Tracing TCP/IP Header Tracing

• Uncertainties arise when application-level

information is inferred from transport-level headers •

UncertaintiesUncertainties arise when application-level arise when application-level

information is inferred from transport-level headers information is inferred from transport-level headers • We discuss several issues in our paper

– Pipelining

– User/browser interactions

» Stop and reload

– Caches

» Local cache and proxies

– TCP segment processing

» Segment reordering

• In summary, limited or no impact in our results •

• We discuss several issues in our paperWe discuss several issues in our paper –

–PipeliningPipelining –

–User/browser interactionsUser/browser interactions

»

» Stop and reloadStop and reload

–CachesCaches

»

» Local cache and proxiesLocal cache and proxies

–TCP segment processingTCP segment processing

»

» Segment reorderingSegment reordering

• In summary, limited or no impact in our resultsIn summary, limited or no impact in our results

31

31

Summary and Conclusions

Summary and Conclusions

Methodology Methodology

• Unidirectional TCP/IP header tracing is a powerful and light-weight traffic measurement methodology •

• Unidirectional TCP/IP header tracing is a powerfulUnidirectional TCP/IP header tracing is a powerful and light-weight traffic measurement methodology and light-weight traffic measurement methodology • Limitations have a minor impact in application-level

results

• We also applied this methodology to – SMTP

– FTP

– Other application-level protocol •

• Limitations have a minor impact in application-levelLimitations have a minor impact in application-level results

results •

• We also applied this methodology toWe also applied this methodology to –

–SMTPSMTP –

–FTPFTP –

–Other application-level protocolOther application-level protocol

32

32

Summary and Conclusions

Summary and Conclusions

Web Traffic Characterization Web Traffic Characterization • New data to populate traffic generators

– Request sizes – Response sizes

– Use of persistent connections – ...

• New data to populate traffic generatorsNew data to populate traffic generators –

–Request sizesRequest sizes –

–Response sizesResponse sizes –

–Use of persistent connectionsUse of persistent connections –

–...

• 1-hour long traces are sufficient to capture application-level behavior

– Short traces cut off large objects, which skews the tails of the distributions

• Persistent Connections:

– ~15% of all the HTTP connections

– 40-50% of all the transferred HTTP bytes •

• 1-hour long traces are sufficient to capture1-hour long traces are sufficient to capture application-level behavior

application-level behavior –

–Short traces cut off large objects, which skews the tails ofShort traces cut off large objects, which skews the tails of the distributions

the distributions •

• Persistent Connections:Persistent Connections: –

–~15% of all the HTTP connections~15% of all the HTTP connections –

References

Related documents

Of these 4 cases, two of the schools are located in Gaston, NC: Northampton High West and Squire Elementary (NCDPI, 2016). While the closure of these schools may have been

En rapport avec ce référentiel (en l’occurrence les pairs), la performance s’apprécie en termes de productivité (de production scientifique). Cette

The SPDR® S&P/ASX 200 Fund is not sponsored, endorsed, sold or promoted by S&P, ASX or their respective affiliates, and S&P, ASX and their respective affiliates make

When the same set of cohort dummies are incorporated into the equation, in column 4 of Table 2, the turning point of the happiness function is at age 44.9 years.. This is

business, combined with an ever- -growing customer growing customer expectation for responsive service, requires service expectation for responsive service, requires service

Control measures that should be in place where persons may be exposed to ionising radiation at work include limiting the time of exposure with the exclusion of particularly

BEECHCRAFT Super King Air 300, serials FA-169, FA-177, FA-184, and FA-190 through FA-206 and any other previous Model 300 airplane which has previously installed a cockpit bleed