• No results found

Computer Networking Chap3

N/A
N/A
Protected

Academic year: 2021

Share "Computer Networking Chap3"

Copied!
102
0
0

Loading.... (view fulltext now)

Full text

(1)

Chap.3 Transport Layer

‰ Goal : study principle of providing comm services to app processes

and implementation issues in the Internet protocols, TCP and UDP

‰ Contents

z Relationship bw transport and net layers

{ extending net layer’s delivery service to a delivery service bw two app-layer processes, by covering UDP

z Principles of reliable data transfer and TCP

(2)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services

z Relationship Between Transport and Network Layers z Overview of the Transport Layer in the Internet

‰ Multiplexing and Demultiplexing ‰ Connectionless Transport: UDP

‰ Principle of Reliable Data Transfer ‰ Connection-Oriented Transport: TCP ‰ Principles of Congestion Control

(3)

Overview of Transport-layer

‰ provide logical comm bw app

processes running on diff hosts

‰ transport protocols run in end

systems

z sending side: converts msgs from

app process into transport-layer pkts (segments in Internet term), passes them to net layer

{ (possibly) break app msgs into small chunks, and add headers

z receiving side: processes

segments from net layer, making them available to app

‰ more than one transport protocol

(4)

Relationship bw Transport and Network layers

‰ transport layer provides logical comm bw processes, whereas net layer

provides logical comm bw hosts

‰ Household analogy

z kids in one household (A) write letters to kids in another household (B)

{ Ann in A and Bill in B collect/distribute mail from/to other kids

z analogies

{ letters in envelopes ~ app messages { kids ~ processes

{ houses ~ hosts

{ Ann and Bill ~ transport protocol

Š not involved in delivering mail bw mail centers

(5)

Overview of Transport-layer in the Internet

‰ IP (Internet Protocol) provides best-effort delivery service

z makes “best-effort” to deliver segments, but no guarantees : no

guarantee on orderly delivery, integrity of data in segments ⇒ unreliable service

‰ User Datagram Protocol (UDP) : provides an unreliable

connectionless service, no-frills extension of IP service

z transport-layer multiplexing and demultiplexing : extend IP’s

host-to-host delivery to process-to-process delivery

z integrity checking by including error-detection fields in segment

header

‰ Transmission Control Protocol (TCP) : provides a reliable

connection-oriented service with several additional services to app

z reliable data transfer : correct and in-order delivery by using

{ flow control and error control (seq #, ack, timers)

z connection setup z congestion control

(6)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services ‰ Multiplexing and Demultiplexing

‰ Connectionless Transport: UDP

‰ Principle of Reliable Data Transfer ‰ Connection-Oriented Transport: TCP ‰ Principles of Congestion Control

(7)

Multiplexing and Demultiplexing

‰ a process can have one or more sockets; each socket having a unique id ‰ multiplexing at sending host : Ann’s job in household analogy

z gathering data chunks at sources from diff sockets

z encapsulating each chunk with header info to create segments z passing segments to net layer

‰ demultiplexing at receiving host : Bill’s job in household analogy

(8)

How Demultiplexing Works

‰ host receives IP datagrams

z each datagram has src and dst IP addrs

{ each datagram carries a transport-layer seg

z each seg has src and dst port #s

{ well-known port #s : reserved for well-known app protocols, ranging 0 ~ 1023 : HTTP(80), FTP(21), SMTP(25) , DNS(53) { other #s : can be used for user apps

(9)

Connectionless Multiplexing and Demultiplexing

‰ creating UDP socket

DatagramSocket mySocket1 = new DatagramSocket();

{ transport layer automatically assigns a port # to the socket, in the

range 1024~65535 not currently used by other UDP ports

DatagramSocket mySocket2 = new DatagramSocket(19157);

{ app assigns a specific port # 19157 to the UDP socket

z typically, the port # in the client side is automatically assigned, whereas the server side assigns a specific port #

‰ When a host receives UDP seg, it checks dst port # in the seg and directs the seg to the socket with that port #

z UDP socket identified by 2-tuple :

(dst IP addr, dst port #)

{ IP datagrams with diff src IP

addrs and/or src port #s are

directed to the same socket

(10)

‰ TCP socket identified by 4-tuple

(src IP addr, src port #, dst IP addr, dst port #)

‰ demultiplexing at receiving host

z 4-tuple used to direct seg to appropriate socket

z TCP segs with diff src IP addrs or src IP port #s are directed

to two diff sockets (except TCP seg carrying conn-establishment request)

‰ server host may support many simultaneous TCP sockets

z each socket identified by its own 4-tuple

(11)
(12)

Connection-Oriented Mux/Demux : Threaded Server

‰ Today’s high-performing Web server uses only one process, but

creating a new thread with a new conn for each new client conn

(13)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services ‰ Multiplexing and Demultiplexing

‰ Connectionless Transport: UDP

z UDP Segment Structure z UDP Checksum

‰ Principle of Reliable Data Transfer ‰ Connection-Oriented Transport: TCP ‰ Principles of Congestion Control

(14)

User Datagram Protocol (UDP) [RFC 768]

‰ no-frills, bare bones transport protocol : adds nothing to IP but,

z multiplexing/demultiplexing : src and dst port #s z (light) error checking

‰ features of UDP

z unreliable best-effort service : no guarantee on correct delivery

{ UDP segments may be lost and delivered out of order to app

z connectionless : no handshaking bw UDP sender and receiver

‰ Q: Isn’t TCP always preferable to UDP? A: No

z simple, but suitable to certain apps such as real-time apps

(15)
(16)

Controversy on UDP

‰ UDP is lack of congestion control and reliable data transfer ‰ when many users starts streaming high-bit rate video, packet

overflow at routers, resulting in

z high loss rates for UDP packets z decrease TCP sending rate

⇒ adaptive congestion control, forcing all sources including UDP sources, required in particular streaming multimedia apps

‰ build reliability directly into app (e.g., adds ack/rexmission)

z many of today’s proprietary streaming apps run over UDP, but

builds ack and rexmission into app in order to reduce pkt loss nontrivial, but can avoid xmission-rate constraint imposed by

(17)

UDP Segment Structure

‰ Source port #, dst port # : used for multiplexing/demultiplexing ‰ Length : length of UDP seg including header, in bytes

‰ Checksum : to detect errors (i.e., bits altered) on an end-end basis

z error source : noise in the links or while store in a router

(18)

UDP Checksum Calculation (1) : Sender

‰ sum all of 16-bit words in segment in a row, with two words for

each calculation with overflow wrapped around

‰ take 1’s complement of the sum; the result is the checksum value

(ex) three 16-bit words

z sum of first two words

z adding third word

0110011001100000 0101010101010101 1000111100001100 0110011001100000 0101010101010101 1011101110110101 1011101110110101

(19)

‰ add all 16-bit words including checksum, and decide

z no error detected, if the result is 1111111111111111 z error detected, otherwise

{ nonetheless the decision is not perfect : error may actually have taken place even when no error detection is decided

‰ UDP is not responsible for recovering from error

z reaction to detecting errors depends on implementations

{ simply discard damaged seg, or

{ pass damaged seg to app with warning

(20)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services ‰ Multiplexing and Demultiplexing

‰ Connectionless Transport: UDP

‰ Principle of Reliable Data Transfer

z Building a Reliable Data Transfer Protocol z Pipelined Reliable Data Transfer Protocol z Go-Back-N (GBN)

(21)

Reliable Data Transfer : Service Model and Implementation

‰ reliable data transfer : no corruption, no loss, and in-order delivery

z of central importance to networking : not only at transport layer,

but also at link layer and app layer rdt_send() :

called from app deliver_data()to deliver data to app: called by rdt

udt_send() : called by rdt to sen pkt over

unreliable channel rdt_rcv() : called from channel upon pkt arrival

(22)

Reliable Data Transfer: Implementation Consideration

‰ characteristics of unreliable channel determines the complexity of

reliable data transfer protocol

‰ We will

z incrementally develop sender and receiver sides of rdt protocol,

considering increasingly complex model of underlying channel

z consider only unidirectional data transfer for simplicity purpose

{ but, control packet is sent back and forth

z use finite state machines (FSM) to specify sender, receiver event causing state transition

actions taken on state transition dashed arrow : initial state

(23)

rdt1.0 : Perfectly Reliable Channel

‰ Assumptions of underlying channel

z perfectly reliable : no bit errors, no loss of packets

‰ separate FSMs for sender and receiver

z sender sends data into underlying channel z receiver read data from underlying channel

(24)

rdt2.0 : Channel with Errors

‰ New assumptions of underlying channel

z may be corrupted when transmitted, propagated, or buffered z no loss and in-order delivery

‰ Automatic Repeat reQuest (ARQ) protocols

z error detection : extra bits placed in checksum field

z receiver feedback : ACK/NAK pkt explicitly sent back to sender

{ ACK (positive acknowledgement) : when pkt received OK

{ NAK (negative acknowledgement) : when pkt received in error

(25)
(26)
(27)

rdt2.0 : Fatal Flaw

Q: How to recover from errors in ACK or NAK pkts?

z minimally, need to add checksum bits to ACK/NAK pkts z possible solutions

{ repeated requests from sender/receiver for a garbled ACK and NAK : hard to find a clue to way out

{ add enough checksum bits for correction : not applicable for lost pkt

{ simply resend the pkt when receiving a garbled ACK or NAK ⇒ incurs possible duplicate at receiver

Š receiver doesn’t know whether it is a new pkt or a rexmission (i.e., a duplicate pkt)

‰ handling duplicates : add a new field (seq # field) to the packet

z sender puts a seq # into this field, and receiver discards

duplicate pkt

z 1-bit seq # suffice for stop-and-stop protocol

‰ rdt2.0 is stop-and-wait protocol : sender sends one pkt, then waits

(28)

Description of sol 1 of Fatal Flaw of rdt2.0

A dictates something to B

B replies ok

or “please

repeat”

A didn’t understand

What did you s

ay? but corrupted

B has no idea whether it is part of dictation or request for repetition of last reply

What did y

(29)
(30)
(31)

rdt2.1 : Discussion

‰ sender

z seq # added to pkt

z two seq #’s (0,1) will suffice

z must check if received ACK/NAK corrupted z twice as many states

{ state must remember whether current pkt has seq # of 0 or 1

‰ receiver

z must check if received pkt is duplicate

{ state indicates whether 0 or 1 is expected pkt seq #

(32)

rdt2.2 : NAK-free

‰ accomplish the same effect as a NAK, by sending an ACK for the

last correctly received pkt

z receiver must explicitly include seq # of pkt being ACKed

‰ sender that receives two ACKs (i.e., duplicate ACKs) knows that

receiver didn’t correctly receive the pkt following the pkt being acked twice, thus rexmits the latter

(33)
(34)
(35)

rdt3.0 : Channel with Errors and Loss

‰ new assumptions of underlying channels :

z can lose pkts (data or ACKs)

Q : how to detect pkt loss and what to do when pkt loss occurs

z checksum, seq #, ACKs, rexmissions are of help, but not enough

‰ approaches

z sender waits proper amount of time (at least round-trip delay +

processing time at receiver) to convince itself of pkt loss

z rexmits the pkt if ACK not received within this time

z if a pkt (or its ACK) just overly delayed, sender may rexmit the

pkt even though it has not been lost

{ but, seq # handles the possibility of duplicate pkts

‰ implementation

z countdown timer set appropriately starts each time pkt is sent z rexmit pkt when the timer is expired

(36)
(37)
(38)
(39)

Performance of rdt3.0 (Stop-and-Wait Protocol)

‰ assumption : ignore xmission time of ACK pkt (which extremely small)

and processing time of pkt at the sender and receiver

‰ sender utilization Usender : frac. of time sender is busy sending into ch

ex) 1 Gbps link, 30 ms RTT, 1 KB packet

z net protocol limits the capabilities provided by underlying net HW 9 0.008 8, 000 bits/packet 0.00027 ; 0.008 ms 30 0.008 10 bits/sec trans sender trans trans t U t L R RTT t = = ≈ = = = + + very poor!

(40)

Pipelining

‰ sends multiple pkts without waiting for acks

z range of seq #s is increased

z buffering at sender and/or receiver required

{ sender : pkts that have been xmitted by not yet acked { receiver : pkts correctly receiver

(41)

Go-Back-N (GBN) Protocol

‰ sender’s view of seq #s in GBN

z window size N : # of pkts allowed to send without waiting for ACK

{ GBN often referred to as sliding window protocol

z pkt’s seq # : carried in a k-bit field in pkt header

{ range of seq # : [0, 2k-1] with modulo 2k arithmetic

‰ events at GBN sender

z invocation from above : before sending, check if window isn’t full z receipt of an ACK : cumulative ack - ack with seq # n indicates all

pkts with a seq up to and including n have been correctly received

z timeout : resend all pkts previously xmitted but not yet acked

‰ drawback of GBN : when widow size and bw-delay product are large,

(42)
(43)

Go-Back-N (GBN) Protocol : Receiver

‰ when pkt with seq # n is received correctly and in-order, receiver

sends an ACK for pkt n and delivers data portion to upper layer

‰ receiver discards out-of-order pkts and resends an ACK for the

most recently received in-order pkt

z simple receiver buffering : needn’t buffer any out-of-order pkts z only info needed : seq # of next in-order pkt, expectedseqnum

(44)

Go-Back-N (GBN) Protocol : Operation

(45)

Selective Repeat (SR) Protocol

‰ sender rexmits only pkts for which ACK not received ⇒ avoid unnecessary

rexmission

‰ receiver individually acks correctly received pkts regardless of their order

(46)

SR Protocol : Sender/Receiver Events and Actions

‰ sender

z data from above : if next available seq # is in window, send pkt z timeout(n) : resend pkt n, restart timer

{ each pkt has its own (logical) timer

z ACK(n) in [sendbase,sendbase+N]

{ mark pkt n as received

{ if n is equal to send_base, window base is moved forward to next

unacked pkt, and xmit unxmitted pkts in advanced window ‰ receiver

z pkt n in [rcvbase, rcvbase+N-1] correctly received : send ACK(n)

{ if not previously received, it is buffered

{ if n is equal to rcv_base, this pkt and previously buffered in-order pkts

(47)
(48)

Max. Window Size

‰ stop-and-wait protocol

z window size N ≤ 2k-1 (k: # of seq field), not 2k, why?

ex) k=2 ⇒ seq #s : 0, 1, 2, 3; max N = 3 ‰ SR protocol

z scenarios

(a) : all acks are lost

Š incorrectly sends duplicate as new (b) : all acks received correctly, but pkt 3

is lost

{ receiver can’t distinguish xmission of pkt

0 in (b) from rexmission of pkt 0 in (a) z further consideration on scenario (a)

{ A rexmits pkt 0; B receives and buffer it

(49)

rdt : Comment on Packet Reordering

‰ since seq #s are reused, old copies of a pkt with a seq/ack # of x

can appear, even though neither sender’s nor receiver’s window contains x

z use of max pkt lifetime : constrain pkt to live in the net

(50)
(51)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services ‰ Multiplexing and Demultiplexing

‰ Connectionless Transport: UDP

‰ Principle of Reliable Data Transfer ‰ Connection-Oriented Transport: TCP

z TCP Connection

z TCP Segment Structure

z Round-Trip Time Estimation and Timeout z Reliable Data Transfer

z Flow Control

z TCP Connection Management

‰ Principles of Congestion Control ‰ TCP Congestion Control

(52)

TCP Connection

‰ two processes established connection via 3-way handshake before sending data, and initialize TCP variables

z full duplex : bi-directional flow bw processes in the same conn z point-to-point : bw one sender and one receiver

{ multicasting is not possible with TCP

‰ a stream of data passes through a socket into send buffer z TCP grab chunks of data from send buffer

z max seg size (MSS) : max amount of app-layer data in seg

{ set based on Path MTU of link-layer

{ typically, 1,460 bytes, 536 bytes, or 512 bytes

(53)

TCP Segment Structure

for reliable data xfer count in bytes, not pkts

for flow control, # of bytes receiver willing to receive 4-bit # counting

in 32-bit words

typically, empty - time-stamping

- mss, window scaling factor negotiation, etc.

for error detection

• ACK : indicates value in ack field is valid

• SYN, RST, FIN : used for connection setup and teardown • PSH : receiver should pass data to upper layer immediately

• URG : indicates there is an urgent data in the seg marked by sending-side upper layer - urgent data pointer indicates the last bytes of urgent data

(54)

Seq Numbers and Ack Numbers

‰ seq # : 1st byte in seg over xmitted bytes stream, not over series

of xmitted segs

z TCP implicitly number each byte in data stream

z initial seq # is chosen randomly rather than set 0, why?

‰ ack # : seq # of next byte expected from other side

(55)

Telnet : Case Study of Seq and Ack Numbers

‰ each ch typed by A is echoed back by B and displayed on A’s screen

ACK piggybacked on B-to-A data seg

(56)

Estimating Round-Trip Time (RTT)

‰ clearly, TCP timeout value > RTT

Q : How much larger? How to estimate RTT? Each seg exploited in estimating RTT? …

‰ estimating RTT

z SampleRTT : time measured from seg xmission until ACK receipt

{ measured not for every seg xmitted, but for one of xmitted segs approximately once every RTT

{ rexmitted segs are not considered in measurements

{ fluctuates from seg to seg : atypical ⇒ needs some sort of avg

‰ Exponential Weighted Moving Average (EWMA) of RTT

(57)

RTT Samples and RTT Estimates

(58)

Retransmission Timeout Interval

‰ DevRTT, variation of RTT : an estimate of how much SampleRTT

deviates from EstimatedRTT

DevRTT = (1-β)⋅DevRTT + β⋅|SampleRTT−EstimatedRTT|

z large (or small) when there is a lot of (or little) fluctuation z recommended value of β : 0.25

‰ TCP’s timeout interval

z should be larger, or unnecessarily rexmit!

z but, if too much larger, TCP wouldn’t quickly rexmit, leading to

large data transfer delay

z thus, timeout interval should be EstimatedRTT plus some safety

(59)

TCP Reliable Data Transfer

‰ reliable data transfer service on top of IP’s unreliable service

z seq # : to identify lost and duplicate segs z cumulative ack : positive ACK (i.e, NAK-free) z timer

{ a single rexmission timer is recommended [RFC 2988], even if there are multiple xmitted but not yet acked segs

{ rexmissions triggered by Š when timed out

Š 3 duplicate acks at sender : fast rexmit in certain versions

‰ We’ll discuss TCP rdt in two incremental steps

z highly simplified description : only timeouts considered

z more subtle description : duplicate acks as well as timeouts

considered

(60)

Simplified TCP Sender

seq # is byte-stream # of the first data byte in seg

(61)

TCP Retransmission Scenarios

rexmission due to a lost ack cumulative ack avoids

rexmission of first seg segment 100 not rexmitted

SendBase=100

SendBase=120

SendBase=120 SendBase=120 SendBase=100

(62)

TCP Modifications : Doubling Timeout Interval

‰ at each timeout, TCP rexmits and set next timeout interval to

twice the previous value

⇒ timeout intervals grow exponentially after each rexmission

‰ but, for the other events (i.e., data received from app and ACK

received) timeout interval is derived from most recent values of

(63)

TCP ACK Gen Recommendation [RFC 1122, 2581]

‰ timeout period can be relatively long ⇒ may increase e-t-e delay

‰ when sending a large # of segs back to back (such as a large file), if

(64)

TCP Modifications : TCP Fast Retransmit

‰ TCP Fast Retransmit : rexmits a (missing) seg before its timer

expiration, if TCP sender receives 3 duplicate ACKs if (y > SendBase) { // event: ACK received,

with ACK field value of y

SendBase = y

if (there are currently not-yet-acked segs)

start timer }

else { // a duplicate ACK for already ACKed

segment

increment count of dup ACKs received for y

(65)

Is TCP Go-Back-N or Selective Repeat?

‰ similarity of TCP with Go-Back-N

z TCP : cumulative ack for the last correctively received, in-order seg z cumulative and correctly received but out-of-order segs are not

individually acked

TCP sender need only maintain SendBase and NextSeqNum ‰ differences bw TCP and Go-Back-N : many TCP implementations

z buffer correctly received but out-of-order segs rather than discard z also, suppose a seq of segs 1, 2, … N, are received correctively in-order,

ACK(n), n < N, gets lost, and remaining N-1 acks arrive at sender before their respective timeouts

{ TCP rexmits at most one seg, i.e., seg n, instead of pkts, n, n+1, …, N { TCP wouldn’t even rexmit seg n if ACK(n+1) arrived before timeout for

seg n

‰ a modification to TCP in [RFC 2018] : selective acknowledgement

z TCP receiver acks out-of-order segs selectively rather than cumulatively z when combined with selective rexmission - skipping segs selectively

acked by receiver – TCP looks a lot like generic SR protocol

(66)

Flow Control : Goal

‰ receiving app may not read data in rcv buffer as quickly as

supposed to be

z it may be busy with some other task

z may relatively slow at reading data, leading to overflowing

receiver’s buffer by too much data too quickly sent by sender

‰ flow control : a speed-matching service, matching sending rate

against reading rate of receiving app

z goal : eliminate possibility of sender overflowing receiver buffer

(note) to make the discussion simple, TCP receiver is assumed to discard out-of-order segs

(67)

Flow Control : How It Works?

‰ at receiver

z not to overflow : LastByteRcvd – LastByteRead ≤ RcvBuffer

LastByteRcvd – LastByteRead : # of bytes received not yet read z RevWindow advertising : RcvWindow placed in receive window field in

every seg sent to sender

RcvWindow = RcvBuffer - [LastByteRcvd - LastByteRead] ‰ at sender : limits unacked # of bytes to RcvWindow

z LastByteSent – LastByteAcked ≤ RcvWindow

LastByteSent – LastByteAcked : # of byte sent but not yet acked RevBuffer : size of buffer space allocated to a conn

RcvWindow : amount of free buffer space at rcv’s buffer initial value of RcvWindow = RevBuffer

LastByteRcvd, LastByteRead : variables at receiver LastByteSent, LastByteAcked : variables at sender

(68)

Flow Control : Avoiding Sender Blocking

‰ suppose A is sending to B, B’s rcv buffer becomes full so that

RcvWindow = 0, and after advertising RcvWindow = 0 to A, B has

nothing to send to A

z note that TCP at B sends a seg only if it has data or ack to send

{ there is no way for B to inform A of some space having opened up in B’s rcv buffer ⇒ A is blocked, and can’t xmit any more!

z wayout : A continue to send segs with one data byte when

RcvWindow = 0, which will be acked

{ eventually, the buffer will begin to empty and ack will contain a nonzero RcvWindow value

(69)

TCP Connection Management : Establishment

‰ 3-way handshake

1. client sends SYN seg to server

{ contains no app data

{ randomly select client initial seq #

2. server replies with SYNACK seg

{ server allocates buffers and variables to the connection { contains no app data

{ randomly select server initial seq #

3. client replies with ACK seg

{ client allocates buffers and variables to the connection { may contain data

SYN segment

SYNACK s

egment

(70)

TCP Connection Management : Termination

‰ Either of client or server can end the

TCP connection

‰ duration of TIME_WAIT period :

implementation dependent

z typically, 30 secs, 1 min, 2 mins

‰ RST seg : seg with RST flag set to 1

z sent when receiving a TCP seg

whose dst port # or src IP addr is not matched with on-going one

(71)

TCP State Transition : Client

(72)

TCP State Transition : Server

(73)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services ‰ Multiplexing and Demultiplexing

‰ Connectionless Transport: UDP

‰ Principle of Reliable Data Transfer ‰ Connection-Oriented Transport: TCP ‰ Principles of Congestion Control

z The Causes and the Costs of Congestion z Approaches to Congestion Control

z Network-Assisted Congestion-Control Example for ATM AVR

(74)

Preliminary of Congestion Control

‰ pkt loss (at least, perceived by sender) results from overflowing of

router buffers as the net becomes congested

z rexmission treats a symptom, but not the cause, of net

congestion

‰ cause of net congestion : too many sources attempting to send data

at too high a rate

z basic idea of wayout : throttle senders in face of net congestion z what’s different from flow control?

(75)

Causes and Costs of Congestion : Scenario 1

‰ assumptions

z no error control, flow control, and congestion control

z host A and B send data at an avg rate of λin bytes/sec, respectively

z share a router with outgoing link capacity of R and infinite buffer space z ignore additional header info (transport-layer and lower-layer)

cost of congested net : avg delay grows unboundedly large as arrival rate nears link capacity

(76)

Causes and Costs of Congestion : Scenario 2 (1)

‰ assumptions

z one finite buffer space

(77)

‰ case a (unrealistic) : host A can somehow determine if router

buffer is free, and send a pkt when buffer is free

z no loss, thus no rexmission ⇒ λ’in= λin

‰ case b : a pkt is known for certain to be dropped

z R/3 : original data, R/6 : rexmitted data

z cost of congested net : sender must rexmit dropped pkt

‰ case c : premature timeout for each pkt ⇒ rexmit each pkt twice

Causes and Costs of Congestion : Scenario 2 (2)

(78)

Causes and Costs of Congestion : Scenario 3

‰ assumptions

z 4 routers, each with finite buffer space and link capacity of R z each of 4 hosts has same λin, rexmits over 2-hop paths

• consider A→C conn

• a pkt dropped at R2 (due to high λin from B) wastes the work done by R1

(79)

Two Broad Approaches to Congestion Control

‰ end-end congestion control

z no explicit support (by feedback) from net layer

z congestion inferred by end-system based on observed net

behavior, e.g., pkt loss and delay

z approach taken by TCP

{ congestion is inferred by TCP seg loss indicated by timeout or triple duplicate acks

‰ network-assisted congestion control

z routers provide explicit feedback to end systems regarding

congestion state in the net

z single bit indication

{ SNA, DECnet, TCP/IP ECN [RFC2481], ATM AVR congestion control

(80)

Two Types of Feedback of Congestion Info

‰ direct feedback : from a router to the sender by using choke pkt ‰ feedback via receiver

z router mark/update a field in a pkt flowing forward to indicate

congestion

(81)

ATM ABR Congestion Control

‰ Asynchronous Transfer Mode (ATM)

z a virtual-circuit switching architecture

z info delivered in fixed size cell of 53 bytes

z each switch on src-to-dst path maintains per-VC state

‰ Available Bit Rate (ABR) : an elastic service

z if net underloaded, use as much as available bandwidth

z if net congested, sender rate is throttled to predetermined min

guaranteed rate

‰ Resource Management (RM) cells

z interspersed with data cells, conveying congestion-related info

{ rate of RM cell interspersion : tunable parameter Š default value : one every 32 data cells

z provides both feedback-via-receiver and direct feedback

{ sent by src flowing thru switches to dst, and back to src

(82)

Mechanisms of Congestion Indication in ATM AVR

‰ Explicit Forward Congestion Indication (EFCI) bit

z EFCI bit in a data cell is set to 1 at congested switch

z if a data cell preceding RM cell has EFCI set, dst sets CI bit of RM cell, and sends it back to src

(83)

Chap.3 Transport Layer

‰ Introduction and Transport-Layer Services ‰ Multiplexing and Demultiplexing

‰ Connectionless Transport : UDP ‰ Principle of Reliable Data Transfer ‰ Connection-Oriented Transport : TCP ‰ Principles of Congestion Control

‰ TCP Congestion Control

z Fairness

(84)

Preliminary of TCP Congestion Control (1)

‰ basic idea of TCP congestion control : limit sending rate based on

the network congestion perceived by sender

z increase/reduce sending rate when sender perceives little/∗

congestion along the path bw itself and dst

‰ to keep the description concrete, sending a large file is assumed ‰ How does sender limit sending rate?

LastByteSent - LastByteAcked ≤ min{CongWin, RcvWindow} (1)

z CongWin : a variable limiting sending rate due to perceive congestion

z henceforth, RcvWindow constraint ignored in order to focus on congestion control

(85)

Preliminary of TCP Congestion Control (2)

‰ How does sender perceive congestion on path bw itself and dst?

z a timeout or the receipt of three duplicate ACKs

‰ TCP is self-clocking : acks are used to trigger its increase on cong

window size, thus the sending rate

z consider an optimistic case of cong-free, in which acks are taken as

an indication that seg are successfully delivered to dst

z if acks arrive at a slow/high rate, cong window is increased more

slowly/quickly

‰ How to regulate sending rate as a function of perceived congestion?

z TCP congestion control algorithms, consisting of 3 components

{ additive-increase, multiplicative-decrease (AIMD)

Š AIMD is a big-picture description; details are more complicated { slow start

(86)

Additive-Increase, Mulitplicative-Decrease

‰ multiplicative decrease : cut CongWin in half down to 1 MSS when detecting a loss

‰ additive increase: increase CongWin by 1 MSS every RTT until a loss detected (i.e., when perceiving e-t-e path is congestion-free)

z commonly, accomplished by increasing CongWin by MSS⋅(MSS/CongWin) bytes for each receipt of new ack

ex) MSS=1,460 bytes, ConWin=14,600 bytes ⇒ 10 segs sent within RTT

Š an ACK for a seg increases CongWin by 1/10⋅MSS, thus after ack for all 10

segs (thus, for one RTT) CongWin is increased by MSS

z congestion avoidance : linear increase phase of TCP cong control saw-toothed pattern of CongWin

(87)

TCP Slow Start

‰ When a TCP conn begins, CongWin is typically

initialized to 1 MSS ⇒ initial rate ≈ MSS/RTT ex) MSS = 500 bytes, RTT = 200 msec ⇒ initial

sending rate : only about 20 kbps

z linear increase at init. phase results in a

waste of bw, considering available bw may be >> MSS/RTT

z desirable to quickly ramp up to some

respectable rate

‰ slow start (SS) : during initial phase, increase

sending rate exponentially fast by doubling CongWin every RTT until a loss occurs

z achieved by increasing CongWin by 1 MSS

(88)

Reaction to Congestion

Q: When does CongWin switch from exponential increase to linear increase?

A: when CongWin is reached to Threshold

z Threshold : a variable set to a half of CongWin just before a loss { initially set large, typically 65 Kbytes, so that it has no initial effect { maintained until the next loss

‰ TCP Tahoe, early version of TCP

z CongWin is cut to 1 MSS both for a timeout and for 3 duplicate acks

{ Jacobson’s algorithm [Jacobson 1988]

‰ TCP Reno [RFC2581, Stevens ’94] : reaction to loss depends on loss type z for 3 duplicate acks receipt : CongWin is cut in half, then grows linearly z for a timeout event : CongWin is set to 1 MSS (SS phase), then grows

exponentially to a Threshold, then grows linearly (CA phase)

(89)

TCP Congestion Control Algorithms

• initial value of Threshold = 8 MSS

(90)

TCP Reno Congestion Control Algorithm

(91)

Steady-State Behavior of a TCP Connection

‰ Consider a highly simplified macroscopic model for steady-state

behavior of TCP

z SS phases ignored since they are typically very short

z Letting W be the window size when a loss event occurs, RTT and

W are assumed to be approximately constant during a conn

Q : What’s avg throughput of a long-lived TCP conn as a function of window size and RTT?

A :

z a pkt is dropped when the rate increases to W/RTT

z then the rate is cut in half and linearly increases by MSS/RTT

every RTT until it again reaches W/RTT

z this process repeats over and over again

⋅ = 0.75 W

avg throughput of a TCP connection (2) RTT

(92)

TCP Futures

‰ TCP congestion control has evolved over the years and continue to evolve z [RFC 2581] : a summary as of the late 1990s

z [Floyd 2001] : some recent developments

z traditional scheme is not necessarily good for today’s HTTP-dominated Internet or for a future Internet service

ex) Consider a high-speed TCP conn with 1500-byte segments, 100ms RTT, and want to achieve 10 Gbps throughput through this conn

z to meet this, from (2) required window size is

z this is a lot of segs, so that there is high possibility of errors, leading us to derive a relationship bw throughput and error rate [prob. P39]

= ⋅ = ⋅ ⋅ = ≈

×

7 10

RTT 0.1 sec 1 10

W tput 10 bits/sec 111,111 segs

0.75 0.75 1,500 8 bits/seg 90

(93)

TCP Fairness (1)

‰ suppose K TCP conns pass though a bottleneck link bw of R, with each conn sending a large file

⇒ avg xmission rate of each conn is approximately R/K

‰ TCP congestion control is fair : each conn gets an equal share of bottleneck link’s bw among competing TCP conns

‰ consider a link of R shared by two TCP conn, with idealized assumptions z same MSS and RTT, sending a large amount of data, operating in CA

(94)

TCP Fairness (2)

‰ bw realized by two conns fluctuates

along equal bw share line, regardless of their initial rates

‰ in practice, RTT value differs from

conn to conn

z conns with a smaller RTT grab the

available bw more quickly (i.e., open their cong window faster), thus get higher throughput than those conns with larger RTTs ideal operating point loss occurs D B

(95)

Some other Fairness Issues

‰ Fairness and UDP

z multimedia apps, e.g., Internet phone and video conferencing do

not want their rate throttled even if net is congested

z thus runs over UDP rather than TCP, pumping audio/video at

const rate, and occasionally lose pkt rather than reducing rate when congested ⇒ UDP sources may crowd out TCP traffic

z research issue : TCP-friendly cong control

{ goal : let UDP traffic behave fairly, thus prevent the Internet from flooding

‰ Fairness and parallel TCP connections

z a session can open multiple parallel TCP conn’s bw C/S, thus gets

a large portion of bw in a congested link

{ a Web browser to xfer multiple objects in a page ex) a link of rate R supporting 9 ongoing C/S apps

{ a new app, asking for 1 TCP conn, gets an equal share of R/10 a new app, asking for 11 TCP conns, gets an unfair rate of R/2

(96)

TCP Delay Modeling

‰ We’d compute the time for TCP to send an object for some simple models z latency : defined as the time from when a client initiate a TCP conn until

the time at which it receives the requested object

‰ assumptions : made in order not to obscure the central issues z simple one-link net of rate R bps

z amount of data sender can xmit is limited solely by cong window z pkts are neither lost or corrupted, thus no rexmission

z all protocol header overheads : ignored z object consist of an integer # of MSS

{ O: object size [bits], S : seg size [bits] (e.g., 536 bits)

z xmission time for segs including control info : ignored

z initial threshold of TCP cong control scheme is so large as not to be attained by cong window

(97)

Static Congestion Window (1)

‰ W : a positive integer, denoting a

fixed-size static congestion window

z upon receipt of rqst, server

immediately sends W segs back to back to client, then one seg for each ack from client

‰ 1st case : WS/R > RTT+S/R

z ack for 1st seg in 1st window

received before sending 1st

window’s worth of segs

z server xmit segs continuously until

entire object is xmitted

z thus, the latency is

2⋅RTT+O/R

(98)

Static Congestion Window (2)

‰ 2nd case : WS/R < RTT+S/R

z ack for 1st seg in 1st window received

after sending 1st window’s worth of segs

‰ latency = setup time + time for xmitting

object + sum of times in idle state

z let K : # of windows covering object

K = O/WS or ⎡K⎤ if K is not an integer

z # of times being in idle state = K-1 z duration of server being in idle state

S/R+RTT-WS/R thus, the latency is

(99)

Dynamic Congestion Window (1)

‰ cong window grows according to slow start,

i.e., doubled every RTT

z O/S : # of segs in the object z # of segs in kth window : 2k-1

z K : # of windows covering object

z xmission time of kth window = (S/R)2k-1 z duration in idle state of kth window

O/S=15 K=4 Q=2 P=min{Q,K-1}=2 − − ⎧ ⎫ = + + + ≥ ⎩ ⎭ ⎧ ⎫ = − ≥ ⎩ ⎭ ⎧ ⎛ ⎞⎫ = + ⎝ ⎠ ⎩ ⎭ ⎡ ⎛ ⎞⎤ = + ⎝ ⎠ ⎢ ⎥ " 0 1 1 1 2 2 min : 2 2 2 min : 2 1 min : log 1 log 1 k k O K k S O k S O k k S O S

(100)

Dynamic Congestion Window (2)

‰ latency = setup time + time for xmitting object + Σ times in idle state

z Q : # of times server being idle if object were of infinite size

‰ actual # of times server is idle is P=min{Q, K-1}, then (3) becomes

+ − − = ⎡ ⎤ = ⋅ + + + − ⎣ ⎦

1 1 1 latency 2 K 2k (3) k O S S RTT RTT R R R

{

}

⎧ − ⎫ = + − ≥ = ≤ + ⎩ ⎭ ⎧ ⎛ ⎞ ⎫ ⎢ ⎛ ⎞⎥ = + + = + + ⎝ ⎠ ⎝ ⎠ ⎩ ⎭ ⎣ ⎦ 1 1 2 2 max : 2 0 max : 2 1 /

max : log 1 1 log 1 1

/ / k k S S RTT Q k RTT k R R S R RTT RTT k k S R S R − ⎡ ⎤ = ⋅ + +

+ − 1 latency 2 RTT O P S RTT 2k S

(101)

Dynamic Congestion Window (3)

‰ comparing TCP latency of (4) with minimal latency

z slow start significantly increase latency when object size is

relatively small (implicitly, high xmission rate) and RTT is relatively large

{ this is often the case with the Web

‰ See the examples in the text

(

)

(

)

(

)

(

)

(

)

(

)

(

)

⎡ ⎤ ⎡ + ⎤ − − ⎣ ⎦ ⎣ ⎦ = + + ⎡ ⎤ + + − = + ≤ + + + 1 2 1 latency 1 minimal latency 2 1 2 1 1 2 2 p p P S R RTT S R RTT O R RTT P S R RTT P P O R RTT O R RTT

(102)

HTTP Modeling

Assume Web page consists of

z 1 base HTML page (of size O bits) z M images (each of size O bits)

‰ non-persistent HTTP

z M+1 TCP conns in series

z response time = 2⋅(M+1)RTT + (M+1)O/R + sum of idle times

‰ persistent HTTP

z 2 RTT to request and receive base HTML file z 1 RTT to request and receive M images

z response time = 3⋅RTT + (M+1)O/R + sum of idle times

References

Related documents