TCP, Active Queue
Management and QoS
Don Towsley
UMass Amherst
Collaborators: W. Gong, C. Hollot , V. Misra
Outline
• motivation
• TCP friendliness/fairness
• bottleneck invariant principle
• active queue management (AQM) & RED • QoS
Properties of TCP
• 90% of Internet traffic
– primary deliverer of multimedia (e.g., napster)
• conservative end-end congestion control (CC)
– additive increase multiplicative decrease CC – equal bandwidth share
• only end-end protocol w. congestion control
20Mbs B1=5Mbs
B2=5Mbs
B3=5Mbs
B4=5Mbs 1
2 3
4
TCP
20Mbs B1= 2Mbs
B2= 2Mbs
B3= 2Mbs
B4= 2Mbs 1 2 3 4 UDP-12Mbs TCP
{
12MbsAdditive-Increase
Multiplicative-Decrease (AIMD) Congestion Control
ri - rate after i-th feedback
ri+1 = ri + c if i -th feedback is no congestion
ri+1 = a x ri if i -th feedback indicates
congestion, a < 1
• similar algorithms for window-based CC • basic building block of most congestion
ri 1
r i
2
C C
(r1,r2)
• two sources, rates
ri1, r
i2
. ..
(Chiu,Jain 89)
• initial rates r1 and
r2
• bandwidth C
Example
• as time goes on, i
increases, source rates converge to a
Generic TCP Behavior
• window algorithm (window W )
– up to W packets can be in network
– return of ACK allows sender to send another packet
– ACKS cumulative
• increase window by one per RTT W <− <− <− <− W +1/W per ACK
⇒⇒⇒⇒ W <− <− <− <− W +1 per RTT
sender receiver
Generic TCP Behavior
• window algorithm (window W)
• increase window by one per RTT W <− <− <− <− W +1/W per ACK
• decrease window by half on detection of loss (triple duplicate ACK), W <−<−<−<− W/2
sender receiver
Generic TCP Behavior
• window algorithm (window W)
• increase window by one per RTT
W <− <− <− <− W +1/W per ACK
• halve window on detection of loss, W <−<−<−<− W/2
• timeouts due to lack of ACKs −>−>−>−> window reduced to one, W <−<−<−<− 1
sender
receiver
Generic TCP Behavior
• window algorithm (window W)
• increase window by one per RTT (or one over window per ACK, W <− <− <− <− W +1/W)
• halve window on detection of loss, W <−<−<−<− W/2 • timeouts due to lack of ACKs, W <−<−<−<− 1
• successive timeout intervals grow exponentially long
• slow start mechanism
IETF mandated (1997) :
“thou must be TCP fair”
IETF and TCP fairness
original definition
B ∝∝∝∝ MTU /(R * p )
p - loss probability;
R - round trip time;
MTU - pkt length;
• equilibrium analysis • focus on avg window
size W
p W/2 = (1-p)
××××
1/Wdrift down = drift up
Derivation of
p
-1/2expression
W = 2(1-p)/p
≈≈≈≈
2/p p smallB = 2 /R p
R
t W(t)
How well does this work?
• unidirectional bulk transfer from UMass to INRIA
• 100 sec. samples
• significant number of timeouts in most traces
• p 1/2 formula inaccurate (1-2
orders of magnitude for large p)
+ +++ + ++++ + + ++ + + ++ + + + +++ + +++ +++++ + + + + + ++ + ++++ +++ +++++ ++ ++ + + + +++ +++ + + + + + + + + ++ + + + ++ + + + + + 1 10 100 1000 10000 100000
0.001 0.01 0.1 1
loss probability
+ measurements
Floyd
p 1/2 formula
measurements loss rate t hro ug hp ut
Including timeouts:
• renewal theoretic approximation
B (p,R )
∝
∝
∝
∝
[R (4p/3)1/2 +T0 3(3p/4)1/2p (1+32p 2)]-1
T0 - timeout length
Basis of revised definition of
TCP friendliness
Validation
Experiments:
– 38 traces from 18 hosts
– unidirectional bulk transfers – 100sec measurements
Conclusions:
– good validation
– other studies support model – insensitive to TCP version
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F 0.01 0.1 1 10 100
0.001 0.01 0.1 1
Loss probability (p)
F Measured
Full Model Floyd/Ott Model
UMass - INRIA
P a ck et s /s Measured PFTK Model
p1/2 model
Lessons
• TCP exhibits well defined bandwidth curve
– decreasing function of R and p
• timeouts important
• little difference between TCP versions
– AIMD, timeouts – Vegas an exception
Bottleneck invariance principle
• bottleneck router
– where loss occurs – high load, util. ≈ 1
• bottleneck invariance principle (BIP)
ΣΣΣΣ
iB
i(
R
i,
p
) =
C
C - router bandwidth
Applications of BIP
• accurate models of networks supporting infinite/finite duration TCP flows
– thruput, loss rate, avg. queue length, …
• provides simple checks of protocol design
– new improved congestion control algorithms – forward error correction
– active queue management – RED
New and improved TCP
• new/improved, BUMass(p)
BUMass(p)
BTCP(p)
p
thrup
ut
• TCP, BTCP(p)
Sharing bottleneck with TCP
NTCP NUM
p
NUM BUM(p) + NTCP Bni(p) = C
• a win!
C
⇒⇒⇒⇒ BUM(p) > BTCP(p)
Replacing TCP with UMass
N
C
N BUM(pUM) = C vs
N BTCP(pTCP) = C
⇒ ⇒ ⇒
⇒ pUM > pTCP
• a loss!
• SACK worse than Reno?
SACK vs Reno
• SACK ACK includes bit vector of status of most recent group of packets
1. one window reduction max per RTT 2. reduces timeout rate
⇒ ⇒ ⇒
⇒ pSACK > pReno at bottleneck
(difference slight)
3. increases retransmission efficiency
⇒ ⇒⇒
⇒ reduces duplicate packets • benefits of 3. outweigh 1. + 2.
Use of FEC
10%
use packet level FEC!
• 10% pkt loss C
encoder decoder
TCP source
TCP rcvr p
⇒ ⇒ ⇒
Use of FEC
10%
use packet level FEC!
• 10% pkt loss C
encoder decoder
TCP source
TCP rcvr
ACKs
p
pFEC
• pFEC << p
⇒ ⇒ ⇒
Use of FEC
• available bandwidth, CFEC < Cwo • BFEC( )
=
Bwo( ) = BTCP( )• N Bwo(pwo) = Cwo
• N BFEC(pFEC) = CFEC
⇒ ⇒ ⇒
⇒ pFEC > pwo
BIP can provide tremendous
insight into TCP.
BIP can provide tremendous
insight into congestion
Active queue management
• drop tail - drop pkt when buffer fills • active queue management (AQM)
– proactively drop/mark packets before buffer overflow
– example: drop pkt with probability p(x) x - avg. queue length
RED (Random Early Detect)
RED: marking/dropping based on average
queue length x (t )
tmin tmax pmax
1
2tmax
ma
rki
ng prob
p
avg queue length x
t
- q (t )
- x (t )
x (t) : smoothed, time averaged q (t) x (ti +1) = α q (ti +δ) + (1-α) x (ti)
Droptail vs. RED
Experiment (Christiansen, etal, SIGCOMM’00)
• finite duration http flows into router • low load
– no difference
• high load
– droptail often produces lower latencies
– careful tuning of RED can reduce difference
RED
Droptail vs. RED: high load
Drop tail
q
q
⇒ qDT > qRED • R = A + q/C
⇒ RDT > RRED
⇒ pDT < pRED
⇒ longer latencies for finite duration flows under RED
• true for other AQM policies
Solution
- packet marking (ECN)
RED discard function
• RED queue
N
N1 < N2
N2
• N identical TCP sources B(R,p) = C/N
C
• p increases with N
pmax
tmax N1
N1
N3
N2 < N3
?
N4
RED discard function
• RED queue
N
• N identical TCP sources B(R,p) = C/N
C
• p increases with N
pmax
tmax N4
RED discard function
• RED queue
N
(Firoiu, Borden, 00)
• N identical TCP sources B(R,p) = C/N
C
• p increases with N • once p > pmax, queue
oscillates around tmax
⇒ ⇒⇒
⇒ RED unstable!
pmax
Improved RED
tmin tmax pmax
1
discontinuity removed
in gentle_ variant
2tmax
Mark
ing pro
b.
p
Another problem
• queue length smoothing adds delay to control loop ∫ ∫ R 1 N ⊗ ⊗ 21
1 W! W q! q
p _ _ _ _ C R1 R1 Time Delay Rtt Control law (e.g. RED)
• generates queue length oscillations • solution requires other tools
– fluid models
– differential equations – control theory
• 30 long-lived, 60 short lived flows • default ns RED parameters
• RED wo smoothing better than RED w smoothing
que
ue
s
ize
(p
kt
s)
time (seconds)
- RED with smoothing
Solution
• remove smoothing (May, etal, 00; Hollot, etal INFOCOM 01)
but feedback delay + queue fill/empty times remain
⇒
⇒
⇒
⇒
• use classical methods to compensate for delays (Hollot, etal, INFOCOM 01)
– addition of phase lead – PI control (also Low ‘00)
Improving congestion control
• PI controller vs. RED • time varying http, ftp
workload
Queue length vs. Time - PI Controller
- RED Controller
que
ue
le
ngt
h
time • PI controller: faster
response, decouples queue size and load level
TCP and QoS
• proper design of TCP affects QoS
• AQM design significantly impacts QoS
– most DiffServ proposals rely on RED
• no additional mechanisms required for underloaded network
• call admission required for overloaded network
Summary
• although complex, TCP is well behaved and
easy to characterize
• BIP can provide tremendous insight
• AQM not useful without marking
• RED is flawed AQM policy
• control theory explains flaws, suggests