Adaptive or Active
Queue Management
Prof. C. Tschudin,
M. Sifalakis
, T. Meyer,
& M. Monti
University of Basel
Cs321 - HS 2013
Overview
•
Queue management inside Internet’s routers
•
Issues arising with queue management
•
What is Active Queue Management
•
Old AQM
–
RED, FRED, CHOKE, ARED, BLUE
•
Bufferbloat: the new threat of the Internet
•
New AQM
Inside Internet Routers
•
The forwarding plane of every packet handling router
has
–
Queues + a queue management policy:
• Allocate buffer space to incoming flows
• Absorbers of transient load (bursts of traffic)
–
A scheduling discipline:
• Allocate bandwidth to en-queued flows
• Ensure flow multiplexing
Collectively referred to as the router’s “Queue Management”
Typical Internet Queue Management
•
E.g. Single queue, FIFO schedule (+ Tail-drop policy)
Single Class of Traffic
Queue Tail-drop, Max Q length Out Incoming Flows Max Q length Incoming Flows Out FIFO
(First in, First Out)
Problems
• No separation between different types of flows
– Aggressive flows get more packets through
– Bursty flows react differently that CBR flows
– Poor flow inter-leaving/mixing
• Lockout
– A few flows can monopolize the Queue space, not letting new flows to be admitted
• Synchronization to events
– End hosts react (congestion avoidance) at the same time to the same events (maybe at different scales)
– Periodic load-surges appear and become resident-periodic
Typical Internet Queue Management
•
E.g. Multi-queue, Fair schedule (+ tail-drop policy)
Any full Queue Tail Drops Max Q length Incoming Flows Out Fair Queueing: each incoming flow hashed to its own Q (Nagle)
Stochastic FQ:
All incoming flows hashed-muxed across N queues (more realistic for core routers) (Deficit) Round Robin N Queues H ash -ba se d C la ss if ier
Some so-called “optimisations”
• Router serves individual queues to exhaustion– Play market-games: Credit the Deficit! (in DRR)
– Less frequent context switches for the scheduler
– Manufacturers can sell faster backplanes (win the benchmark)
• Router employs very large queues
– Reduce tail-drops during bursts
– In highly variable bandwidth links gives time to TCP to exit slow-start phase (successful flow admission)
– Telco’s can advertise high-bandwidth in their networks
Are they really optimisations?... Or just marketing games?
What (TCP) communication
endpoints see
• Bandwidth delay product estimation: aka “Pipe”
– Sender perceived Pipe capacity = (Tx Rate) x (RTT/2)
– Adjust transmission rate accordingly every RTT
• En-path Queueing affects RTT (bigger pipe size)
• Tail-drops are signals to slow down (reduce Tx Win)
Receiver Sender
Data
Issues that can emerge
• As Queues build up, RTT increases, congestion becomes hard to remediate (notice this has the opposite effect from the original intend)
Takes longer for endpoints to react (TxWin adjustment)
ACK spacing can get compressed increasing burstiness of transmissions
• The larger the capacity of the queue
The more detached are the endpoints from the truth about congestion
The more likely that the queues become persistent: “bufferbloat” (long standing delay is treated as the norm)
• Add serving queues to exhaustion
High variability in Avg. queue sizes (and thus delay)
RTT estimation at end node does not stabilize
Likely poor flow mixing – More “ACK compression”
“Butterfly effects” start to appear and dominate .. (see video)
Overall the network can become slower and slower up to a halting point (collapse)
Active Queue Management
•
Not all problems were realised at once
–
As Queues build up, RTT increases
Takes longer for endpoints to react (TxWin adjustment)
ACK spacing can get compressed leading to burstiness of transmissions
•
Early AQM approaches aimed to address this problem
only
–
Provide feedback about imminent congestion
–
While the endpoint can still react fast (i.e. before RTT
increase due to the queue, blocks feedback)
Approaches that we discuss in this lecture
–
RED and variants, BLUE and variants, CHOKe, ECN
Active Queue Management
•
Eventually the other issues became apparent
–
Too long queues (with resident load)
–
High-frequency congestion signal fluctuations (due to
scheduling “optimisations”)
•
New AQMs try to serve their purpose around these
nuisances
–
CoDel, PIE (the ones we discuss here)
•
Plus overcome a big shortcoming of early approaches
AQM Design Objectives
• Maximize throughput– Keep lines busy at all times (queues not staying empty)
• Minimize delay
– Queues at steady state almost empty
• Serve transient load surges
– Queue size should reflect its ability absorb bursts, and not resident load
• No flow lockout
– Packet drops affecting admitted flows rather than new-coming ones
AQM as a Control System
Feedback or lack thereof (ACK )
Objective Function (Parameter) Queue dynamics Bottleneck Router Queue Sender Receiver Action embedding in Queue input
Arrows are not data traffic but rather the embodiment of signaling (E.g. the measured parameter is not necessarily the departure rate)
• Measure at router
– Routers can distinguish between propagation and persistent queuing delays
– Routers can decide on transient congestion, based on workload
• Act at the sender
EARLY AQM
Lock-out Problem: Easier to solve
•
Random drop policy
–
Packet arriving when queue is full causes some
random packet to be dropped
•
Drop front policy
–
On full queue, drop packet at head of queue
•
SFQ + Tail drop
–
Same effect as Random drop
•
Solving the lock-out problem does not address
Full Queues Problem: Bigger challenge
•
Notify sender before queue becomes full (early
drop)
–
Notify = Drop packets
takes > RTT to be sensed
–
Notify = Mark packets
takes <= 1 RTT but can be lost
•
Notice that how fast the signal will arrive to the
sender depends on Net weather
•
Challenges
–
When to notify.. Or how often
–
Who to notify (cannot afford per flow monitoring)
RED Model
Min thresh Max thresh Avg Queue Length minth maxth Pmax 1.0Avg queue length
D ro p p ro b a b ili ty Actual Queue Length
• Maintain Exp Moving Avg (EMA) of queue size
– Byte mode vs. Packet mode depending if Tx delay is a function of packet length or not
• For each packet arrival
if (avgq < minth) do nothing
if (maxth ≤ avg) mark(packet)
if (minth ≤ avgq < maxth)
calculate probability Pm
mark(Pm, packet)
• Marked packets are either dropped or ECN flagged
Pkt Arrival
RED Parameters
• EMA of Queue size computed at every packet arrival (not periodically!)
avgqnow = wq * qlennow + (1-wq) * avgqprev Special condition if queue was idle, I.e. qlennow = 0
– Same as if it had been 100% link utilisation with 0 queue.
– Approx. that m small packets were processed
m = (tnow – tlast_arrival )/ pkt_size nominal
avgqnow = (1-wq)m * avgq prev
• Packet marking probability is a function of % of the between thresholds utilitisation
Pm = Pmax * (avgqnow – minth ) / (maxth – minth)
Issues with RED: configurability
• avgq is an EMA wq adjusts the lag and trend and window of averaging – Short window: fast sensing but vulnerable to transients
– Long window: slow adaptation
• minth adjusts the “power of the network”
– “Too close” to 0: the queue is likely to have idle periods (bandwidth not used)
– “Too far” from 0: increases path latency, delays feedback signal to endpoints
• maxth – minth adjusts the step-granularity of marking
– “Too small”: the AQM becomes spasmodic in its reaction, forces flows to sync
– Must be larger that the typical avgq increase in an RTT
A visual analogy: which vessel size for which sea condition ?
– Think traffic bursts like the swell of the sea (height, length)
– Think of the AQM as the having to behave as speed boat or as a tanker depending on swell
Issues with RED: configurability
•
Average Queue size oscillation
•
Difficult to control congestion when many flows
– esp. unresponsive ones
time Qlen max 0 Actual Avg 32 flows time Qlen max 0 Actual Avg 8 flows
RED & variants – FRED
•
Fair RED or Flow RED – fairness among flow types
–
Flow differentiation based on queue use:
• non-adaptive (UDP), fragile (sensitive to loss), robust
–
per-active-flow accounting and loss-regulation
• All flows entitled to admit minq packets without loss • Adjust minq per-flow based on avg. per flow queue
occupancy (avgcq)
• Set upper per flow capacity to maxq and count violations
RED & variants – CHOKE
• CHOose and Kill unresponsive flows... Or .. CHOose and Keep responsive flows
– Compare incoming packet with random selected packet in queue
– Aggressive flows become more likely to select
Min thresh Max thresh
Apply RED (variant)
Pkt Arrival Flow ID Match No Yes Select Randomly
RED & variants – ARED
•
Adaptive RED – minimize delay variance +
parameter auto-tune
–
Adapt P
maxperiodically, slowly, and with an AIMD
policy
–
Fix
max
th= 3 * min
th–
Fix
w
q= 1 - exp (1/Link_capacity)
RED & variants – SRED
•
Stabilised RED – eliminate need of avgq (and w
q)
–
P
m=
f
(inst. queue length, # of active flows, rank of flow)
–
Zombie
list: history of K seen flows with Hit counters. On
packet arrival
pick Zombie flow randomly if flows match
Hits++ else
replace Zombie with prob. p
–
Statistical counting of flows based on Hit Freq.
–
Rank of flow = Hit counter on match
RED & variants – BLUE
• Putting past insights in new light
– Avoid parameter tuning nightmare
– Avoid effects of avg queue fluctuation on AQM
• Adaptive marking probability
– Pm = f (packet loss, link idle events)
Pkt loss if (tnow – tlast_arrival > freeze_period) Pm= Pm+d1 Idle link if (tnow – tlast_arrival > freeze_period) Pm= Pm-d2
– d1 >> d2 : faster reaction to congestion up-rise than decrease
– freeze_periodis a sort-of A/D discretizer
• Filters out high-freq transient oscillations
• Adjusts parameter at packet arrivals times mod a fixed quantum • SFBlue uses ideas of SFQ and FRED to discriminate flows
MODERN AQM
Bufferbloat: The new Internet threat
•
What it is ?
– Constant residue of packets in Queues that never goes away
– Adds constant delay component to the e2e path latency
•
Where does it come from ?
– A combination of the following
• Senders transmit at higher rate than the bottleneck link can sustain
• Excessively large queues increase e2e RTT, and delays TCP feedback
– Sender response is phase shifted to congestion phenomena
•
Why is it a threat ?
– Excessive delays become resident even on high speed networks
– Confuses TCP’s flow/congestion control algorithm
•
Solutions ?
Illustrating Bufferbloat
• good queue operating at link speed (rate)
• Also good queue (typical of delayed ACK scheme, or when serving synchronised flows)
• Bloated queue, that cannot get rid of resident load
Bufferbloat
Illustrating Bufferbloat
• good queue operating at link speed (rate)
• Also good queue (typical of delayed ACK scheme, or when serving synchronised flows)
• Bloated queue, that cannot get rid of resident load
Both these queues have the same const Avg Length of N pkts over an RTT! What distinguishes them however is the Avg Min length (over a large enough window) !!!
Large enough ≥ 1RTT Bufferbloat
Codel – Controlled delay
• Time-based model of queue dynamics instead ofa spatial one
• Monitor how long the Min queue length remains above a threshold (desired Min)
– Less descriptive than Avg. Min queue length,
– Yet sufficient, and simpler computationally
• Sojourn time as a measure of instantaneous queue length
– How long packets stay in the queue: Time delta between a packet’s departure and arrival time
– Works with a single queue or multiple queues
– Works for variable link rates (e.g. wireless links) – well statistically speaking!
– Simple to measure, easy to implement
Codel – Controlled delay
On packet arrival: On packet departure:
timestamp(packet) sojourn = now – packet.tstamp
if (sojourn < Target) if (drp_mode == 1) drp_mode = 0 exit_drp = now
if (now – exit_drp >= Interval) drp_count = 0
else // sojourn > Target if (drp_mode==0 && now - exit_drp < Interval) drp_mode = 1
if (now >= next_drp) drp(packet) drp_count++
next_drp = now + Interval/sqrt(drp_count) else if (drp_mode = 0) // start drop drp_mode = 1
drp(packet) drp_count = 1
next_drp = now + Interval/sqrt(drp_count) else // already in drp_mode if (now >= next_drp)
drp(packet) drp_count++
Codel – Controlled delay
On packet arrival: On packet departure:
timestamp(packet) sojourn = now – packet.tstamp
if (sojourn < Target) if (drp_mode == 1) drp_mode = 0 exit_drp = now
if (now – exit_drp >= Interval) drp_count = 0
else // sojourn > Target if (drp_mode==0 && now - exit_drp < Interval) drp_mode = 1
if (now >= next_drp) drp(packet) drp_count++
next_drp = now + Interval/sqrt(drp_count) else if (drp_mode = 0) // start drop drp_mode = 1
drp(packet) drp_count = 1
next_drp = now + Interval/sqrt(drp_count) else // already in drp_mode if (now >= next_drp)
drp(packet) drp_count++
next_drp = now + Interval/sqrt(drp_count)
Sojourn falls below Target
Only reset drop rate memory if Sojourn is below Target for Interval
Sojourn above Target again after temporary improvement, resume last drop rate
Sojourn above Target first time, after Interval, start dropping Sojourn continues to remain above
Target, continue dropping
Codel – Controlled delay
•
Significantly less configuration magic involved
• Interval: const (≥ 1RTT)
• Target (delay): const
– max{ equiv of 1-2 packets worth of queue, 5% of worst case RTT}
• Drop/Mark rate: const acceleration in Interval
– inverse-square-root progression linear increase of drops per RTT
– dropping speed up is independent of queue accumulation speed !?!?
•
fq_Codel combines Codel with SFQ
• treats different traffic classes fairer
•
Sojourn measurement does not block the queue!
• by contrast to queue length averaging
PIE – Proportional Integral Enhanced
On packet arrival: decide packet fate
On packet depart: estimate output rate
On Interval Expiration (periodically): update drop rate
mark/drop(P_drop, pkt)
q_delay = interval * q_len / avg_rate (little’s law)
P_drop = P_drop + a*(q_delay – ref_dealy) + b*(q_delay – q_delay_old) q_delay = q_dealy_old
if (q_len > pkt_threshold)
byte_count = byte_count + pkt_bytes if (byte_count > pkt_threshold)
inst_rate = byte_count / (now – last) avg_rate = (1-w)*avg_rate + w*inst_rate last = now
byte_count = 0
P_drop < 1% a = A/8, b = B/8 P_drop < 10% a = A/2, b = B/2 else a = A, b = B
PIE – Proportional Integral Enhanced
On packet arrival: decide packet fate
On packet depart: estimate output rate
On Interval Expiration (periodically): update drop rate
mark/drop(P_drop, pkt)
q_delay = interval * q_len / avg_rate (little’s law)
P_drop = P_drop + a*(q_delay – ref_dealy) + b*(q_delay – q_delay_old) q_delay = q_dealy_old
if (q_len > pkt_threshold)
byte_count = byte_count + pkt_bytes if (byte_count > pkt_threshold)
inst_rate = byte_count / (now – last) avg_rate = (1-w)*avg_rate + w*inst_rate last = now
byte_count = 0
P_drop < 1% a = A/8, b = B/8 P_drop < 10% a = A/2, b = B/2 else a = A, b = B
Start counting bytes contributing to bufferbloat when threshold is reached Once buffebloat in bytes is counted compute queue drain rate Exp weight mov.
avg computation of rate Deviation from desired Delay change in 1 interval
PIE – Proportional Integral Enhanced
• Also controls delay instead of queue length like Codel
• Dropping at the tail of the queue to save buffer space
– Instead of head (Codel)
• 3 modes of operation for 3 different traffic classes
– Parameters a,b adjustment
– Quite lot of other magic numbers (contrast to Codel)
• Queue delay prediction based on queue size and smoothed output rate
– Instead of actual measurement (Codel)
• Drop probability takes into account deviation from nominal value and corrects/improves effects of previous action (direction/magnitude of change)
– Instead of binary accelarate-or-switch_off (Codel)
Explicit Congestion Notification
• Works with TCP traffic. Instead of packet dropping, packet marking– An old idea called DEC-bit from DEC-net (early day TCP/IP competitor)
TCP Receiver TCP Sender 1 2 3 4 5 6 7 2 2 2 2 1 2 1 2 3 4 5 6 7 7 5 4 2 1 3 6 Packet Dropping ECN TCP Receiver TCP Sender ACK: ACK: Data: Data:
ECN – How marking works
•
At the IP header: Signal from router to receiver
VER 4 bits Time to Live 8 bits HLEN 4 bits Header Checksum 16 bits Protocol 8 bits Fragmentation offset 13 bits Flags 3 bits Source IP address 32 bits
Options (if any) Destination IP address 32 bits Data Total Length 16 bits DS 8 bits Identification 16 bits
Differentiated Services Flags 6 bits Reserved 2 bits ECN 2 bits ECT CE ECT CE Interpretation
0 0 Not-ECT (Not ECN Capable Transport) 0 1 ECT(1) (ECN Capable
Transport (1)) 1 0 ECT(0) (ECN Capable
Transport(0)) 1 1 CE (Congestion
Experienced)
ECT: ECN Capable Transport
E C E C W R Reserved 4 bits
ECN – How marking works
•
At the TCP header: Signal from receiver to sender
Source port address 16 bits
Sequence Number 32 bits Acknowledgement Number 32 bits
Destination port address 16 bits
Options (if any) Urgent pointer 16 bits Checksum 16 bits Window size 16 bits Data U R G S Y N P S H A C K R S T F I N HLEN 4 bits Reserved 6 bits U R G A C K P S H R S T S Y N F I N
CWR: Congestion Window Reduced Flag
ECE: ECN-Echo Flag
ECN Vs. Packet drop
as a feedback signal
•
Packet drop effective even with full queues, while ECN
makes only sense before queues get full
•
Packet drop = 3 DUP ACK or timeout before sender acts
ECN delivers the feedback faster
•
Packet drop => retransmissions
–
Judas’ kiss: communicate a signal through an impairment
(B. Briscoe)
Some links on Bufferbloat
•
How can I tell if I’m suffering from bufferbloat?
–
http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/
•
Can I do anything personally to reduce my suffering from
bufferbloat?
–
http://gettys.wordpress.com/2010/12/13/mitigations-and- solutions-of-bufferbloat-in-home-routers-and-operating-systems/
•
Bufferbloat triggered the network neutrality debate
–
http://gettys.wordpress.com/2010/12/07/bufferbloat-and-network-neutrality-back-to-the-past/