Adaptive or Active Queue Management

(1)

Adaptive or Active

Queue Management

Prof. C. Tschudin,

M. Sifalakis

, T. Meyer,

& M. Monti

University of Basel

Cs321 - HS 2013

Overview

• Queue management inside Internet’s routers

• Issues arising with queue management

• What is Active Queue Management

• Old AQM

–

RED, FRED, CHOKE, ARED, BLUE

• Bufferbloat: the new threat of the Internet

• New AQM

(2)

Inside Internet Routers

• The forwarding plane of every packet handling router

has

–

Queues + a queue management policy:

• Allocate buffer space to incoming flows

• Absorbers of transient load (bursts of traffic)

–

A scheduling discipline:

• Allocate bandwidth to en-queued flows

• Ensure flow multiplexing

Collectively referred to as the router’s “Queue Management”

Typical Internet Queue Management

• E.g. Single queue, FIFO schedule (+ Tail-drop policy)



Single Class of Traffic

Queue Tail-drop, Max Q length Out Incoming Flows Max Q length Incoming Flows Out FIFO

(First in, First Out)

(3)

Problems

• No separation between different types of flows

– Aggressive flows get more packets through

– Bursty flows react differently that CBR flows

– Poor flow inter-leaving/mixing

• Lockout

– A few flows can monopolize the Queue space, not letting new flows to be admitted

• Synchronization to events

– End hosts react (congestion avoidance) at the same time to the same events (maybe at different scales)

– Periodic load-surges appear and become resident-periodic

Typical Internet Queue Management

• E.g. Multi-queue, Fair schedule (+ tail-drop policy)

Any full Queue Tail Drops Max Q length Incoming Flows Out Fair Queueing: each incoming flow hashed to its own Q (Nagle)

Stochastic FQ:

All incoming flows hashed-muxed across N queues (more realistic for core routers) (Deficit) Round Robin N Queues H ash -ba se d C la ss if ier

(4)

Some so-called “optimisations”

• Router serves individual queues to exhaustion

– Play market-games: Credit the Deficit! (in DRR)

– Less frequent context switches for the scheduler

– Manufacturers can sell faster backplanes (win the benchmark)

• Router employs very large queues

– Reduce tail-drops during bursts

– In highly variable bandwidth links gives time to TCP to exit slow-start phase (successful flow admission)

– Telco’s can advertise high-bandwidth in their networks

Are they really optimisations?... Or just marketing games?

What (TCP) communication

endpoints see

• Bandwidth delay product estimation: aka “Pipe”

– Sender perceived Pipe capacity = (Tx Rate) x (RTT/2)

– Adjust transmission rate accordingly every RTT

• En-path Queueing affects RTT (bigger pipe size)

• Tail-drops are signals to slow down (reduce Tx Win)

Receiver Sender

Data

(5)

Issues that can emerge

• As Queues build up, RTT increases, congestion becomes hard to remediate (notice this has the opposite effect from the original intend)

Takes longer for endpoints to react (TxWin adjustment)

ACK spacing can get compressed increasing burstiness of transmissions

• The larger the capacity of the queue

The more detached are the endpoints from the truth about congestion

The more likely that the queues become persistent: “bufferbloat” (long standing delay is treated as the norm)

• Add serving queues to exhaustion

High variability in Avg. queue sizes (and thus delay)

RTT estimation at end node does not stabilize

Likely poor flow mixing – More “ACK compression”

“Butterfly effects” start to appear and dominate .. (see video)

Overall the network can become slower and slower up to a halting point (collapse)

(6)

Active Queue Management

• Not all problems were realised at once

–

As Queues build up, RTT increases

Takes longer for endpoints to react (TxWin adjustment)

ACK spacing can get compressed leading to burstiness of transmissions

• Early AQM approaches aimed to address this problem

only

–

Provide feedback about imminent congestion

–

While the endpoint can still react fast (i.e. before RTT

increase due to the queue, blocks feedback)

Approaches that we discuss in this lecture

–

RED and variants, BLUE and variants, CHOKe, ECN

Active Queue Management

• Eventually the other issues became apparent

–

Too long queues (with resident load)

–

High-frequency congestion signal fluctuations (due to

scheduling “optimisations”)

• New AQMs try to serve their purpose around these

nuisances

–

CoDel, PIE (the ones we discuss here)

• Plus overcome a big shortcoming of early approaches

(7)

AQM Design Objectives

• Maximize throughput

– Keep lines busy at all times (queues not staying empty)

• Minimize delay

– Queues at steady state almost empty

• Serve transient load surges

– Queue size should reflect its ability absorb bursts, and not resident load

• No flow lockout

– Packet drops affecting admitted flows rather than new-coming ones

AQM as a Control System

Feedback or lack thereof (ACK )

Objective Function (Parameter) Queue dynamics Bottleneck Router Queue Sender Receiver Action embedding in Queue input

Arrows are not data traffic but rather the embodiment of signaling (E.g. the measured parameter is not necessarily the departure rate)

• Measure at router

– Routers can distinguish between propagation and persistent queuing delays

– Routers can decide on transient congestion, based on workload

• Act at the sender

(8)

EARLY AQM

Lock-out Problem: Easier to solve

• Random drop policy

–

Packet arriving when queue is full causes some

random packet to be dropped

• Drop front policy

–

On full queue, drop packet at head of queue

• SFQ + Tail drop

–

Same effect as Random drop

• Solving the lock-out problem does not address

(9)

Full Queues Problem: Bigger challenge

• Notify sender before queue becomes full (early

drop)

–

Notify = Drop packets



takes > RTT to be sensed

–

Notify = Mark packets



takes <= 1 RTT but can be lost

• Notice that how fast the signal will arrive to the

sender depends on Net weather

• Challenges

–

When to notify.. Or how often

–

Who to notify (cannot afford per flow monitoring)

RED Model

Min thresh Max thresh Avg Queue Length minth maxth Pmax 1.0

Avg queue length

D ro p p ro b a b ili ty Actual Queue Length

• Maintain Exp Moving Avg (EMA) of queue size

– Byte mode vs. Packet mode depending if Tx delay is a function of packet length or not

• For each packet arrival

if (avgq < minth) do nothing

if (maxth ≤ avg) mark(packet)

if (minth ≤ avgq < maxth)

calculate probability Pm

mark(Pm, packet)

• Marked packets are either dropped or ECN flagged

Pkt Arrival

(10)

RED Parameters

• EMA of Queue size computed at every packet arrival (not periodically!)

avgq_now = w_q* qlen_now+ (1-w_q) * avgq_prev Special condition if queue was idle, I.e. qlennow = 0

– Same as if it had been 100% link utilisation with 0 queue.

– Approx. that m small packets were processed

m = (tnow – tlast_arrival )/ pkt_size nominal

avgq_now = (1-w_q)m_{* avgq} prev

• Packet marking probability is a function of % of the between thresholds utilitisation

Pm = Pmax * (avgqnow – minth ) / (maxth – minth)

Issues with RED: configurability

• avgq is an EMA  wq adjusts the lag and trend and window of averaging – Short window: fast sensing but vulnerable to transients

– Long window: slow adaptation

• minth  adjusts the “power of the network”

– “Too close” to 0: the queue is likely to have idle periods (bandwidth not used)

– “Too far” from 0: increases path latency, delays feedback signal to endpoints

• maxth – minth adjusts the step-granularity of marking

– “Too small”: the AQM becomes spasmodic in its reaction, forces flows to sync

– Must be larger that the typical avgq increase in an RTT

A visual analogy: which vessel size for which sea condition ?

– Think traffic bursts like the swell of the sea (height, length)

– Think of the AQM as the having to behave as speed boat or as a tanker depending on swell

(11)

Issues with RED: configurability

• Average Queue size oscillation

• Difficult to control congestion when many flows

– esp. unresponsive ones

time Qlen max 0 Actual Avg 32 flows time Qlen max 0 Actual Avg 8 flows

RED & variants – FRED

• Fair RED or Flow RED – fairness among flow types

–

Flow differentiation based on queue use:

• non-adaptive (UDP), fragile (sensitive to loss), robust

–

per-active-flow accounting and loss-regulation

• All flows entitled to admit minq packets without loss • Adjust minq per-flow based on avg. per flow queue

occupancy (avgcq)

• Set upper per flow capacity to maxq and count violations

(12)

RED & variants – CHOKE

• CHOose and Kill unresponsive flows... Or .. CHOose and Keep responsive flows

– Compare incoming packet with random selected packet in queue

– Aggressive flows become more likely to select

Min thresh Max thresh

Apply RED (variant)

Pkt Arrival Flow ID Match No Yes Select Randomly

RED & variants – ARED

• Adaptive RED – minimize delay variance +

parameter auto-tune

–

Adapt P

_max

periodically, slowly, and with an AIMD

policy

–

Fix

max

th

**= 3 * min**

th

–

Fix

w

q

= 1 - exp (1/Link_capacity)

(13)

RED & variants – SRED

• Stabilised RED – eliminate need of avgq (and w

_q

)

–

P

m

=

f

(inst. queue length, # of active flows, rank of flow)

–

Zombie

list: history of K seen flows with Hit counters. On

packet arrival

pick Zombie flow randomly if flows match

Hits++ else

replace Zombie with prob. p

–

Statistical counting of flows based on Hit Freq.

–

Rank of flow = Hit counter on match

RED & variants – BLUE

• Putting past insights in new light

– Avoid parameter tuning nightmare

– Avoid effects of avg queue fluctuation on AQM

• Adaptive marking probability

– Pm = f (packet loss, link idle events)

Pkt loss if (tnow – tlast_arrival > freeze_period) Pm= Pm+d1 Idle link if (tnow – tlast_arrival > freeze_period) Pm= Pm-d2

– d1 >> d2 : faster reaction to congestion up-rise than decrease

– freeze_periodis a sort-of A/D discretizer

• Filters out high-freq transient oscillations

• Adjusts parameter at packet arrivals times mod a fixed quantum • SFBlue uses ideas of SFQ and FRED to discriminate flows

(14)

MODERN AQM

Bufferbloat: The new Internet threat

• What it is ?

– Constant residue of packets in Queues that never goes away

– Adds constant delay component to the e2e path latency

• Where does it come from ?

– A combination of the following

• Senders transmit at higher rate than the bottleneck link can sustain

• Excessively large queues increase e2e RTT, and delays TCP feedback

– Sender response is phase shifted to congestion phenomena

• Why is it a threat ?

– Excessive delays become resident even on high speed networks

– Confuses TCP’s flow/congestion control algorithm

• Solutions ?

(15)

Illustrating Bufferbloat

• good queue operating at link speed (rate)

• Also good queue (typical of delayed ACK scheme, or when serving synchronised flows)

• Bloated queue, that cannot get rid of resident load

Bufferbloat

Illustrating Bufferbloat

• good queue operating at link speed (rate)

• Also good queue (typical of delayed ACK scheme, or when serving synchronised flows)

• Bloated queue, that cannot get rid of resident load

Both these queues have the same const Avg Length of N pkts over an RTT! What distinguishes them however is the Avg Min length (over a large enough window) !!!

Large enough ≥ 1RTT Bufferbloat

(16)

Codel – Controlled delay

• Time-based model of queue dynamics instead of

a spatial one

• Monitor how long the Min queue length remains above a threshold (desired Min)

– Less descriptive than Avg. Min queue length,

– Yet sufficient, and simpler computationally

• Sojourn time as a measure of instantaneous queue length

– How long packets stay in the queue: Time delta between a packet’s departure and arrival time

– Works with a single queue or multiple queues

– Works for variable link rates (e.g. wireless links) – well statistically speaking!

– Simple to measure, easy to implement

Codel – Controlled delay

On packet arrival: On packet departure:

timestamp(packet) sojourn = now – packet.tstamp

if (sojourn < Target) if (drp_mode == 1) drp_mode = 0 exit_drp = now

if (now – exit_drp >= Interval) drp_count = 0

else // sojourn > Target if (drp_mode==0 && now - exit_drp < Interval) drp_mode = 1

if (now >= next_drp) drp(packet) drp_count++

next_drp = now + Interval/sqrt(drp_count) else if (drp_mode = 0) // start drop drp_mode = 1

drp(packet) drp_count = 1

next_drp = now + Interval/sqrt(drp_count) else // already in drp_mode if (now >= next_drp)

drp(packet) drp_count++

(17)

Codel – Controlled delay

On packet arrival: On packet departure:

timestamp(packet) sojourn = now – packet.tstamp

if (sojourn < Target) if (drp_mode == 1) drp_mode = 0 exit_drp = now

if (now – exit_drp >= Interval) drp_count = 0

else // sojourn > Target if (drp_mode==0 && now - exit_drp < Interval) drp_mode = 1

if (now >= next_drp) drp(packet) drp_count++

next_drp = now + Interval/sqrt(drp_count) else if (drp_mode = 0) // start drop drp_mode = 1

drp(packet) drp_count = 1

next_drp = now + Interval/sqrt(drp_count) else // already in drp_mode if (now >= next_drp)

drp(packet) drp_count++

next_drp = now + Interval/sqrt(drp_count)

Sojourn falls below Target

Only reset drop rate memory if Sojourn is below Target for Interval

Sojourn above Target again after temporary improvement, resume last drop rate

Sojourn above Target first time, after Interval, start dropping Sojourn continues to remain above

Target, continue dropping

Codel – Controlled delay

• Significantly less configuration magic involved

• Interval: const (≥ 1RTT)

• Target (delay): const

– max{ equiv of 1-2 packets worth of queue, 5% of worst case RTT}

• Drop/Mark rate: const acceleration in Interval

– inverse-square-root progression  linear increase of drops per RTT

– dropping speed up is independent of queue accumulation speed !?!?

• fq_Codel combines Codel with SFQ

• treats different traffic classes fairer

• Sojourn measurement does not block the queue!

• by contrast to queue length averaging

(18)

PIE – Proportional Integral Enhanced

On packet arrival: decide packet fate

On packet depart: estimate output rate

On Interval Expiration (periodically): update drop rate

mark/drop(P_drop, pkt)

q_delay = interval * q_len / avg_rate (little’s law)

P_drop = P_drop + a*(q_delay – ref_dealy) + b*(q_delay – q_delay_old) q_delay = q_dealy_old

if (q_len > pkt_threshold)

byte_count = byte_count + pkt_bytes if (byte_count > pkt_threshold)

inst_rate = byte_count / (now – last) avg_rate = (1-w)*avg_rate + w*inst_rate last = now

byte_count = 0

P_drop < 1% a = A/8, b = B/8 P_drop < 10% a = A/2, b = B/2 else a = A, b = B

PIE – Proportional Integral Enhanced

On packet arrival: decide packet fate

On packet depart: estimate output rate

On Interval Expiration (periodically): update drop rate

mark/drop(P_drop, pkt)

q_delay = interval * q_len / avg_rate (little’s law)

P_drop = P_drop + a*(q_delay – ref_dealy) + b*(q_delay – q_delay_old) q_delay = q_dealy_old

if (q_len > pkt_threshold)

byte_count = byte_count + pkt_bytes if (byte_count > pkt_threshold)

inst_rate = byte_count / (now – last) avg_rate = (1-w)*avg_rate + w*inst_rate last = now

byte_count = 0

P_drop < 1% a = A/8, b = B/8 P_drop < 10% a = A/2, b = B/2 else a = A, b = B

Start counting bytes contributing to bufferbloat when threshold is reached Once buffebloat in bytes is counted compute queue drain rate Exp weight mov.

avg computation of rate Deviation from desired Delay change in 1 interval

(19)

PIE – Proportional Integral Enhanced

• Also controls delay instead of queue length like Codel

• Dropping at the tail of the queue to save buffer space

– Instead of head (Codel)

• 3 modes of operation for 3 different traffic classes

– Parameters a,b adjustment

– Quite lot of other magic numbers (contrast to Codel)

• Queue delay prediction based on queue size and smoothed output rate

– Instead of actual measurement (Codel)

• Drop probability takes into account deviation from nominal value and corrects/improves effects of previous action (direction/magnitude of change)

– Instead of binary accelarate-or-switch_off (Codel)

(20)

Explicit Congestion Notification

• Works with TCP traffic. Instead of packet dropping, packet marking

– An old idea called DEC-bit from DEC-net (early day TCP/IP competitor)

TCP Receiver TCP Sender 1 2 3 4 5 6 7 2 2 2 2 1 2 1 2 3 4 5 6 7 7 5 4 2 1 3 6 Packet Dropping ECN TCP Receiver TCP Sender ACK: ACK: Data: Data:

ECN – How marking works

• At the IP header: Signal from router to receiver

VER 4 bits Time to Live 8 bits HLEN 4 bits Header Checksum 16 bits Protocol 8 bits Fragmentation offset 13 bits Flags 3 bits Source IP address 32 bits

Options (if any) Destination IP address 32 bits Data Total Length 16 bits DS 8 bits Identification 16 bits

Differentiated Services Flags 6 bits Reserved 2 bits ECN 2 bits ECT CE ECT CE Interpretation

0 0 Not-ECT (Not ECN Capable Transport) 0 1 ECT(1) (ECN Capable

Transport (1)) 1 0 ECT(0) (ECN Capable

Transport(0)) 1 1 CE (Congestion

Experienced)

ECT: ECN Capable Transport

(21)

E C E C W R Reserved 4 bits

ECN – How marking works

• At the TCP header: Signal from receiver to sender

Source port address 16 bits

Sequence Number 32 bits Acknowledgement Number 32 bits

Destination port address 16 bits

Options (if any) Urgent pointer 16 bits Checksum 16 bits Window size 16 bits Data U R G S Y N P S H A C K R S T F I N HLEN 4 bits Reserved 6 bits U R G A C K P S H R S T S Y N F I N

CWR: Congestion Window Reduced Flag

ECE: ECN-Echo Flag

ECN Vs. Packet drop

as a feedback signal

• Packet drop effective even with full queues, while ECN

makes only sense before queues get full

• Packet drop = 3 DUP ACK or timeout before sender acts



ECN delivers the feedback faster

• Packet drop => retransmissions

–

Judas’ kiss: communicate a signal through an impairment

(B. Briscoe)

(22)

Some links on Bufferbloat

• How can I tell if I’m suffering from bufferbloat?

–

http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/

• Can I do anything personally to reduce my suffering from

bufferbloat?

–

http://gettys.wordpress.com/2010/12/13/mitigations-and- solutions-of-bufferbloat-in-home-routers-and-operating-systems/

• Bufferbloat triggered the network neutrality debate

–

http://gettys.wordpress.com/2010/12/07/bufferbloat-and-network-neutrality-back-to-the-past/