• No results found

SIP Server Overload Control: Design and Evaluation

N/A
N/A
Protected

Academic year: 2021

Share "SIP Server Overload Control: Design and Evaluation"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

SIP Server Overload Control:

Design and Evaluation

Charles Shen and Henning Schulzrinne Columbia University

Erich Nahum

(2)

Session Initiation Protocol (SIP)

INVITE 200 OK 100 Trying INVITE 180 Ringing 180 Ringing 200 OK ACK ACK Media BYE INVITE 100 Trying 200 OK ACK 180 Ringing UA Proxy Proxy UA

§ Application layer signaling protocol for managing sessions in the Internet § Run on top of common transport layer, e.g., UDP, TCP and SCTP

§ Typical usage: voice-over-IP call setup, instant messaging, presence, conferencing

(3)

SIP Server Overload Problem

§

Many causes to SIP server overload

Natural disaster and emergency-induced call volume (earthquake) Predictable special events (Mother’s Day)

Flash Crowds: American Idol, “Free tickets to the third caller” Denial of service attacks

INVITE INVITE

INVITE

INVITE

§

Simply dropping requests on overload?

Simple message dropping induces more messages due to retransmission (especially for SIP over UDP)

E.g., Timer A for INVITE retransmission T1 = 500 ms, increases exponentially

(4)

SIP Server Overload Problem (Cont.)

§

Rejecting excessive requests upon overload?

SIP 503 (Service Unavailable) response code used to reject individual request

– overall sending rate is not reduced

– rejecting costs comparable CPU cycles with accepting requests!

503 (Service Unavailable) with Retry-After?

– Client completely shut off during the period specified – Reducing rate on/off may cause oscillation

§

Trying an alternative server?

(5)

Feedback-based SIP Overload Control

(6)

SIP Overload Feedback Control Design Considerations

Requirements

§

approaching ideal performance

§

Few “tweak” control parameters

Design decisions

§

SIP session as basic control unit

§

Characterizing SIP session

check number of INVITEs accepted

§

Dynamic session backlog estimation

count both INVITEs and non-INVITEs for current session backlog

§

Active source estimation

directly tracking each current active SE sending incoming load

0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 5 6 7 8 9 Load G oo dp ut Ideal Goodput

(7)

Window-based

Feedback Control Algorithms

N/A budget queuing delay

measurement Interval budget queuing delay

control Interval measurement interval Tuning parameters after processing new INVITE request every message arrival

every control interval Window size

adjustment algorithm

upon receiving session request (INVITE) Window size

decrement

Win-auto Win-cont

(8)

Rate-based

Feedback Control Algorithms

budget CPU occupancy control interval

measurement interval OCC tuning parameters budget queuing delay

control interval

measurement interval Tuning parameters

request acceptance ratio acceptable rate Rate adjustment algorithm (every control interval) Rate-occ Rate-abs

(9)

Simulation Assumptions and Metrics

§ RFC3261 compatible simulator built on OPNET

§ exponential call inter-arrival

standard seven-message call flow

§ 72 cps RE service capacity; 3000 cps rejection rate

§ UAs and SEs have infinite capacity § UDP transport, no link delay and loss

§ Piggyback feedback

§ Goodput = # of calls whose INVITE-to-ACK delay below 10 s

§ Delay = time from INVITE sent to ACK (200 OK) received

(10)

No

Feedback Control

Simple drop

§ message dropped when queue full

Threshold rejection

§ queue length configured with a high and a low threshold value

high threshold: new INVITE rejected but other messages processed Low threshold: new INVITE processing restored

Similar congestion collapse

*

but different reasons:

§ Simple Drop

Only 1/3 of INVITEs arriving at the callee

all 180 RINGING and most of the 200 OK also dropped due to queue overflow

Threshold Rejection 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 Load G oo dp

(11)

Sensitivity to Budget Queuing Delay and Control Interval

§

Small queuing delay (< ½ T1 timer) avoids

timeout and gives best results

§

Example results for win-disc

delay budget (DB) <= 200 ms control interval (CI) = 200 ms

goodput degraded by 25% for DB = 500 ms

§

Similar results for win-cont and rate-abs

§

Sensitivity of control interval

smaller CI is better

§

Example results for win-disc

at DB =200 ms, CI <= 200 ms sufficient to archive unit goodput in our scenario

DB = 200ms DB = 300ms DB = 400ms DB = 500ms DB = 600ms 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 1 2 3 4 5 6 7 8 9 10 11 Load G oo dp ut CI = 200ms CI = 500ms CI = 1s 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 Load G oo dp ut

(12)

Impact of Control Interval across Algorithms

§

Comparing CI for win-disc, rate-abs and rate-occ

*

at D

B

= 200ms

§

Both win-disc and rate-abs close to unit goodput except CI = 1s with

heavy load

§

win-disc more sensitive to CI than rate-abs

§

rate-occ not as good as the other two

0 0.2 0.4 0.6 0.8 1 1.2 14ms 100ms 200ms 1s G oo dp ut

win-disc rate-abs rate-occ

0 0.2 0.4 0.6 0.8 1 1.2 14ms 100ms 200ms 1s G oo dp ut

(13)

Best Performance Comparison across Algorithms

All except rate-occ reach unit goodput

§ no retransmissions

§ server always busy processing messages § each single message part of a successful

session

rate-occ < unit goodput

§ artificial 85% CPU limit § occupancy too indirect

§ extremely small CI improves performance at heavy load but incurs problems

rate-occ1 rate-occ2 win-cont win-disc rate-abs 0.7 0.75 0.8 0.85 0.9 0.95 1 0 1 2 3 4 5 6 7 8 9 10 Load G oo dp ut

(14)

Fairness

User-centric fairness

§ equal success rate for each individual user

§ implementation: divide RE capacity proportionally to original SE load arrivals § applicability example: “Free ticket to the third caller”

Provider-centric fairness

§ each provider (SE) gets the same aggregate share of total capacity § implementation: divide RE capacity equally among SEs

§ applicability example: equal-share SLA

Customized fairness

§ any allocation as pre-specified by SLA, …

(15)

Dynamic Load Performance with Provider Centric Fairness

§ Realistic server to server overload situations likely

short periods of bulk loads accompanied by source arrivals or departures

§ Example result using rate-abs algorithm

§ Each upstream SE share close to equal RE capacity § Fast dynamic transition

ua1 ua2 ua3 0 1 2 3 4 5 0 200 400 600 800 1000 1200 1400 1600 1800 Time (sec) Lo ad ua1 ua2 ua3 0 0.2 0.4 0.6 0.8 1 1.2 0 200 400 600 800 1000 1200 1400 1600 1800 Time (sec) G oo dp ut

(16)

User Centric Fairness

ua1 ua2 ua3 0.2 0.4 0.6 0.8 1 G oo dp ut ua1 ua2 ua3 0 1 2 3 4 5 0 200 400 600 800 1000 1200 1400 1600 1800 Time (sec) Lo ad

§ Double feed architecture

Provide incoming load

§ Example using win-cont algorithm

§ Upstream SEs share RE

capacity proportionally § Fast dynamic transition

(17)

Win-auto

ua1 ua2 ua3 0 1 2 3 4 5 0 200 400 600 800 1000 1200 1400 1600 1800 Time (sec) Lo ad ua1 ua2 ua3 0 0.2 0.4 0.6 0.8 1 1.2 0 200 400 600 800 1000 1200 1400 1600 1800 Time (sec) G oo dp ut

§ Source arrival transition time could be noticeably longer

§ Hard to enforce explicit fairness

no processing intervention

§ Still achieves aggregate unit goodput

(18)

Conclusions and Future Work

ß

Provider-centric fairness

ß

Optimal dynamic performance

ß

Optimal steady state performance

ß

Win-auto (with double-feed) Rate-abs (with double-feed) (with double-feed) User-centric fairness Rate-occ Win-cont Win-disc

(19)
(20)

References

Related documents

Therefore, the indices of Holdridge’s life zone derived from climate parameters were combined with spectral–temporal information of vegetation extracted from remotely sensed data

Our sales volume of cement products increased slightly by 0.4 million tons from 45.8 million tons in the ten months ended October 31, 2013 to 46.2 million tons in the ten months

Predictive Services/Intelligence Functions Interagency Cooperation Functions Management and Organization Functions When trying to identify issues related to dispatch,

Project portfolio management (PPM) is a process of subsequent activities which ensures that the right projects and programs are executed the right way and that the intended

To state generalization error bounds for meta-algorithms, we need to define a statistical mea- sure of the performance of an algorithm A with respect to an environment E , analogous

a , d , g The predicted responses of fucoxanthin content, fucoxanthin productivity and power input to light source, respectively, at 100 μmol/m 2 /s and 18/6 light regime; b , e

The proposed research model was developed by extending the unified theory of acceptance and use of technology (UTAUT) with culture and perceived security into the model, in order

When using triggered EMG neuromonitoring to evaluate screw placement, it is important that for the Fixed Angle and Sagittal Adjusting Screws the probe is in contact with the