• No results found

In the Trenches of a Globally Spanning SIP Network

N/A
N/A
Protected

Academic year: 2021

Share "In the Trenches of a Globally Spanning SIP Network"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2009 - 2015 Twilio, Inc. All rights reserved.

& the days spent firefighting

(2)

(3)

© 2009 - 2015 Twilio, Inc. All rights reserved.

- Our SIP Network at a glance

- Loops

- Failover strategies

- Connection Management

- Registration

- Misc

@borjessonjonas

(4)

INTRO

- Deployed across the world

- 6 different AWS regions

(5)

© 2009 - 2015 Twilio, Inc. All rights reserved.

- Each region deployed across multiple AZ

- Each region looks the same

(6)

INTRO

Those SIP Load balancers = opensips

- Philosophy: simple & fast

- Transaction Stateful

- No external lookups

(7)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Max-Forwards:

- TTL

- Typical default 70

- Are you resetting it?

(8)

LOOPS

Basic SIP Load balancer

INVITE sip:[email protected] SIP/2.0 ...

INVITE sip:[email protected] SIP/2.0 …

(9)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Bug in downstream element

- SIP LB marked call inbound

- B2BUA marked call outbound

- SIP LB marked call inbound

...

(10)

LOOPS

Malicious attacks

INVITE sip:[email protected] SIP/2.0 …

Route: <sip:sip_lb_1;lr> Route: <sip:sip_lb_2;lr> INVITE sip:[email protected] SIP/2.0 …

Route: <sip:b2bua_no_14;lr> Route: <sip:sip_lb_2;lr>

(11)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Don’t trust Route headers from external entities!

(12)

FAILOVER STRATEGY

Node crash/misbehave

- Goal: need to handle it

(13)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Scenario: Shared Resource is the culprit

- SQL - NoSQL

- REST Service

- Another SIP Element - Network partition

What do you think about our failover strategy now?

(14)

FAILOVER STRATEGY

Be mindful when you failover

- Strategy:

- The node closest to the problem spot probably knows best - Each node is responsible for exhaustively trying all options - If a node fails to contact XXX, then say so

- A node will only failover on - 408

- 503 - ...

(15)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Drawback: network partition/bad network card

- Node is experiencing a localized network issue

- Node will signal to upstream that nope, no can do. Don’t failover - In this case, it would have helped.

Conclusion: you have to figure out what is worse for you.

(16)

FLOW MANAGEMENT

A flow:

- Represents a bidirectional communication path between two endpoints. - Identified by: Local IP:Port + Remote IP:Port + Transport

- Has to be maintained - NAT & FW

(17)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Registration

- What are we trying to achieve?

REGISTER sip:example.com SIP/2.0 ...

Via: SIP/2.0/TCP 192.168.0.100;rport;branch=... Contact: <sip:[email protected];transport=tcp;lr>

REGISTER sip:example.com SIP/2.0 ...

Via: SIP/2.0/TCP 192.168.0.100;rport=5678;received=62.63.64.65;branch=... Contact: <sip:[email protected];transport=tcp;lr>

Path: <sip:[email protected];transport=udp;lr;ob> 1

2

(18)

FLOW MANAGEMENT

Registration

- No RFC 5626 support in opensips - Can use force_tcp_alias

- but: tcpconn_add_alias:

possible port hijack attempt

(19)

© 2009 - 2015 Twilio, Inc. All rights reserved.

REGISTER sip:example.com SIP/2.0 ...

Via: SIP/2.0/TCP 192.168.0.100;rport;branch=...

Contact: <sip:[email protected];transport=tcp;lr>

REGISTER sip:example.com SIP/2.0 ...

Via: SIP/2.0/TCP 192.168.0.100;rport=5678;received=62.63.64.65;branch=...

Contact: <sip:[email protected];transport=tcp;lr> Path: <sip:[email protected];transport=udp;lr;ob> 1

2

Registration

- force_tcp_alias will store the connection

under a key computed from the src IP of the incoming connection + port found on the Via-header

(20)

FLOW MANAGEMENT

Registration

- Users behind same NAT => same “tcp_alias” - First wins

- Note: you must also “fix” the Contact header.

(21)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Maintain the Flow

- Keep-alive traffic

- Use TCP Keep alive

- But, doesn’t always work. AT&T e.g. will kill your connection anyway - Client has to send a payload (double CRLF) as ping

- iOS App

- What happens when it is backgrounded? - Conclusion: server has to send double CRLF

(opensips ask: please add server side double CRLF support)

(22)

FLOW MANAGEMENT

Registration Flood

- Huge issue, it’s the perfect DDoS - Client has to play along

- Honor 302 on REGISTER (not standard, we do this for our clients. Facilitates connection migration)

- If lose connection, wait randomly between 0 - X seconds - Honor Retry-After headers (server could send a 500 e.g) - Re-cycle the connection after X hours

- Server side:

- Limit the total no of connections/ip - Throttle incoming connections

(23)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Fragmentation

- Are you building a SIP based smartphone app? - Do you use ICE?

- Are your SDPs massive?

- Using TCP? (I certainly hope you are!)

- tcp_max_msg_chunks

TCP Connection Timeout

- Different in 1.11 than 1.8

(24)

MISC

Monitor Everything

- opensips

- we log all requests and responses in a parsable friendly manner - pub those stats to dashboards

Carrier Edge | US

XYZ / s

Carrier Edge | EU

XYZ / s

Carrier Edge | BR

XYZ / s

(25)

© 2009 - 2015 Twilio, Inc. All rights reserved.

Capture Everything

- tshark

- voipmonitor - we push to S3

- tools for downloading, parsing and drawing (heavily use pkts.io)

(26)

MISC

(27)

References

Related documents

Marmara University Atatürk Faculty of Education Journal of Educational Sciences, 46(46), 75-95. Higher Education Institution. Education faculty teacher training undergraduate

Bulk Liquid - General Tanker Terminal Code Remarks.. 1 There is safe access between the tanker

The developing countries like Bangladesh can take a vision in their national policy to integrate TPACK framework in the teachers training module because

Regarding marital status, differences in having selected diseases, hearing limitation, some functional limitations, or disability in some IADLs were not significant between the

Alternative strategies against MDR are the development of the novel concept of nanotechnology-based drug delivery systems, which selectively target the tumor-site (cellular or

protocols reduce data transfer time, protocol overhead, and.

 If the central node in a star network fails the If the central node in a star network fails the entire network is rendered inoperable. entire network is

I want to build upon this restructuring of how we look at the Romantic poets, looking closely at “Tintern Abbey,” “Mont Blanc,” The Prelude, and Prometheus Unbound to see how