Defending the Network: Mitigating Attacks

(1)

Defending the

Network:

(2)

Agenda

  Six Phases of Incident Response

  Reacting with the Data Plane

  Reacting with the Control Plane

(3)

Agenda

(4)

Six Phases of Incident Response

(5)

Preparation Prep the Network Create Tools Test Tools Prep Procedures Train Team Practice Identification How do you know about the attack? What tools can you use?

What’s your process for communication? Classification What kind of attack is it? Traceback Reaction What options do you have to remedy?

Post Mortem What was done?

Can anything be done to prevent it?

How can it be less painful in the future?

(6)

Preparation—Develop and Deploy a Solid

Security Foundation

Preparation

  Includes technical and non-technical components

  Encompasses best practices

  The hardest yet most important phase

  Without adequate preparation, you are

destined to fail

  The midst of a large attack is not the time to be implementing foundational best practices and processes

(7)

Preparation

  Know the enemy

– Understand what drives the miscreants – Understand their techniques

  Create the security team and plan

– Who handles security during an event; is it the security folks;

the networking folks

– A good operational security professional needs to be a cross

between the two: silos are useless

(8)

Preparation

  Establish upstream/downstream contacts

– Understand their capabilities

– Establish a relationship and contact procedures – An attack is no time to figure out how to contact an

upstream or understand how they could potentially assist you

  Infrastructure security

– All of the techniques talked about today also assume

that the infrastructure is available to route and forward

(9)

Are You Pushing the Envelope?

  Know the performance envelope of all your equipment

(routers, switches, workstation, etc.). You need to know what your equipment is really capable of doing.

  Know the capabilities of your network. If possible, test it. Surprises are not kind during a security incident.

  PPS vs. BPS and how enabling features impacts them

(10)

Are You Pushing the

Envelope? Get Real!

  Operator, “I tried to push my aircraft to 70,000 ft and it

stalled.”

  Vendor, “But the aircraft was only designed for a 50,000

ft ceiling.”

  Operator, “I need it to go to 70,000 ft, so you should

make that happen.”

  Vendor, “But that is not going to happen; 50,000 ft is

the only thing it can do. You knew that when you bought it.”

  Operator, “Your equipment sucks if you cannot exceed

(11)

Identification, Classification, and Traceback

  All of this assumes you can detect and understand

the attack

  Reacting to attacks depends, in a lot of ways, on how you detect the attacks

  Time of reaction is often a critical factor

– Once stateful devices fail, the restoration path is

(12)

Reaction

  Many varying reaction mechanisms

  No one tool or technique is applicable in all circumstances

– Think ‘toolkit’

– Automate where possible

– Don’t forget about the operational costs

  It is critical to identify and classify an attack so you can choose the most appropriate mitigation tool

– Every problem does not call for a hammer solution, simplicity

is key

(13)

Attack Vectors

  Infrastructure attacks

– Attacks against vulnerability or protocol weakness – (D)DoS attacks directed at infrastructure

  Downstream customer attacks

– Attack that only impacts target IP

– Attack that impacts downstream customer

– Attack where collateral damage impacts multiple

customers

  Downstream customer sourced attacks

(14)

Attack Vectors

  Infrastructure attacks

– Attacks against vulnerability or protocol weakness

– (D)DoS attacks directed at infrastructure

  Downstream customer attacks

– Attack that only impacts target IP

– Attack that impacts downstream customer

– Attack where collateral damage impacts multiple

customers

  Downstream customer sourced attacks

(15)

Post Mortem

  The step everyone forgets or doesn’t make time

to conduct

  What can you do to make it faster, easier, less painful in the future?

  Complete the loop

Post Mortem—Analyzing What Just Happened.

What Can Be Done to Build Resistance to the

Attack Happening Again

(16)

Agenda

(17)

(18)

RFC3704/BCP84 Ingress Packet Filtering

  Packets should be sourced from valid, allocated address space, consistent with the topology and space allocation

– Our goal here is to bind the problem and reduce the requirements for

implementing security

  No BCP84 means that:

– Devices can (wittingly or unwittingly) send traffic with spoofed and/or

randomly changing source addresses out to the network

– Complicates trace back immensely – Sending bogus traffic is not free!

(19)

BCP 84 Packet Filtering Principles

  Filter as close to the edge as possible

  Filter as precisely as possible

  Filter both source and destination where possible

  Can be implemented in various ways

– Infrastructure ACLs (iACLs)

– Unicast reverse path forwarding (uRPF)

– Cable source verify DHCP

(20)

Where to React?

Peer B Peer A IXP-W IXP-E Upstream A Upstream A Target NOC Sinkhole Network Upstream B Upstream B POP

(21)

Peer B Peer A IXP-W IXP-E Upstream A Sinkhole Network Upstream B Upstream B

Where to React?

The Proper Reaction is the Easiest,

Fastest, and Simplest One That Can

Minimize the Collateral Damage

(22)

Reacting with the Data Plane: Access Control List (ACL)

(23)

Reacting to an Attack with ACLs

 

Traditional method for stopping attacks

 

Scaling issues encountered:

– 

Operational difficulties

– 

Changes on the fly

– 

Multiple ACLs per interface

– 

Performance concerns

(24)

ACLs: Deployment Considerations

  How does the ACL load into the router? Does it

interrupt packet flow?

  How many ACEs can be supported in hardware?

In software?

  How does ACL depth impact performance?

  How do multiple concurrent features affect

(25)

10,000,000 1,000,000 100,000 10,000 1,000 PPS Performance Envelope

Working the PPS Engineering

1 Gbps

100 Mbps

CPU Limit

(26)

ACL Update and Reaction Speed

  Internet speeds above the OC-12/48 range require specialized forwarding/feature ASICs to provide services at line rate

  ACLs loaded into these ASICs require special processing:

1.  Load ACL into router from mgmt app or ftp server (transfer time for big ACLs)

2.  Commit ACL to “active”

3.  Pre-process (compile) ACL

4.  Push to Line Card(s) (if distributed architecture)

5.  Process for loading into Line Card ASIC

6.  Load into Line Card ASIC and activate

  Modular design, simplicity, and ‘amplification effect’ principles apply to ACL design

Example: 100Kb file for 5,000-Line ACL

• access speed- and memory card-

dependent, can be slow… e.g., “minutes”

• small

• Can be lengthy: 10’s of seconds to min’s • msecs

• small

(27)

Filtering Fragments

  Fragments can be explicitly denied

  Fragment handling is enabled via fragments keyword

  Default permit behavior  permit fragments that match ACE L3 entries

  Denies fragments and classifies fragment by protocol:

– access-list 110 deny tcp any any fragments – access-list 110 deny udp any any fragments – access-list 110 deny icmp any any fragments

(28)

Spoofed Source Addresses Customer Traffic Packet Shield # 1 Packet Shield # 2 Packet Shield # 3 Packet Shield # 4

Targeting the Infrastructure

Application Filters—Policy Enforcement

Targeting the Customer

Packet Filtering Viewed Horizontally

  The best ACL may actually be multiple ACLs at

(29)

ACL Construction

  Most common problems:

– Poorly-constructed ACLs

– Ordering matters!

– Understand platform specifics (i.e., 6500/7600

LOUs, masks)!

  Scaling and maintainability issues with ACLs are

commonplace

  Make your ACLs as modular and simple as

(30)

ACL Categories: Hybrid Philosophy

  Anti-spoofing

  Anti-bogon (source)

  Infrastructure

  Explicit deny specific L3

  Incident reaction

  Explicit permit L3 (good traffic)

  Explicit deny everything else

(31)

Layered/Modular ACLs

  Anti-spoofing

  Anti-bogon (source)

  Infrastructure

Hybrid Permit/Deny

Rarely Changes Sometimes Changes Sometimes Changes Changes Every Day

Rarely Changes Rarely Changes

(32)

Changes Every Day Rarely Changes Sometimes Changes Sometimes Changes Sometimes Changes Rarely Changes Rarely Changes Rarely Changes Sometimes Changes   Anti-spoofing   Anti-bogon (source)   Infrastructure

  Explicit deny everything else

Operational Cost Impact

Hybrid Permit/Deny

$$$$$$ $ $$ $$ $ More Static More Static Very Dynamic

(33)

ACL Summary

  ACLs are widely deployed as a primary containment

tool

  Prerequisites: identification and classification—need to

know what to filter

  Apply as specific an ACL as possible

  ACLs are good for static attacks, not as effective for

rapidly changing attack profiles

  Understand ACL performance limitations before

an attack occurs

(34)

The Pros and Cons of ACLs

  ACLs - key strengths:

– Detailed packet filtering

(ports, protocols, ranges, fragments, etc.)

– Relatively static filtering environment – Clear filtering policy

  ACLs can have issues when faced with:

– Dynamic attack profiles

(different sources, different entry points, etc.)

– Frequent changes

– Quick, simultaneous deployment on a multitude of devices – Operationally hard to remove

(35)

Reacting with the Data Plane: Committed Access Rate (CAR)

(36)

The Internet Customers

Layer-3 CAR Filter

Reacting to an Attack with CAR

  Layer-3 input and output rate limits—specifically input rate limits

  Security filters use the input rate limit to drop packets before they are forwarded through the network

  Aggregate and granular limits

– Port, MAC address, IP address, application, precedence, QOS_ID   Excess burst policies

(37)

A CAR Example: SMURF

access-list 199 permit icmp any <target> echo-reply access-list 199 deny ip any any

interface POS2/0

rate-limit output access-group 199 256000 64000

(38)

access-list 199 permit tcp any <target> eq www syn access-list 199 deny ip any any

interface POS2/0

rate-limit output access-group 199 256000 64000 64000 conform-action transmit exceed-action drop

  Syn-floods are generally high volume

  If attack is 99% of traffic, legitimate traffic has a small chance of making it through the rate-limit

  Are we really achieving anything?

(39)

Agenda

(40)

(41)

Routers Drop Data, Often

AS 100 AS 65530 10.1/16 AS 65531 10.1.0.0/19 10.1.32.0/19 10.1.64.0/19 Scans, Backscatter, Other Garbage

B C A D F E G H

(42)

Reacting with the Control Plane: Destination-Based

(43)

Black Hole Filtering

  Black hole filtering or black hole routing forwards a

packet to a router’s bit-bucket

– Also known as “route to Null0”

  Works only on destination addresses, since it is really

part of the forwarding logic

  Forwarding ASICs are designed to work with routes to

Null0—dropping the packet with minimal to no performance impact

  Used for years as a means to ‘blackhole’

(44)

Remotely Triggered Black Hole Filtering

  We will use BGP to trigger a network-wide

response to an attack

  A simple static route and BGP will enable a

network-wide destination address black hole as fast as iBGP can update the network (msecs)

  This provides a tool that can be used to respond to

security-related events and forms a foundation for other remotely triggered uses

(45)

Remotely Triggered Black Hole (RTBH)

  Configure all edge routers with static route to Null0

(must use some “reserved” network)

  Configure trigger router

– Part of iBGP mesh

– Dedicated router recommended

  Activate black hole

– Redistribute host route for victim into BGP with next-hop

set to 192.0.2.1

– Route is propagated using BGP to all BGP speakers and

(46)

Step 1: Prepare All the Routers with Trigger

  Select a small block that will not be used for

anything other than black hole filtering; Test-Net (192.0.2.0/24) is optimal since it should not

be in use

  Put a static route with Test-Net—192.0.2.0/24 to Null0 on every edge router on the network

(47)

Peer B Peer A

Step 1: Prepare All the Routers with Trigger

IXP-W IXP-E Upstream A Upstream A Sinkhole Network 171.16.61.0/24 Upstream B Upstream B

Edge Router with Test-Net to Null0 Edge Router with

(48)

Step 2: Prepare the Trigger Router

  Should be part of the iBGP mesh—but does not

have to accept routes

  Can be a separate router (recommended)

  Can be a production router

  Can be a workstation with Zebra/Quagga

(interface with Perl scripts and other tools)

  Can be Arbor Peakflow SP – GUI interface,

integrated with detection, cleanup timer, etc.

The Trigger Router Is the Device that Will

Inject the iBGP Announcement into the

ISP’s Network

(49)

Trigger Router’s Config

router bgp 65535

redistribute static route-map static-to-bgp !

route-map static-to-bgp permit 10 match tag 66

set ip next-hop 192.0.2.1 set local-preference 200 set community no-export set origin igp

!

route-map static-to-bgp permit 20

Match Static Route Tag Redistribute Static with a Route-Map Set Next-Hop to the Trigger Set Local-Pref

(50)

Step 3: Activate the Black Hole

  Add a static route to the destination to be black

holed; the static is added with the “tag 66” to keep it separate from other statics on the router

  ip route 172.16.61.1 255.255.255.255 null0 tag 66

  BGP advertisement goes out to all

BGP-speaking routers

  Routers receive BGP update and “glue” it to the

existing static route; due to recursion, the next-hop is now Null0

(51)

Step 3: Activate the Black Hole

BGP

Best Path Selection

BGP 65560 RIB AS 65000’s Routes AS 65535’s Routes AS 65536’s Routes FIB 172.16.61.1 next-hop = 192.0.2.1 with no-export 172.16.61.1next-hop = 192.0.2.1 FIB Glues 172.16.61.1’s Next-Hop to Null0, Triggering the Black

(52)

Step 3: Activate the Black Hole

BGP Sent—172.16.61.1Next-Hop = 192.0.2.1

Static Route in Edge Router—192.0.2.1 = Null0

172.16.61.1 = 192.0.2.1 = Null0

Next-Hop of 172.16.61.1

(53)

Step 3: Activate the Black Hole

A B _C D E Peer B Peer A IXP-W IXP-E Upstream A Upstream B Upstream _B Upstream A iBGP

(54)

A B _C D E Peer B Peer A IXP-W IXP-E Upstream A Upstream B Upstream _B Upstream A

Customer Is DOSed (After) Packet Drops

Pushed to the Edge

F POP Target NOC G iBGP Advertises List of Blackholed Prefixes

(55)

Trigger Router Config

  Can use multiple tags

  One tag to redirect attack to sinkhole

  Another tag to redirect attack to anycast sinkhole

  Multiple tags to black hole for different reasons

– Tag #1 is for ongoing (d)DoS attack

– Tag #2 is for black holing botnet command and control – Tag #3 is for phishing site

– Tag #4 is for SPAM

(56)

An Alternative: Community-Based Trigger

  BGP community-based triggering allows for more

fine-tuned control over where you drop the packets

  Three parts to the trigger:

– Static routes to Null0 on all the routers

– Trigger router sets the community

– Reaction router (on the edge) matches community

(57)

Allows for More Control on the Attack Reaction

Why Community-Based Triggering?

  Trigger community #1 can be for all routers in the network

  Trigger community #2 can be for all peering routers; no customer routers—allows for customers to talk to the DOSed customer within your AS

  Trigger community #3 can be for all customers; used to push a inter-AS traceback to the edge of your network

  Trigger communities per ISP peer can be used to only black hole on one ISP peer’s connection; allows for the DOSed customer to have partial service

(58)

Trigger Router Config

(Community-Based Approach)

redistribute static route-map static-to-bgp !

set community 65535:123 set local-preference 200 set community no-export set origin igp

!

set community 65535:124 set local-preference 200 set community no-export set origin igp

Match Static Route Tag Redistribute Static with a Route-Map Set Community Set Local-Pref

(59)

Drop Router Config

(Community-Based Approach)

neighbor <ibgp peer> route-map ibgp-peers in !My Region

ip community-list 1 permit 65535:123 !Other region

ip community-list 2 permit 65535:124 !

route-map ibgp-peers permit 10 match community 1

!

route-map static-to-bgp deny 20 match community 2

This Router Drops

Set Next-Hop to trigger

(60)

Two RTBH Approaches

  Tag-based approach:

– Concentrates configuration complexity on one

“trigger” router

– Edge devices require simple static route to Null0

– Monitoring (OpEx)—Prefixes which are being dropped

(and why) best viewed on “trigger” router (e.g., “show run | include tag”)

  Community-based approach:

– Configuration complexity spread equally to all devices – Allows greater flexibility for drop control (e.g., regional)

– Monitoring (OpEx)—Prefixes which are being dropped on a particular

device (and why) can be determined by reviewing the output of “sh ip bgp community” on that device

(61)

Reacting with the Control Plane:

Service Provider Support for Customer Initiated Destination-Based Black Hole Filtering

(62)

Customer-Initiated RTBH

  Many service providers offer their customers a

customer triggered version of RTBH

– “We’ll accept /32s with community <AS>:666 and

we’ll black hole them in our network for you”

  It’s critical to understand which of your upstream/ peers support this

– How many prefixes will they accept?

– What community triggers it?

(63)

neighbor <customer> route-map customer-RTBH in

neighbor <customer> prefix-list customerA-filter in !

ip community-list 1 permit 65535:666 !

route-map customer-RTBH permit 10 match community 1

!

route-map customer-RTBH deny 20 !Deny BOGONs

(64)

Issues:

Customer-Initiated RTBH

  Must ensure prefix-list-based filtering

– We wouldn’t want your customers to be able to

black hole some important website, now would we?   Restrict the black hole to the PE router only with

advertise, restrict to network only with no-export, or pass along to peers and upstreams?

  Use the same eBGP session to customers or build

a dedicated eBGP session

– If using the same session, be careful with

(65)

Reacting with the Control Plane: Source- and Destination-Based

(66)

S/RTBH: Triggered Source Drops

  Dropping on destination is very important

– Dropping on source is often what we really need

  Reacting using source address provides some

interesting options:

– Stop the attack without taking the destination offline – Filter command and control servers

– Filter (contain) infected end stations

  Must be rapid and scalable

(67)

i/f 1 i/f 2 i/f 3 i/f 1 i/f 2 i/f 3 FIB: . . . S -> i/f x S D Data FIB: . . . S -> i/f 2 S D Data

router(config-if)# ip verify unicast source reachable-via rx

Strict uRPF Check

(68)

Any i/f: Forward i/f 1 i/f 2 i/f 3 i/f 1 i/f 2 i/f 3 FIB: . . . S -> i/f x . . . S D Data FIB: . . . . . . . . . S D Data Not in FIB or Route  Null0:

?

Loose uRPF Check

(Unicast Reverse Path Forwarding)

(69)

Source-Based Remotely-Triggered

Black Hole Filtering (S/RTBH)

  Uses the same architecture as destination-based

filtering and Unicast RPF

  Edge routers must have static in place

  They also require Unicast RPF

  BGP trigger sets next-hop—in this case the

(70)

Source-Based Remotely Triggered

Black Hole Filtering

  What do we have?

– Black Hole Filtering—if the destination address

equals Null0, we drop the packet

– Remotely Triggered—trigger a prefix to equal Null0

on routers across the network at iBGP speeds

– uRPF Loose Check—if the source address equals

Null0,

we drop the packet

  Put them together and we have a tool to trigger a

drop for any packet coming into the network whose source or destination equals Null0

(71)

A _{Peer B} Peer A Upstream A Upstream B Edge Routers Drop Incoming Packets Based on Their Source Address Edge Routers Drop Incoming Packets Based on Their Source Address iBGP

Customer Is DOSed (After) Packet Drops

Pushed to the Edge

IXP-W

Upstream A

IXP-E

Upstream B

(72)

Source-Dropping Caution

  Caution: you will drop all packets with that source

and/or destination

  Remember spoofing!

– Don’t let the attacker spoof the true target and trick

you into black holing it for them

– Whitelist important sites which should never be

blocked (i.e., root & TLD nameservers, etc.) via prefix-lists

(73)

Source-Based RTBH - S/RTBH

  Advantages:

– No ACL update

– No change to the router’s configuration – Drops happen in the forwarding path

– Frequent changes when attacks are dynamic

(for multiple attacks on multiple customers)

  Limitations:

– Source detection and enumeration

– Attack termination detection (reporting) – Resource utilization: finite resources

– Effects all traffic, on all triggered interfaces,

(74)

Agenda

(75)

(76)

Given Everything Said, What Remains?

  We have discussed techniques that are very

effective at limiting the collateral damage

  Raise the bar; stop only bad traffic

  In asymmetric environments, especially across

peers, packet spoofing is still problematic

  Detection of exactly who is attacking is problematic

  Doing all this in the core requires specialized hardware, which has scaling and availability problems

(77)

Network IDS/’IPS’ Terminology

  False positives: system mistakenly reports

certain benign activity as malicious; also called false alarms

  False negatives: system does not detect and report actual malicious activity

  False positives, performance, need for traffic symmetry, and increased risk of DDoS due to capacity/state are the banes of IDS technology!

  Additionally, you require a signature in order to stop the attack – what if this is a new attack?

(78)

What It Is Pros and Cons

Modern Stateful Firewalls: The Inadequate

Security Default

  Sometimes called a hybrid   Combines features of other

firewall approaches such as: – Access control lists

– Application-specific proxies/inspections – Stateful inspection

  Plus features of other devices: – Web (HTTP) cache

– Specialized servers – SSH, SOCKS, NTP – Most include VPN,

some include IDS/’IPS’

  Pro: Maintains most of the speed advantage of a simple stateful firewall

  Pro: Application layer gateway services provide application

security while resolving the NAT issue

  Con: Does not provide complete session termination, as would a full proxy

  Con: Actively tracks the state of incoming connections—a DoS issue

  Con: Performance – a DoS issue

  Con: ‘Inspectors’ are an attack surface

(79)

Formal Requirements for a Core

Security Device

  Need to avoid state

– Constant state tracking leaves us vulnerable to

DDoS attacks

  Doesn’t rely on signatures

– If I get an attack with no signature, I cannot block it – Possibly can use signature-like filters, however,

after the fact

  Doesn’t have to be in-line when it isn’t needed

  Scales easily

(80)

Firewalls and IDS/’IPS’ don’t help!

  It’s time to put the firewall and IDS/’IPS’ myth to rest!

Firewalls are policy-enforcement devices – they can’t help with DDoS, and in most cases, the policies applied to the firewalls have been devised with no visibility into network traffic, so the firewall rules bear little relation to what should actually be permitted and denied.

IDS/’IPS’ are by definition always behind the attackers – in order to have a signature for something, you must have seen it before.

IDS/’IPS’ have proven to be totally ineffective at dealing with application-layer compromises, which is how most hosts are botted and used for DDoS, spam, corporate espionage, identity theft, theft of intellectual property, etc.

Firewalls & IDS/’IPS’ output reams of syslog which lacks context, and which nobody analyzes. It is almost impossible to relate this syslog output to network behaviors. End-customers subscribe to traditional managed security services based on

firewalls and IDS/’IPS’, and still get compromised!

Firewall & IDS/’IPS’ deployments cause performance & usability problems, and don’t scale, shouldn’t be deployed in front of servers!

(81)

Core Design Philosophies

  Scale by using traffic shunting

  Core packet cleaning requirements

– 1) Validate incoming traffic to make sure it comes from

the source IPs that are in the SRC IP field of the packet

– 2) Evaluate these validated sources against a baseline

and then recommend either further processing or dropping for sources that misbehave

  Don’t need to stop every bad packet—instead, focus on

not stopping any good packets

– Pad thresholds to reduce likelihood of false positives

(82)

Packet Cleaning Issues

Shunting the Packets

(83)

Traffic Shunts

  Intercept and shunt traffic to the mitigation device —the “scrubber”

  Return good traffic back to the customer

  Need to avoid forwarding loops—means some sort

(84)

Arbor DDoS Solution: Diversion/Offramping

NetFlow to Arbor Peakflow SP

Protected Zone 1: Web

Protected Zone 2:

Name Servers _{Protected Zone 3:}

(85)

Arbor DDoS Solution: Diversion/Offramping

NetFlow to Arbor Peakflow SP Arbor TMS

(86)

1. Detect

2. Activate: Auto/Manual

Target

(87)

Arbor DDoS Solution: Diversion/Offramping

3. Divert Only Target’s Traffic

BGP Announcement

(88)

Arbor DDoS Solution: Diversion/Offramping

1. Detect

4. Identify and Filter the Malicious

BGP Announcement

Target

Traffic Destined to the Target

(89)

Arbor DDoS Solution: Diversion/Offramping

2. Activate: Auto/Manual Legitimate Traffic to Target BGP Announcement

4. Identify and Filter the Malicious

(90)

Arbor DDoS Solution: Diversion/Offramping

1. Detect 2. Activate: Auto/Manual Legitimate Traffic to Target 6. Non-Targeted Traffic Flows Freely BGP Announcement

5. Forward the Legitimate

Target

3. Divert Only Target’s Traffic 4. Identify and Filter

(91)

Design Considerations

  Network chokepoints

–  Back haul attack traffic across potential costly or congested

links.

  SLA’s being offered

–  Availability

–  Guaranteed mitigation capacity   Provisioning and Operation

–  Simpler is better

  Existing Networking technology

–  Often limited by what capabilities exist in the network today –  Number of mitigation devices and capacity

(92)

Deployment strategies

  Distributed mitigation

–  Regional deployment strategy

–  Per PoP or per Peering center location

Peakflow SP TMS PEERING L3 Switch CORE Peakflow SP TMS PEERING CORE Internet POP A POP B P1 P2 P1 P2 C1 C2 C1 C2 Core S2 S1 _S1 _S2

(93)

Distributed Mitigation

Benefits

  Keeps attack traffic at edge

  Limited backhauling of attack traffic

  Limits exposure of Internal infrastructure

  Easier Capacity planning

–  Not as worried about how

much attack traffic would have to be backhauled

  Good shared mitigation

Drawbacks

  Power and Space Requirements

  Scalability - How much

mitigation capacity can you add at each location

  Harder to dedicate mitigation capacity per-customer

  Potentially more equipment to purchase upfront

  Potential to backhaul

Customer to Customer attack traffic

(94)

Distributed Mitigation – Bottom line

  This can be a workable strategy for Tier-Two, -Three, and MSOs

  Strategically-consolidated Internet access points. Backbone capacity is already focused on these locations.

  Customer-to-customer attacks are likely to be small in size. Not many large-bandwidth customers.

  Backbone capacity and infrastructure likely to be vulnerable to large scale attacks. Better ROI to keep attack traffic off of backbone.

  Fewer business customers likely to pay for dedicated mitigation capacity.

(95)

Deployment Strategies

  Centralized mitigation

–  AKA ‘Cleaning Center’

Data IP Core D1 D2 P1 S1 Peers S1 P2 C2 C2 Transit S1

(96)

Centralized Mitigation

Benefits

  Can start small

–  2 TMS devices for

redundancy

  Can be located where

power and space allow easy growth

  Possibly fits in more with other hosted service

offerings

  Easier to troubleshoot

–  You know where traffic

should go to and from

Drawbacks

  Back haul attack traffic

  Potential for backbone infrastructure to be

impacted by attack traffic

  Must plan for capacity

–  How much attack traffic

potentially could customers pull to Cleaning Center

  Limited topological/

geographical diversity – regional Cleaning Centers are the answer

(97)

Centralized Mitigation – Bottom Line

  This is a good strategy for most deployments

  Very distributed Peering locations and limited or no purchased Transit.

  Customer-to-customer attacks can be quite large.

–  Think of a MSO-based zombie attacking large Bank.

  Many large-bandwidth customers.

  Backbone and Internet Data Center capacity readily

available.

(98)

Offramping/Diversion

  Goal – Get attack traffic to correct TMS and Port

–  BGP Next-hop Anycast

  Get attack traffic to closest TMS

  Load balance distributed attacks geographically

  Built-in redundancy by leveraging routing protocols

–  ECMP (Equal Cost Multi-Path)

  Multiple equal routes pointing to same advertised TMS BGP next-hop IP to achieve multi-gigabit performance

  ‘CEF-based load-leveling’ is Cisco terminology

–  SLA-based Next-hops or Communities

  Provide dedicated mitigation ports and capacity to meet customer SLA

(99)

BGP Next-hop Anycast

PEERING CORE Peakflow SP TMS PEERING CORE Internet POP A POP B P1 P2 P1 P2 1.TMS advertises off ramp (victim) prefix w/ next hop of TMS

L0 (virtual IP)

S1 S2 S1 S2

2. P1 and P2 do a recursive lookup on TMS L0 which

matches static route to directly connected interface/s 3. Since victim

prefix and next-hop is learned in

both PoP-A and PoP-B Attack traffic is sent to

(100)

ECMP (Equal Cost Multi-Path)

Peakflow SP TMS PEERING CORE POP B Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp 1.TMS advertises off ramp (victim) prefix w/ next hop

of 1.1.1.1

2. Equal static routes to 1.1.1.1(BGP Next-hop) load balances off

ramp across two or more ports

Note: You can also use port bonding (logical ports) to treat

(101)

SLA-based – Dedicated Mitigation Capacity

Peakflow SP TMS PEERING CORE POP B 1.TMS advertises off-ramp prefix w/ next hop of 1.1.1.1 and customer

specific off-ramp community 2. Router’s BGP

policy takes customer specific community

and changes next-hop to dedicated pt 2

pt interface or recursive lookup match ECMP routes

(102)

Off-ramping/Diversion – Bottom Line

  Use lessons learned from Sinkhole and Blackhole

usage. Keep it simple with next-hop changes or add granularity by utilizing BGP communities,

multiple next-hops and ECMP routes to next-hops.

  To achieve multi-gigabit mitigation capability, traffic must transit multiple TMS ports.

  Think about where routes are being heard and

how. Then run through failure scenarios. Is attack traffic still going to make it to a TMS or will it go to customer dirty?

(103)

On-Ramping/Re-injection

  Goal – Avoid Routing loops

–  GRE/mGRE Tunnels

  Routing loop avoided by forwarding decision being performed on tunnel endpoint

–  VRF VPN

  Avoid routing loop by utilizing non-global route table

–  L2 Forwarding

  Routing loops avoided by selective route advertisement and distribution control. Requires hierarchical logical network design.

(104)

Tunnel – GRE (Generic Route Encapsulation)

Peakflow SP TMS L3 Switch CORE POP B Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp Clean Traffic

One to one mapping of off-ramp port to on-ramp

port

To avoid routing loop, TMS encapsulates packet in GRE

packet with tunnel Source and Destination IP

Customer CPE processes GRE packet on Tunnel

interface and forwards original packet to victim IP TMS advertises victim

prefix which is propagated throughout network.

(105)

On-Ramping/Re-injection via GRE/mGRE

Benefits

  Easy to avoid routing loops

  Redundant tunnels can be configured

  Proven On-Ramping method

Drawbacks

  per-customer mitigation prefix configuration (not dynamic)

  Provisioning Engineers / Systems must touch Peakflow system

  Per-TMS tunnel

configuration (more work for distributed model)

(106)

VRF - VPN

Peakflow SP TMS Customer Aggregation Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp Clean Traffic

One to one mapping of off-ramp port to on-ramp

port TMS advertises

victim prefix which is propagated throughout network. VRF VRF MPLS/IP Core VRF VRF Routing loop is avoided because traffic is being forwarded inside VPN/ Label switched Static route to customer’s protected CIDR is preconfigured

(107)

On-Ramping/Re-injection via VRF-VPN

Benefits

  Easy to avoid routing loops

  Leverages built in network redundancy

  Simple static route required for each “protected”

customer prefix. Most provisioners know how to do this. Drawbacks   Per-customer mitigation prefix configuration   MPLS must be used throughout network

  Multiple technologies in use could cause operational

(108)

Direct L2 Onramping/Re-injection

Peakflow SP TMS L3 Switch CORE POP B Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp Clean Traffic

Off-ramp and On-ramp routers must be different

L3 devices

To avoid routing loop, you restrict the more specific victim prefix to only specific

routers

Customer CPE processes GRE packet on Tunnel

interface and forwards original packet to victim IP

TMS advertises victim prefix which is

propagated throughout network.

(109)

On-Ramping/Re-Injection via Direct L2

Benefits

  Leverages built in network redundancy

  No special router configurations

  New “protected” customer prefixes can be dynamically on-ramped. No new

configuration required.

Drawbacks

  Harder to avoid routing

loops. Special care of BGP announcements and route distribution is required.

  Difficult to mitigate customer to customer attacks especially if both customers are on same router.

  Static in nature, not easily re-configured/scaled

(110)

Shunts in the Data Center

 

All devices on the same subnet

– 

Either TMS-driven or configured in router

– 

May use remotely-triggered shunt trick

– 

 All traffic in core to target goes to the TMS

 

Optionally, you can use VLANs to avoid loops

– 

Bypassing the “modified” router is trivial with

(111)

Hosting/SP Data Center

I S tys 50 Ppy SS t rcsr I C T S S Ca r Pwp R S S C Cisco IOS® Router Cisco Catalyst® Switch Firewalls GEnet Arbor TMS Alert Arbor TMS Backbone Backbone Switches Arbor Peakflow SP

(112)

Shunts in the Data Center

ISP A ISP B Dirty Traffic Clean Traffic BGP Peering for Diversion Arbor Peakflow SP Arbor TMSs (Cleaning IDC Edge IDC Edge

(113)

Shunts in the IP Core: GRE Injection

  Core routes target IP to the TMS

– Either TMS-driven or configured in router

– May use remotely triggered shunt trick

–  All traffic in core to target goes to the TMS

  Injection into GRE tunnel

– Bypassing the “modified” core routing

– GRE starts on TMS-attached router, terminates on

(114)

Shunts in the IP Core: GRE Injection

Target (1.1.1.1) TMS (2.2.2.2) Attack GRE (Preconfigured) 1. BGP: I’m next-hop for 1.1.1.1

2. Redistribution into Core

3. Rerouting to 1.1.1.1

4. Injection to Target

(115)

Shunts with MPLS VPNs

 

Easy to deploy:

– 

Core remains untouched, injection VPN

preconfigured

– 

VPN invisible to core

 

No performance impact

 

No need to touch CPE

(116)

Target (1.1.1.1) TMS (2.2.2.2) Attack MPLS VPN (Preconfigured) 1. BGP: I’m next-hop for 1.1.1.1

2. Redistribution into Core 3. Rerouting to 1.1.1.1 4. Injection to VPN

MPLS VPN Shunt

VPN VPN

(117)

Packet

Cleaning Issues

(118)

Packet

Cleaning in the Core:

The Cleaning Center

SP Core Customer A Customer B Peering SP2 PE Core ASBR Core Core PE CE CE Dirty Traffic Arbor TMS (Customer B) Arbor TMS (Customer A) Arbor Peakflow SP Out-of-Band WAN Connection Peering SP1 Cleaning Center NOC NetFlow

(119)

Scaling a Cleaning Center: Clustering

Topology with ECMP/CEF Load-leveling

Load-Leveling Router – up to 16 TMSes, 160gb/sec w/N7K TMS Mitigation Cluster Attack TMS TMS TMS TMS TMS

(120)

Backbone Option: Cleaning Centers

 

Question is: how many?

– 

Most national providers have decided to start

with two

– 

Geographic redundancy

– 

Adequate incoming bandwidth in key locations

– 

Limit the backhaul of traffic across expensive

links

(121)

Packet Spoofing

  What can be spoofed?

– Any field in a packet header (well, almost)

– Spoofing most often happens in combinations with

several fields being spoofed   Spoofing is used to:

– Hide the source so the attacker or resource is not

revealed

(122)

TMS Mitigation Processing

  DDoS attacks consist of undesirable traffic mixed in with some amount of desirable traffic

–  Undesirable traffic may come in large quantities or it could

come shaped in a way designed to disrupt normal processing

  The TMS allows desirable traffic through while lowering the impact of undesirable traffic

  The TMS uses various countermeasures – defense

mechanisms – to target and remove the most egregious attack traffic to allow the network to continue operating

–  Different countermeasures are designed to stop different types

of attack traffic

–  The countermeasures as a whole provide defense in depth

(123)

DDoS Attacks

  DDoS attacks can consist of just about anything

–  Large quantities of raw traffic designed to overwhelm a

resource or infrastructure

–  Application specific traffic designed to overwhelm a particular

service – sometimes stealthy in nature

–  Traffic formatted in such a way to disrupt a host from normal

processing

–  Traffic reflected and/or amplified through legitimate hosts –  Traffic from compromised sources or from spoofed IP

addresses

–  Pulsed attacks – start/stop attacks

(124)

TCP Stack Flood Attacks

  Description

–  Flood a certain aspect of the TCP connection

process to keep the host from being able to respond to legitimate connections

–  May be spoofed or non spoofed

  Peakflow SP Detection Capabilities

–  Misuse TCP SYN, RST, Total Traffic detection

  Peakflow TMS Mitigation Countermeasures

–  TCP SYN authentication, zombie army, white list/

black list

  Common names

(125)

Generic Flood Attacks

–  Flood of traffic for one or more protocols or ports –  Designed to look like normal traffic

–  Reflection attacks

–  May be spoofed or non spoofed

–  Misuse UDP, ICMP, Total Traffic detection

–  Profiled anomaly detection for managed object   Peakflow TMS Mitigation Countermeasures

–  White list/blacklist, zombie army, baseline enforcement, rate

limiting, payload filtering

(126)

Fragmentation Attacks

–  A flood of TCP or UDP fragments are sent to a victim

overwhelming the victim’s ability to re-assemble the streams and severely reducing performance

–  Fragments may also be malformed in some way –  May be a result of a network mis-configuration   Peakflow SP Detection Capabilities

–  Misuse IP Fragment detection

  Peakflow TMS Mitigation Capabilities

–  White list/blacklist, zombie army   Common names

(127)

Application Attacks

–  Attacks designed to overwhelm components of specific applications

–  Commonly seen against HTTP, DNS and SIP in particular

–  May be stealthy by mixing with a much higher traffic volume on the

same protocol/port

–  Requires TMS systems deployed in span mode doing appID and

feeding SP systems with application level managed objects defined.

  Peakflow TMS Mitigation Countermeasures

–  HTTP malformed, HTTP rate limiting, HTTP payload filtering, SIP

malformed, SIP request rate limiting, DNS authentication, DNS malformed, Payload regex filtering, Regex filtering

(128)

Connection Attacks

–  Attacks that maintain a large number of either ½

open TCP connections or fully open idle connections with the victim impeding new connections from

forming

–  Limited

  Peakflow TMS Mitigation Capabilities

–  TCP syn authentication, TCP Idle Reset

(129)

Vulnerability Exploit Attacks

–  Attacks designed to exploit a vulnerability in the victim’s

operating system

–  Some are single packet or very low level attacks

–  Many of these are obsolete in modern operating systems   Peakflow SP Detection Capabilities

–  Limited: ATF based fingerprint detection   Peakflow TMS Mitigation Capabilities

–  Limited: Malformed HTTP, SIP, DNS, White list/blacklist

–  The most effective method of stopping these attacks is to patch

hosts on your network

(130)

The Core Functional Components of an

Anti-DDoS Packet Scrubber:

Putting All This Together to Stop DDoS

  Destination detection

  Source verification (via anti-spoofing)

  Source detection (via anomalies)

  Source/attack blocking/filtering

(131)

Packet Cleaning via Shunts

  Advantages:

– Not on critical path during normal operation – Anomaly-based detection with baselining – Optimized for high-performance blocking

– Is resistant to state limitations of most other devices

  Limitations:

– Not designed to stop single-packet exploits, but regexp

countermeasure can be used if the exploit packet signature is known

– Inherent assumption of a “destination” to be protected – i.e., a server – Resource utilization: finite resources in the scrubber complex

(160gb/sec max per cluster with Cisco CEF-based load-leveling of 16 paths w/N7K– multiple clusters via IPv4 anycast addressing)

(132)

(133)