Defending the
Network:
Agenda
Six Phases of Incident Response
Reacting with the Data Plane
Reacting with the Control Plane
Agenda
Six Phases of Incident Response
Reacting with the Data Plane
Reacting with the Control Plane
Six Phases of Incident Response
Preparation Prep the Network Create Tools Test Tools Prep Procedures Train Team Practice Identification How do you know about the attack? What tools can you use?
What’s your process for communication? Classification What kind of attack is it? Traceback Reaction What options do you have to remedy?
Post Mortem What was done?
Can anything be done to prevent it?
How can it be less painful in the future?
Preparation—Develop and Deploy a Solid
Security Foundation
Preparation
Includes technical and non-technical components
Encompasses best practices
The hardest yet most important phase
Without adequate preparation, you are
destined to fail
The midst of a large attack is not the time to be implementing foundational best practices and processes
Preparation
Know the enemy
– Understand what drives the miscreants – Understand their techniques
Create the security team and plan
– Who handles security during an event; is it the security folks;
the networking folks
– A good operational security professional needs to be a cross
between the two: silos are useless
Preparation
Establish upstream/downstream contacts
– Understand their capabilities
– Establish a relationship and contact procedures – An attack is no time to figure out how to contact an
upstream or understand how they could potentially assist you
Infrastructure security
– All of the techniques talked about today also assume
that the infrastructure is available to route and forward
Are You Pushing the Envelope?
Know the performance envelope of all your equipment
(routers, switches, workstation, etc.). You need to know what your equipment is really capable of doing.
Know the capabilities of your network. If possible, test it. Surprises are not kind during a security incident.
PPS vs. BPS and how enabling features impacts them
Are You Pushing the
Envelope? Get Real!
Operator, “I tried to push my aircraft to 70,000 ft and it
stalled.”
Vendor, “But the aircraft was only designed for a 50,000
ft ceiling.”
Operator, “I need it to go to 70,000 ft, so you should
make that happen.”
Vendor, “But that is not going to happen; 50,000 ft is
the only thing it can do. You knew that when you bought it.”
Operator, “Your equipment sucks if you cannot exceed
Identification, Classification, and Traceback
All of this assumes you can detect and understand
the attack
Reacting to attacks depends, in a lot of ways, on how you detect the attacks
Time of reaction is often a critical factor
– Once stateful devices fail, the restoration path is
Reaction
Many varying reaction mechanisms
No one tool or technique is applicable in all circumstances
– Think ‘toolkit’
– Automate where possible
– Don’t forget about the operational costs
It is critical to identify and classify an attack so you can choose the most appropriate mitigation tool
– Every problem does not call for a hammer solution, simplicity
is key
Attack Vectors
Infrastructure attacks
– Attacks against vulnerability or protocol weakness – (D)DoS attacks directed at infrastructure
Downstream customer attacks
– Attack that only impacts target IP
– Attack that impacts downstream customer
– Attack where collateral damage impacts multiple
customers
Downstream customer sourced attacks
Attack Vectors
Infrastructure attacks
– Attacks against vulnerability or protocol weakness
– (D)DoS attacks directed at infrastructure
Downstream customer attacks
– Attack that only impacts target IP
– Attack that impacts downstream customer
– Attack where collateral damage impacts multiple
customers
Downstream customer sourced attacks
Post Mortem
The step everyone forgets or doesn’t make time
to conduct
What can you do to make it faster, easier, less painful in the future?
Complete the loop
Post Mortem—Analyzing What Just Happened.
What Can Be Done to Build Resistance to the
Attack Happening Again
Agenda
Six Phases of Incident Response
Reacting with the Data Plane
Reacting with the Control Plane
RFC3704/BCP84 Ingress Packet Filtering
Packets should be sourced from valid, allocated address space, consistent with the topology and space allocation
– Our goal here is to bind the problem and reduce the requirements for
implementing security
No BCP84 means that:
– Devices can (wittingly or unwittingly) send traffic with spoofed and/or
randomly changing source addresses out to the network
– Complicates trace back immensely – Sending bogus traffic is not free!
BCP 84 Packet Filtering Principles
Filter as close to the edge as possible
Filter as precisely as possible
Filter both source and destination where possible
Can be implemented in various ways
– Infrastructure ACLs (iACLs)
– Unicast reverse path forwarding (uRPF)
– Cable source verify DHCP
Where to React?
Peer B Peer A IXP-W IXP-E Upstream A Upstream A Target NOC Sinkhole Network Upstream B Upstream B POPPeer B Peer A IXP-W IXP-E Upstream A Sinkhole Network Upstream B Upstream B
Where to React?
The Proper Reaction is the Easiest,
Fastest, and Simplest One That Can
Minimize the Collateral Damage
Reacting with the Data Plane: Access Control List (ACL)
Reacting to an Attack with ACLs
Traditional method for stopping attacks
Scaling issues encountered:
–
Operational difficulties
–
Changes on the fly
–
Multiple ACLs per interface
–
Performance concerns
ACLs: Deployment Considerations
How does the ACL load into the router? Does it
interrupt packet flow?
How many ACEs can be supported in hardware?
In software?
How does ACL depth impact performance?
How do multiple concurrent features affect
10,000,000 1,000,000 100,000 10,000 1,000 PPS Performance Envelope
Working the PPS Engineering
1 Gbps
100 Mbps
CPU Limit
ACL Update and Reaction Speed
Internet speeds above the OC-12/48 range require specialized forwarding/feature ASICs to provide services at line rate
ACLs loaded into these ASICs require special processing:
1. Load ACL into router from mgmt app or ftp server (transfer time for big ACLs)
2. Commit ACL to “active”
3. Pre-process (compile) ACL
4. Push to Line Card(s) (if distributed architecture)
5. Process for loading into Line Card ASIC
6. Load into Line Card ASIC and activate
Modular design, simplicity, and ‘amplification effect’ principles apply to ACL design
Example: 100Kb file for 5,000-Line ACL
• access speed- and memory card-
dependent, can be slow… e.g., “minutes”
• small
• Can be lengthy: 10’s of seconds to min’s • msecs
• small
Filtering Fragments
Fragments can be explicitly denied
Fragment handling is enabled via fragments keyword
Default permit behavior permit fragments that match ACE L3 entries
Denies fragments and classifies fragment by protocol:
– access-list 110 deny tcp any any fragments – access-list 110 deny udp any any fragments – access-list 110 deny icmp any any fragments
Spoofed Source Addresses Customer Traffic Packet Shield # 1 Packet Shield # 2 Packet Shield # 3 Packet Shield # 4
Targeting the Infrastructure
Application Filters—Policy Enforcement
Targeting the Customer
Packet Filtering Viewed Horizontally
The best ACL may actually be multiple ACLs at
ACL Construction
Most common problems:
– Poorly-constructed ACLs
– Ordering matters!
– Understand platform specifics (i.e., 6500/7600
LOUs, masks)!
Scaling and maintainability issues with ACLs are
commonplace
Make your ACLs as modular and simple as
ACL Categories: Hybrid Philosophy
Anti-spoofing
Anti-bogon (source)
Infrastructure
Explicit deny specific L3
Explicit deny specific L4
Incident reaction
Explicit permit L3 (good traffic)
Explicit permit L4 (good traffic)
Explicit deny everything else
Layered/Modular ACLs
Anti-spoofing
Anti-bogon (source)
Infrastructure
Explicit deny specific L3
Explicit deny specific L4
Incident reaction
Explicit permit L3 (good traffic)
Hybrid Permit/Deny
Rarely Changes Sometimes Changes Sometimes Changes Changes Every Day
Rarely Changes Rarely Changes
Changes Every Day Rarely Changes Sometimes Changes Sometimes Changes Sometimes Changes Rarely Changes Rarely Changes Rarely Changes Sometimes Changes Anti-spoofing Anti-bogon (source) Infrastructure
Explicit deny specific L3
Explicit deny specific L4
Incident reaction
Explicit permit L3 (good traffic)
Explicit permit L4 (good traffic)
Explicit deny everything else
Operational Cost Impact
Hybrid Permit/Deny
$$$$$$ $ $$ $$ $ More Static More Static Very DynamicACL Summary
ACLs are widely deployed as a primary containment
tool
Prerequisites: identification and classification—need to
know what to filter
Apply as specific an ACL as possible
ACLs are good for static attacks, not as effective for
rapidly changing attack profiles
Understand ACL performance limitations before
an attack occurs
The Pros and Cons of ACLs
ACLs - key strengths:
– Detailed packet filtering
(ports, protocols, ranges, fragments, etc.)
– Relatively static filtering environment – Clear filtering policy
ACLs can have issues when faced with:
– Dynamic attack profiles
(different sources, different entry points, etc.)
– Frequent changes
– Quick, simultaneous deployment on a multitude of devices – Operationally hard to remove
Reacting with the Data Plane: Committed Access Rate (CAR)
The Internet Customers
Layer-3 CAR Filter
Reacting to an Attack with CAR
Layer-3 input and output rate limits—specifically input rate limits
Security filters use the input rate limit to drop packets before they are forwarded through the network
Aggregate and granular limits
– Port, MAC address, IP address, application, precedence, QOS_ID Excess burst policies
A CAR Example: SMURF
access-list 199 permit icmp any <target> echo-reply access-list 199 deny ip any any
interface POS2/0
rate-limit output access-group 199 256000 64000
access-list 199 permit tcp any <target> eq www syn access-list 199 deny ip any any
interface POS2/0
rate-limit output access-group 199 256000 64000 64000 conform-action transmit exceed-action drop
Syn-floods are generally high volume
If attack is 99% of traffic, legitimate traffic has a small chance of making it through the rate-limit
Are we really achieving anything?
Agenda
Six Phases of Incident Response
Reacting with the Data Plane
Reacting with the Control Plane
Routers Drop Data, Often
AS 100 AS 65530 10.1/16 AS 65531 10.1.0.0/19 10.1.32.0/19 10.1.64.0/19 Scans, Backscatter, Other GarbageB C A D F E G H
Reacting with the Control Plane: Destination-Based
Black Hole Filtering
Black hole filtering or black hole routing forwards a
packet to a router’s bit-bucket
– Also known as “route to Null0”
Works only on destination addresses, since it is really
part of the forwarding logic
Forwarding ASICs are designed to work with routes to
Null0—dropping the packet with minimal to no performance impact
Used for years as a means to ‘blackhole’
Remotely Triggered Black Hole Filtering
We will use BGP to trigger a network-wide
response to an attack
A simple static route and BGP will enable a
network-wide destination address black hole as fast as iBGP can update the network (msecs)
This provides a tool that can be used to respond to
security-related events and forms a foundation for other remotely triggered uses
Remotely Triggered Black Hole (RTBH)
Configure all edge routers with static route to Null0
(must use some “reserved” network)
Configure trigger router
– Part of iBGP mesh
– Dedicated router recommended
Activate black hole
– Redistribute host route for victim into BGP with next-hop
set to 192.0.2.1
– Route is propagated using BGP to all BGP speakers and
Step 1: Prepare All the Routers with Trigger
Select a small block that will not be used for
anything other than black hole filtering; Test-Net (192.0.2.0/24) is optimal since it should not
be in use
Put a static route with Test-Net—192.0.2.0/24 to Null0 on every edge router on the network
Peer B Peer A
Step 1: Prepare All the Routers with Trigger
IXP-W IXP-E Upstream A Upstream A Sinkhole Network 171.16.61.0/24 Upstream B Upstream B
Edge Router with Test-Net to Null0 Edge Router with
Step 2: Prepare the Trigger Router
Should be part of the iBGP mesh—but does not
have to accept routes
Can be a separate router (recommended)
Can be a production router
Can be a workstation with Zebra/Quagga
(interface with Perl scripts and other tools)
Can be Arbor Peakflow SP – GUI interface,
integrated with detection, cleanup timer, etc.
The Trigger Router Is the Device that Will
Inject the iBGP Announcement into the
ISP’s Network
Trigger Router’s Config
router bgp 65535
redistribute static route-map static-to-bgp !
route-map static-to-bgp permit 10 match tag 66
set ip next-hop 192.0.2.1 set local-preference 200 set community no-export set origin igp
!
route-map static-to-bgp permit 20
Match Static Route Tag Redistribute Static with a Route-Map Set Next-Hop to the Trigger Set Local-Pref
Step 3: Activate the Black Hole
Add a static route to the destination to be black
holed; the static is added with the “tag 66” to keep it separate from other statics on the router
ip route 172.16.61.1 255.255.255.255 null0 tag 66
BGP advertisement goes out to all
BGP-speaking routers
Routers receive BGP update and “glue” it to the
existing static route; due to recursion, the next-hop is now Null0
Step 3: Activate the Black Hole
BGP
Best Path Selection
BGP 65560 RIB AS 65000’s Routes AS 65535’s Routes AS 65536’s Routes FIB 172.16.61.1 next-hop = 192.0.2.1 with no-export 172.16.61.1next-hop = 192.0.2.1 FIB Glues 172.16.61.1’s Next-Hop to Null0, Triggering the Black
Step 3: Activate the Black Hole
BGP Sent—172.16.61.1Next-Hop = 192.0.2.1
Static Route in Edge Router—192.0.2.1 = Null0
172.16.61.1 = 192.0.2.1 = Null0
Next-Hop of 172.16.61.1
Step 3: Activate the Black Hole
A B C D E Peer B Peer A IXP-W IXP-E Upstream A Upstream B Upstream B Upstream A iBGPA B C D E Peer B Peer A IXP-W IXP-E Upstream A Upstream B Upstream B Upstream A
Customer Is DOSed (After) Packet Drops
Pushed to the Edge
F POP Target NOC G iBGP Advertises List of Blackholed Prefixes
Trigger Router Config
Can use multiple tags
One tag to redirect attack to sinkhole
Another tag to redirect attack to anycast sinkhole
Multiple tags to black hole for different reasons
– Tag #1 is for ongoing (d)DoS attack
– Tag #2 is for black holing botnet command and control – Tag #3 is for phishing site
– Tag #4 is for SPAM
An Alternative: Community-Based Trigger
BGP community-based triggering allows for more
fine-tuned control over where you drop the packets
Three parts to the trigger:
– Static routes to Null0 on all the routers
– Trigger router sets the community
– Reaction router (on the edge) matches community
Allows for More Control on the Attack Reaction
Why Community-Based Triggering?
Trigger community #1 can be for all routers in the network
Trigger community #2 can be for all peering routers; no customer routers—allows for customers to talk to the DOSed customer within your AS
Trigger community #3 can be for all customers; used to push a inter-AS traceback to the edge of your network
Trigger communities per ISP peer can be used to only black hole on one ISP peer’s connection; allows for the DOSed customer to have partial service
Trigger Router Config
(Community-Based Approach)
router bgp 65535
redistribute static route-map static-to-bgp !
route-map static-to-bgp permit 10 match tag 123
set community 65535:123 set local-preference 200 set community no-export set origin igp
!
route-map static-to-bgp permit 20 match tag 124
set community 65535:124 set local-preference 200 set community no-export set origin igp
Match Static Route Tag Redistribute Static with a Route-Map Set Community Set Local-Pref
Drop Router Config
(Community-Based Approach)
router bgp 65535
neighbor <ibgp peer> route-map ibgp-peers in !My Region
ip community-list 1 permit 65535:123 !Other region
ip community-list 2 permit 65535:124 !
route-map ibgp-peers permit 10 match community 1
set ip next-hop 192.0.2.1 set local-preference 200 set community no-export set origin igp
!
route-map static-to-bgp deny 20 match community 2
This Router Drops
Set Next-Hop to trigger
Two RTBH Approaches
Tag-based approach:
– Concentrates configuration complexity on one
“trigger” router
– Edge devices require simple static route to Null0
– Monitoring (OpEx)—Prefixes which are being dropped
(and why) best viewed on “trigger” router (e.g., “show run | include tag”)
Community-based approach:
– Configuration complexity spread equally to all devices – Allows greater flexibility for drop control (e.g., regional)
– Monitoring (OpEx)—Prefixes which are being dropped on a particular
device (and why) can be determined by reviewing the output of “sh ip bgp community” on that device
Reacting with the Control Plane:
Service Provider Support for Customer Initiated Destination-Based Black Hole Filtering
Customer-Initiated RTBH
Many service providers offer their customers a
customer triggered version of RTBH
– “We’ll accept /32s with community <AS>:666 and
we’ll black hole them in our network for you”
It’s critical to understand which of your upstream/ peers support this
– How many prefixes will they accept?
– What community triggers it?
router bgp 65535
neighbor <customer> route-map customer-RTBH in
neighbor <customer> prefix-list customerA-filter in !
ip community-list 1 permit 65535:666 !
route-map customer-RTBH permit 10 match community 1
set ip next-hop 192.0.2.1 set local-preference 200 set community no-export set origin igp
!
route-map customer-RTBH deny 20 !Deny BOGONs
Issues:
Customer-Initiated RTBH
Must ensure prefix-list-based filtering
– We wouldn’t want your customers to be able to
black hole some important website, now would we? Restrict the black hole to the PE router only with
advertise, restrict to network only with no-export, or pass along to peers and upstreams?
Use the same eBGP session to customers or build
a dedicated eBGP session
– If using the same session, be careful with
Reacting with the Control Plane: Source- and Destination-Based
S/RTBH: Triggered Source Drops
Dropping on destination is very important
– Dropping on source is often what we really need
Reacting using source address provides some
interesting options:
– Stop the attack without taking the destination offline – Filter command and control servers
– Filter (contain) infected end stations
Must be rapid and scalable
i/f 1 i/f 2 i/f 3 i/f 1 i/f 2 i/f 3 FIB: . . . S -> i/f x S D Data FIB: . . . S -> i/f 2 S D Data
router(config-if)# ip verify unicast source reachable-via rx
Strict uRPF Check
Any i/f: Forward i/f 1 i/f 2 i/f 3 i/f 1 i/f 2 i/f 3 FIB: . . . S -> i/f x . . . S D Data FIB: . . . . . . . . . S D Data Not in FIB or Route Null0:
?
Loose uRPF Check
(Unicast Reverse Path Forwarding)
Source-Based Remotely-Triggered
Black Hole Filtering (S/RTBH)
Uses the same architecture as destination-based
filtering and Unicast RPF
Edge routers must have static in place
They also require Unicast RPF
BGP trigger sets next-hop—in this case the
Source-Based Remotely Triggered
Black Hole Filtering
What do we have?
– Black Hole Filtering—if the destination address
equals Null0, we drop the packet
– Remotely Triggered—trigger a prefix to equal Null0
on routers across the network at iBGP speeds
– uRPF Loose Check—if the source address equals
Null0,
we drop the packet
Put them together and we have a tool to trigger a
drop for any packet coming into the network whose source or destination equals Null0
A Peer B Peer A Upstream A Upstream B Edge Routers Drop Incoming Packets Based on Their Source Address Edge Routers Drop Incoming Packets Based on Their Source Address iBGP
Customer Is DOSed (After) Packet Drops
Pushed to the Edge
IXP-W
Upstream A
IXP-E
Upstream B
Source-Dropping Caution
Caution: you will drop all packets with that source
and/or destination
Remember spoofing!
– Don’t let the attacker spoof the true target and trick
you into black holing it for them
– Whitelist important sites which should never be
blocked (i.e., root & TLD nameservers, etc.) via prefix-lists
Source-Based RTBH - S/RTBH
Advantages:
– No ACL update
– No change to the router’s configuration – Drops happen in the forwarding path
– Frequent changes when attacks are dynamic
(for multiple attacks on multiple customers)
Limitations:
– Source detection and enumeration
– Attack termination detection (reporting) – Resource utilization: finite resources
– Effects all traffic, on all triggered interfaces,
Agenda
Six Phases of Incident Response
Reacting with the Data Plane
Reacting with the Control Plane
Given Everything Said, What Remains?
We have discussed techniques that are very
effective at limiting the collateral damage
Raise the bar; stop only bad traffic
In asymmetric environments, especially across
peers, packet spoofing is still problematic
Detection of exactly who is attacking is problematic
Doing all this in the core requires specialized hardware, which has scaling and availability problems
Network IDS/’IPS’ Terminology
False positives: system mistakenly reports
certain benign activity as malicious; also called false alarms
False negatives: system does not detect and report actual malicious activity
False positives, performance, need for traffic symmetry, and increased risk of DDoS due to capacity/state are the banes of IDS technology!
Additionally, you require a signature in order to stop the attack – what if this is a new attack?
What It Is Pros and Cons
Modern Stateful Firewalls: The Inadequate
Security Default
Sometimes called a hybrid Combines features of other
firewall approaches such as: – Access control lists
– Application-specific proxies/inspections – Stateful inspection
Plus features of other devices: – Web (HTTP) cache
– Specialized servers – SSH, SOCKS, NTP – Most include VPN,
some include IDS/’IPS’
Pro: Maintains most of the speed advantage of a simple stateful firewall
Pro: Application layer gateway services provide application
security while resolving the NAT issue
Con: Does not provide complete session termination, as would a full proxy
Con: Actively tracks the state of incoming connections—a DoS issue
Con: Performance – a DoS issue
Con: ‘Inspectors’ are an attack surface
Formal Requirements for a Core
Security Device
Need to avoid state
– Constant state tracking leaves us vulnerable to
DDoS attacks
Doesn’t rely on signatures
– If I get an attack with no signature, I cannot block it – Possibly can use signature-like filters, however,
after the fact
Doesn’t have to be in-line when it isn’t needed
Scales easily
Firewalls and IDS/’IPS’ don’t help!
It’s time to put the firewall and IDS/’IPS’ myth to rest!
Firewalls are policy-enforcement devices – they can’t help with DDoS, and in most cases, the policies applied to the firewalls have been devised with no visibility into network traffic, so the firewall rules bear little relation to what should actually be permitted and denied.
IDS/’IPS’ are by definition always behind the attackers – in order to have a signature for something, you must have seen it before.
IDS/’IPS’ have proven to be totally ineffective at dealing with application-layer compromises, which is how most hosts are botted and used for DDoS, spam, corporate espionage, identity theft, theft of intellectual property, etc.
Firewalls & IDS/’IPS’ output reams of syslog which lacks context, and which nobody analyzes. It is almost impossible to relate this syslog output to network behaviors. End-customers subscribe to traditional managed security services based on
firewalls and IDS/’IPS’, and still get compromised!
Firewall & IDS/’IPS’ deployments cause performance & usability problems, and don’t scale, shouldn’t be deployed in front of servers!
Core Design Philosophies
Scale by using traffic shunting
Core packet cleaning requirements
– 1) Validate incoming traffic to make sure it comes from
the source IPs that are in the SRC IP field of the packet
– 2) Evaluate these validated sources against a baseline
and then recommend either further processing or dropping for sources that misbehave
Don’t need to stop every bad packet—instead, focus on
not stopping any good packets
– Pad thresholds to reduce likelihood of false positives
Packet Cleaning Issues
Shunting the Packets
Traffic Shunts
Intercept and shunt traffic to the mitigation device —the “scrubber”
Return good traffic back to the customer
Need to avoid forwarding loops—means some sort
Arbor DDoS Solution: Diversion/Offramping
NetFlow to Arbor Peakflow SP
Protected Zone 1: Web
Protected Zone 2:
Name Servers Protected Zone 3:
Arbor DDoS Solution: Diversion/Offramping
NetFlow to Arbor Peakflow SP Arbor TMS
Protected Zone 1: Web
Protected Zone 2:
Name Servers Protected Zone 3:
1. Detect
2. Activate: Auto/Manual
Target
NetFlow to Arbor Peakflow SP Arbor TMS
Arbor DDoS Solution: Diversion/Offramping
2. Activate: Auto/Manual
3. Divert Only Target’s Traffic
BGP Announcement
NetFlow to Arbor Peakflow SP Arbor TMS
Protected Zone 1: Web
Protected Zone 2:
Name Servers Protected Zone 3:
Arbor DDoS Solution: Diversion/Offramping
1. Detect
2. Activate: Auto/Manual
4. Identify and Filter the Malicious
BGP Announcement
Target
NetFlow to Arbor Peakflow SP Arbor TMS
3. Divert Only Target’s Traffic
Traffic Destined to the Target
Arbor DDoS Solution: Diversion/Offramping
2. Activate: Auto/Manual Legitimate Traffic to Target BGP AnnouncementNetFlow to Arbor Peakflow SP Arbor TMS
3. Divert Only Target’s Traffic
Traffic Destined to the Target
4. Identify and Filter the Malicious
Protected Zone 1: Web
Protected Zone 2:
Name Servers Protected Zone 3:
Arbor DDoS Solution: Diversion/Offramping
1. Detect 2. Activate: Auto/Manual Legitimate Traffic to Target 6. Non-Targeted Traffic Flows Freely BGP Announcement
5. Forward the Legitimate
Traffic Destined to the Target
Target
NetFlow to Arbor Peakflow SP Arbor TMS
3. Divert Only Target’s Traffic 4. Identify and Filter
Design Considerations
Network chokepoints
– Back haul attack traffic across potential costly or congested
links.
SLA’s being offered
– Availability
– Guaranteed mitigation capacity Provisioning and Operation
– Simpler is better
Existing Networking technology
– Often limited by what capabilities exist in the network today – Number of mitigation devices and capacity
Deployment strategies
Distributed mitigation
– Regional deployment strategy
– Per PoP or per Peering center location
Peakflow SP TMS PEERING L3 Switch CORE Peakflow SP TMS PEERING CORE Internet POP A POP B P1 P2 P1 P2 C1 C2 C1 C2 Core S2 S1 S1 S2
Distributed Mitigation
Benefits Keeps attack traffic at edge
Limited backhauling of attack traffic
Limits exposure of Internal infrastructure
Easier Capacity planning
– Not as worried about how
much attack traffic would have to be backhauled
Good shared mitigation
Drawbacks
Power and Space Requirements
Scalability - How much
mitigation capacity can you add at each location
Harder to dedicate mitigation capacity per-customer
Potentially more equipment to purchase upfront
Potential to backhaul
Customer to Customer attack traffic
Distributed Mitigation – Bottom line
This can be a workable strategy for Tier-Two, -Three, and MSOs
Strategically-consolidated Internet access points. Backbone capacity is already focused on these locations.
Customer-to-customer attacks are likely to be small in size. Not many large-bandwidth customers.
Backbone capacity and infrastructure likely to be vulnerable to large scale attacks. Better ROI to keep attack traffic off of backbone.
Fewer business customers likely to pay for dedicated mitigation capacity.
Deployment Strategies
Centralized mitigation
– AKA ‘Cleaning Center’
Data IP Core D1 D2 P1 S1 Peers S1 P2 C2 C2 Transit S1
Centralized Mitigation
Benefits Can start small
– 2 TMS devices for
redundancy
Can be located where
power and space allow easy growth
Possibly fits in more with other hosted service
offerings
Easier to troubleshoot
– You know where traffic
should go to and from
Drawbacks
Back haul attack traffic
Potential for backbone infrastructure to be
impacted by attack traffic
Must plan for capacity
– How much attack traffic
potentially could customers pull to Cleaning Center
Limited topological/
geographical diversity – regional Cleaning Centers are the answer
Centralized Mitigation – Bottom Line
This is a good strategy for most deployments
Very distributed Peering locations and limited or no purchased Transit.
Customer-to-customer attacks can be quite large.
– Think of a MSO-based zombie attacking large Bank.
Many large-bandwidth customers.
Backbone and Internet Data Center capacity readily
available.
Offramping/Diversion
Goal – Get attack traffic to correct TMS and Port
– BGP Next-hop Anycast
Get attack traffic to closest TMS
Load balance distributed attacks geographically
Built-in redundancy by leveraging routing protocols
– ECMP (Equal Cost Multi-Path)
Multiple equal routes pointing to same advertised TMS BGP next-hop IP to achieve multi-gigabit performance
‘CEF-based load-leveling’ is Cisco terminology
– SLA-based Next-hops or Communities
Provide dedicated mitigation ports and capacity to meet customer SLA
BGP Next-hop Anycast
PEERING CORE Peakflow SP TMS PEERING CORE Internet POP A POP B P1 P2 P1 P2 1.TMS advertises off ramp (victim) prefix w/ next hop of TMSL0 (virtual IP)
S1 S2 S1 S2
2. P1 and P2 do a recursive lookup on TMS L0 which
matches static route to directly connected interface/s 3. Since victim
prefix and next-hop is learned in
both PoP-A and PoP-B Attack traffic is sent to
ECMP (Equal Cost Multi-Path)
Peakflow SP TMS PEERING CORE POP B Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp 1.TMS advertises off ramp (victim) prefix w/ next hopof 1.1.1.1
2. Equal static routes to 1.1.1.1(BGP Next-hop) load balances off
ramp across two or more ports
Note: You can also use port bonding (logical ports) to treat
SLA-based – Dedicated Mitigation Capacity
Peakflow SP TMS PEERING CORE POP B 1.TMS advertises off-ramp prefix w/ next hop of 1.1.1.1 and customerspecific off-ramp community 2. Router’s BGP
policy takes customer specific community
and changes next-hop to dedicated pt 2
pt interface or recursive lookup match ECMP routes
Off-ramping/Diversion – Bottom Line
Use lessons learned from Sinkhole and Blackhole
usage. Keep it simple with next-hop changes or add granularity by utilizing BGP communities,
multiple next-hops and ECMP routes to next-hops.
To achieve multi-gigabit mitigation capability, traffic must transit multiple TMS ports.
Think about where routes are being heard and
how. Then run through failure scenarios. Is attack traffic still going to make it to a TMS or will it go to customer dirty?
On-Ramping/Re-injection
Goal – Avoid Routing loops
– GRE/mGRE Tunnels
Routing loop avoided by forwarding decision being performed on tunnel endpoint
– VRF VPN
Avoid routing loop by utilizing non-global route table
– L2 Forwarding
Routing loops avoided by selective route advertisement and distribution control. Requires hierarchical logical network design.
Tunnel – GRE (Generic Route Encapsulation)
Peakflow SP TMS L3 Switch CORE POP B Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp Clean TrafficOne to one mapping of off-ramp port to on-ramp
port
To avoid routing loop, TMS encapsulates packet in GRE
packet with tunnel Source and Destination IP
Customer CPE processes GRE packet on Tunnel
interface and forwards original packet to victim IP TMS advertises victim
prefix which is propagated throughout network.
On-Ramping/Re-injection via GRE/mGRE
Benefits Easy to avoid routing loops
Redundant tunnels can be configured
Proven On-Ramping method
Drawbacks
per-customer mitigation prefix configuration (not dynamic)
Provisioning Engineers / Systems must touch Peakflow system
Per-TMS tunnel
configuration (more work for distributed model)
VRF - VPN
Peakflow SP TMS Customer Aggregation Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp Clean TrafficOne to one mapping of off-ramp port to on-ramp
port TMS advertises
victim prefix which is propagated throughout network. VRF VRF MPLS/IP Core VRF VRF Routing loop is avoided because traffic is being forwarded inside VPN/ Label switched Static route to customer’s protected CIDR is preconfigured
On-Ramping/Re-injection via VRF-VPN
Benefits Easy to avoid routing loops
Leverages built in network redundancy
Simple static route required for each “protected”
customer prefix. Most provisioners know how to do this. Drawbacks Per-customer mitigation prefix configuration MPLS must be used throughout network
Multiple technologies in use could cause operational
Direct L2 Onramping/Re-injection
Peakflow SP TMS L3 Switch CORE POP B Customer CPE BGP announce Direct Off-Ramp Direct On-Ramp Clean TrafficOff-ramp and On-ramp routers must be different
L3 devices
To avoid routing loop, you restrict the more specific victim prefix to only specific
routers
Customer CPE processes GRE packet on Tunnel
interface and forwards original packet to victim IP
TMS advertises victim prefix which is
propagated throughout network.
On-Ramping/Re-Injection via Direct L2
Benefits Leverages built in network redundancy
No special router configurations
New “protected” customer prefixes can be dynamically on-ramped. No new
configuration required.
Drawbacks
Harder to avoid routing
loops. Special care of BGP announcements and route distribution is required.
Difficult to mitigate customer to customer attacks especially if both customers are on same router.
Static in nature, not easily re-configured/scaled
Shunts in the Data Center
All devices on the same subnet
–
Either TMS-driven or configured in router
–
May use remotely-triggered shunt trick
–
All traffic in core to target goes to the TMS
Optionally, you can use VLANs to avoid loops
–
Bypassing the “modified” router is trivial with
Hosting/SP Data Center
I S tys 50 Ppy SS t rcsr I C T S S Ca r Pwp R S S C Cisco IOS® Router Cisco Catalyst® Switch Firewalls GEnet Arbor TMS Alert Arbor TMS Backbone Backbone Switches Arbor Peakflow SPShunts in the Data Center
ISP A ISP B Dirty Traffic Clean Traffic BGP Peering for Diversion Arbor Peakflow SP Arbor TMSs (Cleaning IDC Edge IDC EdgeShunts in the IP Core: GRE Injection
Core routes target IP to the TMS
– Either TMS-driven or configured in router
– May use remotely triggered shunt trick
– All traffic in core to target goes to the TMS
Injection into GRE tunnel
– Bypassing the “modified” core routing
– GRE starts on TMS-attached router, terminates on
Shunts in the IP Core: GRE Injection
Target (1.1.1.1) TMS (2.2.2.2) Attack GRE (Preconfigured) 1. BGP: I’m next-hop for 1.1.1.12. Redistribution into Core
3. Rerouting to 1.1.1.1
4. Injection to Target
Shunts with MPLS VPNs
Easy to deploy:
–
Core remains untouched, injection VPN
preconfigured
–
VPN invisible to core
No performance impact
No need to touch CPE
Target (1.1.1.1) TMS (2.2.2.2) Attack MPLS VPN (Preconfigured) 1. BGP: I’m next-hop for 1.1.1.1
2. Redistribution into Core 3. Rerouting to 1.1.1.1 4. Injection to VPN
MPLS VPN Shunt
VPN VPNPacket
Cleaning Issues
Packet
Cleaning in the Core:
The Cleaning Center
SP Core Customer A Customer B Peering SP2 PE Core ASBR Core Core PE CE CE Dirty Traffic Arbor TMS (Customer B) Arbor TMS (Customer A) Arbor Peakflow SP Out-of-Band WAN Connection Peering SP1 Cleaning Center NOC NetFlow
Scaling a Cleaning Center: Clustering
Topology with ECMP/CEF Load-leveling
Load-Leveling Router – up to 16 TMSes, 160gb/sec w/N7K TMS Mitigation Cluster Attack TMS TMS TMS TMS TMS
Backbone Option: Cleaning Centers
Question is: how many?
–
Most national providers have decided to start
with two
–
Geographic redundancy
–
Adequate incoming bandwidth in key locations
–
Limit the backhaul of traffic across expensive
links
Packet Spoofing
What can be spoofed?
– Any field in a packet header (well, almost)
– Spoofing most often happens in combinations with
several fields being spoofed Spoofing is used to:
– Hide the source so the attacker or resource is not
revealed
TMS Mitigation Processing
DDoS attacks consist of undesirable traffic mixed in with some amount of desirable traffic
– Undesirable traffic may come in large quantities or it could
come shaped in a way designed to disrupt normal processing
The TMS allows desirable traffic through while lowering the impact of undesirable traffic
The TMS uses various countermeasures – defense
mechanisms – to target and remove the most egregious attack traffic to allow the network to continue operating
– Different countermeasures are designed to stop different types
of attack traffic
– The countermeasures as a whole provide defense in depth
DDoS Attacks
DDoS attacks can consist of just about anything
– Large quantities of raw traffic designed to overwhelm a
resource or infrastructure
– Application specific traffic designed to overwhelm a particular
service – sometimes stealthy in nature
– Traffic formatted in such a way to disrupt a host from normal
processing
– Traffic reflected and/or amplified through legitimate hosts – Traffic from compromised sources or from spoofed IP
addresses
– Pulsed attacks – start/stop attacks
TCP Stack Flood Attacks
Description
– Flood a certain aspect of the TCP connection
process to keep the host from being able to respond to legitimate connections
– May be spoofed or non spoofed
Peakflow SP Detection Capabilities
– Misuse TCP SYN, RST, Total Traffic detection
Peakflow TMS Mitigation Countermeasures
– TCP SYN authentication, zombie army, white list/
black list
Common names
Generic Flood Attacks
Description
– Flood of traffic for one or more protocols or ports – Designed to look like normal traffic
– Reflection attacks
– May be spoofed or non spoofed
Peakflow SP Detection Capabilities
– Misuse UDP, ICMP, Total Traffic detection
– Profiled anomaly detection for managed object Peakflow TMS Mitigation Countermeasures
– White list/blacklist, zombie army, baseline enforcement, rate
limiting, payload filtering
Common names
Fragmentation Attacks
Description
– A flood of TCP or UDP fragments are sent to a victim
overwhelming the victim’s ability to re-assemble the streams and severely reducing performance
– Fragments may also be malformed in some way – May be a result of a network mis-configuration Peakflow SP Detection Capabilities
– Misuse IP Fragment detection
Peakflow TMS Mitigation Capabilities
– White list/blacklist, zombie army Common names
Application Attacks
Description
– Attacks designed to overwhelm components of specific applications
– Commonly seen against HTTP, DNS and SIP in particular
– May be stealthy by mixing with a much higher traffic volume on the
same protocol/port
Peakflow SP Detection Capabilities
– Requires TMS systems deployed in span mode doing appID and
feeding SP systems with application level managed objects defined.
Peakflow TMS Mitigation Countermeasures
– HTTP malformed, HTTP rate limiting, HTTP payload filtering, SIP
malformed, SIP request rate limiting, DNS authentication, DNS malformed, Payload regex filtering, Regex filtering
Common names
Connection Attacks
Description
– Attacks that maintain a large number of either ½
open TCP connections or fully open idle connections with the victim impeding new connections from
forming
Peakflow SP Detection Capabilities
– Limited
Peakflow TMS Mitigation Capabilities
– TCP syn authentication, TCP Idle Reset
Common names
Vulnerability Exploit Attacks
Description
– Attacks designed to exploit a vulnerability in the victim’s
operating system
– Some are single packet or very low level attacks
– Many of these are obsolete in modern operating systems Peakflow SP Detection Capabilities
– Limited: ATF based fingerprint detection Peakflow TMS Mitigation Capabilities
– Limited: Malformed HTTP, SIP, DNS, White list/blacklist
– The most effective method of stopping these attacks is to patch
hosts on your network
Common names
The Core Functional Components of an
Anti-DDoS Packet Scrubber:
Putting All This Together to Stop DDoS
Destination detection
Source verification (via anti-spoofing)
Source detection (via anomalies)
Source/attack blocking/filtering
Packet Cleaning via Shunts
Advantages:
– Not on critical path during normal operation – Anomaly-based detection with baselining – Optimized for high-performance blocking
– Is resistant to state limitations of most other devices
Limitations:
– Not designed to stop single-packet exploits, but regexp
countermeasure can be used if the exploit packet signature is known
– Inherent assumption of a “destination” to be protected – i.e., a server – Resource utilization: finite resources in the scrubber complex
(160gb/sec max per cluster with Cisco CEF-based load-leveling of 16 paths w/N7K– multiple clusters via IPv4 anycast addressing)