Copyright © 2013 Splunk, Inc.
Using Big Data to
Align IT Security with
Business Risk
Mark Seward, Senior Director,
Security and Compliance
Legal Notices
During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be
incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release.
Splunk, the engine for machine data is a registered trademarks or trademarks of Splunk Inc. and/or its subsidiaries and/or affiliates in the United States and/or other jurisdictions. All other brand names, product names or trademarks belong to their respective holders.
A little about Splunk
•
Over paying 5200 customers
•
Over 300 free apps and templates
•
Over 2300 security use case
customers
•
Half of all Fortune 100 companies are
customers
•
Named #1 in Big Data and Named #4
Most innovative company in the
world by Fast Company Magazine
According to the Verizon Data Breach Investigations Report
(DBIR), 92% of breaches are made public by someone other
than the one who’s been breached.
An indictment of SIEM technologies - only 1% of breaches
are detected by log analysis.
The Way Cyber Adversaries Think
Where is the most important and valuable data?
What are the typical security defenses?
What structural information silos that
exist for the security team?
What’s the typical patch cycle for applications and operating systems?
How does the IT team prioritize
vulnerabilities?
Are ‘normal’ IT service user activities routinely monitored
and correlated? Who in the organization has
access to the most valuable data and credentials I can
Attack Vectors Have Gotten Personal
50% of Attacks based on compromised passwords.
Where’s evidence of the attack?
•
In log data most companies already
have
•
Many of these companies had a
SIEM
•
“So if I have log data and a SIEM,
why am I still breached”
The Way Some Security Folks Think
“I hope my AV, IPS,
Firewall, (name your
technology) vendor
catches these guys.”
“I have 300 rules
on my SIEM. One
of them will catch
the attacker.”
Attackers know that if you have a static correlation engine, you are
likely trusting it, and because often "No news is good news"
Why the Disconnect Between Attacker and Security
Professional?
• Vendors have convinced security folks -- the reactive approach is the only
approach
• Solutions don’t reinforce skills -- SIEMs don’t nourish the security person’s
‘inner hacker’
• Security persons have been taught a ‘data reduction’ strategy –
compromises data fidelity and limits investigations
• Current SIEMs can’t accept enough data for long term pattern analysis
Security has out grown the traditional SIEM
Security Relevant Data
SIEM
Security Relevant Data
(IT infrastructure logs / Physical Security
/ Communication systems logs /
Application data / non-traditional data
sources)
“Normal” user and machine
generated data – credentialed
activities – ‘unknown threats’ are
behavior based – require analytics
‘Known’ events as seen by current
security architecture – vendor
supplied – hampered by lack of
Current Architecture Issues
Traditional SIEM
Data Reduction Model
Correlation Reporting Must Fit a Schema Selective Supported Raw Data
Data Discard Leak Mostly traditional
The amount of data generated only gets bigger
Volume | Velocity | Variety | Variability
GPS,
RFID,
Hypervisor,
Web Servers,
Email, Messaging
Clickstreams, Mobile,
Telephony, IVR, Databases,
Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
Fastest growing, most
complex, most valuable area
of big data
Stunt and Zeus
mark the beginning
of industrial,
vertical attacks
coming from
machine data
What Does Machine Data Look Like?
Sources
Care IVR Middleware Error Patient Portal CommunityWhat Does Machine Data Look Like?
Sources
Community Care IVR Middleware Error Patient Portal Patient ID Patient ID Patient ID Community ID Time Waiting On HoldThe ‘Love Child of Google and Excel’
+
Time Index Ingestion
Text Base Search
Nested Search
Cross Data-type Search
Apend
Abstract
Cluster
Bucket
Multikv
Scrub
Join
Rare
Cluster
Associate
Stats
AVG
Transaction
Addtotals
Delta
Eval
Stddev
Rare
Outlier
Streamstats
Timechart
Over 150 data manipulation and visualization commands
Moving to a data inclusion model
Data Inclusion Model
Specific behavior based pattern
modeling for humans and machines
Based on combinations of:
• Location
• Role
• Data/Asset type
• Data/Asset criticality
• Time of day
• Action type
• Action length of time
No up front normalization Time-indexed Data Analytics and Statistics
Commands Correlation Pattern Analysis
New way of thinking:
The big data security process
“The true sign of intelligence is not knowledge but
imagination.”
A. Einstein
The new weapons of a security warrior
+
+
Creativity
Using Statistical Analysis
Action
Phase
Source
Search Type
Splunk
Search
Why
SQL Injection Infiltration WebLogs Outlier and
exception
len(_raw) +2.5stddev
Hacker puts SQL commands in the URL; URL length is standard deviations
higher than normal
Password Brutes Infiltration Auth Logs Outlier and
exception short delta _time
Automated password guessing tools enter credentials much faster than
humanly possible
DNS Exfil Exfiltration DNS logs/FW
Logs
Outlier and
exception count +2.5stddev
Hackers exfiltrate the data in DNS packet; standard deviations more DNS
requests from a single IP
Web Crawling Reconnaissance Web/FTP
Logs
Outlier and exception
count(src_ip) +2.5stddev
Web crawlers (copying the web site for comments, passwords, email addresses,
etc) will be the source IP behind page requests standard deviations higher
than normal
Port Knocking Exfil/CnC Firewall Outlier and
exception
Count outbound (deny) by ip
Threat does inside-out port scan to identify exfiltration paths
•
What does the business care about?
•
What could cause loss of service or
financial harm?
•
Performance Degradation
•
Unplanned outages (security related)
•
Intellectual property access
•
Data theft
A Process for Using Big Data for Security:
Identify the Business Issue
A Process for Using Big Data for Security:
Construct a Hypothesis
•
How could someone gain access to
data that should be kept private?
•
What could cause a mass system
outage does the business care about?
•
What could cause performance
degradation resulting in an increase in
customers dissatisfaction?
A Process for Using Big Data for Security:
It’s about the Data
• Where might our problem be in
evidence?
• For data theft start with
unauthorized access issues…
• Facility access data, VPN, AD,
Wireless, Applications, others…
• Beg, Borrow, SME from system
A Process for Using Big Data for Security:
Data Analysis
►
For data theft start with what’s normal
and what’s not (create a statistical
model)
►
How do we ‘normally’ behave?
►
What patterns would we see to identify
outliers?
►
Patterns based on ToD, Length of time,
who, organizational role, IP geo-lookups,
the order in which things happen, how
often a thing normally happens, etc.
A Process for Using Big Data for Security:
Interpret and Identify
• What are the mitigating factors?
• Does the end of the quarter cause
increased access to financial data?
• Does our statistical model need to
change due to network architecture
changes, employee growth, etc?
• Can we gather vacation information to
know when it is appropriate for HPA
users to access data from foreign soil.
• What are the changes in attack patterns?
Short form - Example
The Steps The Response
Business Issue Service degradation causes monetary damage and customer satisfaction issues.
Construct one of more hypothesis (team creativity required)
Unwanted bots can degrade service and steal content. Gather data sources and expertise What combinations of data would be considered definitive
evidence? What might be the first signs of trouble? List all data in which this might be reflected.
Determine the analysis to be performed Determine the types of data searches appropriate and automation requirements
Interpret the results Do the results represent false positives of false positives or false negatives? Are there good bots and bad bots?
Big Data Platform: Insight for Business Risk
LDAP, AD Watch Lists
App Monitoring Data
Security Data
IT Operations Data
Business
Process
Data
Distribution
System
Data
Business Analytics Web Intelligence IT Operations Management
Security & Compliance
Business Risk and Security
Looking Beyond IT for Business Risk
HVAC data
Personnel Data
Industrial Control System
Data
Facility Security Data
Manufacturing
Shipping Data (when/who
loaded the truck)
Parts/Ingredients (RFID)
Data
Raw Materials Data
Distribution Monitoring
What manufacturing questions could you ask of Splunk?
Who is accessing company data from outside the company but issitting at their desk? Is the product quality
compromised due to an increase in ambient temperature in the plant?
What pattern of user activity did we see before
they attacked the website?
Who is stealing company property?
Are the large file exchanges between these
two employees normal?
What’s the real-time ongoing drop off rate in sales after a specific sales
promotion ends?
Are there employees that surf to the same website at exactly the same time every
day?
Are terms like virus, bot, and Anonymous trending upward
Why Big Data and Analytics are the Future of Security
• Confidentiality Integrity and Availability: A holistic view of business security and risk
mitigation growing beyond traditional IT data sources
• Security is being redefined Monitoring & mitigating threats that compromise business
reputation, service delivery, confidential data or result in loss of intellectual property
• Security folks need more data – not less – for accurate root cause analysis
• Complexity of threats will continue to grow and cross from IT to less traditional data /
devices / sources
• A single investigation will include data from all parts of the business – beyond IT data
• Using statistical analysis for Base-lining and understanding outliers is the way to detect
Thank You
Questions?