© 2014 Interset, a FileTrek Company
What’s Behind “Big Data”
and “Behavorial Analytics”
STEPHAN JOU, CTO
ISSA TORONTO
© 2014 Interset, a FileTrek Company
Hey. I’m Stephan Jou
•
CTO at Interset
•
Previously: IBM’s Business AnalyBcs CTO Office
•
Big data analyBcs, visualizaBon, cloud, predicBve
analyBcs, data mining, neural networks, mobile,
dashboarding and semanBc search
•
M.Sc. in ComputaBonal Neuroscience and
Biomedical Engineering, and a dual B.Sc. in
Computer Science and Human Physiology, all
from the University of Toronto
•
Email:
[email protected]
•
TwiTer: @eeksock
© 2014 Interset, a FileTrek Company
Threat
Detection
(Insider and Compromised Machine Attack)
Through the Science of
Behavioral Analytics
3
Catching Bad Guys With Math
© 2014 Interset, a FileTrek Company
Lessons:
•
There were limited systems in place and
we sBll do not know all that he took
•
His acBons were highly anomalous
-
Volumes of data
-
Access to improper accounts
-
Usage of USB storage devices
There was plenty of evidence and
Bme if only it was visible!
Who Is This?
© 2014 Interset, a FileTrek Company
Who Are These Two?
5
Lessons:
•
Disgrunted insiders employees can be
at risk
•
What were the anomalies?
–
Copied 16,000 documents within five
days of receiving severance
There was plenty of evidence and Bme if
only it was visible!
© 2014 Interset, a FileTrek Company
And This Guy?
6
Lessons:
•
Most aTacks are from users/idenBBes
with proper access
•
ATacker stayed under the radar for
years
•
Third parBes (US Intelligence) most
ocen uncovers the aTack
•
What were the anomalies?
–
Accessing data not related to his job
–
Moving data in ways that same role users
were not – over Bme
–
Money problems
There was plenty of evidence and Bme
if only it was visible!
© 2014 Interset, a FileTrek Company
And these guys?
Lessons:
•
Make sure your partners are secure
–
Hacked (SQL InjecBon) a partner with a weak
network
–
Stole user names and passwords
•
IdenBBes & machines are “enBBes”
–
They acted in highly anomalous ways
–
Moved large amounts of data
–
Moved data to exfiltraBon points
–
At four companies and the US Army!
There was plenty of evidence
and Bme if only it was visible!
“if we do this right, we will make a million
dollars each
…
” “we could have already sold
them for Bitcoins which would have been
untraceable if we did it right. It could have
© 2014 Interset, a FileTrek Company
How Do You Catch the Authorized User?
75%
of material loss via insiders with approved access
70%
of IP thec cases, insiders steal informaBon within 30 days of
announcing their resignaBons
62%
of employees believe it acceptable to transfer work documents to
personal devices or cloud-‐based file sharing services, even if a
company police prohibits it
60%
of employees believe informaBon they had been involved in
developing is theirs regardless of the IP protecBon policy of the
company
51%
of employees say their company does not strictly enforce
policies, so feel it more than OK to take corporate data.
20%
of loss involved collaboraBon with one or more employees
Source: Symantec & 2011 Cyber Watch Survey, Carnegie Mellon University CERT Program
© 2014 Interset, a FileTrek Company
© 2014 Interset, a FileTrek Company
© 2014 Interset, a FileTrek Company
Kung Fu Move #1: Big Data
Source: OliverMunday.com
© 2014 Interset, a FileTrek Company
Transactional
Machine
Social
Volume
Velocity
Variety
Reputation
Veracity
The Four V’s of Big Data (Sorry)
© 2014 Interset, a FileTrek Company
Source: Competing on Analytics, Davenport and Harris, 2007 Standard Reporting Ad hoc Reporting Query/Drill Down Alerts Forecasting Simulation Predictive Modeling
In memory data, fuzzy search, geo spatial Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression
Rules/triggers, context sensitive, complex events
Query by example, user defined reports Real time, visualizations, user interaction
T ra di tio na l
N
ew
D
ata
Optimization
Optimization under Uncertainty
Decision complexity, solution speed
Quantifying or mitigating risk Adaptive Analysis
Continual Analysis Responding to local change/feedback Responding to context
Entity Resolution
Annotation and Tokenization Relationship, Feature Extraction
People, roles, locations, things
Rules, semantic inferencing, matching Automated, crowd sourced
Kung Fu Move #2: Math
N
ew
Me
th
o
d
s
13© 2014 Interset, a FileTrek Company
Venn Diagram of Data Science
Source: Drew Conway,
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Hacking – meaning
computer science skills
The problem – if you
chose the wrong math
you will have false
posiBves and an
ineffecBve systems
© 2014 Interset, a FileTrek Company
Standard Thresholds Approach
A Pattern for Increased Monitoring for Intellectual Property Theft by Departing Insiders, Andrew Moore et al., Carnegie Mellon, 2011
© 2014 Interset, a FileTrek Company
© 2014 Interset, a FileTrek Company
© 2014 Interset, a FileTrek Company
© 2014 Interset, a FileTrek Company
Behavioral Analytics – A simple example
Edward Snowden was an
contractor, sysadmin
with privileged access
User
The volume of copying is large,
compared to Snowden’s past 30 days,
and compared to other analysts
Ac8vity
These files have a high
risk and importance
value
Asset
USB drives are marked as
high risk channels
Method
Edward Snowden is copying an unusually large number of
sensiBve files to an external USB drive.
© 2014 Interset, a FileTrek Company
Use Appropriate Math to Assemble the Data
•
Risk scores are percentages between 0% (no risk) and 100% (extreme risk)
•
P(event |
y
) is probability that the behavior occurred, either observed or predicted
•
Aggregate risk values combine risks associated with the activity, people, assets and end points
•
Model based on Expected Utility Theory and standard risk model (Risk = Probability * Impact)
•
Mathematical weighting is used to tune and train model for specific activities, people, assets and
end points on a per-behavior pattern basis
AcBvity
User
File
Method
20
R
behavior=
P
(event |
y
)
×
w
y×
w
u2
−i⋅
R
u[i] u∑
∈U+
w
f2
−j⋅
R
f[j] f∈F∑
+
w
m2
−k⋅
R
m[k] m∑
∈M&
'
(
(
)
*
+
+
w
u+
w
f+
w
m© 2014 Interset, a FileTrek Company
Important Questions
Where is my important, at risk stuff?
Who or what is behaving abnormally?
Who is stealing my stuff?
Who is going to leave the company?
© 2014 Interset, a FileTrek Company
Some Simple Anomaly Models
Where is my important, at risk stuff?
§ Riskiest Files
Who or what is behaving abnormally?
§ Person Name is accessing informaBon during unusual working
hours.
§ Person Name accessed a storage volume, path, an unusually
large number of Bmes
§ Person Name accessed an important file type an unusually large
number of Bmes
Who is going to steal my stuff?
§ Riskiest Users
§ Person Name accessed an abnormally large amount of data.
§ Person Name performed an abnormally large number of file exits.
Who is going to leave the company?
© 2014 Interset, a FileTrek Company
More Sophisticated Anomaly Models
Where is my important, at risk stuff?
§ Highest at-‐risk machines, file shares, and source code
repositories
§ The file, Filename, is highly valuable compared to similar files.
§ The following source code projects are most at-‐risk.
§ Similar users visualizaBon
§ Similar files visualizaBon
§ Similar machines visualizaBon
Who or what is behaving abnormally?
§ Person Name is using an unexpected file, filename.
§ Person Name is touching an unexpected set of files.
§ Person Name is consistently accessing higher amounts of data than similar
users.
§ Person Name is consistently accessing an important file type more than similar users.
§ Person Name is accessing informaBon during different working Bmes compared to similar users.
§ An applicaBon accessed an unexpected file type.
Who is going to steal my stuff?
§ Person Name has accessed an unusual amount of total file value.
§ Person Name is consistently performing more file exits than similar
users.
§ Person Name's amount of file exits varies more than similar users.
§ Person Name has replicated a large amount of source code
Who is going to leave the company?
§ Person Name is hoarding an unusual amount of source code.
§ Person Name has been accessing unexpected source code
repositories
§ Person Name is engaging in job search acBviBes.
§ The proporBon of Bme spent by Person Name on non-‐work acBviBes
has changed.
§ Person Name has emailed themselves.
© 2014 Interset, a FileTrek Company
Computing Probability of an Anomalous Event
§
Each term in the aggregate behavior risk equaBon has
analyBcs behind it
§
Highly anomalous acBviBes, compared to baseline, should
result in a high value
§
How to compute the probability of an anomalous event?
24
R
behavior=
P
(event |
y
)
×
w
y×
w
u2
−i⋅
R
u[i] u∈U∑
+
w
f2
−j⋅
R
f[j] f∈F∑
+
w
m2
−k⋅
R
m[k] m∈M∑
&
'
(
(
)
*
+
+
w
u+
w
f+
w
m© 2014 Interset, a FileTrek Company
Model: Unusual volumes
•
Computes probability that a
value in a given hour is
anomalous
-
Bayesian approach
•
Explicitly models both normal
and abnormal distribuBons
-
Gaussian, Gamma
•
EsBmators for both normal
and abnormal based on
observaBon
© 2014 Interset, a FileTrek Company
Example: Modeling unusual times
•
Monitor, for each user, start
Bmes of when a file or window
is brought into focus
•
AcBve Bmes used as input into
Gaussian kernel density
esBmators
•
Times that contain 95% of
acBvity deemed to be
“normal”
•
P(y is bad) at a given Bme is
raBo of expected acBvity to
95% acBvity line
© 2014 Interset, a FileTrek Company
Model: Unusual Working Days
User 1
•
Regularly works six days a
week (takes Sundays off)
•
Slight dip during lunches
User 2
•
Works five days a week
•
ParBcularly acBve on
Thursdays
© 2014 Interset, a FileTrek Company
Model: Unusual Working Hours
User 1
•
Starts work fairly early in
morning
•
Early lunch break
•
SomeBmes works past midnight
User 2
•
Doesn’t work as long hours as User 1
•
9 to 5’er
•
Has occasionally worked a liTle bit
acer 8pm
© 2014 Interset, a FileTrek Company
Model: Clustering Unusual Entities
•
Clusters are created based on
observed behaviors of a target set
of enBBes
-
Users, Machines, Assets
•
Clusters are created for “like
behaviors” & outliers are
anomalous
-
User acBons
-
Access to data
-
ApplicaBons open/run
© 2014 Interset, a FileTrek Company
§
John Sneakypants is accessing an unusual, important network share
25
§
… at a time of day he was almost never active at before
§
… and just copied an unusual amount of sensitive files to a USB drive
§
… and took from a source code project that has been inactive for months
46
80
96
•
Increase risk of an entity (e.g. user) based on probability, severity, risk and recency of observed
behavioral events (anomalies, violations, exfiltrations)
•
Allows real-time aggregation or “correlation” of multiple event models
•
Reduces false positives and noise
30
© 2014 Interset, a FileTrek Company
•
Analyzed a large semiconductor developer
community (>20,000 developers) to look for
behavioral indicators of risk
•
Identified 2 known source code thieves and
leavers
•
Identified 11 previously unknown threats
-
2 confirmed: terminated
-
1 confirmed: is currently under investigation
-
8 Chinese employees replicating 600,000 to
nearly 15,000,000 files per day. Currently under
investigation
31
Dots = source code projects
Lines connecting dots = developers using those projects
Visualization of Interset Cluster – Leaver 1
© 2014 Interset, a FileTrek Company
Effective Behavioral Analytics
Bad
•
Rules-‐based alerts alone
•
ClassificaBon systems alone
•
Simple mean/standard
deviaBon based thresholds,
generic anomaly detecBon
•
Hard decision boundaries
Good
•
Probability-‐based anomaly +
cost-‐based models
•
Machine learning models
•
Robust models (handle outliers,
big data, responds to change)
•
Numerical scores
32
à
Flood of alerts, hard to deploy,
scale and maintain
à
Less noise, easier to deploy and
scale, ability to focus on
top n
© 2014 Interset, a FileTrek Company 33
© 2014 Interset, a FileTrek Company Source Competing on Analytics, Davenport and Harris, 2007 Standard Reporting Ad hoc Reporting Query/Drill Down Alerts Forecasting Simulation Predictive Modeling
In memory data, fuzzy search, geo spatial Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression
Rules/triggers, context sensitive, complex events Query by example, user defined reports
Real time, visualizations, user interaction Optimization
Optimization under Uncertainty
Decision complexity, solution speed
Quantifying or mitigating risk Adaptive Analysis
Continual Analysis Responding to local change/feedback Responding to context
Entity Resolution
Annotation and Tokenization Relationship, Feature Extraction
People, roles, locations, things
Rules, semantic inferencing, matching Automated, crowd sourced
Big Data Analytics in Security
We are here.
© 2014 Interset, a FileTrek Company
Future of Big Data Analytics in Security
35
Advanced Threat Detec8on
and Response
Intelligent Sensors and
Ubiquitous Data Sources
Behavioral and Threat Analy8cs
PlaSorm
•
What happened?
•
How many, how ocen?
•
Where is the risk and threat?
•
How can this threat be contained?
•
How can we prevent this?
•
What will happen next?
•
What is the best possible
response to this threat?
• Desktops and Servers
• Mobile
• Cloud
• Social Networks
• Open Data, External Data, IOCs
• ReputaBon and Risk Services
• Enterprise to Global Systems
• Forensic Analysis • Risk Modeling • Anomaly DetecBon • EnBty ResoluBon • Behavioral SimulaBon • Behavioral PredicBon
© 2014 Interset, a FileTrek Company