• No results found

What s Behind Big Data and Behavorial Analytics

N/A
N/A
Protected

Academic year: 2021

Share "What s Behind Big Data and Behavorial Analytics"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

 ©  2014  Interset,  a  FileTrek  Company  

What’s Behind “Big Data”

and “Behavorial Analytics”

STEPHAN JOU, CTO

ISSA TORONTO

(2)

 ©  2014  Interset,  a  FileTrek  Company  

Hey. I’m Stephan Jou

CTO  at  Interset  

Previously:  IBM’s  Business  AnalyBcs  CTO  Office  

Big  data  analyBcs,  visualizaBon,  cloud,  predicBve  

analyBcs,  data  mining,  neural  networks,  mobile,  

dashboarding  and  semanBc  search  

M.Sc.  in  ComputaBonal  Neuroscience  and  

Biomedical  Engineering,  and  a  dual  B.Sc.  in  

Computer  Science  and  Human  Physiology,  all  

from  the  University  of  Toronto  

Email:  

[email protected]

   

TwiTer:  @eeksock  

(3)

 ©  2014  Interset,  a  FileTrek  Company  

Threat

Detection

(Insider and Compromised Machine Attack)

Through the Science of

Behavioral Analytics

3  

Catching Bad Guys With Math

(4)

 ©  2014  Interset,  a  FileTrek  Company  

Lessons:  

There  were  limited  systems  in  place  and  

we  sBll  do  not  know  all  that  he  took  

His  acBons  were  highly  anomalous  

-

Volumes  of  data  

-

Access  to  improper  accounts  

-

Usage  of  USB  storage  devices  

 

There  was  plenty  of  evidence  and  

Bme  if  only  it  was  visible!  

Who Is This?

(5)

 ©  2014  Interset,  a  FileTrek  Company  

Who Are These Two?

5  

Lessons:  

Disgrunted  insiders  employees  can  be  

at  risk  

What  were  the  anomalies?  

Copied  16,000  documents  within  five  

days  of  receiving  severance  

There  was  plenty  of  evidence  and  Bme  if  

only  it  was  visible!  

(6)

 ©  2014  Interset,  a  FileTrek  Company  

And This Guy?

6  

Lessons:  

Most  aTacks  are  from  users/idenBBes  

with  proper  access  

ATacker  stayed  under  the  radar  for  

years  

Third  parBes  (US  Intelligence)  most  

ocen  uncovers  the  aTack  

What  were  the  anomalies?  

Accessing  data  not  related  to  his  job  

Moving  data  in  ways  that  same  role  users  

were  not  –  over  Bme  

Money  problems  

There  was  plenty  of  evidence  and  Bme  

if  only  it  was  visible!  

(7)

 ©  2014  Interset,  a  FileTrek  Company  

And these guys?

Lessons:  

Make  sure  your  partners  are  secure  

Hacked  (SQL  InjecBon)  a  partner  with  a  weak  

network  

Stole  user  names  and  passwords  

IdenBBes  &  machines  are  “enBBes”    

They  acted  in  highly  anomalous  ways  

Moved  large  amounts  of  data  

Moved  data  to  exfiltraBon  points  

At  four  companies  and  the  US  Army!  

 

There  was  plenty  of  evidence  

and  Bme  if  only  it  was  visible!  

“if we do this right, we will make a million

dollars each

” “we could have already sold

them for Bitcoins which would have been

untraceable if we did it right. It could have

(8)

 ©  2014  Interset,  a  FileTrek  Company  

How Do You Catch the Authorized User?

75%    

of  material  loss  via  insiders  with  approved  access  

70%    

of  IP  thec  cases,  insiders  steal  informaBon  within  30  days  of  

 announcing  their  resignaBons  

62%  

of  employees  believe  it  acceptable  to  transfer  work  documents    to  

 personal  devices  or  cloud-­‐based  file  sharing  services,  even  if  a  

 company  police  prohibits  it  

60%    

of  employees  believe  informaBon  they  had  been  involved  in  

 developing  is  theirs  regardless  of  the  IP  protecBon  policy  of  the  

 company  

51%  

of  employees  say  their  company  does  not  strictly  enforce  

 policies,  so  feel  it  more  than  OK  to  take  corporate  data.  

20%      

of  loss  involved  collaboraBon  with  one  or  more  employees  

Source:  Symantec  &  2011  Cyber  Watch  Survey,   Carnegie  Mellon  University  CERT  Program    

(9)

 ©  2014  Interset,  a  FileTrek  Company  

(10)

 ©  2014  Interset,  a  FileTrek  Company  

(11)

 ©  2014  Interset,  a  FileTrek  Company  

Kung Fu Move #1: Big Data

Source:  OliverMunday.com  

(12)

 ©  2014  Interset,  a  FileTrek  Company  

Transactional

Machine

Social

 

Volume

   

Velocity

 

 

Variety

 

Reputation

Veracity

 

The Four V’s of Big Data (Sorry)

(13)

 ©  2014  Interset,  a  FileTrek  Company  

Source: Competing on Analytics, Davenport and Harris, 2007 Standard Reporting Ad hoc Reporting Query/Drill Down Alerts Forecasting Simulation Predictive Modeling

In memory data, fuzzy search, geo spatial Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression

Rules/triggers, context sensitive, complex events

Query by example, user defined reports Real time, visualizations, user interaction

T ra di tio na l

N

ew

D

ata

Optimization

Optimization under Uncertainty

Decision complexity, solution speed

Quantifying or mitigating risk Adaptive Analysis

Continual Analysis Responding to local change/feedback Responding to context

Entity Resolution

Annotation and Tokenization Relationship, Feature Extraction

People, roles, locations, things

Rules, semantic inferencing, matching Automated, crowd sourced

Kung Fu Move #2: Math

N

ew

Me

th

o

d

s

13  

(14)

 ©  2014  Interset,  a  FileTrek  Company  

Venn Diagram of Data Science

Source: Drew Conway,

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Hacking  –  meaning  

computer  science  skills    

The  problem  –  if  you  

chose  the  wrong  math  

you  will  have  false  

posiBves  and  an  

ineffecBve  systems  

(15)

 ©  2014  Interset,  a  FileTrek  Company  

Standard Thresholds Approach

A Pattern for Increased Monitoring for Intellectual Property Theft by Departing Insiders, Andrew Moore et al., Carnegie Mellon, 2011

(16)

 ©  2014  Interset,  a  FileTrek  Company  

(17)

 ©  2014  Interset,  a  FileTrek  Company  

(18)

 ©  2014  Interset,  a  FileTrek  Company  

(19)

 ©  2014  Interset,  a  FileTrek  Company  

Behavioral Analytics – A simple example

Edward  Snowden  was  an  

contractor,  sysadmin  

with  privileged  access  

User

 

The  volume  of  copying  is  large,  

compared  to  Snowden’s  past  30  days,  

and  compared  to  other  analysts  

Ac8vity

 

These  files  have  a  high  

risk  and  importance  

value  

Asset

 

USB  drives  are  marked  as  

high  risk  channels  

Method

 

Edward  Snowden  is  copying  an  unusually  large  number  of  

sensiBve  files  to  an  external  USB  drive.  

(20)

 ©  2014  Interset,  a  FileTrek  Company  

Use Appropriate Math to Assemble the Data

Risk scores are percentages between 0% (no risk) and 100% (extreme risk)

P(event |

y

) is probability that the behavior occurred, either observed or predicted

Aggregate risk values combine risks associated with the activity, people, assets and end points

Model based on Expected Utility Theory and standard risk model (Risk = Probability * Impact)

Mathematical weighting is used to tune and train model for specific activities, people, assets and

end points on a per-behavior pattern basis

AcBvity  

User

 

File

 

Method

 

20  

R

behavior

=

P

(event |

y

)

×

w

y

×

w

u

2

i

R

u[i] u

U

+

w

f

2

j

R

f[j] fF

+

w

m

2

k

R

m[k] m

M

&

'

(

(

)

*

+

+

w

u

+

w

f

+

w

m

(21)

 ©  2014  Interset,  a  FileTrek  Company  

Important Questions

Where  is  my  important,  at  risk  stuff?  

Who  or  what  is  behaving  abnormally?  

Who  is  stealing  my  stuff?  

Who  is  going  to  leave  the  company?  

(22)

 ©  2014  Interset,  a  FileTrek  Company  

Some Simple Anomaly Models

Where  is  my  important,  at  risk  stuff?  

§  Riskiest  Files  

 

 

Who  or  what  is  behaving  abnormally?  

§  Person  Name  is  accessing  informaBon  during  unusual  working  

hours.  

§  Person  Name  accessed  a  storage  volume,  path,  an  unusually  

large  number  of  Bmes  

§  Person  Name  accessed  an  important  file  type  an  unusually  large  

number  of  Bmes  

Who  is  going  to  steal  my  stuff?  

§  Riskiest  Users  

§  Person  Name  accessed  an  abnormally  large  amount  of  data.  

§  Person  Name  performed  an  abnormally  large  number  of  file  exits.  

Who  is  going  to  leave  the  company?  

 

(23)

 ©  2014  Interset,  a  FileTrek  Company  

More Sophisticated Anomaly Models

Where  is  my  important,  at  risk  stuff?  

§  Highest  at-­‐risk  machines,  file  shares,  and  source  code  

repositories  

§  The  file,  Filename,  is  highly  valuable  compared  to  similar  files.  

§  The  following  source  code  projects  are  most  at-­‐risk.  

§  Similar  users  visualizaBon  

§  Similar  files  visualizaBon  

§  Similar  machines  visualizaBon  

Who  or  what  is  behaving  abnormally?

 

§  Person  Name  is  using  an  unexpected  file,  filename.  

§  Person  Name  is  touching  an  unexpected  set  of  files.  

§  Person  Name  is  consistently  accessing  higher  amounts  of  data  than  similar  

users.  

§  Person  Name  is  consistently  accessing  an  important  file  type  more  than  similar   users.  

§  Person  Name  is  accessing  informaBon  during  different  working  Bmes   compared  to  similar  users.  

§  An  applicaBon  accessed  an  unexpected  file  type.  

Who  is  going  to  steal  my  stuff?  

§  Person  Name  has  accessed  an  unusual  amount  of  total  file  value.    

§  Person  Name  is  consistently  performing  more  file  exits  than  similar  

users.  

§  Person  Name's  amount  of  file  exits  varies  more  than  similar  users.  

§  Person  Name  has  replicated  a  large  amount  of  source  code  

Who  is  going  to  leave  the  company?  

§  Person  Name  is  hoarding  an  unusual  amount  of  source  code.  

§  Person  Name  has  been  accessing  unexpected  source  code  

repositories  

§  Person  Name  is  engaging  in  job  search  acBviBes.  

§  The  proporBon  of  Bme  spent  by  Person  Name  on  non-­‐work  acBviBes  

has  changed.    

§  Person  Name  has  emailed  themselves.  

 

(24)

 ©  2014  Interset,  a  FileTrek  Company  

Computing Probability of an Anomalous Event

§

Each  term  in  the  aggregate  behavior  risk  equaBon  has  

analyBcs  behind  it  

§

Highly  anomalous  acBviBes,  compared  to  baseline,  should  

result  in  a  high  value  

§

How  to  compute  the  probability  of  an  anomalous  event?  

24  

R

behavior

=

P

(event |

y

)

×

w

y

×

w

u

2

i

R

u[i] uU

+

w

f

2

j

R

f[j] fF

+

w

m

2

k

R

m[k] mM

&

'

(

(

)

*

+

+

w

u

+

w

f

+

w

m

(25)

 ©  2014  Interset,  a  FileTrek  Company  

Model: Unusual volumes

Computes  probability  that  a  

value  in  a  given  hour  is  

anomalous  

-

Bayesian  approach  

Explicitly  models  both  normal  

and  abnormal  distribuBons  

-

Gaussian,  Gamma  

EsBmators  for  both  normal  

and  abnormal  based  on  

observaBon  

(26)

 ©  2014  Interset,  a  FileTrek  Company  

Example: Modeling unusual times

Monitor,  for  each  user,  start  

Bmes  of  when  a  file  or  window  

is  brought  into  focus  

AcBve  Bmes  used  as  input  into  

Gaussian  kernel  density  

esBmators  

Times  that  contain  95%  of  

acBvity  deemed  to  be  

“normal”  

P(y  is  bad)  at  a  given  Bme  is  

raBo  of  expected  acBvity  to  

95%  acBvity  line  

(27)

 ©  2014  Interset,  a  FileTrek  Company  

Model: Unusual Working Days

User  1  

Regularly  works  six  days  a  

week  (takes  Sundays  off)  

Slight  dip  during  lunches  

User  2  

Works  five  days  a  week  

ParBcularly  acBve  on  

Thursdays  

(28)

 ©  2014  Interset,  a  FileTrek  Company  

Model: Unusual Working Hours

User  1  

Starts  work  fairly  early  in  

morning  

Early  lunch  break  

SomeBmes  works  past  midnight  

User  2  

Doesn’t  work  as  long  hours  as  User  1  

9  to  5’er  

Has  occasionally  worked  a  liTle  bit  

acer  8pm  

(29)

 ©  2014  Interset,  a  FileTrek  Company  

Model: Clustering Unusual Entities

Clusters  are  created  based  on  

observed  behaviors  of  a  target  set  

of  enBBes  

-

Users,  Machines,  Assets  

Clusters  are  created  for  “like  

behaviors”  &  outliers  are  

anomalous  

-

User  acBons  

-

Access  to  data  

-

ApplicaBons  open/run  

(30)

 ©  2014  Interset,  a  FileTrek  Company  

§

John Sneakypants is accessing an unusual, important network share

25  

§

… at a time of day he was almost never active at before

§

… and just copied an unusual amount of sensitive files to a USB drive

§

… and took from a source code project that has been inactive for months

46  

80  

96  

Increase risk of an entity (e.g. user) based on probability, severity, risk and recency of observed

behavioral events (anomalies, violations, exfiltrations)

Allows real-time aggregation or “correlation” of multiple event models

Reduces false positives and noise

30  

(31)

 ©  2014  Interset,  a  FileTrek  Company  

Analyzed a large semiconductor developer

community (>20,000 developers) to look for

behavioral indicators of risk

Identified 2 known source code thieves and

leavers

Identified 11 previously unknown threats

-

2 confirmed: terminated

-

1 confirmed: is currently under investigation

-

8 Chinese employees replicating 600,000 to

nearly 15,000,000 files per day. Currently under

investigation

31  

Dots = source code projects

Lines connecting dots = developers using those projects

Visualization of Interset Cluster – Leaver 1

 

(32)

 ©  2014  Interset,  a  FileTrek  Company  

Effective Behavioral Analytics

Bad  

Rules-­‐based  alerts  alone  

ClassificaBon  systems  alone  

Simple  mean/standard  

deviaBon  based  thresholds,  

generic  anomaly  detecBon  

Hard  decision  boundaries  

Good  

Probability-­‐based  anomaly  +  

cost-­‐based  models  

Machine  learning  models  

Robust  models  (handle  outliers,  

big  data,  responds  to  change)  

Numerical  scores  

32  

à

Flood of alerts, hard to deploy,

scale and maintain

à

Less noise, easier to deploy and

scale, ability to focus on

top n

(33)

 ©  2014  Interset,  a  FileTrek  Company   33  

(34)

 ©  2014  Interset,  a  FileTrek  Company   Source Competing on Analytics, Davenport and Harris, 2007 Standard Reporting Ad hoc Reporting Query/Drill Down Alerts Forecasting Simulation Predictive Modeling

In memory data, fuzzy search, geo spatial Causality, probabilistic, confidence levels High fidelity, games, data farming Larger data sets, nonlinear regression

Rules/triggers, context sensitive, complex events Query by example, user defined reports

Real time, visualizations, user interaction Optimization

Optimization under Uncertainty

Decision complexity, solution speed

Quantifying or mitigating risk Adaptive Analysis

Continual Analysis Responding to local change/feedback Responding to context

Entity Resolution

Annotation and Tokenization Relationship, Feature Extraction

People, roles, locations, things

Rules, semantic inferencing, matching Automated, crowd sourced

Big Data Analytics in Security

We  are  here.  

(35)

 ©  2014  Interset,  a  FileTrek  Company  

Future of Big Data Analytics in Security

35  

Advanced  Threat  Detec8on  

and  Response  

Intelligent  Sensors  and  

Ubiquitous  Data  Sources  

Behavioral  and  Threat  Analy8cs  

PlaSorm  

What  happened?  

How  many,  how  ocen?  

Where  is  the  risk  and  threat?  

How  can  this  threat  be  contained?  

How  can  we  prevent  this?  

What  will  happen  next?  

What  is  the  best  possible  

response  to  this  threat?  

•  Desktops  and  Servers  

•  Mobile  

•  Cloud  

•  Social  Networks  

•  Open  Data,  External  Data,  IOCs  

•  ReputaBon  and  Risk  Services  

•  Enterprise  to  Global  Systems  

•  Forensic  Analysis   •  Risk  Modeling   •  Anomaly  DetecBon   •  EnBty  ResoluBon   •  Behavioral  SimulaBon   •  Behavioral  PredicBon  

(36)

 ©  2014  Interset,  a  FileTrek  Company  

Thank You! Questions?

Upload  your  logs,  try  

out  our  math    

 

Cloud-­‐hosted  Threat  

References

Related documents

The reduction of dimension is based on a separation of fast and slow dynamical variables, a procedure of general applicability to dynamical systems described by coupled

Further evidence using instrumental variables suggests that in states without hospital privacy laws, one hospital’s adoption increases the propensity of other area hospitals to adopt

FIGURE 2 | Scatter plots showing the number of species in which a β-tubulin substitution has been reported in MBC-resistant field isolates, against (A) the number of species in

material , and the first interface layer consists essentially of metal material which does not form intermetallic com pounds with elements in the first alloy material ; b )

Headlining the show are authors Richard Thompson and Michael Cremo, whose internationally popular and controversial book Forbidden Archeology documents many cases

To receive the MSIM, students must complete a total of sixteen courses (through course work or waiver), including five business core, six required international courses,

System Implementation Data Scrubbing Manual Reporting Capacity Building Relationship Building Early Days Self-service Reporting Advanced Analytics Predictive Analytics

I We also consider a noisy variant with results concerning the asymptotic behaviour of the MLE. Ajay Jasra Estimation of