• No results found

Combating Web Fraud with Predictive Analytics. Dave Moore Novetta Solutions

N/A
N/A
Protected

Academic year: 2021

Share "Combating Web Fraud with Predictive Analytics. Dave Moore Novetta Solutions"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Combating Web Fraud

with Predictive Analytics

Dave Moore

Novetta Solutions

(2)

Novetta Solutions

 Formerly, International Biometric Group (IBG) 

Consulting

 DoD, DHS, DRDC 

IR&D

 Identity  Cyber

(3)
(4)

Fundamental problem

 Machines are the proxies of personal identity.

Attributing machine activity to a person is difficult, even when the session is authenticated.

 Contrast this to the pre-Internet society,

(5)

Fundamental problem

Old question

 “Are you who you claim to be?”

New question

 “Are you

what

you claim to be?”

Both questions are equally relevant in our

(6)

Machine-enabled anonymity

 Account takeover

 Click & impression fraud  Content scraping

 Espionage

 Fake account registration  Identity theft

 Spam

 Vandalism

 Vulnerability scanning  Vulnerability exploitation

(7)

Machine-enabled anonymity

 Edward Snowden acquired ~1.7MM

NSA files using a Web crawler.

 Bradley Manning used a simple

Web client to acquire files.

Sanger, David E. and Eric Schmitt, “Snowden Used Low-Cost Tool to Best N.S.A.,” The New York Times, 8 Feb 2014, <http://www.nytimes.com/2014/02/09/us/snowden-used-low-cost-tool-to-best-nsa.html?_r=1>.

Fisher, Max, “The free Web program that got Bradley Manning convicted of computer fraud,” The Washington Post, 30 Jul 2013, <http://www.washingtonpost.com/blogs/worldviews/wp/2013/07/30/the-free-web-program-that-got-bradley-manning-convicted-of-computer-fraud/>.

(8)

How can we distinguish humans from bots?

 Bot traps

 Challenge-response  IP address reputation  Device fingerprinting

(9)

How can we distinguish humans from bots?

 Bot traps

 Challenge-response  IP address reputation  Device fingerprinting

(10)
(11)

What is it, really?

 PA is the application of software and statistical

modeling to determine the outcome of an unknown, future event based on prior knowledge.

Why is it a buzzword?

 PA describes any software that uses statistical models

to make decisions. Most applications of Machine Learning (ML) do this. Everyone is now “predictive.”

 PA and Authentication are identical in our use case,

where the “future event” in question is the likelihood that a user agent will commit fraud.

(12)
(13)

What’s a user agent?

A user agent is an application that requests content

from the Web on behalf of a person.

 Web browsers

 Internet Explorer, Firefox, Chrome, Safari, …

 Search engine crawlers

 GoogleBot, BingBot, YandexBot, Slurp, …

(14)

User agents make assertions of identity.

Firefox 27.0, Windows 7

User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0

Host www.google.com

DNT 0

Connection keep-alive Accept-Language en-US,en;q=0.5 Accept-Encoding gzip, deflate

(15)

User agents make assertions of identity.

This is true for all major desktop and mobile

(16)

User agents make assertions of identity.

User agents can claim to be anything.

Spoofing is trivial.

Rightfully, Web security experts often advise

(17)

User agents make assertions of identity.

Novetta computer scientists have discovered it is

entirely possible to harness those assertions

to detect bots and combat Web fraud.

(18)

Basic concept

 Gather statistics on the behaviors of user agents.  Train an ML classifier (e.g. neural network)

to learn the behaviors of known user agents.

 Deploy the classifier to detect false assertions of

(19)

Feature selection

Device features Human features

Packet headers Keystroke dynamics

Capability test results Mouse dynamics

Geolinguistic validation Touch and swipe dynamics

(20)

How it performs

~0.15% equal error rate (EER) when the claim

is a desktop or mobile Web browser.

Higher error rates for lesser known user agents.

(21)

How it performs

Fast, efficient

 We can confidently determine the likelihood of

spoofing in the first request of a session.

Robust

(22)

Policies for effective implementation

Allow

 Standard desktop and mobile Web browsers

verified by the proposed system.

 Standard search engine crawlers

verified by hostname lookups.

 Custom exceptions.

Deny

(23)

Applications

 Breach prevention  Fraud prevention  Scraping prevention  Spam prevention  Threat intelligence 

Implementations

 Web (HTTP)  Email (SMTP)  VoIP (SIP)

(24)
(25)

Takeaways

 Personal identity and user agent identity are equally

important in establishing trust on the Internet.

 User agent assertions are verifiable,

especially for the everyday Web browsers.

 User agent verification enhances privacy

(26)

Questions?

References

Related documents

Guests of three+-star hotels appreciated the most: good price/quality relationship (4.73), the beaches located nearby (4.64), and the standard of rooms (4,59), while tourists

The topic is approached at different scales of analysis and behaviour: at the scale of empire in the context of other empires and large states (by observing its interactions,

Berdasarkan kelebihan yang dimiliki logika fuzzy dan association rule mining dalam mendeteksi serta mengelompokkan suatu e-mail dan spam dengan menentukan

This strategy suggests that to become a proficient reader, students need to activate their higher order thinking skills (HOTS) to construct meaning of the relationship between

Apuyhtiön avulla toteutettava yrityskauppa voidaan toteuttaa myös siten, että apuyhtiö ostaa kohdeyhtiön varoja pitkällä aikavälillä.. Kiinteistöjä ja arvopapereita voidaan

When snapshot notification has been enabled on the protected disk, or selected in the schedule for automatic snapshots or when a manual snapshot is created, the CDP/NSS server sends

Due to demographic patterns and institutional settings on the labour market, the two thresholds can differ, implying that minimum output growth needed for a rise in employment may

ƒ Low financing costs: public sector bears the risk after construction Low financing costs: public sector bears the risk after construction Æ interest Æ interest rates are