Combating Web Fraud
with Predictive Analytics
Dave Moore
Novetta Solutions
Novetta Solutions
Formerly, International Biometric Group (IBG)
Consulting
DoD, DHS, DRDC
IR&D
Identity Cyber
Fundamental problem
Machines are the proxies of personal identity.
Attributing machine activity to a person is difficult, even when the session is authenticated.
Contrast this to the pre-Internet society,
Fundamental problem
◦
Old question
“Are you who you claim to be?”
◦
New question
“Are you
what
you claim to be?”◦
Both questions are equally relevant in our
Machine-enabled anonymity
Account takeover
Click & impression fraud Content scraping
Espionage
Fake account registration Identity theft
Spam
Vandalism
Vulnerability scanning Vulnerability exploitation
Machine-enabled anonymity
Edward Snowden acquired ~1.7MMNSA files using a Web crawler.
Bradley Manning used a simple
Web client to acquire files.
Sanger, David E. and Eric Schmitt, “Snowden Used Low-Cost Tool to Best N.S.A.,” The New York Times, 8 Feb 2014, <http://www.nytimes.com/2014/02/09/us/snowden-used-low-cost-tool-to-best-nsa.html?_r=1>.
Fisher, Max, “The free Web program that got Bradley Manning convicted of computer fraud,” The Washington Post, 30 Jul 2013, <http://www.washingtonpost.com/blogs/worldviews/wp/2013/07/30/the-free-web-program-that-got-bradley-manning-convicted-of-computer-fraud/>.
How can we distinguish humans from bots?
Bot traps
Challenge-response IP address reputation Device fingerprinting
How can we distinguish humans from bots?
Bot traps
Challenge-response IP address reputation Device fingerprinting
What is it, really?
PA is the application of software and statistical
modeling to determine the outcome of an unknown, future event based on prior knowledge.
Why is it a buzzword?
PA describes any software that uses statistical models
to make decisions. Most applications of Machine Learning (ML) do this. Everyone is now “predictive.”
PA and Authentication are identical in our use case,
where the “future event” in question is the likelihood that a user agent will commit fraud.
What’s a user agent?
◦
A user agent is an application that requests content
from the Web on behalf of a person.
Web browsers
Internet Explorer, Firefox, Chrome, Safari, …
Search engine crawlers
GoogleBot, BingBot, YandexBot, Slurp, …
User agents make assertions of identity.
Firefox 27.0, Windows 7
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
Host www.google.com
DNT 0
Connection keep-alive Accept-Language en-US,en;q=0.5 Accept-Encoding gzip, deflate
User agents make assertions of identity.
◦
This is true for all major desktop and mobile
User agents make assertions of identity.
◦
User agents can claim to be anything.
Spoofing is trivial.
◦
Rightfully, Web security experts often advise
User agents make assertions of identity.
◦
Novetta computer scientists have discovered it is
entirely possible to harness those assertions
to detect bots and combat Web fraud.
Basic concept
Gather statistics on the behaviors of user agents. Train an ML classifier (e.g. neural network)
to learn the behaviors of known user agents.
Deploy the classifier to detect false assertions of
Feature selection
Device features Human features
Packet headers Keystroke dynamics
Capability test results Mouse dynamics
Geolinguistic validation Touch and swipe dynamics
How it performs
◦
~0.15% equal error rate (EER) when the claim
is a desktop or mobile Web browser.
◦
Higher error rates for lesser known user agents.
How it performs
◦
Fast, efficient
We can confidently determine the likelihood of
spoofing in the first request of a session.
◦
Robust
Policies for effective implementation
◦
Allow
Standard desktop and mobile Web browsers
verified by the proposed system.
Standard search engine crawlers
verified by hostname lookups.
Custom exceptions.
◦
Deny
Applications
Breach prevention Fraud prevention Scraping prevention Spam prevention Threat intelligence Implementations
Web (HTTP) Email (SMTP) VoIP (SIP)
Takeaways
Personal identity and user agent identity are equally
important in establishing trust on the Internet.
User agent assertions are verifiable,
especially for the everyday Web browsers.
User agent verification enhances privacy