• No results found

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

N/A
N/A
Protected

Academic year: 2021

Share "ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

ADVANCED ANALYTICS

AND FRAUD DETECTION

THE RIGHT TECHNOLOGY

FOR NOW AND THE FUTURE

(2)
(3)

Big Data

What tax agencies are or will be seeing!

• Big Data

Large and increased data volumes

New and emerging data types/sources

New multi-structured data types with unknown relationships

that require processing of data regardless of size to discover

insights.

Examples: web logs, sensor networks, social networks, text.

Increased reporting requirements such as Merchant cards

(Form 1099-K) and Cost Basis Reporting on Securities Sales

(Form 1099-B)

Key Points

Analyze all the data – just not random samples

The need for fast processing to detect and prevent fraud

(4)

More’s Law …

(as in more data)

(5)

Big Data Challenges are More Than Data Size

“CIOs face significant

challenges in addressing

the issues surrounding big

data…

New technologies and

applications are emerging

and should be investigated

to understand their

potential value.”

Source: CEO Advisory: ‘Big Data’ Equals Big Opportunity, Gartner, 31 March 2011.

The Four Axes of Big Data

(6)

Data in a Tax Agency

Big Box Retailers/Corporations

Seller/Retailer Data

i.e.

Audit Leads

Nexus Payments

Structured and Unstructured Data

(7)

Data in a Tax Agency

Correspondence &

Emails

Web Logs

i.e.

Audit Leads

Nexus Payments

Structured and Unstructured Data

Case Notes

Customs Data

Work Papers

(8)

Leveraging data for Taxpayer Education,

Compliance and Service Enhancement

Humans by nature are social, social media is just an enabler

Untapped social network data

EVERYWHERE !

-

Existing consumer/taxpayer transaction data & interaction data

-

You are not constrained to Twitter and Facebook feeds to obtain TP

behavior and/or data

What if….. you could determine by applying text analytics that a

taxpayer that claimed no income in 2011 bought three motorcycles

in 2011

What if….you could be ‘notified’ a taxpayer claimed he cheated your

tax department on a blog, on Facebook, etc?

(9)

Statistical Modeling

• The most powerful method is to use statistical models to assess fraud risk

• To build a predictive model, you need to identify some historical known

cases

• Clustering can also be used to find cases with similar characteristics. This

won’t predict fraud, but can identify unusual groupings of cases

C1 C2 C3 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 T r a n s a c ti o n s Login Time

Cluster analysis can help

find cases that have similar

profiles

Decision trees can help

identify drivers of fraud and high

risk cases

Response modeling can provide

rankings on overall fraud risk

Various modeling

(10)

One Analytic Data Solution

Pattern

Analysis

Path

Analysis

Graph

Analysis

Strategic & Operational Intelligence

Text

Social

media

Machine

data

SCM

ERP

Trans

3

rd

Party

CRM

Big Data Insight

Web

logs

Aster Data

Analytic Platform

Teradata

Integrated

Data Warehouse

SQL Analytics

SQL-Map Reduce

Analytics

Structure

Multi-Structure

Ad Hoc

/OLAP

Predictive

Analytics

Spatial/

Temporal

Active

Execution

(11)

In-Database Analytic Processing

Enabling Better, Faster Insight

Advanced

Visualization

Text Analytics

Reporting and OLAP

Advanced Analytics

Parallel

Performance

(12)

Who is Teradata ?

Global Leader in Enterprise Data Warehousing

Headquartered in Ohio

9,200+ associates

Analytic Solutions and Consulting Services

The leader in

Gartner

’s Leaders Quadrant since 1999

U.S. publicly-traded software company

S&P 500 Member, Listed NYSE:

“TDC”

Founded in 1979, public launch in 2007

Global presence and world-class customer list

More than 1,300 customers, More than 2,500 installations

28 Federal and State partners

Extended Appliance Family

Launched 2008

Simple

Powerful

Affordable!

Teradata Tax Team

Deep tax domain Compliance

Customer service

(13)

13

Teradata

is

THE Leader

and has been

since 1999 !

GARTNER MAGIC QUADRANT

DATA WAREHOUSE DBMS, 2012

Magic Quadrant for Data Warehouse Database Management Systems Mark Beyer, Donald Feinberg, Merv Adrian, Roxanne Edjlali 2/6/12

(14)

14

Teradata Workload-Specific Platform Family

5

6

0

1

6

5

0

2

6

9

0

4

6

0

0

6

6

XX

Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Extreme Performance Appliance Active Enterprise Data Warehouse Aster MapReduce Appliance Scalability Up to 12TB Up to 186PB Up to 315TB Up to 18TB Up to 92PB Up to 5PB Workloads Test/ Development or Smaller Data Marts Analytical Archive, Deep Dive Analytic Strategic Intelligence, Decision Support System, Fast Scan Operational Intelligence, Lower Volume, High Performance Strategic & Operational Intelligence, Real Time Update, Active workloads Discovery Platform for Big

Data Analytics with embedded SQL MapReduce

for new data types & sources

(15)

15 8/14/2012 Teradata Confidential

Data Volume

(Raw, User Data)

Competition Scales One Dimension at the Expense of Others Limited by Technology!

Schema

Sophistication

Query

Freedom

Query

Complexity

Data

Freshness

Query Data Volume

Query

Concurrency

Workload

Management

Teradata can Scale Simultaneously Across

Multiple Dimensions Driven by Business!

Scalability Across Multiple Dimensions

(16)

16 8/14/2012 Teradata Confidential

Automatic

Built-In

Functionality

Fast Query

Performance

“Parallel Everything” design and smart Teradata

optimizer enables fast query execution across platforms

Quick Time to

Value

Simple set up steps with automatic “hands off”

distribution of data, along with integrated load utilities result in rapid installations

Simple to Manage

DBAs never have to set parameters, manage table space, or reorganize data

Responsive to

Business Change

Fully parallel MPP “shared nothing” architecture scales linearly across data, users, and applications providing consistent and predictable performance and growth

Easy

“Set & G0”

Optimization

Options

Powerful,

Embedded

Analytics

In-database data mining, virtual OLAP/cubes, pre-built and custom application objects (User Defined Functions) drive efficient and differentiated business insight

Advanced

Workload

Management

Workload management options by user, application, time of day and CPU exceptions

Intelligent Scan

Elimination

“Set and Go” options reduce full file scanning (Primary, Secondary, Multi-level Partitioned Primary, Aggregate Join Index, Sync Scan)

(17)

Analytical Ecosystem

The Ecosystem Is The Warehouse

1650

2650

560

66XX

66XX

2650

Aster Data

SQL-Map Reduce

(18)

Teradata Aster

Unified Big Data Architecture for the Enterprise

Audio/

Video Images Text

Web & Social

Machine

Logs CRM SCM ERP

Engineers Data Scientists Quants Business Analysts

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Capture, Store, Refine

Discovery Platform

Integrated Data

Warehouse

Integrated Data

(19)

Aster SQL-MapReduce:

What Is It and Why It Is Important to In-Database Analytics?

Patented Framework for advanced

analytics that are hard to define in SQL

-

Couples SQL (relational) with MapReduce

(SQL-MapReduce)

-

it’s invoked from SQL. (automatically

parallelized)

-

Includes library of pre-packaged Analytic

Modules

Aster Data nCluster

App App App App App App

SQL SQL-MapReduce

Architecture for diverse, embedded analytics processing

-

Supports custom analytics written in a variety of languages i.e Java

Combines SQL & visual tools

(20)

Ease of Development and Reuse

Analytic Foundation : 50+ out-of-the-box modules

Modules

Business-ready SQL-MapReduce Functions

Path Analysis

Discover patterns in rows of sequential data

nPath: complex sequential analysis for time series analysis and behavioral pattern analysis

Sessionization: identifies sessions from time series data in a single pass over the data

Attribution: operator to help ad networks and websites to

distribute “credit”

Statistical

Analysis

High-performance processing of common statistical calculations

Histogram: function to provide capability of generating

Decision Trees: Native implementation of parallel random forests.

Approximate percentiles and distinct counts: calculate

percentiles and counts within specific variance

Correlation: calculation that characterizes the strength of the relation between different data fileds

Regression: performs linear or logistic regression between an

output variable and a set of input variables

Averages: calculate moving, weighted, exponential or

volume-weighted averages over a window of data

Relational

Analysis

Discover important

relationships among data

Graph analysis: finds shortest path from a distinct node to all

other nodes in a graph

Tokenization: splits strings into individual words to assist text

(21)

Modules

SQL-MapReduce Analytic Functions

Text Analysis

Derive patterns in textual data

Text Processing: counts occurrences of words, identifies roots, &

tracks relative positions of words & multi-word phrases

Text Partition: analyzes text data over multiple rows

Levenshtein Distance: computes the distance between two words

Cluster

Analysis

Discover natural groupings of data points

k-Means: clusters data into a specified number of groupings

Canopy: partitions data into overlapping subsets within which

k-means is performed

Minhash: buckets highly-dimensional items for cluster analysis

Basket analysis: creates configurable groupings of related items

from transaction records in single pass

Collaborative Filter: predicts the interests of a user by collecting

interest information from many users

Data

Transformation

Transform data for more advanced analysis

Unpack: extracts nested data for further analysis

Pack: compress multi-column data into a single column

Antiselect: returns all columns except for specified column

Multicase: case statement that supports row match for multiple

cases

Ease of Development and Reuse

(22)

Unified

Big Data Architecture for the Enterprise

Audio/

Video Images Text

Web & Social

Machine

Logs CRM SCM ERP

Engineers Data Scientists Quants Business Analysts

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Discovery Platform

Integrated Data

Warehouse

Integrated Data

(23)

Aster SQL-MapReduce and Hadoop MapReduce

Customized MapReduce

Deployed via SQL-MR and BI

and Visualization tools

Easy to manage database

50+ Packaged

SQL-MapReduce Analytics

SQL – “language of

business”

Integrated Development

Environment (IDE)

Aster

SQL-MapReduce

Customized MapReduce

Deployed via application

code and people

File System

• Batch Processing

• Requires lots of coding

Hadoop

MapReduce

(24)

Aster SQL-MapReduce and Hadoop

Customized MapReduce

Deployed via SQL-MR and BI

and Visualization tools

Easy to manage database

50+ Packaged

SQL-MapReduce Analytics

SQL – “language of

business”

Integrated Development

Environment (IDE)

Aster

SQL-MapReduce

Customized MapReduce

Deployed via application

code and people

File System

• Batch Processing

• Requires lots of coding

Hadoop

MapReduce

SELECT *

FROM nPath (

ON (…)

PARTITION BY sba_id

ORDER BY datestamp

MODE (NONOVERLAPPING)

PATTERN ('(OTHER_EVENT|FEE_EVENT)+')

SYMBOLS (

event LIKE '%REVERSE FEE%' AS

FEE_EVENT,

event NOT LIKE '%REVERSE FEE%' AS

OTHER_EVENT)

RESULT (…)

(25)

Aster SQL-MapReduce and Hadoop

Customized MapReduce

Deployed via SQL-MR and BI

and Visualization tools

Easy to manage database

50+ Packaged

SQL-MapReduce Analytics

SQL – “language of

business”

Integrated Development

Environment (IDE)

Aster

SQL-MapReduce

Customized MapReduce

Deployed via application

code and people

File System

• Batch Processing

• Requires lots of coding

Hadoop

MapReduce

SELECT *

FROM nPath (

ON (…)

PARTITION BY sba_id

ORDER BY datestamp

MODE (NONOVERLAPPING)

PATTERN ('(OTHER_EVENT|FEE_EVENT)+')

SYMBOLS (

event LIKE '%REVERSE FEE%' AS

FEE_EVENT,

event NOT LIKE '%REVERSE FEE%' AS

OTHER_EVENT)

RESULT (…)

(26)

Teradata Workload-Specific Platforms

5

6

0

1

6

5

0

2

6

9

0

4

6

0

0

6

6

XX

Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Extreme Performance Appliance Active Enterprise Data Warehouse Aster MapReduce Appliance Scalability Up to 12TB Up to 186PB Up to 315TB Up to 18TB Up to 92PB Up to 5PB Workloads Test/ Development or Smaller Data Marts Analytical Archive, Deep Dive Analytic Strategic Intelligence, Decision Support System, Fast Scan Operational Intelligence, Lower Volume, High Performance Strategic & Operational Intelligence, Real Time Update, Active workloads Discovery Platform for Big

Data Analytics with embedded SQL MapReduce

for new data types & sources

(27)

Teradata Aster

Software Only

Teradata Aster

Cloud

Edition

Aster

MapReduce

Appliance

Purpose

Complex, High

Speed Analytics

For Emerging

Big Data

Teradata Aster

nCluster for Amazon

Web Services,

AppNexus, Dell’s Data

Cloud and Terremark

Integrated

Discovery Platform

Scalability

Flexible

Elastic

Up to 5PB

Sub Segment

Massively parallel software solution with

embedded SQL-MapReduce analytics for

new data types and sources

On-demand extreme scaling with no downtime,

always-on data cloud availability for high

performance next-generation analytics for

big data

Embedded SQL-MapReduce analytics on Teradata hardware.

(28)

Customer wants a

ready-to-run integrated solution with:

Teradata Server

Management

Teradata support

Value Proposition:

Comparing the Aster Appliance vs. Aster Software-Only

Customer wants to use

commodity hardware

Wants to run in the cloud

Who Supports

Appliance

SW-Only

Hardware

Teradata

Customer

Software

Teradata

Teradata

OS

Teradata

Customer

Network

Teradata

Customer

Set up

Teradata

Customer

(29)

Thank You !!

Questions ??

What will you do different

TOMORROW ?

References

Related documents

8 Attebery, like Irwin, distinguishes between the fantastic and fantasy: the fantastic as a mode of storytelling incorporates the whole of myth, fairy tale, magic realism,

This Service Level Agreement (SLA or Agreement) document describes the general scope and nature of the services the Company will provide in relation to the System Software (RMS

de Klerk, South Africa’s last leader under the apartheid regime, Mandela found a negotiation partner who shared his vision of a peaceful transition and showed the courage to

Although total labor earnings increase with the unskilled unions’ bargaining power, we can say nothing when the increase in production is due to stronger skilled unions, since

Make measurements on timeslot 0 [FREQUENCY] {Timeslot Off} [Enter] Activate the ORFS measurement (figure 25) [MEASURE] {GMSK Output RF The default setting measures spectrum

Using a nationwide database of hospital admissions, we established that diverticulitis patients admitted to hospitals that encounter a low volume of diverticulitis cases have

The Modified Principal Component Analysis technique shall take care of issues such as problem arising from the reconstruction of the face images using their corresponding

potential photonics-related research and innovation topics as input to the Societal Challenges work programme or for joint programme activities.?. Aim of the