• No results found

Taking Data Analytics to the Next Level

N/A
N/A
Protected

Academic year: 2021

Share "Taking Data Analytics to the Next Level"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Taking Data Analytics

to the Next Level

Implementing and Supporting

Big Data Initiatives

(2)

What Is Big Data and How Is It

Applicable to Anti-Fraud Efforts?

(3)

Definition

Gartner: Big data is high-volume, -velocity,

and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

(4)

®2013 Association of Certified Fraud Examiners, Inc.

4 of 20

Why Big Data?

Fact Gathering on an Investigation or Proactive Compliance Program Interviews Document analysis (unstructured data) Financial & operational analysis (structured data)

 Email & user documents

 Social media

 Corporate document repositories

 New feeds & research

 Sales records

 Payment or expense details

 Selected general ledger accounts

 Financial reports and analysis

Interviews pull from document analysis and financial and operational analysis.

(5)

IBM Projection: Massive Explosion of Data

The Dawn of Big Data: The uncertainty of new information is growing alongside it’s complexity

(6)

MapReduce

 MapReduce is built on the proven concept of divide and conquer: it’s much faster to break a massive task into smaller chunks and process them in parallel.

 In 2004, Google decided to implement the power of parallel, distributed computing to

digest enormous amounts of data produced in daily operations, which resulted in a group of technologies and architectural design

(7)

Hadoop

 Hadoop implementation of MapReduce was

created by Doug Cutting and is written in Java.

 After it was created, Hadoop was turned over to the Apache Software Foundation.

 Now maintained as an open-source, top-level project with a global community of contributors.

 Original deployments include some of the most well-known, technologically advanced

organizations such as Yahoo, Facebook, and LinkedIn.

(8)

Pig and Hive

 To build applications such as Hadoop, one normally employs a popular programming interface such as Java, Pig, or Hive

 Pig: A specialized higher-level MapReduce language

 Hive: A specialized SQL-based MapReduce language  Many other programming interfaces exist

(9)

IBM Survey—Big Data Sources

34% 38% 40% 41% 41% 42% 42% 43% 57% 59% 73% 88% 0% 20% 40% 60% 80% 100% Still images/video Audio Geospatial RFID scans or POS data Free-form text Sensors External Feeds Social Media Emails Events Log Data Transactions

(10)

IBM Survey—Big Data Analytics Activities

25% 26% 35% 43% 52% 56% 65% 67% 71% 77% 91% 0% 20% 40% 60% 80% 100% Voice analytics Video analytics Streaming analytics Geospatial analytics Natural language text Simulation Optimization Predictive modeling Data visualization Data mining Query and reporting

(11)

Fraud Detection Requires a

Comprehensive Approach

Platform (Analytics) Analyze Forecast Plan Collabor -ate Simulate Survey Govern Discover Model Predict Mine Report Score Visualize Decide For fraud detection, any direction can, and should, be taken when applying analytics to our platform.

(12)

False Positive Rate High Low S tructured Da ta Detection Rate Low High Uns truc ture d Da ta

“Traditional” Rules-Based Queries & Analytics Matching, Grouping, Ordering, Joining, Filtering Statistical-Based Analysis Anomaly Detection, Clustering Risk Ranking

Traditional Keyword Searching

Keyword Search

Data Visualization and Text Mining

Data Visualization, Drill-down, Text Mining

Recall the Forensic Analytics Maturity

Model

(13)

Email and Instant Message 3rd Party Data Feeds Social Media ERP Systems Transactional Data Analysis Platform

Big Data and Anti-Fraud

Structured and unstructured data… is organized and “risk scored”

(14)

A More Human Way to Look at Data

Data Points Are Represented as Objects, With Logical Relationships

View supporting documents as dynamic objects Graphical representation of relationships between seemingly discrete entities Epicenters of activity become immediately discernable

(15)

Search-Around Functionality

Rapidly Build Networks of Interest and Tie In Multiple Data Sources

Easily find entities, documents, events, etc. that are directly related to your selection

(16)

Geocoding and Heat Maps

Identify Global Epicenters of Activity, As Well As Anomalies

Hotspots of activity are easily identified

(17)

Employee-Risk Ranking

Scored by Custodian and Time Period Based on Multiple Criteria

1.Keywords Percentage of EY-ACFE Fraud Triangle keywords around pressure, opportunity and

rationalization in email and IM communications. Scaling: 3

2. T&E analysis Ranking of T&E out-of-compliance hits and overall email scoring. Scaling: 3

Custodian C1 C2 C3 C4 C5 C6 C7 Scaling C1 Scaling C2 Scaling C3 Scaling C4 Scaling C5 Scaling C6 Scaling C7 Score A , Week 1 1 3 3 4 6 2 3 3 3 4 2 2 3 5 45 A , Week 2 2 2 4 5 3 4 2 37

4. User Activity Percentage of instances within that week, where custodian sends or receives ESI

involving those outside of peer group, as identified through hierarchies. Scaling: 2

5. 3rd Party Risk

Instances where employee is linked to high-risk 3rd parties (e.g., customers, vendors,

state owned entities, etc.) as determined by hits on OFAC, sanctions, PEP lists, or adverse media lists. Whether it be in email, T&E, or sales activity.

Scaling: 2

6. Alias Clustering

Percentage of instances within that week, where custodian sends or receives ESI

involving at least one (1) of their identified communicative aliases. Scaling: 3

7. Emotive Tone

Percentage of instances, where the employee sends or receives ESI with negative

emotions (angry, frustrated, secretive, etc.) identified through linguistic analyses. Scaling: 5

(18)

Employee-Risk Scoring

Risk Scoring Model—Peer Stratification Dashboard Review

Peer Stratification

Dots represent clusters of high risk communications that can be reviewed by clicking.

(19)

Course Recap

(20)

Contacts

Vincent Walden, CFE, CPA Ernst & Young LLP

Partner, Assurance Services

Fraud Investigation & Dispute Services New York, NY

(212) 773-3643

References

Related documents

On contrary in 1996-2001 the first is relative productivity - the mark-ups and other factors increased more in tradable than in nontradable sector, so that they did not contribute

Gwen Hughes, Ryerson University, Master of Public Policy & Administration 66th IPAC National Annual Conference, Edmonton, AB. Cabannes 2004, Lerner & Van

During budding, membrane bound Sar1p-GTP sequen- tially recruits Sec23/24p and Sec13/31p, and addition of these five core COPII components to liposomes is sufficient to generate

Así, en aras de propiciar la discusión sobre posibles herramientas analíticas para el campo de las Relaciones Internacionales y de la integración latinoamericana 4 en particular,

She brings a unique perspective to intelligence and the dynamics of market and industry issues as a result of broad business development, marketing, and corporate research

Created as a powerful center for inner growth, this popular destination attracts people from all parts of the world and is unique in its offering of all the four major paths of yoga

Article 26 Partial factors for the verification of geotechnical bearing capacity of piles loaded in tension, based on results from pile load tests, shall be selected in

Therefore, based on the theoretical models previously described, the objectives of the current research are to analyze relationships between personality indicators (perfectionism