The Application of Exploratory Data Analysis
(EDA) in Auditing
Qi Liu
Ph.D. Candidate (A.B.D.)
Dept. of Accounting & Information Systems Rutgers University
28
thWCARS
November 9, 2013
Outline
Introduction
An overview of EDA concept
EDA in Auditing
An application of EDA in auditing – A credit card retention case
Future Research
Motivation
Audit is a data intensive process; data analysis plays an important role in audit process.
Current data analysis approaches used in auditing process focus on validating predefined audit objectives, which can not discover unaware risks from the data.
EDA is often linked to detective work, and one of its objectives is to identify outliers.
Even though some EDA techniques have been used in some auditing procedures, EDA has never been systematically employed in auditing.
Contribution
This research contributes to the auditing literature by taking the first cut to use exploratory data analysis in auditing and illustrate a real-world
application in audit process. 3
Definition of EDA
Exploratory data analysis (EDA) is a data analysis approach emphasizing on pattern recognition and hypothesis generation.
Exploratory Data Analysis (EDA) Confirmatory Data Analysis (CDA)
Reasoning Type Inductive Deductive
Goal Pattern Recognition and Hypothesis generation
Estimation, Modeling, Hypothesis testing
Applied Data Observation Data (data collected without well-defined hypothesis)
Experimental data (data collected through formally designed experiments)
Techniques Descriptive Statistics, Data Visualization, Clustering Analysis, Process Mining…
Traditional statistical techniques of inference, significance, and confidence
Advantages • No assumptions required
• Promotes deeper understanding of the data
• Precise
• Well-established theory and methods
Disadvantages • No conclusive answers
• Difficult to avoid bias produced by overfitting
• Required unrealistic assumptions
• Difficult to notice unexpected results
Confirmatory Data Analysis (CDA) is a widely used data analysis approach emphasizing on experimental design, significance testing, estimation, and prediction (Good, 1983).
5
Since 1980s, EDA has been applied to diversified disciplines such as interior design, marketing, industrial engineering, and geography (Chen et al., 2011; Nayaka and Yano, 2010; Koschat and Sabavala, 1994; Wesley et al., 2006; De Mast and Trip, 2007, 2009) .
A framework to apply EDA in practical problem solving issues include: (1) display the data; (2) identify salient features; (3) interpret salient features (De Mast and Kemper, 2009).
6
Framework to apply EDA in auditing
7 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7• Display the distribution of related fields
• Identify salient features from the distribution
• Perform CDA to test possible explanations
• Identify suspicious cases
• Explore the causes of abnormal cases
• Perform CDA to confirm the relationship
• Report the risks and recommend improvement suggestions
8
Purpose
Demonstrate the benefits of applying EDA in audit process
Provide a real example to support the proposed guidelines
Scenario:
Clients call the bank asking for a reduction of their card fees. Bank representatives offer discounts to clients to retain their accounts.
Objectives:
identify the situations of loss of revenue in the negotiation of fees caused by bank representatives, as b: bank representatives offer higher discounts than allowed
bank representatives usually offer the highest allowable discounts without putting enough efforts to negotiate lower discounts
bank representatives offer discounts without any negotiation with the clients
Data Description
9
Data (Retention Dataset)
Selected Attributes
Each record represents a customer call
195,694 records
162 fields
Time frame: January, 2012
Original fee (VLR_ANUIDADE_G)
Actual fee (_Valor da Anuidade de Saída)
Agent identification (Funcional do Agente)
Supervisor identification (Funcional do Supervisor)
Location of the customer service center (Polo de Atendimento)
Call duration (Tempo de Atendimento de Retenti)
Data Preprocess
Discount Calculation
Applied EDA techniques
Descriptive Statistics
Data Visualization
Data Transformation
Methodology
Bank policy allows bank representatives to offer discounts up to 100% of the annuity to retain the customer
Results Analysis
(1/8)11
Policy-violating bank representatives and negative discounts
0.15% 0.59% 0.69% 1.21% 4.60% 5.71% 15.66% 31.34% 13.39% 15.14% 11.51% 0% 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% <0 0-10% 10%-20% 20%-30% 30%-40% 40%-50% 50%-60% 60%-70% 70%-80% 80%-90% 90%-100% >100% Disc o u n t
Descriptive statistics of discounts
Results Analysis
(2/8)12
Policy-violating bank representatives and negative discounts
27 0 0 0 0 0 7 50 12 0 190 0 20 40 60 80 100 120 140 160 180 200 C o u n t Discount
Results Analysis
(3/8)13
Policy-violating bank representatives and negative discounts
New Audit Objective:
Actual fees are recorded correctly.
Original fees reflect the number of cards in an account.
Results Analysis
(4/8)14
Effortless bank representatives and inactive representatives
Distribution of bank representatives offered 100% discounts in the whole retention data and the 100% discount subset
Bank representatives who always offer 100% discounts should be considered not putting enough effort to negotiate with the clients for a lower discount.
0 100 200 300 400 500 600 700 86 15 6 3 9 0 9 5 4 7 90 65 4 2 90 55 7 2 90 66 6 1 95 23 8 2 90 50 7 7 90 52 2 8 90 63 3 2 91 25 2 1 90 44 8 7 91 26 6 6 99 95 9 7 90 01 3 2 90 07 2 2 90 67 0 4 98 56 4 1 87 09 8 7 95 23 9 0 90 83 9 7 90 89 1 4 90 19 2 7 90 53 5 5 90 93 3 9 9 0 0 0 7 8 91 17 8 6 91 39 7 4 91 19 2 7 90 79 6 4 91 15 8 5 90 85 8 6 90 96 0 2 92 39 5 1 99 79 8 1 90 79 7 2 90 90 1 2 92 17 0 8 91 53 6 1 95 04 8 8 90 46 9 9 99 05 5 9 99 80 9 9 91 18 8 7 91 18 5 4 90 57 2 0 90 78 7 1 91 49 1 2 9 1 1 9 0 6 90 85 8 5 F r e q u e n c y Bank Representatitves ID
Results Analysis
(5/8)15
Effortless bank representatives and inactive representatives
Descriptive statistics of frequency distribution of bank representatives
Inactive representatives 32% Supervisor 3% 35% Active representatives 65%
Results Analysis
(
6/8)16
Effortless bank representatives and inactive representatives
Comparison of active and inactive representatives on frequency distribution of discounts
Distribution of bank representatives
11.28% 25.97% 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% P e r c e n t a g e
Frequency of active representatives Frequency of inactive representatives
47.33% 12.30% 26.47% 13.90% 85.95% 7.84% 4.32% 1.89% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%
Sao Paulo Rio deJaneiro Salvador Nao Especificado
Frequency of active representatives Frequency of inactive representatives
Results Analysis
(7/8)17
Non-negotiation bank representatives and short calls
Frequency distribution of call duration less than 600 seconds
Bank representatives who offer a discount without negotiation usually related to short call duration
One possible explanation for these unreasonable short calls is that these calls are forced to terminate due to bad network connection.
New Audit Objective:
All the discounts are given in calls long enough to offer discounts.
Results Analysis
(8/8)18
Non-negotiation bank representatives and short calls
Relationship between call duration and discounts
45 55 60 65 70 75 80 0 100 200 300 400 500 600 A v e r a g e d i s o c u n t s Call duration