• No results found

Data Visualization for Data QC

N/A
N/A
Protected

Academic year: 2021

Share "Data Visualization for Data QC"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Copyright © 2011 Deloitte Development LLC. All rights reserved.

Data Visualization for Data QC

March, 2012 Steve Berman

(2)

Agenda

Overview

Reasons to Use Data Visualization in Data Cleansing Examples

Tools

(3)

2 Copyright © 2011 Deloitte Development LLC. All rights reserved. • Checking data for reasonability is an important part of any modeling

process

• Can’t be avoided – no such thing as “perfect data”

• Can be time intensive – the “preprocess” portion of any project takes 60-80% of the total project time

• QC often relies on individual queries that test for specific conditions, or filters in Excel

How QC has worked up to now

SELECT policy_no, pol_eff_date, tot_incurred_loss FROM claim_data

WHERE tot_incurred_loss) > 0

ORDER BY DESCENDING tot_incurred_loss;

proc summary data=pol_info; by company state;

var pol_ct earned_prem written_prem;

output out=pol_info_sum (drop=_type_ _freq_) sum=; run;

(4)

Data Visualization is the

communication of information using graphical representations

Introducing data visualization

“A picture is worth a thousand words.”

“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey

• Data visualization is an alternative and a supplement to the more traditional querying methods

• Use is becoming more prevalent in presentation

• Not yet widely used by actuaries in QC process – more used to seeing numbers

(5)

4 Copyright © 2011 Deloitte Development LLC. All rights reserved.

(6)

• Amount of data is growing

• Information harder to be expressed numerically in simple spreadsheets

• Improved computing power

• Improved tools and visualization methods, which are easier to use

• Increased demand from clients / management

(7)

6 Copyright © 2011 Deloitte Development LLC. All rights reserved.

Benefits of data visualization

(8)

• Data visualization allows users see several different perspectives of the data.

• Data visualization makes it possible to interpret vast amounts of data

• Data visualization offers the ability to note exceptions in the data

• Data visualization allows the user to analyze visual patterns in the data

• Data visualization equips users with the ability to see influences that would otherwise be difficult to find

• Packages allow drag and drop functionality, limited need for coding

• Also allows for easy drill down, filtering, grouping

• Data visualization allows a natural progression for QC to insight in modeling

(9)

8 Copyright © 2011 Deloitte Development LLC. All rights reserved. • QC, Data Mining, Exploratory Data Analysis (EDA)

• Model interpretation

• Results / Presentation

(10)

Examples - Histogram

• Mass points • Extreme values • Unreasonable values 0 10,000 20,000 30,000 40,000 50,000 60,000 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00

Adjusted to Actual Premium

Histogram after correction – but is it fixed?

• Still some outliers

• For extremes, higher loss ratio indicates potential problems with adjusted premium

(11)

10 Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples - Histogram

• Incorrectly calculated exposure is one of the causes

(12)

• Descriptive of median, standard deviation, outliers

• Particularly useful in

comparing groups within a variable

(13)

12 Copyright © 2011 Deloitte Development LLC. All rights reserved.

• Split into subgroups as well

(14)

Examples – Line Charts

• Unusual spikes by month? • Seasonal activity consistent with business knowledge? • Consistent behavior over time?

(15)

14 Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples – Line Charts

• Back to insurance: Total premium average

premium, policy counts by month

• Is peak at January renewal (both counts and average premium) consistent with business?

(16)
(17)

16 Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples – Scatterplots

(18)

Correlations between variables Look for:

• Very high correlations (near +1 or -1

• Low correlations for variables that should be related

(19)

18 Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples - Geospatial

Customers in a five state area compared to sales locations

• Customers in area that are not

within driving distance

• Customers mapped well outside

(20)

Examples - Geospatial

Sales and gross margin rate by zip code Large mass point in sales, also large negative GMR

(21)

20 Copyright © 2011 Deloitte Development LLC. All rights reserved.

Software

(not an exhaustive list!)

Excel Third party add-ins available

PowerPivot Microsoft add-in to Excel

R Open source, multitudes of packages

Tableau

SAS SAS/GRAPH module

Spotfire

Microstrategy

(22)

Wrapping Up

• Data Visualization allows analysts to communicate with clarity, precision, and efficiency

• However, there are benefits in utilizing visualizations early in the process, as they may uncover data issues more easily than individual queries

• Standard tools, like histograms, scatterplots, maps, are becoming more available and easier to use

References

Related documents

20) To consider final adoption of an ordinance authorizing the Mayor to accept the low bid submitted for renovations to snorkel truck building and driveway for Fire Station #4. 21)

Results and Conclusion: The collaborative approach to simulation-based training design described here can facilitate genuine stakeholder participation in the development of

Health and Safety Mobility Aging Population Healthcare Education Climate Change Air Quality Human Migration Employment Social Contribution Congestion

Figure 2.3 Enhancement of soil water content for elevated CO 2 levels (A) under different management systems; B) under different vegetation types; and (C) under different

• to develop geothermal energy (electricity) to complement hydro and other sources of power to meet the energy demand of rural areas in sound environment,.. • to mitigate rural

The Halmos College of Natural Sciences and Oceanography (HCNSO) offers degree and certificate programs in biology, ocean science, marine biology, mathematics, chemistry, physics,

The main substrate surface processes are schematically illustrated in Fig. Figure 11: Schematic illustration of the surface processes occurring during MBE growth. These

Kijkend naar de resultaten van het onderzoek kan geconcludeerd worden dat de manier waarop competentiemanagement binnen Fabory uitgevoerd wordt geen toegevoegde waarde heeft