© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 1
Data Warehousing
Data Warehouses, OLAP and Visualization
By Kristian Staalø Knudsen and David Flenstrup, Netcompany A/S
Aalborg, 4th of April 2011
Contact information:
David Flenstrup, Senior Consultant dfl@netcompany.com
About us (David and Kristian) - David
• M.Sc. in Computer Science from the Technical University of
Denmark
• Senior Consultant at Netcompany A/S since 2009
• Microsoft Certified Business Intelligence Developer
• Worked with the following clients:
– Bankdata
– Maersk
– Sundhedsstyrelsen
– TDC
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 3
About us (David and Kristian) - Kristian
• M.Sc. in Computer Science from the Technical University of
Denmark
• Senior Consultant at Netcompany A/S since 2007
• Microsoft Certified Business Intelligence Developer
• Worked with the following clients:
– Flügger
– Forca
– PFA
– Unipension
Netcompany facts
•
Netcompany delivers IT and business consulting services
•
Dedicated to ”old deeds” with high professional standards and focus on solutions which create
both technical and business value for our customers
•
Deep knowledge and experience within business, technology, process optimization, change
management and project management
•
Customer base consists of well-established Danish and international companies and
organizations
•
Awarded by Børsen as “The best IT consultants in Denmark” in 2005, 2006, 2007 and 2008
•
In 2005, 2006 and 2007 respectively awarded ”System Integrator of the year”, ”Dynamics CRM
partner of the year” and ”Information Worker partner of the year” by Microsoft Denmark.
•
Microsoft Gold Certified Business Intelligence partner (among other areas)
•
Average employee age 31
•
Etablished in 1999
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 5
Please interrupt us!
Agenda
Analysis Chaos
Locating performance bottlenecks
KISS architecture
Visualization – Representing the data
Case – Maersk Drilling
Future technologies – potential in the DSS and BI realm
Microsoft Business Intelligence
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 7
Analysis Chaos
•
Introduction
• What happened
• Lessons learned
Pension Administration Company
Pension Administration Company
Pension for
newly educated
computer
geeks
Pension for
redheaded
highschool
teachers
Pension for Ill
tempered
cleaning ladies
Customer Service Economysystems IT Development IT Maintenance© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 9
New ERP system with reports
Employee
Employee
Manager
Enterprise Resource Planning
(ERP)
Analysis Chaos
• Introduction
•
What happened
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 11
Økonomi m
realtid
HR
ProjektLøn
Analysis
Design
Implementation
Test
Scope at first
Netcompany Consultant
Analysis
Design
Implementation
Test
Requirements
specification
Positive
Negative
People asked
30-40 People
Netcompany Consultant Netcompany ConsultantScope Explosion
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 13
Kreditor
Core
Analysis
Design
Implementation
Test
HR
ProjektKreditor
Debitor
Pension
Løn
Requirements
specification
Project plan
Scope (8 cubes)
Ressourcer (1700)
Deadline (1/6-2011)
Limitations
Assumptions/Prereq
uisites
Final Scope
Signed off by the business
Signed off by the business
Name Account Amount Jensen Wage 50
Analysis
Design
Implementation
Test
< 5 min
Real-time requirement at
Monthly balancing
Near Real-time requirement
Manager
Employee
Name Account Amount Jensen Wage 100
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 15
Design
Implementation
Test
Analysis
Design
Implementation
Test
ERP
Nightly Extract-Transform-Load
F
ull
Proc
es
s
in
g
SQL Server Agent
Every 5 minutes
Delta Load since last run
No deleted Records
Last Modified Date
Partition for current day
SSIS Dataflow Destination:
Partition Processing Add
EDW
Near Real-time Design
Analysis
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 17
Design
Implementation
Test
BI
Design
Implementation
Test
BI
ERP
Project plan
Scope (8 cubes)
Ressourcer (1700)
Deadline (1/6-2011)
Limitations
Assumptions/Prer
equisites
Status (weekly)
Riscs
ERP
Integration points change
Solution
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 19
Design
Implementation
Test
Design
Implementation
Test
Lack of Testdata
Analysis
• Garbage In -> Garbage out
Project plan
Scope (8 cubes)
Ressourcer (1700)
Deadline (1/6-2011)
Limitations
Assumptions/Prer
equisites
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 21
Analysis Chaos
• Introduction
• What happened
Design
Implementation
Test
Evaluation
• Things have different names in different companies
• Things have different names inside the company
• Things have different names in the same room
• Don’t be afraid to ask questions about the business!
Understanding the business
Analysis
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 23
Design
Implementation
Test
Evaluation
• Proposed in October
by Netcompany
• Demanded in April
by the Business
Niveau 3
Niveau 2
Niveau 1
All Projects Project 1 Project 2 Maintenance CleaningDoes the business know what they want?
Analysis
Design
Implementation
Test
Evaluation
Place responsibility in the business
• Who is the buyer?
• Make heads of areas
responsible
• Better overview
• Easier to plan workshops
Analysis
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 25
Design
Implementation
Test
Evaluation
Don’t build BI on systems that are not in
production or completely specified yet.
• Hard to ”guess” the right
table
• Documentation!
• Reimplementation when
functionality or interface
change
• Testdata!
1700 tables in source
BI
ERP
Analysis
Design
Implementation
Test
Evaluation
Top-Down design
Løn
Progr
es
s
• Ensures
business
requirement
s are met.
Analysis
KPI’s Charts, Reports
Cubes
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 27
Analysis
Design
Implementation
Test
Evaluation
Prototype / Iterations
• Show end users the reports -> better analysis / more value
• Find technical limitations early
• Performance testing early
• Discover heavy administrative processes in company
A
D
I
T
Analysis
Design
Implementation
Test
Creation of SSIS user
account takes 1 month
Status now
• End of build phase
• Total estimate of 3900 hours (vs. 1700)!
• Ok because:
– Assumptions.
– Limitations.
– Riscs in status reports.
Plan BI as part of your business releases
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 29
Analysis Chaos
Locating performance bottlenecks
KISS architecture
Visualization – Representing the data
Case – Maersk Drilling
Future technologies – potential in the DSS and BI realm
Microsoft Business Intelligence
Case: Understand the requirements and challenge the business
Requirement: A Top 100 most expensive events report (OLAP candidate?)
•
Customers are placed in parent/child hierarchy
•
About 400 million events a year
•
The report should be fast! (1-2 sec max)
•
Business requirements would result in a design with complex ETL, high requirements for
hardware platform.
•
Candidate for Outlier report, data analysis showed the following:
•
Challenged the business and ended with a reduction in data of 95%
•
Result: Simple ETL, customer saves money.
Price
No. of events
Reduction
All
7.150.683
0,00%
> 1,-
4.171.859
41,66%
> 5,-
796.349
88,86%
> 10,-
338.048
95,27%
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 31
Locating performance bottlenecks in SSIS
Locate the bottleneck
•
Source (source component -> row count task)
•
Destination (flat file -> destination)
•
Network (local source vs. remote source)
•
Transformations (enable/disable transformations)
How to fix a slow source: Multiple reads from slow source (union in DF)
How to fix a slow destination: Multiple writes to slow destination (split destination table into
multiple tables and create view joining the tables)
How to fix a slow network: jumbo frames, network adapters…
Demo: Analyzing and fixing a slow query
•
Execute query in SSMS and Include Actual Execution Plan
•
Use SQL Profiler to log the activity on SQL Server
Analysis Chaos
Locating performance bottlenecks
KISS architecture
Visualization – Representing the data
Case – Maersk Drilling
Future technologies – potential in the DSS and BI realm
Microsoft Business Intelligence
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 33
Architecture KISS
• Plan:
–
Existing Architecture
– Alternative Architecture
– Lessons learned
Existing Architecture
Names Insurance Customer Service Contibution Ext Ext Ext Ext Hist Hist Hist Hist 3NF EDW Source systems 1:1 Historical changes40 minutes
Custom component:
Slowly Changing
Dimension on facts!
Aktuelle Views
Afledt periodisering
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 35
Afledt Periodisering
Sag nr 127
Jens Hansen
Sag nr 127
(gammel)
Jens Hansen
(gammel)
Sag nr 127
Jens Jensen
• Possibly cascading through large parts of Datawarehouse
• Slow
Existing Architecture
Ext Ext Ext Ext Hist Hist Hist DSH 3NF EDW DMV DMV Insurance Cube Source systems 1:1 Historical changes Member CubeActual Views
Afledt periodisering
40 minutes
Custom component:
Slowly Changing
Dimension on facts!
Views on views
Denormalizing
Processing
SQL Query Optimizer
works bad on these
complicated views
Names Insurance Customer Service Contibution© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 37
Dimensions in Insurance Cube
Architecture KISS
• Plan:
– Existing Architecture
–
Alternative Architecture
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 39
Alternative design
DSH DIM EDW Insurance Cube Member Cube No historic changes in source: DeltaLoad using Created date
Historic changes in source: Delta Load using Last Modified date + delete old rows
Historic changes in source and no Last Modified Date Source systems
NO Actual Views
NO Afledt periodisering
Names Insurance Customer Service ContibutionArchitecture KISS
• Plan:
– Existing Architecture
– Alternative Architecture
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 41
Lessons learned
• Do the requirements come from the business (Extract,
Historical changes, 3NF EDW)?
• 3NF vs. Dimensional EDW
• Pragmatic approcah
• Maintenance / further development
– New/changed field:
– Original architecture: 8 places.
– Alternative architecture: 3 places
• Smaller cubes and fewer in each dimensions (Perspective)
• KISS
Analysis Chaos
Locating performance bottlenecks
KISS architecture
Visualization – Representing the data
Case – Maersk Drilling
Future technologies – potential in the DSS and BI realm
Microsoft Business Intelligence
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 43
Dashboards
• A dashboard is
”A visual display of the most information needed to achieve one or
more objectives which fits entirely on a single computer screen so
it can be monitored at a glance”
• A dashboard must provide an acurate overview of the data in seconds
• Dashboards should always provide unambigious up-to-date data.
• Dashboards can effectively impact the strategy of a corporation by
visually presenting goals and status
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 45
Pie-charts – a pet peeve
Pie-charts are rarely ever effective at presenting
data
Pie-chart of Market
share
Same data but using simple bar
charts
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 47
Dashboard challenges
Be carefull to avoid
• Scrolling – prevents users of ever getting an overview
• Space wasters – fancy dashboard designs
• Pie charts – non-intuitive data presentation
• Misused terminology – make sure to use terms that the entire
business agrees on
• Data errors – Both from source data, but also from in our own
calculations
Once a user looses trust in your dashboards, it is very hard
to gain it back!
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 49
Analysis Chaos
Locating performance bottlenecks
KISS architecture
Visualization – Representing the data
Case – Maersk Drilling
Future technologies – potential in the DSS and BI realm
Microsoft Business Intelligence
Analysis Chaos
Locating performance bottlenecks
KISS architecture
Visualization – Representing the data
Case – Maersk Drilling
Future technologies – potential in the DSS and BI realm
Microsoft Business Intelligence
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 51
Trends in Business Intelligence
• Search in Business Intelligence
– WolframAlpha
• Enrich data warehouse with data from online sources
– Azure datamarket (
http://datamarket.azure.com
)
– Amazon AWS (
http://aws.amazon.com
)
– Digitalisér (
http://digitaliser.dk/
)
– Etc.
• True real time Business Intelligence
– StreamInsight
• Self-serviced Business Intelligence
– PowerPivot
Trends in Business Intelligence: Search
•
Search in Business Intelligence
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 53
Trends in BI: Add GPS coordinates from address (1/2)
Trends in BI: Add GPS coordinates from address (2/2)
Use for store location, sale heat map, etc.
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 55
Trends in BI: : True Realtime BI with Microsoft StreamInsight
•
Applicable when data process is: Event driven, time sensitive, high volumes of data, no need
for persisting data
•
Used in call centers, financial markets, etc.
Future Technologies - Pervasive BI
• How do you administer 1.000 users with their own analytical tools?
• What happened to ”
single version of the truth
”?
• Visualization for
non-analysts
• Data warehouse still the main
centralized datastore (MDM?)
• How long will I have a job?
© 2011 Netcompany A/S Netcompany A/S · Grønningen 19 · 1270 København K · Tlf. 70131440 Side 57