1.
The Big Picture
2.
Data Warehouse Philosophy
3.
Data Warehouse Concepts
4.
Warehousing Applications
5.
Warehouse Schema Design
6.
Business Intelligence Reporting
7.
On-Line Analytical Processing
8.
OLAP Applications
9.
Data Warehouse Implementation
1.
The Big Picture
2.
Data Warehouse Philosophy
3.
Data Warehouse Concepts
4.
Warehousing Applications
5.
Warehouse Schema Design
6.
Business Intelligence Reporting
7.
On-Line Analytical Processing
8.
OLAP Applications
9.
Data Warehouse Implementation
10.
Warehousing Software
Data Warehouses & OLAP 4
What is OLAP?
On-Line Analytical Processing is not a
definition…
It gives no help in deciding if a product is an
OLAP tool or not!
Since late 1994, many vendors claim to have
OLAP compliant products
It is not possible to rely on the vendors’ own
description
Membership of the OLAP council is not a good
Data Warehouses & OLAP 5
What is OLAP?
Researchers were forced to create
their own
definition
…
It had to be
simple
,
memorable
and
product-independent
The
FASMI
test is one of the most converging
definition efforts for detecting OLAP
compliance
It defines the
characteristics of an OLAP
application
in a specific way…
FASMI
F
ast
A
nalysis of
S
hared
M
ultidimensional
FAST
The system is targeted to deliver responses to
users within about 5 seconds
◦ Simplest analysis ~ no more than 1 second
◦ Most complicated analysis ~ no more than 20 seconds
End-users assume that a process has failed if
results are not received within 30 seconds
Unless the system warns that the report will
Data Warehouses & OLAP 7
FAST
The OLAP response speed is not easy to
achieve…
… especially when on-the-fly and ad hoc
calculations are required
Vendors resort to many techniques to achieve
this goal:
◦ Specialized forms of data storage, ◦ Extensive pre-calculations,
FAST
None of the existent products is fully
optimized
… an area of developing technology
◦ The full pre-calculation approach fails with large and sparse data
◦ Doing everything on-the-fly is much too slow with large data
According to surveys, slow query response is
Data Warehouses & OLAP 9
ANALYSIS
The system can cope with any business
logic and statistical analysis relevant for the
application and the user
In some OLAP product some
pre-programming may be needed…
◦ Without having to program, it is necessary to allow the user to define new ad hoc
ANALYSIS
Analysis could include specific features like:
◦ Time series analysis,◦ Cost allocations,
◦ Currency translation, ◦ Goal seeking,
◦ Ad hoc multidimensional structural changes, ◦ Non-procedural modeling,
◦ Exception alerting, ◦ Data mining.
These capabilities differ between products,
Data Warehouses & OLAP 11
SHARED
The system implements all the
security
requirements for confidentiality
If multiple write access is needed,
concurrent update locking at an appropriate
level should be implemented
The system should be able to handle multiple
updates in a timely and secure manner
This is a major area of weakness in many
OLAP products
… assuming that OLAP applications will be
read-only
Even products with multi-user read-write have
MULTIDIMENSIONAL
Is the
key requirement
for all OLAP
applications
The system must provide a multidimensional
conceptual view including:
◦ Full support for hierarchies ◦ Multiple hierarchies
… This is the most
logical way
to analyze
Data Warehouses & OLAP 13
INFORMATION
Is all of the
data and derived information
needed
, wherever it is and however much is
relevant for the application
The capacity if handling data differ between
OLAP products
◦ The largest OLAP products can hold at least a thousand times as much as the smallest
Many considerations must be taking:
The
FASMI
test is a reasonable and
understandable definition of the
goals OLAP is meant to achieve
Researches encourage users and
vendors to adopt this definition, which
Data Warehouses & OLAP 15
The Codd rules
In 1993, Codd et al. published a white paper
“Providing OLAP to User-Analysts: An IT
Mandate”
Codd was very well known as a respected
database researcher from the 1960s till the
late 1980s
He is credited with being the
inventor of the
relational database model in 1969
Unfortunately
, his OLAP rules proved to be
controversial due to being
The Codd rules
The OLAP white paper included 12 rules,
which are now well known
They were followed by another 6 rules in
1995
Codd restructured the rules into four groups,
calling them “features”
◦ Basic Features
◦ Special Features
Data Warehouses & OLAP 17
The Codd rules
Basic Features
1. Multidimensional Conceptual View
◦ Few would argue with this feature
◦ Codd believes this to be the central core of OLAP
The Codd rules
Basic Features
2. Intuitive Data Manipulation
◦ Data
manipulation through direct actions on
cells in the view
◦
Without recourse to menus or multiple
actions, we assume that this is by using a
mouse
Data Warehouses & OLAP 19
The Codd rules
Basic Features
3. Accessibility: OLAP as a Mediator
◦ OLAP engines are considered as middleware, sitting between heterogeneous data sources and an OLAP front-end
◦ Most products can achieve this, but often with
The Codd rules
Basic Features
4. Batch Extraction vs Interpretive
◦ This rule effectively required that products offer
both their own staging database for OLAP data as well as offering live access to external data
Data Warehouses & OLAP 21
The Codd rules
Basic Features
5. OLAP Analysis Models
◦ Codd required that OLAP products should support all four analysis models :
Categorical: parameterized static reporting ~ All OLAP tools
Exegetical: slicing and dicing with drill down ~ All OLAP tools
Contemplative: « what if? » analysis ~ Most OLAP
tools
Formulaic: goal seeking models ~ Very few OLAP
The Codd rules
Basic Features
6. Client/Server Architecture
◦ The OLAP server component of an OLAP product
should be sufficiently intelligent that various clients could be attached with minimum effort and
programming for integration
◦ Relatively few OLAP products are qualified for this test
Data Warehouses & OLAP 23
The Codd rules
Basic Features
7. Transparency
◦ This test, dealing with openness, is also a tough
but valid one
◦ A spreadsheet user should be able to get full
values from an OLAP engine and not even be aware of where the data comes from
◦ OLAP products must allow live access to
heterogeneous data sources from a full function
spreadsheet add-in, with the OLAP server engine in between
The Codd rules
Basic Features
8. Multi-User Support
◦ OLAP tools must provide concurrent access (retrieval and update), integrity and security
◦ Many OLAP applications are still read-only
Data Warehouses & OLAP 25
The Codd rules
Special Features
9. Treatment of Non-Normalized Data
◦ Refers to the integration between an OLAP engine and denormalized source data
◦ Any data updates performed in the OLAP
environment should not be allowed to alter stored denormalized data in feeder systems
The Codd rules
Special Features
10. Storing OLAP Results: Keeping them
Separate from Source Data
◦ This is really an implementation rather than a product issue
◦ But few would disagree with it
◦ Read-write OLAP applications should not be implemented directly on live transaction data
◦ OLAP data changes should be kept distinct from transaction data
Data Warehouses & OLAP 27
The Codd rules
Special Features
11. Extraction of Missing Values
◦ All missing values are cast in the uniform
representation defined by the Relational Model
◦ Missing values are to be distinguished from
zero values
The Codd rules
Special Features
12. Treatment of Missing Values
◦ All missing values are to be ignored by the OLAP analyzer regardless of their source
Data Warehouses & OLAP 29
The Codd rules
Reporting Features
13. Flexible Reporting
◦ The dimensions can be laid out in any way that the user requires in reports
◦ Most products are capable of this in their
formal report writers
◦ It is preferable that analysis and reporting facilities
The Codd rules
Reporting Features
14. Uniform Reporting Performance
◦ Reporting performance be not significantly
degraded by increasing the number of dimensions or database size
◦ There are differences between products
◦ The principal factor that affects performance is the
degree to which the calculations are performed
Data Warehouses & OLAP 31
The Codd rules
Reporting Features
15. Automatic Adjustment of Physical
Level
◦ OLAP system must adjust its physical schema
automatically to adapt to the type of model, data volumes and sparsity
◦ Most vendors fall far short of this noble ideal
◦ Since 1996, users can benefit from it in
The Codd rules
Dimension Control
16. Generic Dimensionality
◦ Each dimension must be equivalent in both its structure and operational capabilities
◦ This has proven to be one of the most controversial Codd’s rules
Data Warehouses & OLAP 33
The Codd rules
Dimension Control
17. Unlimited Dimensions & Aggregation
Levels
◦ Technically, no product can possibly comply with this feature
◦ There is no such thing as an unlimited entity on a limited computer
◦ Few applications need more than about eight or
ten dimensions
◦ Few hierarchies have more than about six
consolidation levels
The Codd rules
Dimension Control
18. Unrestricted Cross-dimensional
Operations
◦ All forms of calculation must be allowed across all dimensions, not just the “measures” dimension
◦ Many products which use only relational storage are weak in this area
Data Warehouses & OLAP 35
Data Warehouses & OLAP 37
Data Warehouses & OLAP 39
Data Warehouses & OLAP 41
Data Warehouses & OLAP 43
Data Warehouses & OLAP 45
Data Warehouses & OLAP 47