OLAP
Business Intelligence
OLAP definition & application
Multidimensional data representation
Business Intelligence
Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities.
There are two main types of access tools available to meet this demand, namely Online Analytical Processing
Business Intelligence
OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies.
An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business
Intelligence (BI) technologies.
BI = DW + OLAP BI = DW + DM
OLAP definition & application
The dynamic synthesis, analysis, and
consolidation of large volumes of multi- dimensional data, Codd (1993).
Describes a technology that uses a multi-dimensional view of aggregate
data to provide quick access to strategic information for the purposes of
OLAP definition & application
Can easily answer ‘who?’ and ‘what?’
questions, however, ability to answer
‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-
purpose query tools.
Types of analysis ranges from basic navigation and browsing (slicing and
dicing) to calculations, to more complex analyses such as time series and
complex modeling.
OLAP definition & application
Examples of OLAP applications
OLAP definition & application
Although OLAP applications are
found in widely divergent
functional areas, they all have the
following key features:
multi-dimensional views of data
support for complex calculations
time intelligence
OLAP - multi-dimensional
views of data
Core requirement of building a ‘realistic’
business model.
Provides basis for analytical processing through flexible access to corporate
data.
The underlying database design that
provides the multi-dimensional view of data should treat all dimensions equally.
OLAP - support for complex
calculations
Must provide a range of powerful
computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth.
Mechanisms for implementing
computational methods should be clear and non-procedural.
OLAP – time intelligence
Key feature of almost any analytical application as performance is almost always judged over time.
Time hierarchy is not always used in the same manner as other hierarchies.
Concepts such as year-to-date and
period-over-period comparisons should be easily defined.
Representation of Multi-
dimensional Data
Example of two-dimensional query.
What is the total revenue generated by
property sales in each city, in each quarter of 2004?’
Choice of representation is based on
types of queries end-user may ask.
Compare representation - three-field
relational table versus two-
dimensional matrix.
Multi-dimensional Data as
Three-field table versus Two-
dimensional Matrix
Representation of Multi-
dimensional Data
Example of three-dimensional
query.
‘What is the total revenue generated by property sales for each type of property (Flat or House) in each city, in each
quarter of 2004?’
Compare representation - four-
field relational table versus
three-dimensional cube.
Multi-dimensional Data as
Four-field Table versus Three-
dimensional Cube
Representation of Multi-
dimensional Data
A sample data cube
Representation of Multi-
dimensional Data
Cube represents data as cells in an array.
Relational table only represents multi- dimensional data in two dimensions.
Multi-dimensional structures are best visualized as cubes of data, and cubes
within cubes of data. Each side of a cube is a dimension.
A cube can be expanded to include other dimensions.
Representation of Multi-
dimensional Data
As number of dimensions increases,
number of the cube’s cells
increases exponentially.
However, majority of multi-
dimensional queries use
summarized, high-level data.
Solution is to pre-aggregate
(consolidate) all logical subtotals
and totals along all dimensions.
Representation of Multi-
dimensional Data
Pre-aggregation is valuable, as typical dimensions are hierarchical in nature.
(e.g. Time dimension hierarchy - years, quarters, months, weeks, and days)
Predefined hierarchy allows logical pre- aggregation and, conversely, allows for a logical ‘drill-down’.
Representation of Multi-
dimensional Data
Supports common analytical
operations
Consolidation
Drill-down
Slicing and dicing
Representation of Multi-
dimensional Data
Pivoting
select two dimensions used to aggregate a measure
e.g., the Sales for a Product in a particular City
Product City Date Sales
Juice NY 10 NY CA MN Cola CA 20 Juice 40 x x Juice NY 30 Cola 10 20 x Cola NY 10 Cream x x 10 Cream MN 10
Representation of Multi-
dimensional Data
Rollup
taking a current data and group-by further on another dimension
e.g., rollup the total sales on city by product
Drill-down
Representation of Multi-
dimensional Data
Slice_and_dice
reducing the dimensionality of data
the table (city, date) of a specific product
Representation of Multi-
dimensional Data
Consolidation - aggregation of data
such as simple ‘roll-ups’ or complex
expressions involving inter-related
data.
Drill-Down - is the reverse of
consolidation and involves displaying
the detailed data that comprises the
consolidated data.
Representation of Multi-
dimensional Data
Slicing and Dicing - (also called
pivoting) refers to the ability to
look at the data from different
viewpoints.
Representation of Multi-
dimensional Data
Can store data in a compressed form by dynamically selecting physical storage
organizations and compression techniques that maximize space utilization.
Dense data (that is, data that exists for a high percentage of cells) can be stored separately from sparse data (that is, a
significant percentage of cells are empty).
Representation of Multi-
dimensional Data
Ability to omit empty or repetitive
cells can greatly reduce the size of
the cube and the amount of
processing.
Allows analysis of exceptionally large
amounts of data.
Representation of Multi-
dimensional Data
In summary, pre-aggregation, dimensional hierarchy, and sparse data management
can significantly reduce the size of the
cube and the need to calculate values ‘on- the-fly’.
Removes need for multi-table joins and provides quick and direct access to arrays of data, thus significantly speeding up
execution of multi-dimensional queries.