OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

(1)

OLAP

Business Intelligence

OLAP definition & application

Multidimensional data representation

(2)

Business Intelligence

Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities.

There are two main types of access tools available to meet this demand, namely Online Analytical Processing

(3)

Business Intelligence

OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies.

An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business

Intelligence (BI) technologies.

BI = DW + OLAP BI = DW + DM

(4)

OLAP definition & application

The dynamic synthesis, analysis, and

consolidation of large volumes of multi- dimensional data, Codd (1993).

Describes a technology that uses a multi-dimensional view of aggregate

data to provide quick access to strategic information for the purposes of

(5)

OLAP definition & application

Can easily answer ‘who?’ and ‘what?’

questions, however, ability to answer

‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-

purpose query tools.

Types of analysis ranges from basic navigation and browsing (slicing and

dicing) to calculations, to more complex analyses such as time series and

complex modeling.

(6)

OLAP definition & application

Examples of OLAP applications

(7)

OLAP definition & application

Although OLAP applications are

found in widely divergent

functional areas, they all have the

following key features:

multi-dimensional views of data

support for complex calculations

time intelligence

(8)

OLAP - multi-dimensional

views of data

Core requirement of building a ‘realistic’

business model.

Provides basis for analytical processing through flexible access to corporate

data.

The underlying database design that

provides the multi-dimensional view of data should treat all dimensions equally.

(9)

OLAP - support for complex

calculations

Must provide a range of powerful

computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth.

Mechanisms for implementing

computational methods should be clear and non-procedural.

(10)

OLAP – time intelligence

Key feature of almost any analytical application as performance is almost always judged over time.

Time hierarchy is not always used in the same manner as other hierarchies.

Concepts such as year-to-date and

period-over-period comparisons should be easily defined.

(11)

Representation of Multi-

dimensional Data

Example of two-dimensional query.

What is the total revenue generated by

property sales in each city, in each quarter of 2004?’

Choice of representation is based on

types of queries end-user may ask.

Compare representation - three-field

relational table versus two-

dimensional matrix.

(12)

Multi-dimensional Data as

Three-field table versus Two-

dimensional Matrix

(13)

Representation of Multi-

dimensional Data

Example of three-dimensional

query.

‘What is the total revenue generated by property sales for each type of property (Flat or House) in each city, in each

quarter of 2004?’

Compare representation - four-

field relational table versus

three-dimensional cube.

(14)

Multi-dimensional Data as

Four-field Table versus Three-

dimensional Cube

(15)

Representation of Multi-

dimensional Data

A sample data cube

(16)

Representation of Multi-

dimensional Data

Cube represents data as cells in an array.

Relational table only represents multi- dimensional data in two dimensions.

Multi-dimensional structures are best visualized as cubes of data, and cubes

within cubes of data. Each side of a cube is a dimension.

A cube can be expanded to include other dimensions.

(17)

Representation of Multi-

dimensional Data

As number of dimensions increases,

number of the cube’s cells

increases exponentially.

However, majority of multi-

dimensional queries use

summarized, high-level data.

Solution is to pre-aggregate

(consolidate) all logical subtotals

and totals along all dimensions.

(18)

Representation of Multi-

dimensional Data

Pre-aggregation is valuable, as typical dimensions are hierarchical in nature.

(e.g. Time dimension hierarchy - years, quarters, months, weeks, and days)

Predefined hierarchy allows logical pre- aggregation and, conversely, allows for a logical ‘drill-down’.

(19)

Representation of Multi-

dimensional Data

Supports common analytical

operations

Consolidation

Drill-down

Slicing and dicing

(20)

Representation of Multi-

dimensional Data

Pivoting

select two dimensions used to aggregate a measure

e.g., the Sales for a Product in a particular City

Product City Date Sales

Juice NY 10 NY CA MN Cola CA 20 Juice 40 x x Juice NY 30 Cola 10 20 x Cola NY 10 Cream x x 10 Cream MN 10

(21)

Representation of Multi-

dimensional Data

Rollup

taking a current data and group-by further on another dimension

e.g., rollup the total sales on city by product

Drill-down

(22)

(23)

(24)

Representation of Multi-

dimensional Data

Slice_and_dice

reducing the dimensionality of data

the table (city, date) of a specific product

(25)

(26)

(27)

Representation of Multi-

dimensional Data

Consolidation - aggregation of data

such as simple ‘roll-ups’ or complex

expressions involving inter-related

data.

Drill-Down - is the reverse of

consolidation and involves displaying

the detailed data that comprises the

consolidated data.

(28)

Representation of Multi-

dimensional Data

Slicing and Dicing - (also called

pivoting) refers to the ability to

look at the data from different

viewpoints.

(29)

Representation of Multi-

dimensional Data

Can store data in a compressed form by dynamically selecting physical storage

organizations and compression techniques that maximize space utilization.

Dense data (that is, data that exists for a high percentage of cells) can be stored separately from sparse data (that is, a

significant percentage of cells are empty).

(30)

Representation of Multi-

dimensional Data

Ability to omit empty or repetitive

cells can greatly reduce the size of

the cube and the amount of

processing.

Allows analysis of exceptionally large

amounts of data.

(31)

Representation of Multi-

dimensional Data

In summary, pre-aggregation, dimensional hierarchy, and sparse data management

can significantly reduce the size of the

cube and the need to calculate values ‘on- the-fly’.

Removes need for multi-table joins and provides quick and direct access to arrays of data, thus significantly speeding up

execution of multi-dimensional queries.