• No results found

Data Warehousing OLAP

N/A
N/A
Protected

Academic year: 2021

Share "Data Warehousing OLAP"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Warehousing

OLAP

(2)

References

2

Wei Wang. A Brief MDX Tutorial Using Mondrian. School of

Computer Science & Engineering, University of New South Wales.

Toon Calders. Querying OLAP Cubes.

Wolf-Tilo Balke, Silviu Homoceanu. Data Warehousing & Data Mining, part 6: OLAP Operations & Queries.

(3)

Outline

3

OLAP Operations

Roll-up, Drill-down, Slice and dice, Pivot

OLAP Data Visualization

From OLAP Operations to the Data

MDX

Motivation

Basic Concepts

Advanced Operations

FILTER, ORDER, HEAD, TOPCOUNT, CROSSJOIN, NONEMPTY, WITH MEMBER

(4)

Data Warehouse Queries

4

DW queries are big queries

Imply a large portion of the data

Mostly read queries

Redundancy a necessity

Materialized views, special-purpose indexes, de-normalized schemas

Data is refreshed periodically

Daily or weekly

Their purpose is to analyze data

OLAP (OnLine Analytical Processing)

OLAP usage fields

Management Information

Sales per product group / area / year

Government

Population census

Scientific databases

Geo-, Bio-Informatics

Goal: Response time of seconds / few minutes

(5)

OLAP Operations

5

Roll-up

Drill-down

Slice and dice

Pivot (rotate)

(6)

Roll-up

6

Taking the current aggregation level of fact values and doing a further aggregation

Summarize data by

Climbing up hierarchy (hierarchical roll-up)

By dimensional reduction

Used for obtaining an increased generalization

E.g., from Time.Week to Time.Year

(7)

Hierarchical roll-ups

7

(8)

Dimensional roll-ups

8

(9)

Drill-down

9

(10)

Roll-up Drill-down Example

10

(11)

Slice

11

(12)

Slice

12

(13)

Dice

13

(14)

Dice

14

(15)

Pivot

15

(16)

Pivot

16

(17)

Pivot

17

(18)

Typical Analytical Requests

18

OLAP operations are hard to express in query languages

Most analysts and decision makers won’t enjoy it

SELECT f.region, z.month, sum(a.price * a.volume) FROM Order a, Time z, PoS f

WHERE a.pos = f.name AND a.date = z.date GROUP BY f.region, z.month

OLAP clients allow operations to be performed through GUIs

(19)

OLAP Data Visualization

How do these operations look like for the user?

2 dimensions … is trivial

E.g. Products by Store

19

(20)

20

3 dimensions: we can

visualize sold quantity on 3 dimensions as layers

Another way is by nesting on the same axis

OLAP Data Visualization

(21)

OLAP Data Visualization

21

OLAP reporting has to be very flexible

The IBM Infosphere - OLAP web based report

(22)

OLAP Data Visualization

Drill-down operation

Can be performed easy by going down on the hierarchy and choosing the granularity

22

(23)

OLAP Data Visualization

23

Trends Visualization

With the help of charts

(24)

From Presentation to Data

24

Client/server architecture

The client displays reports and allows interaction with the end user to perform the OLAP operations and other custom queries

The server is responsible for providing the requested data. How? It depends on whether it is MOLAP, ROLAP, HOLAP, etc.

(25)

OLAP Server

25

High-capacity, multi-user data manipulation engine specifically designed to support and operate on multidimensional data

structures

Optimized for fast, flexible calculation and transformation of raw data based on formulaic relationships

Either

Physically stage the processed multidimensional information to deliver consistent and rapid response times to end users (MOLAP)

Store data in relational databases and simulate multidimensionality with special schemas (ROLAP)

Or offer a choice of both (HOLAP)

(26)

Getting from OLAP Operations to the Data

26

(27)

Typical OLAP Queries

27

The idea is to

Select by Attributes of Dimensions

region = „Europe“

Group by Attributes of Dimensions

region, month, quarter

Aggregate on measures

sum(price * volume)

OLAP queries in SQL

SELECT d1.x, d2.y, d3.z, sum(f.t1), avg(f.t2) FROM Fact f, Dim1 d1, Dim2 d2, Dim3 d3 WHERE a < d1.field < b AND d2.field = c GROUP BY d1.x, d2.y, d3.z;

(28)

MDX - MultiDimensional eXpressions

28

Developed by Microsoft

Not really brilliant

But adopted by major OLAP providers due to Microsoft's market leader position

Used in specifications and an industry standards for multi- dimensional data processing

OLE DB for OLAP (ODBO) with API support

XML for Analysis (XMLA): specification of web services for OLAP

Supported by many data warehousing systems

MS SQL Server

SAS OLAP Server

drivers for MDX for Oracle OLAP

For ROLAP to support MDX, it is usually translated into SQL

(29)

Motivation

29

(30)

Pivot and Unpivot

30

(31)

MDX Syntax

31

Similar to SQL syntax

SELECT {Germany, Niedersachsen, Bayern, Frankfurt} ON COLUMNS, {Qtr1.CHILDREN, Qtr2, Qtr3} ON ROWS

FROM SalesCube

WHERE (Measures.Sales, Time.[2011], Products.[All Products]);

SELECT

Dimensions, on columns and rows

FROM

Data source cube specification

If joined, data cubes must share dimensions

WHERE

Slicer - restricts the data area

Specifies the measures to return

(32)

MDX Syntax

32

(33)

Example

33

(34)

Example: Cross Tabulation

34

(35)

Basic Concepts

35

(36)

Tuples

36

Tuple is a combination of members from one or more dimensions

When a tuple has more than one dimension, it has only one member from each dimension

([Customer].[Chicago, IL], [Time].[Jan, 2005], [Time].[Feb, 2005]) – not valid

Any dimensions can be part of a tuple, including measures

() is not a valid tuple

Tuple can not be composed from other tuples

([Time].[2004], ([Customer].[Chicago, IL], [Product].[Tools])) – not valid

In calculations and queries, cell identification is based on tuples

when a tuple is used in an expression where a number or string might be used, the default behavior is to reference the value in the cell that the tuple specifies

([Product].[Leather Jackets], [Time].[June-2005], [Store].[Fifth Avenue NYC], [Measures].[Dollar Sales]) may define a cell with a value of $13,000

(37)

Axis Specification

37

(38)

Axis Specification

38

(39)

Sets of Members

39

(40)

FoodMart

40

(41)

CHILDREN and MEMBERS

41

(42)

Slicer Dimension vs. Filter

42

(43)

ORDER (I)

43

(44)

ORDER (II) – Example of result

44

SELECT { [Measures].[Dollar Sales] } on columns, Order ( [Product].[Product Category].Members,

[Measures].[Dollar Sales], BDESC)) on rows FROM [Sales]

WHERE [Time].[2004]

(45)

HEAD

45

(46)

TOPCOUNT, TOPPERCENT, TOPSUM

46

10 products that have the highest Internet Sales Amount

SELECT {[Measures].[Internet Sales Amount]} ON COLUMNS,

TOPCOUNT([Product].[Product].Members ,10 , [Measures].[Internet Sales Amount]) ON ROWS FROM [Adventure Works]

Top 10 percent of Products with the highest Sales Amount

SELECT {[Measures].[Sales Amount]} ON COLUMNS,

TOPPERCENT([Product].[Product].[Product].Members, 10,[Measures].[Sales Amount]) ON ROWS FROM [Adventure Works]

A set of Products with the highest values whose cumulative Sales Amount total is greater than or equal to 10,000,000

SELECT {[Measures].[Sales Amount]} ON COLUMNS,

TOPSUM([Product].[Product].[Product].Members, 10000000, [Measures].[Sales Amount]) ON ROWS

FROM [Adventure Works]

(47)

CROSSJOIN

47

SELECT

CROSSJOIN (

{ [Time].[Q1, 2005], [Time].[Q2, 2005]},

{ [Measures].[Dollar Sales], [Measures].[Unit Sales] } ) ON COLUMNS ,

{ [Product].[Tools], [Product].[Toys] } ON ROWS FROM Sales

(48)

NON EMPTY (I)

48

SELECT [Time].[1997].CHILDREN ON COLUMNS,

CROSSJOIN([Store].[Store State].MEMBERS, [Product].[Product Family].MEMBERS) ON ROWS

FROM [Sales]

WHERE (Measures.[Profit])

Problem with the query: many members in the ROW axis is empty, hence many empty rows

It needs a simple filtering - removing empty members from the axis

Solution: NON EMPTY (CROSSJOIN(…) )

(49)

NON EMPTY (II)

49

SELECT

{ [Time].[Jan,2005], [Time].[Feb,2005] } ON COLUMNS , NON EMPTY

{ [Product].[Toys], [Product].[Toys].Children } ON ROWS FROM Sales

WHERE ([Measures].[Dollar Sales], [Customer].[TX])

(50)

[Advanced MDX].[Calculate Member]

50

References

Related documents

Whether grown as freestanding trees or wall- trained fans, established figs should be lightly pruned twice a year: once in spring to thin out old or damaged wood and to maintain

by mass spectrometry, by the shape of their spectral peak in in more detail in the Example Section. time-of-?ight mass spectrometry and by their binding char- 25 The biomarkers

„ A physically separate store of data transformed from the operational environment.. „ Operational update of data does not occur in the data warehouse

Keywords: Data Mining, Data Warehouse, Decision Support System(DSS), Desktop OLAP (DOLAP), Hybrid OLAP (HOLAP), Multidimensional Data, Multidimensional OLAP (MOLAP),

  Involves the creation of large, consolidated data repositories called data warehouses..   Warehouses exploited using sophisticated

California USA North West Oregon Washington South West. California USA North West Oregon Washington

3 Thermostat Operation Thermostat Operation 2 LCD Display Program Buttons System Button Temperature Setpoint Buttons.. See page 6 for details about this

The specificity of the approach that I suggest in this essay lies in the fact that the starting point of our analysis is neither the history of feminism nor the development of