• No results found

Data Warehouse: Introduction

N/A
N/A
Protected

Academic year: 2021

Share "Data Warehouse: Introduction"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

INTRODUCTION - 1

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG Elena Baralis Politecnico di Torino

Data warehouse

Introduction

Elena Baralis Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

INTRODUCTION - 2

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Decision support systems

• Huge operational databases are available in most companies

Öthese databases may provide a large wealthof useful information

• Decision support systems provide means for

Öin depth analysis of a company’s business

Öfasterand betterdecisions

INTRODUCTION - 3

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Strategic decision support

• Demand evolution analysis and forecast

• Critical business areas identification

• Budgeting and management transparency

– reporting, practices against frauds and

money laundering

• Identification and implementation of winning

strategies

Ö

cost reduction and profit increase

INTRODUCTION - 4

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Business Intelligence

• BI provides support to strategic decision

support in companies

• Objective: transforming company data into

actionable information

– at different detail levels – for analysis applications

• Users may have heterogeneous needs

• BI requires an appropriate hardware and

software infrastructure

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Applications

• Manifacturing companies: order management, client support

• Distribution: user profile, stock management • Financial services: buyer behavior (credit cards) • Insurance: claim analysis, fraud detection • Telecommunication: call analysis, churning, fraud

detection

• Public service: usage analysis

• Health: service analysis and evaluation • ... and many more...

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Example

Bank clients with a loan

x: “bad” clients owing periodic payments to the bank after due date

o: “good” clients respecting periodic paymentdue date

Loan Amount x o x x x x x x x x x o o oo o o o o o o o o o x Income

(2)

INTRODUCTION - 7

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Example

If Income < k€ then “bad” client Loan Amount Income x o x x x x x x x x x o o oo o o o o o o o o o x k INTRODUCTION - 8

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG Elena Baralis Politecnico di Torino

Example

Loan Amount Income x o x x x x x x x x x o o oo o o o o o o o o o x INTRODUCTION - 9

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Data management

• Traditional DBMS usage, characterized by – detailed data, relational representation – snapshot of current data state

– well-known, structured and repetitive operations – read/write access to few records

– short transactions

– isolation, reliability and integrity (ACID) are critical – database size≈100MB-GB

INTRODUCTION - 10

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Data analysis

• Data processing for decision support,

characterized by

– “historical” data

– consolidated and integrated data – ad hoc applications

– read access to millions of record – complex queries

– data consistency before and after periodical loads – database size≈100GB-TB

INTRODUCTION - 11

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Data warehouse

• Database devoted to decision support, which is kept separatefrom company operational databases • Data which is

– integrated

– time dependent, non volatile – devoted to a specific subject

used for decision support in a company W. H. Inmon, Building the data warehouse, 1992

INTRODUCTION - 12

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Why separate data?

• Performance

– complex queries reduce performance of operational transaction management

– different access methods at the physical level • Data management

– missing information (e.g., history) – data consolidation

(3)

INTRODUCTION - 13

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Data warehouse: architecture

(External) data sources Data Analysis DW management Metadata Back-end

tools Analysistools

Data warehouse

Data marts OLAP servers

INTRODUCTION - 14

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Data warehouse and data mart

Company data warehouse:it contains allthe information on the company business

– extensive functional modelling process – design and implementation require a long time

Data mart:departimental information subset focused on a given subject

– two architectures

• dependent, fed by the company data warehouse • independent, fed directly by the sources

– faster implementation

– requires careful design, to avoid subsequent data mart integration problems

INTRODUCTION - 15

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Back-end tools

• Feed the data warehouse

(ETL = Extraction Transformation Loading) – data extraction from data sources

– data cleaning (errors, missing or duplicated data)

– format trasformations and conversions – data loading and periodical refresh

INTRODUCTION - 16

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Multidimensional representation

• Data are represented as an (hyper)cube with three or more dimensions

• Measures on which analysis is performed: cells at dimension intersection

• Data warehouse for tracking sales in a supermarket chain:

– dimensions: product, shop, time – measures: sold quantity, sold amount, ...

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Multidimensional representation

date product shop 2-3-2000 SupShop MilkTTT 3

From Golfarelli, Rizzi,”Data warehouse, teoria e pratica della progettazione”, McGraw Hill 2006

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Data analysis tools

• OLAP analysis: complex aggregate function computation

– support to different types of aggregate functions (e.g., moving average, top ten)

• Data analysis by means of data mining techniques – various analysis types

(4)

INTRODUCTION - 19

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

• Presentation

– separate activity: data returned by a query may be rendered by means of different presentation tools

• Motivation search

– Data exploration by means of progressive, “incremental” refinements (e.g., drill down)

Data analysis tools

INTRODUCTION - 20

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

OLAP analysis

• Slicing and dicing

• Aggregation

date prod uct sh op SupShop date prod uct sh op year=2000 city=‘Turin’' category=‘food products'

From Golfarelli, Rizzi,”Data warehouse, teoria e pratica della progettazione”, McGraw Hill 2006

date product shop category year city INTRODUCTION - 21

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Types of data mining activities

Classification and regression:predictive model generation

– requires a previously labeled data set

Association rules:extraction of data correlations

Clustering:data partioned in “homogeneous” groups – requires the notion of distance between two

elements

INTRODUCTION - 22

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Example: classification

Age Car type Risk category

40 SW low 65 sport high 20 utility high 25 sport high 50 utility low Age < 26 high high low

Car type = sport

Decision tree

INTRODUCTION - 23

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Example: association rules

• Given a collection of counter transactions in a supermarket (receipts)

• Association rules

diapers⇒beer – 2% of transactions contains both elements

– 30% of transactions containing diapers also contains beer

INTRODUCTION - 24

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Servers for Data Warehouses

• ROLAP (Relational OLAP) server – extended relational DBMS

• compact representation for sparse data

– SQL extensions for aggregate computation

– specialized access methods which implement efficient OLAP data access

• MOLAP (Multidimensional OLAP) server

– data represented in proprietary (multidimensional) matrix format

• sparse data require compression

– special OLAP primitives • HOLAP (Hybrid OLAP) server

(5)

INTRODUCTION - 25

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Relational representation:

star model

• (Numerical) measures stored in thefact table

– attribute domain is numeric

Dimensionsdescribe the context of each measure in the fact table

– characterized by many descriptive attributes • Example: Data warehouse for tracking sales in a

supermarket chain

Shop Sale Product

Date

INTRODUCTION - 26

Copyright – All rights reserved

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Data warehouse size

• Time dimension: 2 years x 365 days • Shop dimension: 300 shops

• Product dimension: 30.000 products, of which 3.000 sold every day in every shop

• Number of rows in the fact table: 730 x 300 x 3000 = 657 millions

ÖSize of the fact table ≈21GB

INTRODUCTION - 27

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Meta data

• Different types of meta data:

– for data transformation and loading:

• describe data sources and needed transformation operations

– for data management:

• describe the structure of the data in the data warehouse (also for materialized view) – for query management:

• data on query structure and execution

– SQL code for the query – execution plan – memory and CPU usage

INTRODUCTION - 28

Copyright – All rights reserved

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Elena Baralis Politecnico di Torino

Textbooks

• Data warehousing

Golfarelli, Rizzi, Data warehouse: teoria e pratica della progettazione, McGraw-Hill 2006

Kimbal et al., textbooks on metodology and case studies, Wiley

• Data mining

Han, Kamber, Data mining: concepts and techniques, Morgan Kaufmann 2006

Tan, Steinbach, Kumar, Introduction to data mining, Pearson 2006

Database and data mining group, Politecnico di Torino

DataBase and Data Minin g Gro up of Politecnico di Torino

DBMG

Useful links

• Data warehouse http://www.dwinfocenter.org http://www.dwreview.com http://kimballuniversity.com • Data mining http://www.kdnuggets.com/

References

Related documents

Shot peening was applied to TRIP 780 steel specimens (2 mm thick, 15% of initial residual austenite) by a Wheelabrator turbine machine for different peening conditions using steel

The Research Institute for the Languages of Finland (RILF) 1 will publish − probably in 2010 − in digital and book form a list of standardized Finnish exonyms, or more exactly a

ISSN 1337-0960 (online) SOMATIC CELL COUNT IN MILK OF INDIVIDUAL LACAUNE EWES UNDER PRACTICAL CONDITIONS IN SLOVAKIA: POSSIBLE EFFECT ON MILK YIELD.. AND

Note how these ruins sit directly on top of outcrop 1 (see details in this figure). b) Detail of the lowermost part of outcrop 1 showing pedogenized floodplain mud (VGU C) capped

C.9 Post Implementation Dynamic Power Consumption per entities in Programmable Logic with a clock frequency of 80 MHz and integer 32 PEs.. LIX C.10 Post Implementation Dynamic

The seven layers of the ISO OSI reference model, shown in Figure 3.3 are: application layer, presentation layer, session layer, transport layer, network layer, data link layer

Entonces, yo sencillamente me conozco todos los toques del complejo rítmico de la rumba y estos como forma, como punto de partida, como raíz la uso hasta en formas

Whilst reasonable care has been taken to ensure accuracy of the information presented in the research, LOLC Securities Limited does not give a guarantee on the accuracy of