• No results found

DATA WAREHOUSE DESIGN

N/A
N/A
Protected

Academic year: 2021

Share "DATA WAREHOUSE DESIGN"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

1

DATA WAREHOUSE DESIGN

DATA WAREHOUSE DESIGN

ICDE 2001 Tutorial

ICDE 2001 Tutorial

Stefano Rizzi, Matteo Golfarelli

DEIS - University of Bologna, Italy

Motivation

Motivation

Building a data warehouse for an enterprise is a huge and complex task, which requires an accurate planning aimed at devising satisfactory answers to organizational and architectural questions. Despite the pushing demand for working solutions coming from enterprises and the wide offer of advanced technologies from producers, few attempts towards devising a specific methodology for data warehouse design have been made. On the other hand, the statistic reports related to DW project failures state that a major cause lies in the absence of a global view of the design process: in other terms, in the absence of a design methodology.

Summary

Summary

Ë Introduction to Data Warehousing

Ë Conceptual design of Data Warehouses

Ë Workload-based logical design for ROLAP

(2)

3

Introduction

Introduction

to Data Warehousing

to Data Warehousing

Stefano Rizzi

Ë

Information systems are rooted in the relationship

between information, decision and control.

Ë

An IS should

collect

collect

and

classify

classify

the information, by

means of

integrated

integrated

and

suitable

suitable

procedures, in

order to produce

in time

in time

and at the

right levels

right levels

the

synthesis to be used to support the decisional

process, as well as to administrate and globally

control the enterprise activity.

Information Systems: profile and role

(3)

5 Manufacturing system Information system Information Finished product

Information as a resource

Information as a resource

Ë

Information is an increasing value resource,

required from managers to schedule and monitor

effectively the enterprise activities.

Ë

Information is the first matter which is transformed

by information systems like unfinished products are

transformed by manufacturing systems.

Amount

Value

Strategic directions

Reports

Selected information

Primary information sources

Value of information

Value of information

Ë

Information is an enterprise resource like capital, first

matters, plants and people; thus, it has a cost.

Ë

Hence, understanding the

value

of information is

important.

(4)

7

Different kinds of information systems

Different kinds of information systems

Sales Sales and and marketing marketing

Manufacturing

Manufacturing FinanceFinance AccountingAccounting HumanHuman resources resources Operational Operational Operational Operational managers managers

TPS

Knowledge Knowledge Knowledge Knowledge and and data

data workersworkers

OAS

KWS

Management Management Middle Middle managers managers

MIS

DSS

Strategic Strategic Senior Senior managers managers

ESS

The

The

Data Warehouse

Data Warehouse

phenomenon

phenomenon

Ë

Ë

Usual complaints:

Usual complaints:

We have tons of data but we cannot access

them!

How can people playing the same role

produce substantially different results?

We want to slice and dice data in any

possible way!

Show me only what is important!

Everyone knows some data are incorrect...

(R. Kimball, The Data Warehouse Toolkit)

(5)

9

Data Warehousing

Data Warehousing

Ì

A collection of technologies and tools supporting the

knowledge worker (executive, manager, analyst) in

analysing data aimed at decision making and at

improving the knowledge assets of the enterprise.

Data Warehouse

At the core of the architecture of modern information systems,

it is a data

repository

:

Á

Oriented to subjects

Á

Integrated and consistent

Á

Representing temporal evolution

Á

Non volatile

The data warehouse is regularly refreshed, permanently growing, The data warehouse is regularly refreshed, permanently growing, logically centralised and easily accessed by users, essentially read-only logically centralised and easily accessed by users, essentially read-only

External data Operational data (relational, legacy)

Reporting

Warehouse

Warehouse

Summary Summary data data Access Access

Data Warehouse

Data Warehouse

What-If analysis ETL tools

(6)

11

Data

Data

Marts

Marts

Data

Data

Warehouse

Warehouse

Data mart

Client Client management management Geographical Geographical regions regions Supplier Supplier management management Marketing Marketing Finance Finance

Replication and broadcasting

Subject

Subject

vs

vs

Process

Process

reservations charge Medical reports admissions

Emphasis on applications

patient region consumption

Emphasis on subjects

(7)

13

Integration and consistency

Integration and consistency

DB

DW

External

data

Text files

Schema Integration Extraction Transformation Cleaning Validation Filtering Loading wrappers mediators loaders

Temporal evolution

Temporal evolution

OLTP

DW

Restricted historical content,

Often time is not included in keys,

Data are updated

Rich historical content, Time is included in keys, Snapshots cannot be updated

(8)

15

Non-volatility

Non-volatility

OLTP

insert delete update

DW

load

Huge data volumes: from 20 GBs to some TBs

in a few years

Ë In a DW, no advanced techniques for transaction management are required (differently from OLTP systems)

Ë Key issues are the query throughput and the resilience

acce ss

DW

DW

vs

vs

. OLTP

. OLTP

• 90%

ad hoc

queries

• Mostly read access

• Hundreds users

• Denormalised

• Supports historical

versions

• Optimised for accesses

involving most

database

• Based on summary

data

• 90% predefined

transactions

• Read/write access

• Thousands users

• Normalised

• Does not support historical

versions

• Optimised for accesses

involving a small database

fraction

(9)

17

ROLAP (Relational OLAP)

ROLAP (Relational OLAP)

Ë Intermediate level server between a relational back- end server and the front-end client

Ë Specialised middleware

Ë Generation of SQL multi-statements for the back-end server

Ë Query scheduling

MOLAP (Multidimensional OLAP)

MOLAP (Multidimensional OLAP)

Ë Direct support of multi-dimensional views

Ë Special data structures (e.g., multi-dimensional arrays)

Ë Compression techniques

Ë Intelligent disk/memory caching

Ë Pre-computation

Ë Complex analysis

The technological progress

The technological progress

data data knowledge knowledge 1970 1980 1990 2000 Statistics

Statistics && reporting reporting Data Data Warehousing Warehousing OLAP OLAP Data Data Mining Mining Pattern Pattern Warehousing Warehousing Refinement Source: Information Discovery

(10)

19

The Data

The Data

Warehouse

Warehouse

Market

Market

0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 3 0 0 0 3 5 0 0 4 0 0 0 4 5 0 0 1 9 9 8 1 9 9 9 2 0 0 0 2 0 0 1 2 0 0 2 RDBMS OLAP 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 1 9 9 8 1 9 9 9 2 0 0 0 2 0 0 1 2 0 0 2 Data Marts ETL Data Quality Metadata

Source: Shilakes, Tylman -Enterprise Information Portals

The DW life-cycle

The DW life-cycle

Objective definition and

planning Clearly determine the scopes,define the borders, estimate dimensions, choose the approach to design, evaluate the benefits

Infrastructure design Choose the technologies and the tools, analyse the architectural solutions, solve the management problems

Design and implementation

of applications Add iteratively new data marts and applications to the warehouse

(11)

21

Bibliography

Bibliography

Ë R. Barquin, S. Edelstein. Planning and Designing the Data Warehouse. Prentice Hall (1996).

Ë S. Chaudhuri, U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record 26,1 (1997).

Ë G. Colliat. OLAP, relational and multidimensional database systems. SIGMOD Record 25, 3 (1996).

Ë M. Demarest. The politics of data warehousing.

Http://www.hevanet.com/demarest/marc/dwpol.html

Ë U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. Data mining and knowledge discovery in databases: an overview. Comm. of the ACM 39, 11 (1996).

Ë W.H. Inmon. Building the data warehouse. John Wiley & Sons (1996).

Ë S. Kelly. Data Warehousing in Action. John Wiley & Sons (1997).

Ë R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996).

Ë R. Kimball, L. Reeves, M. Ross, W. Thornthwaite. The data Warehouse Lifecycle Toolkit. John Wiley & Sons (1998).

Ë C. Shilakes, J. Tylman. Enterprise Information Portals.

Http://www.sagemaker.com/company/downloads/eip/indepth.pdf

Ë P. Vassiliadis. Gulliver inthe land of data warehousing: practical experiences and observations of a researcher. Proc. DMDW’2000 (2000).

Ë J. Widom. Research Problems in Data Warehousing. Proc. CIKM (1995).

Conceptual modelling

Conceptual modelling

for Data Warehousing

for Data Warehousing

Stefano Rizzi

(12)

23

Why a new conceptual model?

Why a new conceptual model?

Ë

While it is universally recognised that a DW leans on a

multidimensional model, there is no agreement on the

approach to conceptual modelling.

Ë

On the other hand, an accurate conceptual design is

the necessary foundation for building a “good”

information system.

Ë

The Entity/Relationship model is widespread in the

enterprises, but….

"Entity relation data models [...] cannot be understood

by users and they cannot be navigated usefully by DBMS

software. Entity relation models cannot be used as the

basis for enterprise data warehouses.”

(Kimball, 96)

Sales

Sales

Store Store

Product

Product TimeTime

The multidimensional data model

The multidimensional data model

Number of Coke

cans sold at BIGSTORES in London on 10/10/99

Number of Pepsi cans sold at all BIGSTORES on 10/10/99

Number of Fanta cans globally sold

(13)

25

Basic

Basic

terminology

terminology

Ë

Fact (cube, target). It is a focus of interest for the

decision-making process; typically, it models an event occurring in the enterprise world (sales, shipments, purchases). It is essential for a fact to have some dynamic aspects, i.e., to evolve somehow across time.

Ë

Measures

(attributes, variables, metrics, properties). They are continuously valued (typically numerical) attributes which describe a fact from different points of view. For instance, each sale is measured by its revenue.

Ë

Dimensions. They are discrete attributes which determine the

minimum granularity adopted to represent facts. Typical dimensions for the sale fact are product, store and date.

Ë

Hierarchies

(dimensions). They contain

dimension

attributes

(levels, parameters) connected in a tree-like structure by many-to-one relationships (functional dependencies).

DW

DW

modelling

modelling

in the

in the

literature

literature

Gyssens, Lakshmanan 97 Agrawal et al. 95 Li, Wang 96 Cabibbo, Torlone 98 Datta, Thomas 97 Vassiliadis 98 Tryfona et al. 99 Hüsemann et al. 00 Sapia et al. 98 Franconi, Sattler 99 Golfarelli et al. 98

(14)

27

LOGICAL

LOGICAL

CONCEPTUAL

CONCEPTUAL

DW

DW

modelling

modelling

in the

in the

literature

literature

Gyssens, Lakshmanan 97 Agrawal et al. 95 Li, Wang 96 Cabibbo, Torlone 98 Datta, Thomas 97 Vassiliadis 98 Tryfona et al. 99 Hüsemann et al. 00 Sapia et al. 98 Franconi, Sattler 99 Golfarelli et al. 98

GRAPHICAL

GRAPHICAL

FORMAL

FORMAL

DW

DW

modelling

modelling

in the

in the

literature

literature

Gyssens, Lakshmanan 97 Agrawal et al. 95 Li, Wang 96 Cabibbo, Torlone 98 Datta, Thomas 97 Vassiliadis 98 Tryfona et al. 99 Hüsemann et al. 00 Sapia et al. 98 Franconi, Sattler 99 Golfarelli et al. 98

(15)

29

ALGEBRA

ALGEBRA

DW

DW

modelling

modelling

in the

in the

literature

literature

Gyssens, Lakshmanan 97 Agrawal et al. 95 Li, Wang 96 Cabibbo, Torlone 98 Datta, Thomas 97 Vassiliadis 98 Tryfona et al. 99 Hüsemann et al. 00 Sapia et al. 98 Franconi, Sattler 99 Golfarelli et al. 98 DESIGN DESIGN

DW

DW

modelling

modelling

in the

in the

literature

literature

Gyssens, Lakshmanan 97 Agrawal et al. 95 Li, Wang 96 Cabibbo, Torlone 98 Datta, Thomas 97 Vassiliadis 98 Tryfona et al. 99 Hüsemann et al. 00 Sapia et al. 98 Franconi, Sattler 99 Golfarelli et al. 98

(16)

31

Conceptual models

Conceptual models

Ë

Sapia, Blaschka, Höfling, Dinter (1998)

dimension level roll-up relationship fact relationship attribute

Conceptual models (2)

Conceptual models (2)

Ë

Franconi, Sattler (1999)

dimension target property level aggregated entity

(17)

33

Conceptual models (3)

Conceptual models (3)

Ë

Hüsemann, Lechtenbörger, Vossen (2000)

dimension

dimension level

measure property attribute

optional property attribute

optional

aggregation path fact

The Dimensional Fact Model

The Dimensional Fact Model

The Dimensional Fact ModelDimensional Fact Model (DFM) is a graphical conceptual model for DWs, aimed to:

Á Effectively support conceptual design;

Á Provide an environment where user queries can be formulated intuitively;

Á Enable communication between the designer and the final user in order to refine requirement specification;

Á Supply a stable platform for logical design;

Á Provide an expressive and non-ambiguous documentation. The DFM is independent of the target logical model

(18)

35

Ë

Three levels of conceptual documentation are provided:

Á Fact scheme: represents a fact of interest and the associated

measures, dimensions and hierarchies.

Á Data Mart scheme: summarizes the fact schemes which constitute each data mart and emphasize the feasible connections between them.

Á Data Warehouse scheme: shows the different data marts emphasizing their overlaps, the different profiles of the users accessing them, and the operational sources which feed them.

The Dimensional Fact Model (2)

The Dimensional Fact Model (2)

Ë

Each documentation level is integrated by

glossaries

which explain the names adopted within the schemes,

define a connection between the DW data and the

operational sources, express data volumes.

Ë

Data mart schemes are associated to the

workload

specification.

hierarchy

Fact schemes

Fact schemes

A fact expresses a many-to-many relationship between its dimensions

state SALE category type quarter month store store city county sales manager year sale district date holidayday of week

marketing group department brand qty sold revenue unit price no. of customers brand city product week dimension attribute measure fact dimension

(19)

37 address non-dimension attribute phone manager diet manager promotion price reduction cost end date begin date ad type optionality state SALE category type quarter month store store city county sales manager year sale district date holidayday of week

marketing group department brand qty sold revenue unit price no. of customers brand city product week

Fact schemes (2)

Fact schemes (2)

Á A non-dimension attribute contains additional information about a dimension attribute, and is typically connected to it by a one-to-one relationship.

It cannot be used for aggregation.

Á Some links between attributes can be optional.

Fact schemes (3)

Fact schemes (3)

Á Convergence Á Cross-dimension attributes Á Additivity, non-additivity, non-aggregability Á Overlap begin date end date store state diet marketing group brand city store county store city SALE product qty sold revenue unit price no. of customers category type department brand store promotion ad type price reduction fiscal week fiscal quarter fiscal month fiscal year date week day of week quarter month year manager sale district phone address V.A.T. non-aggregability cross-dimension attribute convergence

(20)

39

The SHIPMENTS fact scheme

The SHIPMENTS fact scheme

marketing group brand city store state store city warehouse state warehouse city SHIPMENT TO STORES product qty shipped shipping cost category type warehouse department brand store mode type carrier fiscal week fiscal quarter fiscal month fiscal year date week day of week quarter month year

FACT SCHEME: SHIPMENT TO STORES

The INVENTORY fact scheme

The INVENTORY fact scheme

marketing group brand city warehouse nation warehouse city INVENTORY product level category type warehouse department brand fiscal week fiscal quarter fiscal month fiscal year date week day of week quarter month year

FACT SCHEME: INVENTORY

units per pallet package type package size weight

AVG, MIN

(21)

41

The

The

supply chain

supply chain

date product factory MANUFACTURING date component factory COMPONENT INVENTORY date product factory package type PACKAGING date product warehouse WAREHOUSE INVENTORY PRODUCTION OF COMPONENTS date component factory date component to factory COMPONENT DELIVERY from factory date product factory warehouse SHIPMENT TO WAREHOUSE mode date product store SALES promotion date product store warehouse SHIPMENT TO STORES mode

Glossaries

Glossaries

name description domain card. query product products 5000 brand brands 800 brand city Where brands are manufactured cities 50 type (pasta, soft drink, …) pr. types 200 category (food, clothing, music,…) pr. categories 10 department Deps. managing categories deps. 5 marketing group Responsible for product types groups 20

select prodName,brandName, cityName,…

from PRODUCTS P,BRANDS B, CITIES C,…

where P.brandId = B.brandId

and B.cityId = C.cityId and . . . . . .

stores stores 100 store city cities 80 store state states 5

select storeName,cityName, stateName from STORES S,CITIES C

where S.cityId = C.cityId

... ... ... ... . . . .

ATTRIBUTE GLOSSARY: SHIPMENT TO STORES

name description type query qty shipped Quantity of each product being

shipped

INTEGER select SUM(PS.qty)

from PRODUCTS P,SHIP S,PRODSHIP PS,…

where P.prodId = PS.prodId and PS.shipId = S.shipId and . . . . group by P.prodId,S.date, . . .

shipping cost Cost of the shipment MONEY . . . . MEASURE GLOSSARY: SHIPMENT TO STORES (sparsity = 0.01)

(22)

43

Data mart schemes

Data mart schemes

Ë

The data mart scheme is used to summarize the fact

schemes which constitute the data mart and to show

drill-across connections between them.

Ë

It is a graph whose nodes are elemental and

overlapped fact schemes; the arcs are directed to

each overlapped scheme from its component

schemes, which in turn may be overlapped.

MANUFACTURING COMPONENT INVENTORY PACKAGING WAREHOUSE INVENTORY PRODUCTION OF COMPONENTS COMPONENT DELIVERY SHIPMENT TO WAREHOUSE SALE SHIPMENT TO STORES

DATA MART SCHEME: SUPPLY CHAIN

PRODUCTION AND DELIVERY DELIVERY AND INVENTORY MANUFACTURING AND PACKAGING SHIPMENT AND SALE DISTRIBUTION CYCLE PRODUCT CYCLE

The workload

The workload

Ë

In principle, the workload for a data mart is dynamic

and unpredictable.

Ë

In some commercial tools, the actual workload is

monitored while the DW is operating and the logical

and physical schemes are dynamically tuned.

Ë

We claim that a core workload can, and should, be

determined a priori:

Á The user typically knows in advance which kind of data analysis (s)he will carry out more often for decisional or statistical purposes;

Á A substantial amount of queries are aimed at extracting summary data to fill standard reports.

(23)

45

The workload (2)

The workload (2)

marketing group brand city store state store city warehouse state warehouse city SHIPMENT TO STORES product qty shipped shipping cost category type warehouse department brand store mode type carrier fiscal week fiscal quarter fiscal month fiscal year date week day of week quarter month year

FACT SCHEME: SHIPMENT TO STORES

Data warehouse schemes

Data warehouse schemes

Ë

At the highest abstraction level, the data warehouse

scheme shows the different data marts emphasizing

the fact schemes duplicated on two or more of them,

the different profiles of the users accessing them, and

the operational sources which feed them.

DEMAND CHAIN SUPPLY CHAIN SALES RENOVATION personnel manager administrative manager sale buyer incentives personnel database purchases restoration works PERSONNEL SALES product orders data mart user fact scheme operational db file transfer manual input

(24)

47

Stefano Rizzi

Conceptual design

Conceptual design

of Data Warehouses

of Data Warehouses

Designing the DW

Designing the DW

²

Within a successful approach to DW design, top-down

and bottom-up strategies should be mixed.

Á When planning a DW, a bottom-up approach should be followed.

Á One data mart at a time is identified and prototyped.

Á Each data mart is designed in a top-down fashion by building a conceptual scheme for each fact of interest.

(25)

49

Data Mart prototyping

Data Mart prototyping

Prototype first the data mart which:

Ë plays the most strategic role for the enterprise;

Ë can convince the final users of the potential benefits;

Ë leans on available and consistent data sources.

DM1 Source 1 DM2 DM3 Source 2 DM4 DM5 Source 3

Reference architecture

Reference architecture

Reconciled data heterogeneous operational dbs DW

Problem of designing

the reconciled data

(

integration of

(26)

51

Methodological framework

Methodological framework

analysis of the

operational db

requirement

specification

conceptual

design

workload

refinement

logical

design

physical

design

final user

designer

db administrator

DWs are based on a pre-existing

information system

Methodological framework

Methodological framework

(2)

(2)

Logical Logical Scheme Scheme LOGICAL DESIGN Workload Target logical model Physical Physical Scheme Scheme PHYSICAL DESIGN Workload Target DBMS E/R E/R Scheme Scheme

chiave negozio negozio cittàregione indirizzo resp. vendite

N1 ….….….…………… N2

chiave tempochiave negozio chiave_prodottoquant venduta incasso num_clienti T1 N1 P1 101000000 2 T1 N1 P2 8 1200000 8 T1 N2 P5 151500000 5 … ….. …… ……. Relational Relational Scheme Scheme Conceptual Conceptual Scheme Scheme CONCEPTUAL DESIGN Facts Preliminary workload

(27)

53

Conceptual design of the data mart

Conceptual design of the data mart

Ë

Design is based on the documentation of the

underlying operational information system:

Á E/R schemes

Á Relational schemes

Golfarelli, Maio, Rizzi 98; Cabibbo, Torlone 98;

Moody, Kortink 00; Hüsemann, Lechtenbörger, Vossen 00

Ë

Steps:

Á Find facts

Á For each fact:

• Navigate functional dependencies • Drop useless attributes

• Define dimensions and measures

Finding facts

Finding facts

Á Within an E/R scheme, a fact is represented by either an entity F or an n-ary relationship between entities E1...En

Á Within a relational scheme, a fact is represented by a relation F.

The entities and relationships representing frequently

updated archives are good candidates to define facts;

those representing nearly-static archives are not.

(28)

55

Navigating functional dependencies

Navigating functional dependencies

Á Build a tree in which each vertex corresponds to an attribute of the scheme;

Á The root corresponds to the identifier (key) of F;

Á For each vertex v, the corresponding attribute

functionally determines all the attributes corresponding to the descendants of v.

Example (from the E/R scheme):

Example (from the E/R scheme):

TYPE PRODUCT CATEGORY STORE CITY (1,1) (0,N) (1,1) (1,N) (1,1) (0,N) (1,1) (1,N) (0,N) date qty unit price PURCHASE TICKET (1,N) type category product sales manager

ticket number store city of sale of in in weight diet (0,1) address MARKETING GROUP (1,1) (1,N) marketing group for manager DEPARTM. (1,1) (1,N) department for manager phone COUNTY (1,1) (1,N) county of STATE (1,1) (1,N) state of BRAND brand (1,1) (1,N) (1,1) (1,N) of WAREHOUSE (1,N) (1,N) from warehouse SALE DISTRICT district no. (1,1) in(1,N) (1,1) (1,N) of address produced in size

(29)

57 district no unit price qty ticket number date store sales manager city state product brand type category address diet city weight dept. manager mark. grp. manager phone county district no+state size city state county sale

Example (from the E/R scheme):

Example (from the E/R scheme):

Dropping useless attributes

Dropping useless attributes

Ë

Some attributes in the tree may be uninteresting for

the DW. In order to drop useless levels of detail, it is

possible to apply the following operators:

Á

Á PruningPruning: delete a vertex and its subtree.

ticket number date store sales manager city state address ticket number date store sales manager address address date store sales manager Á

Á GraftingGrafting: delete a vertex and move its subtree. It is useful when an attribute is not interesting but the attributes it determines must be preserved.

(30)

59

Defining dimensions

Defining dimensions

Ë

The choice of dimensions determines the fact

granularity

granularity

.

Ë

Dimensions must be chosen among the root children

in the attribute tree.

Ë

Time should always be a dimension.

unit price qty date store sales manager city state product brand type category address diet city weight dept. manager mark. grp. manager phone district no+state city county sale

Defining measures

Defining measures

Ë Measures must be chosen among the children of the root.

Ë Typically, measures are computed either by counting the number of instances of F, or by summing (averaging, …) expressions which involve numerical attributes.

Ë An attribute cannot be both a measure and a dimension.

Ë A fact may have no measures.

unit price qty date store sales manager city state product brand type category address diet city weight dept. manager mark. grp. manager phone district no+state city county sale

(31)

61

Granularity

Granularity

Ë

Defining the

granularity

of data is a primary issue in

determining performance. Granularity depends on the

queries users are interested in, and represents a

trade-off between query response time and detail of

information to be stored.

Á It may be worth adopting a finer granularity than that required by users, provided that this does not slow down the system too much.

Á Constrained by the maximum time frame for loading.

Ë

Choosing granularity includes defining the refresh

interval.

Á Issues to be considered:

• Availability of operational data • Workload characteristics

• The total time period to be analysed

W

WAN

AN

D

D

a CASE

a CASE

tool for

tool for

data

data

warehouse

warehouse

design

design

Ë

A design methodology is almost useless, if no CASE tool to

support it is provided.

Á Acquire the relational db scheme via ODBC

Á Carry out conceptual design

Á Define the workload

Á Calculate data volume

Á Carry out logical design

(32)

63

Bibliography

Bibliography

(1)

(1)

Ë K. Aberer, K. Hemm. A methodology for building a data warehouse in a scientific environment. Proc. 1st Int. Conf. on Cooperative Inf. Systems, Brussels (1996).

Ë R. Agrawal, A. Gupta, S. Sarawagi Modeling multidimensional

databases. IBM Research Report, IBM Almaden Research Center (1995).

Ë M. Blaschka et al. Finding your way through multidimensional data models. Proc. DEXA’98 (1998).

Ë L. Cabibb o , R. Torlone. A logical approach to multidimensional databases. EDBT 98 (1998).

Ë A. Datta, H. Thomas. A conceptual model and algebra for on-line analytical processing in data warehouses. Proc. WITS’97 (1997).

Ë E. Franconi, U. Sattler. A data warehouse conceptual model for

multidimensional aggregation. Proc. DMDW’99 (1999).

Ë M. Golfarelli , D. Maio, S. Rizzi The Dimensional Fact Model: a conceptual model for data warehouses. Int. Jour. of Cooperative Inf. Systems 7, 2&3 (1998).

Ë M. Golfarelli, S. Rizzi. Designing the data warehouse: key steps and crucial issues. Jour. of Computer Science and Information Management 2, 3 (1999).

Bibliography

Bibliography

(2)

(2)

Ë M. Gyssens, L.V.S. Lakshmanan. A foundation for multi-dimensional databases. Proc. 23rd VLDB, Athens, Greece (1997).

Ë B. Hüsemann , J. Lechtenbörger, G. Vossen. Conceptual data

warehouse design. Proc. DMDW’00 (2000).

Ë R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996). Ë D. Moody, M. Kortink. From enterprise models to dimensional models:

a methodology for data warehouse and data mart design. Proc. DMDW’00 (2000).

Ë T. Bach Pedersen, C. Jensen. Multidimensional data modelling for complex data. Proc. 15th ICDE, Sydney (1999).

Ë C. Sapia et al. Extending the E/R model for the multidimensional paradigm. Proc. ER’98 (1998).

Ë N. Tryfona, F. Busborg, J. Christiansen. starER: A Conceptual Model for Data Warehouse Design. Proc. DOLAP’99 (1999).

Ë P. Vassiliadis. Modeling multidimensional databases, cubes and cube operations. Proc. 10th SSDBM Conf., Capri, Italy (1998).

References

Related documents

Pedestrians usually prefer to choose a quick way to get to their destination rather than choosing safety way. These factors will bring up several problems and dangerous situation

A user granted access to individual disaster assistance information cannot access data from a public assistance data mart unless the public assistance source

Phonetic analysis of unsupervised state-based interpolation between Regional Standard Austrian German (RSAG) and Bad Goisern dialect (GOI)..

Often, Spain sought to close these loopholes after each new treaty with Britain, However, constant shifting in Spanish governance and diplomacy often left an unclear foreign

The above Proposition points out that the probability of facing a tough authority plays a key role in setting the ranking between the covenant equilibrium and the late monopoly

TIEC Entrepreneurs SMEs MNCs Universities incubators VCs Angel Networks policies Education Regulations Infrastructure.. TIEC's main role and objective is to act as a catalyst and a

Move your right foot about one step forward, bend your legs and raise your hands up to the chest area, right hand in front of left.. This posture is performed facing to

Historically, the nation-state building processes were an important phase in the period of transition from the governmental traditions in Europe and the establishment of a