• No results found

Technology. Data Warehouse Design. Part 06. Markku Suni

N/A
N/A
Protected

Academic year: 2021

Share "Technology. Data Warehouse Design. Part 06. Markku Suni"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Data

Warehouse Design

Part 06

Technology

Data Warehouse Design

Part 06

(2)

Data

Warehouse Design

Part 06

(3)

Data

Warehouse Design

Part 06

Technology

Data Warehouse Design

Goal:

To introduce the students to the idea of data warehouse

To show how it can be designed To give an example

To point some of the difficulties and challenges involved

(4)

Data

Warehouse Design

Part 06

Technology

Data Warehouse Design – Part 06

Contents:

Metadata

(5)

Data

Warehouse Part 06

Technology

Metadata

What you wanted to know about metadata

but never dared to ask

What do we know:

We know Metadata

Rhymes with Petadata

(6)

Data Warehouse Part 06 Technology

Metadata

Meta means

”on higher level”

”definitions of ”

”description of ”

(7)

Data Warehouse Part 06 Technology

Metadata

A database system has metadata in system tables.

(8)

Data

Warehouse Part 06

Technology

Metadata

E.F. Codd’s 12 rules . . .

Rule 1: The Information Rule

All data should be presented to the user in table form.

Rule 4: Dynamic On-Line Catalog Based on the Relational Model.

A relational database must provide access to its structure through the same tools that are used to access the data. This is usually accomplished by

storing the structure definition within special system tables.

(9)

Data

Warehouse Part 06

(10)

Data

Warehouse Part 06

Metadata

The term metadata is a bit vague. It usually includes

• Description about the data in the DW

• type, format, name, description of each item It could include a lot more

(11)

Data

Warehouse Part 06

Metadata - could include

Information about source systems and the meaning of data from them

• Business rules to be applied when reading, transforming, joining data and calculating derived values

• Business rules used in reporting and analysis

• Quality assessment: how complete is any source, how well validated, how reliable

• Who is the owner of data

(12)

Data

Warehouse Part 06

Metadata

Metadata can and should be used in all the application of the DW

• The applications should be metadata-driven

• The amount of metadata can be very large.

(13)

Data

Warehouse Part 06

Technology

There’s always a time for discussion

A time to be sowing

A time to be growing

A time to be courting

A girl of your own

Twas so good to be young then

In the wake of the earth

And to stand by your wife

At the moment of birth

(14)

Data

Warehouse Part 06

Technology

(15)

Data

Warehouse Part 06

Technology

So. Let us build a Data Warehouse

- And survive

(16)

Data

Warehouse Part 06

Technology

So. Let us build a Data Warehouse

(17)

Data

Warehouse Part 06

Technology

So. Let us build a Data Warehouse

The concept is over 10 years old. It is widely done.

(18)

Data

Warehouse Part 06

Technology

So. Let us build a Data Warehouse

Before we begin the project

Make sure the normal prequisities are there •The support from the top

•Clearly defined responsibilities •Genuine business need

•And so on But:

(19)

Data

Warehouse Part 06

Technology

Political problems

Is the project a Data Warehouse project?

It must be called that or it must not be called that Why “must not”:

• There is already a “Data Warehouse” • The previous DW project failed

• Similar projects have failed elsewhere

(20)

Data Warehouse Part 06 Technology

Political problems

It could be called

• Analytic and Reporting Data Store • Detailed Extract Database

• Integrated Detail Layer

• Common History Datastore • Information Database

• Reporting Database • and so on . . .

(21)

Data

Warehouse Part 06

Technology

For whom should it be

• A Data Warehouse is for rather general purpose. • Not for one specific use for a small group.

(22)

Data

Warehouse Part 06

Technology

Business issues

At the beginning it might be a good idea to do something more at the same time (or before)

• Process redesign of information delivery • Any isolated islands

• Improved management decision systems • Customer focus – pay attention to this • Any new applications around / needed ? • Risk management

• Any new management visions

(23)

Data

Warehouse Part 06

We need it now

• A large, expensive, and long project may not be popular

• Deliver results (and money) quickly

• Does this spell “A quick little data mart”

• With a forethought the project can solve the

immediate need and prepare for the future at the same time

•Try to avoid building multiple independent data marts • Better be consistent

• Build in stages, based on an overall vision and plan for the longer term

(24)

Data

Warehouse Part 06

We need it now

• It is a good idea to begin with a limited selection of the data

A1 A2 A3 A4 A5 A6 A7

(25)

Data

Warehouse Part 06

We need it now

• It is a good idea to begin with a limited selection of the data

A1 A2 A3 A4 A5 A6 A7

(26)

We need it now

• It is a good idea to begin with a limited selection of the data

May be difficult to limit

Data

Warehouse Part 06

Technology

(27)

We need it now

• It is a good idea to begin with a limited selection of the data

May be difficult to limit

Data

Warehouse Part 06

Technology

(28)

Data

Warehouse Part 06

What information will there be?

• Good to already have conceptual and logical data models

• these can be used to understand the operational data sources

• Starting from scratch one has to construct a logical data model of the existing situation

• Gather subjects, entities and attributes

(29)

Data

Warehouse Part 06

About attributes

Give each attribute a unique name

• Define the attributes

• type, domain, field length, . . .

• Identify the data source for this attribute

• Define the transformation needed for this attribute • Define the checking needed for this attribute

• Understand the timing aspects of this attribute • update cycle, time constraints, retrospection • Document that you have found and done

• This will form the metadata later on

(30)

Data

Warehouse Part 06

Being consistent

Try to design and define standards applicable for the whole data warehouse environment

• Naming the items

• E.G. Hungarian notation

• First characters describe the use or the contents • ID – some sort of Id or key: IDCUSTOMER

• NA – name sort of: NACUSTOMER

• EU – monetary attribute: EUCUSTOMER

• Checking routines • Input routines

(31)

Data

Warehouse Part 06

How big is too big?

How much can the project handle at one time • Some alarming points might be

• More than 15 source data tables

• More than 2 source application systems • Multiple complex transformations

• Many additive measures • more than 3 or 4 facts Try to concentrate.

Find trouble before it finds you.

(32)

Data

Warehouse Part 06

What to do with too big steps?

Some things to consider

• Break a large work into smaller sections • Deliver each one in turn

• Limit the subjects

• Delay some difficult reporting to a later stage • Often requirements change anyway

• Prioritize based on business value

• The highest value may be the highest project risk

• In that case secure considerable commitment from the sponsor

(33)

Data

Warehouse Part 06

Technology

Deliver value on the way

If at all possible provide something for the needy.

Some useful part of the total may be extremely valuable

Once you have some live users you have a production Data Warehouse. The deliverables should be right

(34)

Data

Warehouse Part 06

Technology

Deliver value on the way

If at all possible provide something for the needy.

Some useful part of the total may be extremely valuable

Once you have some live users you have a production Data Warehouse. The deliverables should be right

(35)

Data

Warehouse Part 06

On practical issues and constraints

There are some very practical issues

1. Job scheduling

• How to schedule the data capture • What cycle for the DW ?

• What cycles there are in the operational systems • Understand the dependencies

• What effect retrospection ?

• The date the DW has new data

• correct and ready to use or about such

(36)

Data

Warehouse Part 06

On practical issues and constraints

There are some very practical issues

2. Disk Space

• “One cannot be too rich or have too much disk space” • Figure the size of your tables, factor in the growth

• How about • work files

• summary files

• Multiple dimension data bases (MDDB’s) • Areas needed for exploitation tools

• metadata

(37)

Data

Warehouse Part 06

On practical issues and constraints

There are some very practical issues

3. Political issues

• “Every file has an owner” • Find the owner

• Who is responsible for data being correct ? • The Data Warehouse is for everyone

• should be treated as such • Usage rights

• who is allowed this or that

(38)

Data

Warehouse Design

Part 06

Technology

(39)

Data

Warehouse Design

Part 06

Technology

(40)

Data

Warehouse Design

Part 06

Technology

I get up and I see the sunrise. And I feel good, yeah

For my life that begun. You and me were free We went out to see From morning, ooh

Till the end of the day.

The Kinks This is the end

References

Related documents

The purpose of this study was to investigate the mother-child (FI-child) interactions in the children with ADHD compared to the TD children using a qualitative

In this paper, we propose architecture for XML-based data and metadata integration in data warehousing system with Common Warehouse Metamodel (CWM) as a standard for modeling

Figure 6.6 (A) shows the applied pattern of load while Figure 6.6 (B) gives how the task changes its location according to the load state of the worker. The decision of moving the

To be a major contributor to the development, delivery and monitoring of the Council’s People Strategy, service plan and other related strategies and take responsibility for

The Business review of the Group can be found in the consolidated financial statements and Annual Report and Accounts of the Company for the year to December 31, 2013, prepared

And how do the research ideas relate to the candidate’s own educational and professional background, to one of the academic disciplines of the graduate schools of Leiden

Hot water cylinder Heat exchanger in radiator heating system RAVK RAVK Technical data Type RAVK k v (m 3 /h) at a P-band °C