• No results found

Reasons for Many To Many Relationships

ural Defin

Hub Example the examples

5.2 Reasons for Many To Many Relationships

Within the Data Vault modeling constructs a Link is formed any time there is a 1 to 1, 1 to many, many to 1, or many to many relationship between data elements (business keys). The resulting physical Data Vault can capture “what the relationship was”, while it captures “what the relationship is”, and can adapt to “what the relationship will be in the future.”

Many-to-Many relationships provide the following benefits:

1

1. Flexibility 2. Granulari 3. Dynamic 4. Scalabilit ny-to-many re nges with litt ry). Business ponsive to ha

er.

ough the Link ause the rela ny customers gned in a rig ent business des to chang folios.” Figu

y ses must cha ndling those

k entity the D ationship cha s, but each cu gid fashion (th s rules quite w ge their busin

re 5-1 demon

F

allow the phy act to both ex ange at the s

changes. M

ata Vault mit nges. For ex ustomer mus hat is to say w well. All is we ness rule: “no nstrates rela

Figure 5-1: R

ysical model t xisting data s peed of busi More and mor

tigates the ne xample: toda st be handled

with parent-c ell until the b ow, a custom tionship cha

elationship C

to absorb da sets (history) ness, and IT re business r

eed to restru y the busines d by 1 and on child depend business (tom mer may be h

nge over tim

Changes Ove

ta changes a and existing must becom rules are cha

ucture/redes ss states “1 nly 1 portfolio

encies) then morrow, next andled by 3 e.

er Time

and business g processes ( me more agile

nging, faster

ign the EDW portfolio can o.” If the mo it represents year, or 2 ye

or 4 differen s rule

Supe

er Charge Yo

an Linstedt 2 of the proble ctures static.

e changed in he business:

del is that it a tionship enfo e-engineer th he Data Ware Data Vault m ws. By mode

Link table fu monstrates th

our Data Ware

2010-2011, a ems of mode . It forces the

the past, an grow, change also re-introd orcement in t e loading rou ehouse. This must remain f ling the Links unctions to fu

e reason for

Figure 5-2

ehouse

all rights rese eling today’s

e structures d will change e, or die. The uces busines to the loading utines, the m s is an unacce

flexible, and s as a many-t to represent e again in the e problem wi ss rules to th g routines. W modeling arch

eptable and not introduce to-many relat he model and

table:

Structure Ho

in any data w today’s relat e future. This

th introducin e loading pro When the rela hitecture, and un-maintaina e the need fo tionship, we d provide max

using Multip ng static relat

ocesses. It a ationship doe d the queries

able cost goi or re-enginee

tionships in t also introduce

es change, IT s to get the da

ng forward.

ering as the m ccomplish th bility. Figure

hips

Many-to-many relationships ensure that the business associations (past, present, and future) can be added to the warehouse without altering the model or the load routines. The metadata that is currently lost is the nature of the relationship (e.g.,1:1, 1:M, M:1) as documented in the source system (what exactly did the operational model look like?). This must be documented in the

metadata of the Link table, hopefully in the Meta Vault. By capturing the metadata in the Meta Vault (including computational functions that create the relationship, along with how it’s used) the

business can begin to track changes to business knowledge as they relate to the data set and operational systems over time.

The resulting power of this capture mechanism enables the business to monitor the impact of their decision. Data mining on the Meta Vault and the data set can then perform gap analysis in regards to the quality of the decision and the end resulting impact (pre and post decision process). If the business adapts its business process, adding a new Link table can be done easily and quickly without reengineering the entire existing data warehouse. Load routines are isolated from the impact, as are queries and BI processes.

If the model is rigid, then the loading (ETL) designs are also rigid. If the business rule changes, and meets a rigid architecture, then the result of the impact is: forced re-engineering. The extent of the impact may cascade into other “child tables” thus, the larger the EDW model grows, the larger the possibility for impact. The less agile IT can be in response to business rule changes, and conversely the more it costs (over time) to continue to adjust the EDW architecture to meet business needs.

This is the common design pattern that occurs in traditionally modeled warehouses. This impact is completely mitigated by building a Link entity into the Data Vault. The Data Vault therefore is highly scalable, flexible, and now, agile. The Link entity allows the structure to handle changes to business rules without the impact of re-engineering (aka re-factoring), and without the ever increasing cost curve. However, it is suggested that the business rule itself, along with any calculation that

produces this data set be recorded within a Meta Vault. To learn more about Meta Vault, check out the one-on-one coaching area at: http://danLinstedt.com

5.3 Flexibility

Many-to-Many relationships provide maximum flexibility and agility. The more flexible the model is, the faster it is to adapt or change. The faster the model can adapt, the less time it takes IT to respond to business changes. The less time it takes for IT to respond to business changes, the more work can be done in a shorter amount of time, leading to increased productivity of the IT staff in the data warehousing environment. Adding new tables (especially Link tables) to the Data Vault is easy.

Supe

© Da Supp prod prod

er Charge Yo

an Linstedt 2 pose the mo duct, and line duct categorie

our Data Ware

2010-2011, a del starts ou e-items (see F

es and suppl

ehouse

all rights rese Figure 5-3: S t with an ord Figure 5-3). T liers. How do

Figure 5-4:

erved

Starting Mode er tracking s Time passes oes the mode

Data Vault A

el Before Cha system that k

and now the el evolve?

After Modifica

P

http:/

anges knows the cus e business wi

ation

Page 81 of 15

//LearnData stomer, orde ishes to add

52

aVault.com er,

a set of

As s

een in Figure e or effort is r warehouse. W ting structure time and effo ctures is the dd sales regi ht it be to ext

easy to add re is a discus

ems. This ty rypted data c

p in mind tha pful in manag

stored in diff ry use.

e 5-4, changi required to m We can add L es (to add ne ort required t first step, an ons to mana tend the mod

Figure new entities ssion in the B pe of model can be stored

at adding new ging distribute ferent global

ng the mode make the cha Links to repre ew foreign ke

to find and es nd the most i age both cust

del again?

5-5: Addition to the Data V Business of D

flexibility len d in separate w Links acros

ed (yet conne systems; it is

el or adding n nges occur a esent the ne ey columns) o

stablish appr mportant ste tomers and o

nal Data Vau Vault model d Data Vault Mo ds itself well entities and ss global env ected) system odeling book

to protected easily Linke vironments is ms. As indica

pplication to

es is a simple impact on th ips without h data. Do no ness keys. C Suppose the b ombined com

ore Changes nk structure k about classi d environmen ed to unsecur s also possib

ated in Figure keep the key

e process. N he existing po having to revi ot confuse th Creating the business now mponent, how

(see Figure 5 ified informa nts as the sec red data enti

le. This is es e 5-6, the Lin ys synchroniz

ot much ortions of

se

Supe

© Da In th Vaul and farm crite will b 5.4 Gran of “p The wha

er Charge Yo

an Linstedt 2 his situation,

lt. Most likel a business r m out differen eria. The con be discussed Granularity nularity is vita parent” table same mode t is the grain

our Data Ware

2010-2011, a there are ma y the applica rules engine.

nt componen ntrol over the

d in the book

al to an EDW es a Link cont of thinking a of the follow

ehouse

all rights rese Figure 5-6 any different ation loading It is quite po ts of the acc data, the loa titled: Data V

W; the Data Va tains. For ea applies when wing fact tabl

erved 6: Global Dat applications data to the g ossible to ha ess dependin ading and qu

Vault Implem

ault is no diff ach parent, th

considering e (see Figure

ta Vault Linki s that are syn global Data V ve more than ng on the sec uerying are be mentation.

ferent. Grain here is a new fact tables in e 5-7), and ho

P

http:/

ing

nchronizing th Vault is comp

n one global curity, geo-lo eyond the sc

n can be mea w (lower) leve n a Star Sche ow can it be

Page 83 of 15

//LearnData he operationa prised of web

controller, a ocation, or oth

ope of this b

asured by the el of grain int

ema. For exa accurately d

52

aVault.com al Data b-services,

nd to her book, and

e number roduced.

ample, escribed?

The Each simp table Data

grain of this h dimensiona ply means de es represent a Vault, the g

fact table ca al key creates etailed level o the level of d grain would lo

Figure

Figure 5-7:

an be read as s a new leve of data. Grai detail that th ook Figure

5-8: Data Va

Uncovering s: Customer b

l of grain for in in the Data he data is sto

-8.

ault Grain, Re

Fact Table G by Product by the facts. G a Vault Link t red at. After

epresenting S Grain

y Sales by We rain as defin tables is no d r converting t

Star Schema

eek/Year/Mo ned by this ex

different. The this Star Sch

onth, etc..

xample e Link

ema to a

Super Charge Your Data Warehouse Page 85 of 152

© Dan Linstedt 2010-2011, all rights reserved http://LearnDataVault.com When the business requirements indicate a need to record data at a different grain, new Links

should be added to the existing Data Vault – old ones are simply no longer fed incoming data (but are retained as they contain historical data). The alternative option is to re-engineer the existing Link to add the new Hub-surrogate-key. Re-engineering is the enemy of flexibility, and auditability – and can quickly cause an EDW project to scale out of control. In regards to auditability, the

question is: once a new Hub-surrogate-key is added to the existing table, how should it be defined to the business? Especially if the definition has to apply to “past-historical data” that is stored in the Link already.

The very same question plagues the changes to star-schema fact tables; adding a dimensional surrogate to a fact table causes the grain of all the data to change. When the business asks the next question: can we reproduce a report from last year and compare it to data from this year? Of course, the answer is: technically yes – but what has to happen to the code that drives that report?

It has to split in to two parts, one part of the code for grabbing history, and a second part of the code for grabbing “current” data with the new key, now the project is beginning to take on a much greater cost in terms of maintenance. As changes continue to alter the structure, more code forks are necessary to mitigate the business users’ desire for reporting; until one-day, the business wakes up and says to IT: We can’t afford any more changes, and why is the system such a mess already? This is one of the reasons we advocate using Data Vault model for your core EDW instead of a

dimensional architecture – this kind of change will not break a Data Vault.

What lurks in the shadows is even more troubling. Suppose it’s the first change, all is well – and everyone is happy (as long as access to each data set is governed). Then one day, another business unit decides they need to “roll up” the data, or summarize the recent data that has the new key.

They then combine these results with the old-data that doesn’t have the new key, and the numbers no longer match. Now they ask IT: why is the data reporting “bad” numbers?

Accountability has just been destroyed. As stated above, in the situation of new relationships and with the added needs of a data warehouse, it is best to always create new Links for these changes and leave the old ones be. A hint from the implementation book: as data degrades in value (get’s older), there’s a good chance that the old Link and its data, will be backed up and the old Link will no longer be necessary within the warehouse. This is the beginnings of a Data Vault model that truly changes with the business needs.