ural Defin
Hub Example the examples
5.2 Reasons for Many To Many Relationships
Within the Data Vault modeling constructs a Link is formed any time there is a 1 to 1, 1 to many, many to 1, or many to many relationship between data elements (business keys). The resulting physical Data Vault can capture “what the relationship was”, while it captures “what the relationship is”, and can adapt to “what the relationship will be in the future.”
Many-to-Many relationships provide the following benefits:
1
1. Flexibility 2. Granulari 3. Dynamic 4. Scalabilit ny-to-many re nges with litt ry). Business ponsive to ha
er.
ough the Link ause the rela ny customers gned in a rig ent business des to chang folios.” Figu
y ses must cha ndling those
k entity the D ationship cha s, but each cu gid fashion (th s rules quite w ge their busin
re 5-1 demon
F
allow the phy act to both ex ange at the s
changes. M
ata Vault mit nges. For ex ustomer mus hat is to say w well. All is we ness rule: “no nstrates rela
Figure 5-1: R
ysical model t xisting data s peed of busi More and mor
tigates the ne xample: toda st be handled
with parent-c ell until the b ow, a custom tionship cha
elationship C
to absorb da sets (history) ness, and IT re business r
eed to restru y the busines d by 1 and on child depend business (tom mer may be h
nge over tim
Changes Ove
ta changes a and existing must becom rules are cha
ucture/redes ss states “1 nly 1 portfolio
encies) then morrow, next andled by 3 e.
er Time
and business g processes ( me more agile
nging, faster
ign the EDW portfolio can o.” If the mo it represents year, or 2 ye
or 4 differen s rule
Supe
er Charge Yo
an Linstedt 2 of the proble ctures static.
e changed in he business:
del is that it a tionship enfo e-engineer th he Data Ware Data Vault m ws. By mode
Link table fu monstrates th
our Data Ware
2010-2011, a ems of mode . It forces the
the past, an grow, change also re-introd orcement in t e loading rou ehouse. This must remain f ling the Links unctions to fu
e reason for
Figure 5-2
ehouse
all rights rese eling today’s
e structures d will change e, or die. The uces busines to the loading utines, the m s is an unacce
flexible, and s as a many-t to represent e again in the e problem wi ss rules to th g routines. W modeling arch
eptable and not introduce to-many relat he model and
table:
Structure Ho
in any data w today’s relat e future. This
th introducin e loading pro When the rela hitecture, and un-maintaina e the need fo tionship, we d provide max
using Multip ng static relat
ocesses. It a ationship doe d the queries
able cost goi or re-enginee
tionships in t also introduce
es change, IT s to get the da
ng forward.
ering as the m ccomplish th bility. Figure
hips
Many-to-many relationships ensure that the business associations (past, present, and future) can be added to the warehouse without altering the model or the load routines. The metadata that is currently lost is the nature of the relationship (e.g.,1:1, 1:M, M:1) as documented in the source system (what exactly did the operational model look like?). This must be documented in the
metadata of the Link table, hopefully in the Meta Vault. By capturing the metadata in the Meta Vault (including computational functions that create the relationship, along with how it’s used) the
business can begin to track changes to business knowledge as they relate to the data set and operational systems over time.
The resulting power of this capture mechanism enables the business to monitor the impact of their decision. Data mining on the Meta Vault and the data set can then perform gap analysis in regards to the quality of the decision and the end resulting impact (pre and post decision process). If the business adapts its business process, adding a new Link table can be done easily and quickly without reengineering the entire existing data warehouse. Load routines are isolated from the impact, as are queries and BI processes.
If the model is rigid, then the loading (ETL) designs are also rigid. If the business rule changes, and meets a rigid architecture, then the result of the impact is: forced re-engineering. The extent of the impact may cascade into other “child tables” thus, the larger the EDW model grows, the larger the possibility for impact. The less agile IT can be in response to business rule changes, and conversely the more it costs (over time) to continue to adjust the EDW architecture to meet business needs.
This is the common design pattern that occurs in traditionally modeled warehouses. This impact is completely mitigated by building a Link entity into the Data Vault. The Data Vault therefore is highly scalable, flexible, and now, agile. The Link entity allows the structure to handle changes to business rules without the impact of re-engineering (aka re-factoring), and without the ever increasing cost curve. However, it is suggested that the business rule itself, along with any calculation that
produces this data set be recorded within a Meta Vault. To learn more about Meta Vault, check out the one-on-one coaching area at: http://danLinstedt.com
5.3 Flexibility
Many-to-Many relationships provide maximum flexibility and agility. The more flexible the model is, the faster it is to adapt or change. The faster the model can adapt, the less time it takes IT to respond to business changes. The less time it takes for IT to respond to business changes, the more work can be done in a shorter amount of time, leading to increased productivity of the IT staff in the data warehousing environment. Adding new tables (especially Link tables) to the Data Vault is easy.
Supe
© Da Supp prod prod
er Charge Yo
an Linstedt 2 pose the mo duct, and line duct categorie
our Data Ware
2010-2011, a del starts ou e-items (see F
es and suppl
ehouse
all rights rese Figure 5-3: S t with an ord Figure 5-3). T liers. How do
Figure 5-4:
erved
Starting Mode er tracking s Time passes oes the mode
Data Vault A
el Before Cha system that k
and now the el evolve?
After Modifica
P
http:/
anges knows the cus e business wi
ation
Page 81 of 15
//LearnData stomer, orde ishes to add
52
aVault.com er,
a set of
As s
een in Figure e or effort is r warehouse. W ting structure time and effo ctures is the dd sales regi ht it be to ext
easy to add re is a discus
ems. This ty rypted data c
p in mind tha pful in manag
stored in diff ry use.
e 5-4, changi required to m We can add L es (to add ne ort required t first step, an ons to mana tend the mod
Figure new entities ssion in the B pe of model can be stored
at adding new ging distribute ferent global
ng the mode make the cha Links to repre ew foreign ke
to find and es nd the most i age both cust
del again?
5-5: Addition to the Data V Business of D
flexibility len d in separate w Links acros
ed (yet conne systems; it is
el or adding n nges occur a esent the ne ey columns) o
stablish appr mportant ste tomers and o
nal Data Vau Vault model d Data Vault Mo ds itself well entities and ss global env ected) system odeling book
to protected easily Linke vironments is ms. As indica
pplication to
es is a simple impact on th ips without h data. Do no ness keys. C Suppose the b ombined com
ore Changes nk structure k about classi d environmen ed to unsecur s also possib
ated in Figure keep the key
e process. N he existing po having to revi ot confuse th Creating the business now mponent, how
(see Figure 5 ified informa nts as the sec red data enti
le. This is es e 5-6, the Lin ys synchroniz
ot much ortions of
se
Supe
© Da In th Vaul and farm crite will b 5.4 Gran of “p The wha
er Charge Yo
an Linstedt 2 his situation,
lt. Most likel a business r m out differen eria. The con be discussed Granularity nularity is vita parent” table same mode t is the grain
our Data Ware
2010-2011, a there are ma y the applica rules engine.
nt componen ntrol over the
d in the book
al to an EDW es a Link cont of thinking a of the follow
ehouse
all rights rese Figure 5-6 any different ation loading It is quite po ts of the acc data, the loa titled: Data V
W; the Data Va tains. For ea applies when wing fact tabl
erved 6: Global Dat applications data to the g ossible to ha ess dependin ading and qu
Vault Implem
ault is no diff ach parent, th
considering e (see Figure
ta Vault Linki s that are syn global Data V ve more than ng on the sec uerying are be mentation.
ferent. Grain here is a new fact tables in e 5-7), and ho
P
http:/
ing
nchronizing th Vault is comp
n one global curity, geo-lo eyond the sc
n can be mea w (lower) leve n a Star Sche ow can it be
Page 83 of 15
//LearnData he operationa prised of web
controller, a ocation, or oth
ope of this b
asured by the el of grain int
ema. For exa accurately d
52
aVault.com al Data b-services,
nd to her book, and
e number roduced.
ample, escribed?
The Each simp table Data
grain of this h dimensiona ply means de es represent a Vault, the g
fact table ca al key creates etailed level o the level of d grain would lo
Figure
Figure 5-7:
an be read as s a new leve of data. Grai detail that th ook Figure
5-8: Data Va
Uncovering s: Customer b
l of grain for in in the Data he data is sto
-8.
ault Grain, Re
Fact Table G by Product by the facts. G a Vault Link t red at. After
epresenting S Grain
y Sales by We rain as defin tables is no d r converting t
Star Schema
eek/Year/Mo ned by this ex
different. The this Star Sch
onth, etc..
xample e Link
ema to a
Super Charge Your Data Warehouse Page 85 of 152
© Dan Linstedt 2010-2011, all rights reserved http://LearnDataVault.com When the business requirements indicate a need to record data at a different grain, new Links
should be added to the existing Data Vault – old ones are simply no longer fed incoming data (but are retained as they contain historical data). The alternative option is to re-engineer the existing Link to add the new Hub-surrogate-key. Re-engineering is the enemy of flexibility, and auditability – and can quickly cause an EDW project to scale out of control. In regards to auditability, the
question is: once a new Hub-surrogate-key is added to the existing table, how should it be defined to the business? Especially if the definition has to apply to “past-historical data” that is stored in the Link already.
The very same question plagues the changes to star-schema fact tables; adding a dimensional surrogate to a fact table causes the grain of all the data to change. When the business asks the next question: can we reproduce a report from last year and compare it to data from this year? Of course, the answer is: technically yes – but what has to happen to the code that drives that report?
It has to split in to two parts, one part of the code for grabbing history, and a second part of the code for grabbing “current” data with the new key, now the project is beginning to take on a much greater cost in terms of maintenance. As changes continue to alter the structure, more code forks are necessary to mitigate the business users’ desire for reporting; until one-day, the business wakes up and says to IT: We can’t afford any more changes, and why is the system such a mess already? This is one of the reasons we advocate using Data Vault model for your core EDW instead of a
dimensional architecture – this kind of change will not break a Data Vault.
What lurks in the shadows is even more troubling. Suppose it’s the first change, all is well – and everyone is happy (as long as access to each data set is governed). Then one day, another business unit decides they need to “roll up” the data, or summarize the recent data that has the new key.
They then combine these results with the old-data that doesn’t have the new key, and the numbers no longer match. Now they ask IT: why is the data reporting “bad” numbers?
Accountability has just been destroyed. As stated above, in the situation of new relationships and with the added needs of a data warehouse, it is best to always create new Links for these changes and leave the old ones be. A hint from the implementation book: as data degrades in value (get’s older), there’s a good chance that the old Link and its data, will be backed up and the old Link will no longer be necessary within the warehouse. This is the beginnings of a Data Vault model that truly changes with the business needs.