ural Defin
Hub Example the examples
5.5 Dynamic Adaptability
Link structures enable dynamic adaptability; that is: the ability to “define associations or correlated data sets on the fly.” Dynamic adaptability leads to a fluid modeling structure. A data mining tool with specialized algorithms that mine both the metadata (data model ontology) and the data set is capable of discovering new relationships that are not yet represented in the model. The data mining algorithm must include the metadata definitions of terminology that explain data models in order to apply appropriate context when deciding to Link different data sets (i.e., Hubs) together.
Relationships (i.e., new Links in the Data Vault) created in this fashion must include two additional attributes: confidence and strength. In other words, how confident is the mining engine (neural network) that this relationship actually exists and is real, and how strong are the correlations across the data sets? These two metrics are applied to every row of data that is loaded to the newly formed Link.
A fluid Data Vault model is constantly adapting, self-learning. Like any neural network, the
alterations and learning must be a guided and corrected process, otherwise the neural network may drive the model to an undesired state (possibly un-usable). Before these notions are dismissed as theoretical in nature they must be considered as reality. A company known as NetQuote in Denver, Colorado applied this technique (human based mining) to build an up-sell Linkage resulting in a 40%
profitability increase in the first week.
Learning systems, intelligence systems, and military grade systems may actually see the most benefit from this technique. It allows “testing” of hypothesis without losing any of the historical data which has been captured. To take advantage of the fluid model requires automated changes to apply to loading and querying routines; in addition, it requires automated changes to the data marts down-stream.
It is possible to create a learning system that is capable of discovering relationships across data sets where none existed previously. It is possible to create a system that adapts to newly arriving elements on the XML feeds, or web-service transactions. It is possible to create a system that arrives at potential high impact information without the need of up-front human intervention. How to build these systems is way beyond the scope of this book, but a well-designed Data Vault is a prerequisite to even starting down this path.
Supe
er Charge Yo
an Linstedt 2 Scalability Link entity a allel processi larger data s ked or associa
imagination a What is M Short for M running in with the m whereas in difficult to executing from the b the same sical location ormance. Fi AN or NAS di em. The Dat ormance gro red.
our Data Ware
2010-2011, a lso provides ng) relies on sets. The Lin ated at run-ti and the hard PP?
Massively Para parallel to exe main difference
n MPP system program beca segments can bottleneck pro
memory at on n of the tables
gure 5-9 indi isk). This typ ta Vault mode ows, as data s
Fig
ehouse
all rights rese the model w
scale-out (ad nk entity enab ime (see Figu dware compo
allel Processin ecute a single e being that in s, each CPU h ause the appli n communicat blems inheren ce. http://ww s on specific icates a tradi pe of architec el is flexible e sets grow, as
gure 5-9: Trad
erved ith unlimited dding new in bles data to s ure 5-6 above onents applie
ng, a type of co e program. MP n SMP systems has its own me ication must b e with each ot nt in SMP syste ww.webopedia
storage dev itional startin cture provide
enough to gr s real-time da
ditional Data
d scale-out; th dependent p sit in differen e). The Data ed underneat
omputing that P is similar to s all the CPUs emory. MPP sy be divided in s ther. On the ot ems when all a.com/TERM/
ices can be o ng point for th s the lowest ow with the c ata arrives –
Vault Storag
P
http:/
he same way processing no nt geographic a Vault scalab th the model.
uses many se symmetric pr share the sam ystems are the
uch a way tha ther hand, MP the CPUs atte M/MPP.html optimized for
he Data Vaul cost entry po odes) to reac cal locations, bility is limite .
eparate CPUs ocessing (SMP me memory,
erefore more at all the PP don't suffer empt to access
r maximum lt architectur oint for a sing ng needs. As
ult model can 52
Whe new and volu stora SLA proc
The or ha while an M intro
en the perform physical arc Satellite-Cus
me of inform age disk) wit (service leve cesses that lo
tables can th ardware. The e allowing th MPP design.
oduced to oth
mance of this hitecture as stomer-Order mation, they c h multiple I/O el agreement
oad down-str
Figu hen be furthe e architectur
e model to b For further p her table stru
s architectur shown in Fig r are growing can be split o O channels.
) which spec eam marts.
ure 5-10: Per er partitioned re allows the be de-coupled performance, uctures, as se
re falls below gure 5-10. As g at an unprec off physically
This allows t ifies perform
rformance Ph d across mult
performance d from the ph , additional R een in Figure
w expectation ssuming in th
cedented rat on to a spec the business mance driven
hysical Split V tiple I/O chan e to be tightly hysical layers RAID 0+1 con e 5-11.
s, it can be e his case that te, or that the
ialized DASD managemen metrics arou
Version 1 nnels, additio y coupled to s. This is an o nfigurations a
easily adjuste Link-Custom ey contain a D (direct attac
nt and IT to p und certain q
onal disk com the physical
optimal situa and DASD ca
ed to a mer-Order
massive ched provide an queries or
mponents, storage, ation for an be
Supe
© Da This part exec syste parts
Addi Part phys
er Charge Yo
an Linstedt 2 process can ition level of cuted at the p
ems, and for s of the mod
itional I/O ch itioning of th sical table str
our Data Ware
2010-2011, a Figu n be repeated each individ physical leve r the flexibility
el are placed
Figu hannels can b e tables furt ructuring can
ehouse
all rights rese ure 5-11: Per d again, and
ual table. Th l of the Data y of breaking d on high-spe
ure 5-12: Per be added to e her enhance n be found in
erved
rformance Ph again across his enables fu
Vault. This t g off parts of eed, high-cos
rformance Ph each disk de
s performanc the Data Va
hysical Split V s each individ
ull scale-out type of desig the model on st equipment
hysical Split V vice for addit ce and paral ult implemen
P
http:/
Version 2 dual table, an
MPP style ar gn is geared f n to slower e
.
Version 3 tional paralle lelism. Furth ntation book
Page 89 of 15
//LearnData nd down to th rchitecture to
for extremely equipment, w
el access cap her discussio
.
52
aVault.com he
o be y large while other
pacity.
ons of
5.7
Link Entity S Link entity st ness sequen are necessa seen date, c ments may be
ness require ord source), a Link entity m mpromised, th k is comprom
nked table in he loading ro ks) in order to k and is invali
Warning high main time in th
Link Driving very Link the of the relatio be appropria trarily been a
Structure tructure cons nce keys, load ary and helpfu onfidence ra e added for q es. As techno and encryptio must NEVER c hen the flexib ised then yo surers that it utines. Links o be consider id. Figure
5-: Any compro ntenance cost he near future
Key
re is a notion onship. The d ately “end-da assigned to C
sists of basic d date stamp ul in order to bility of the m u are sure to t depends on s must contai red valid; a L -13 is an exa
Figure
5-omise made ts, difficulty i e. Never alt
n called a dri driving key is ated” when th CUST_SQN.
c required ele p, and record o meet the ap h rating, enc es, performa es, items suc e “swallowed ness keys, or model is imme
o need reengi n the busines n two or mor ink with a sin ample of the
-13: Sample
in the structu in growth, lac ter the raw st
ving key. The s necessary t he relationsh
ements: surro d source. The pplied needs
ryption key, a ance purpose
ch as last see d” by the data begin and e ediately com ineering in th ss logic for lo re key seque ngle Hub seq Link Entity St
Link Structu
ure will lead d ck of flexibilit tructural defin
e driving key to identify so ip changes.
ogate sequen ere are addit of the data s and possibly es, and disco en dates, me abase functi end dates. If promised. If he future. Add oading, this ra ence fields (fr quence key is
tructure.
re
directly to re ty, and proble
nitions of the
y is the main that Satellite In Figure 5-1
nce id, multip tional compo
set. Items su other metad overy purpose etadata (inclu
onality.
a Link struct f the structur ding busines aises the com rom either Hu s considered
-engineering ematic real-e Data Vault.
key that driv es based on 14, the drivin
ple
Supe
© Da Wha spec on O esta purp
In th chan chan inse Sate
er Charge Yo
an Linstedt 2 at this means cific custome October 14, 2 ablishing Link
poses.
his case, the nges the acco nges the emp rt (intermedi ellite.
our Data Ware
2010-2011, a s in this exam er. For instan 2000, it’s the k _SQN = 1.
Figure Link record 1 ount number ployee that d ate step) occ
ehouse
all rights rese Figure 5-14 mple is: the a nce: when the e first time fo
Figure 5-15
e 5-15: Exam 1 has 1 Sate
r that the cus deals with the cur. Figure 5
erved : Example Dr ccount and e e warehouse
r this relation adds a Sate
mple of Link S llite record. W stomer is ass e customer?
5-16 depicts t
riving Key for employee seq e sees: CUST=
nship. An ins llite (discuss
Satellite with What happen sociated with
In each of th the “post-ins
P
http:/
r Link quences can
=11, ACCT=2 sert occurs to sed in chapte
Driving Key ns when the h? What if th hese cases, w sert” of the ne
Page 91 of 15
//LearnData be re-assign 25, and EMP_
o the Link, er 6) for illust
operational s e operationa we see the fo ew row in the
52
aVault.com ned to a
_SQN=12 trative
system al system
ollowing e Link and
To re cust supe mak
The dete
5.9 For t Wor stren vers
estore order tomer sequen ersedes the o ke the proper
Note: the system Pr view) of m
final Figure 5 ect changes.
Link Exampl the examples ks data mod ngth/confide sion of the Ad
Figur to the data, nce in the Lin old version. T r determinati
e Driving Ke rimary Key, b multiple syste
5-17 below s
Fig es
s of the Links el and a hea ence ratings.
dventure Wor
re 5-16: Inse row 1 in the S nk is the driv
The ETL proc on of Link re
ey may be a but not alway ems. Choose
shows the Sa
ure 5-17: Lin
s we have us lth-care mod Figure 5-18 rks 2008 Dat
rt to Link/Sa Satellite requ ing key, and cessing must cords and as
composite.
ys. Sometim e what makes
tellite row pr
nk Driving Ke
sed several d del. These Lin contains exa ta Vault.
at Based on D uires that it b the Link rece t take into ac ssociations to
It may be r es it is a com s sense to the
roperly end-d
y/Satellite En
ifferent mod nk structures
ample Link s
Driving Key be end-dated
eived a new ccount the dr o Satellite ro
representative mbined view (
e business.
ated by using
nd Dated
els including do not carry structures fou
d. Why? Beca record that riving key in o
ws.
e of the sour (super-set
g the Driving
g Microsoft Ad y last seen da und in the cu
ause the order to
rce
g Key to
dventure ates nor urrent
Supe
er Charge Yo
an Linstedt 2 Figu ach of these t) has a grain minology this
dimensions.
he Link table combination ch 1 to 1 wit e. This stand
time.
mine the Link ociated keys.
contains an he operationa d records. It ment of a Link
our Data Ware
2010-2011, a re 5-18: Exam examples, fo n of 3 differen
would be rea In technical
.
n of Hub Sequ h the genera dard is enforc
k Lnk_WOID_
It contains Oper_Seq. T al system (in
also serves t k table is cal
ehouse
all rights rese mple of Link ocus on the p
nt Hubs: Hub ad as: Produc terms, this is
uences (com ted Link Seq ced so that t
_LocID (which Hub Work Or The Oper Seq
this case Ad to make the c
led a degene
erved Tables From particular gra b Product, Hu ct by Categor s deemed to
posite) must quence. The he sequence
h stands for:
rder ID seque q turns out to dventure Wor
combination erate field.
Adventure W ain of the dat ub Category, ry by Sub Cat ence, and Hu o be an opera rks Applicatio
of the two ke
P
http:/
Works 2008 ta set. The la
Hub Sub Cat tegory. Thes n of the data
ue index. Th nce is the prim
a Vault mode
Order ID by Lo ub Location ID
ational seque on if there wa
eys unique.
Page 93 of 15
//LearnData Data Vault ast Link (seen
tegory. In bu e would be k which is rep
his unique ind mary key of t el can be re-b
ocation ID). N D sequence, encing numb as one) to ord This particul
52
Notice it’s but it ber used
der all the ar