• No results found

Dynamic Adaptability

ural Defin

Hub Example the examples

5.5 Dynamic Adaptability

Link structures enable dynamic adaptability; that is: the ability to “define associations or correlated data sets on the fly.” Dynamic adaptability leads to a fluid modeling structure. A data mining tool with specialized algorithms that mine both the metadata (data model ontology) and the data set is capable of discovering new relationships that are not yet represented in the model. The data mining algorithm must include the metadata definitions of terminology that explain data models in order to apply appropriate context when deciding to Link different data sets (i.e., Hubs) together.

Relationships (i.e., new Links in the Data Vault) created in this fashion must include two additional attributes: confidence and strength. In other words, how confident is the mining engine (neural network) that this relationship actually exists and is real, and how strong are the correlations across the data sets? These two metrics are applied to every row of data that is loaded to the newly formed Link.

A fluid Data Vault model is constantly adapting, self-learning. Like any neural network, the

alterations and learning must be a guided and corrected process, otherwise the neural network may drive the model to an undesired state (possibly un-usable). Before these notions are dismissed as theoretical in nature they must be considered as reality. A company known as NetQuote in Denver, Colorado applied this technique (human based mining) to build an up-sell Linkage resulting in a 40%

profitability increase in the first week.

Learning systems, intelligence systems, and military grade systems may actually see the most benefit from this technique. It allows “testing” of hypothesis without losing any of the historical data which has been captured. To take advantage of the fluid model requires automated changes to apply to loading and querying routines; in addition, it requires automated changes to the data marts down-stream.

It is possible to create a learning system that is capable of discovering relationships across data sets where none existed previously. It is possible to create a system that adapts to newly arriving elements on the XML feeds, or web-service transactions. It is possible to create a system that arrives at potential high impact information without the need of up-front human intervention. How to build these systems is way beyond the scope of this book, but a well-designed Data Vault is a prerequisite to even starting down this path.

Supe

er Charge Yo

an Linstedt 2 Scalability Link entity a allel processi larger data s ked or associa

imagination a What is M Short for M running in with the m whereas in difficult to executing from the b the same sical location ormance. Fi AN or NAS di em. The Dat ormance gro red.

our Data Ware

2010-2011, a lso provides ng) relies on sets. The Lin ated at run-ti and the hard PP?

Massively Para parallel to exe main difference

n MPP system program beca segments can bottleneck pro

memory at on n of the tables

gure 5-9 indi isk). This typ ta Vault mode ows, as data s

Fig

ehouse

all rights rese the model w

scale-out (ad nk entity enab ime (see Figu dware compo

allel Processin ecute a single e being that in s, each CPU h ause the appli n communicat blems inheren ce. http://ww s on specific icates a tradi pe of architec el is flexible e sets grow, as

gure 5-9: Trad

erved ith unlimited dding new in bles data to s ure 5-6 above onents applie

ng, a type of co e program. MP n SMP systems has its own me ication must b e with each ot nt in SMP syste ww.webopedia

storage dev itional startin cture provide

enough to gr s real-time da

ditional Data

d scale-out; th dependent p sit in differen e). The Data ed underneat

omputing that P is similar to s all the CPUs emory. MPP sy be divided in s ther. On the ot ems when all a.com/TERM/

ices can be o ng point for th s the lowest ow with the c ata arrives –

Vault Storag

P

http:/

he same way processing no nt geographic a Vault scalab th the model.

uses many se symmetric pr share the sam ystems are the

uch a way tha ther hand, MP the CPUs atte M/MPP.html optimized for

he Data Vaul cost entry po odes) to reac cal locations, bility is limite .

eparate CPUs ocessing (SMP me memory,

erefore more at all the PP don't suffer empt to access

r maximum lt architectur oint for a sing ng needs. As

ult model can 52

Whe new and volu stora SLA proc

The or ha while an M intro

en the perform physical arc Satellite-Cus

me of inform age disk) wit (service leve cesses that lo

tables can th ardware. The e allowing th MPP design.

oduced to oth

mance of this hitecture as stomer-Order mation, they c h multiple I/O el agreement

oad down-str

Figu hen be furthe e architectur

e model to b For further p her table stru

s architectur shown in Fig r are growing can be split o O channels.

) which spec eam marts.

ure 5-10: Per er partitioned re allows the be de-coupled performance, uctures, as se

re falls below gure 5-10. As g at an unprec off physically

This allows t ifies perform

rformance Ph d across mult

performance d from the ph , additional R een in Figure

w expectation ssuming in th

cedented rat on to a spec the business mance driven

hysical Split V tiple I/O chan e to be tightly hysical layers RAID 0+1 con e 5-11.

s, it can be e his case that te, or that the

ialized DASD managemen metrics arou

Version 1 nnels, additio y coupled to s. This is an o nfigurations a

easily adjuste Link-Custom ey contain a D (direct attac

nt and IT to p und certain q

onal disk com the physical

optimal situa and DASD ca

ed to a mer-Order

massive ched provide an queries or

mponents, storage, ation for an be

Supe

© Da This part exec syste parts

Addi Part phys

er Charge Yo

an Linstedt 2 process can ition level of cuted at the p

ems, and for s of the mod

itional I/O ch itioning of th sical table str

our Data Ware

2010-2011, a Figu n be repeated each individ physical leve r the flexibility

el are placed

Figu hannels can b e tables furt ructuring can

ehouse

all rights rese ure 5-11: Per d again, and

ual table. Th l of the Data y of breaking d on high-spe

ure 5-12: Per be added to e her enhance n be found in

erved

rformance Ph again across his enables fu

Vault. This t g off parts of eed, high-cos

rformance Ph each disk de

s performanc the Data Va

hysical Split V s each individ

ull scale-out type of desig the model on st equipment

hysical Split V vice for addit ce and paral ult implemen

P

http:/

Version 2 dual table, an

MPP style ar gn is geared f n to slower e

.

Version 3 tional paralle lelism. Furth ntation book

Page 89 of 15

//LearnData nd down to th rchitecture to

for extremely equipment, w

el access cap her discussio

.

52

aVault.com he

o be y large while other

pacity.

ons of

5.7

Link Entity S Link entity st ness sequen are necessa seen date, c ments may be

ness require ord source), a Link entity m mpromised, th k is comprom

nked table in he loading ro ks) in order to k and is invali

Warning high main time in th

Link Driving very Link the of the relatio be appropria trarily been a

Structure tructure cons nce keys, load ary and helpfu onfidence ra e added for q es. As techno and encryptio must NEVER c hen the flexib ised then yo surers that it utines. Links o be consider id. Figure

5-: Any compro ntenance cost he near future

Key

re is a notion onship. The d ately “end-da assigned to C

sists of basic d date stamp ul in order to bility of the m u are sure to t depends on s must contai red valid; a L -13 is an exa

Figure

5-omise made ts, difficulty i e. Never alt

n called a dri driving key is ated” when th CUST_SQN.

c required ele p, and record o meet the ap h rating, enc es, performa es, items suc e “swallowed ness keys, or model is imme

o need reengi n the busines n two or mor ink with a sin ample of the

-13: Sample

in the structu in growth, lac ter the raw st

ving key. The s necessary t he relationsh

ements: surro d source. The pplied needs

ryption key, a ance purpose

ch as last see d” by the data begin and e ediately com ineering in th ss logic for lo re key seque ngle Hub seq Link Entity St

Link Structu

ure will lead d ck of flexibilit tructural defin

e driving key to identify so ip changes.

ogate sequen ere are addit of the data s and possibly es, and disco en dates, me abase functi end dates. If promised. If he future. Add oading, this ra ence fields (fr quence key is

tructure.

re

directly to re ty, and proble

nitions of the

y is the main that Satellite In Figure 5-1

nce id, multip tional compo

set. Items su other metad overy purpose etadata (inclu

onality.

a Link struct f the structur ding busines aises the com rom either Hu s considered

-engineering ematic real-e Data Vault.

key that driv es based on 14, the drivin

ple

Supe

© Da Wha spec on O esta purp

In th chan chan inse Sate

er Charge Yo

an Linstedt 2 at this means cific custome October 14, 2 ablishing Link

poses.

his case, the nges the acco nges the emp rt (intermedi ellite.

our Data Ware

2010-2011, a s in this exam er. For instan 2000, it’s the k _SQN = 1.

Figure Link record 1 ount number ployee that d ate step) occ

ehouse

all rights rese Figure 5-14 mple is: the a nce: when the e first time fo

Figure 5-15

e 5-15: Exam 1 has 1 Sate

r that the cus deals with the cur. Figure 5

erved : Example Dr ccount and e e warehouse

r this relation adds a Sate

mple of Link S llite record. W stomer is ass e customer?

5-16 depicts t

riving Key for employee seq e sees: CUST=

nship. An ins llite (discuss

Satellite with What happen sociated with

In each of th the “post-ins

P

http:/

r Link quences can

=11, ACCT=2 sert occurs to sed in chapte

Driving Key ns when the h? What if th hese cases, w sert” of the ne

Page 91 of 15

//LearnData be re-assign 25, and EMP_

o the Link, er 6) for illust

operational s e operationa we see the fo ew row in the

52

aVault.com ned to a

_SQN=12 trative

system al system

ollowing e Link and

To re cust supe mak

The dete

5.9 For t Wor stren vers

estore order tomer sequen ersedes the o ke the proper

Note: the system Pr view) of m

final Figure 5 ect changes.

Link Exampl the examples ks data mod ngth/confide sion of the Ad

Figur to the data, nce in the Lin old version. T r determinati

e Driving Ke rimary Key, b multiple syste

5-17 below s

Fig es

s of the Links el and a hea ence ratings.

dventure Wor

re 5-16: Inse row 1 in the S nk is the driv

The ETL proc on of Link re

ey may be a but not alway ems. Choose

shows the Sa

ure 5-17: Lin

s we have us lth-care mod Figure 5-18 rks 2008 Dat

rt to Link/Sa Satellite requ ing key, and cessing must cords and as

composite.

ys. Sometim e what makes

tellite row pr

nk Driving Ke

sed several d del. These Lin contains exa ta Vault.

at Based on D uires that it b the Link rece t take into ac ssociations to

It may be r es it is a com s sense to the

roperly end-d

y/Satellite En

ifferent mod nk structures

ample Link s

Driving Key be end-dated

eived a new ccount the dr o Satellite ro

representative mbined view (

e business.

ated by using

nd Dated

els including do not carry structures fou

d. Why? Beca record that riving key in o

ws.

e of the sour (super-set

g the Driving

g Microsoft Ad y last seen da und in the cu

ause the order to

rce

g Key to

dventure ates nor urrent

Supe

er Charge Yo

an Linstedt 2 Figu ach of these t) has a grain minology this

dimensions.

he Link table combination ch 1 to 1 wit e. This stand

time.

mine the Link ociated keys.

contains an he operationa d records. It ment of a Link

our Data Ware

2010-2011, a re 5-18: Exam examples, fo n of 3 differen

would be rea In technical

.

n of Hub Sequ h the genera dard is enforc

k Lnk_WOID_

It contains Oper_Seq. T al system (in

also serves t k table is cal

ehouse

all rights rese mple of Link ocus on the p

nt Hubs: Hub ad as: Produc terms, this is

uences (com ted Link Seq ced so that t

_LocID (which Hub Work Or The Oper Seq

this case Ad to make the c

led a degene

erved Tables From particular gra b Product, Hu ct by Categor s deemed to

posite) must quence. The he sequence

h stands for:

rder ID seque q turns out to dventure Wor

combination erate field.

Adventure W ain of the dat ub Category, ry by Sub Cat ence, and Hu o be an opera rks Applicatio

of the two ke

P

http:/

Works 2008 ta set. The la

Hub Sub Cat tegory. Thes n of the data

ue index. Th nce is the prim

a Vault mode

Order ID by Lo ub Location ID

ational seque on if there wa

eys unique.

Page 93 of 15

//LearnData Data Vault ast Link (seen

tegory. In bu e would be k which is rep

his unique ind mary key of t el can be re-b

ocation ID). N D sequence, encing numb as one) to ord This particul

52

Notice it’s but it ber used

der all the ar