ural Defin
Hub Example the examples
4.14 Modeling Rules and Standards for Hub Tables
The Data Vault model is a repeatable, consistent, scalable and flexible technique. There are rules and standards around each of the table structures that must be followed, or the resulting model will not qualify as a Data Vault model and will be subject to the risks it was designed to avoid. Below are the modeling rules and standards that surround a Hub Table.
• A Hub must have at least 1 business key
• A Hub should not contain a composite set of business keys. ** exception below
• A Hub SHOULD support at least one Satellite to be in existence, Hubs without Satellites usually indicate "bad source data", or poorly defined source data, or business keys that are missing valuable metadata. However, a Hubs’ Satellites may be hidden because of security restrictions or information hiding paradigms
• A Hub Business Key CAN be composite when: two of the same operational systems are using the same keys to mean different things AND these keys collide when integrated back together again. In this case, the record source becomes part of the business key. Please be aware: BAD DATA CAUSES BREAKS IN THESE RULES - THESE ARE GUIDING PRINCIPLES. Exceptions to this rule should not happen (but do), also be aware, bad architecture in source systems causes breaks in these rules too.
• A Hub Business Key MAY also be composite because the key is utilized as a composite key within the business
• Hub's business key must stand-alone in the environment - either be a system created key, or a true business key that is the single basis for "finding" information in the source system. A True business key is often referred to as a NATURAL KEY
• A Hub should contain a surrogate sequence key (if the database doesn't work well with natural keys).
• A Hub's load-date-time stamp or observation start date must be an attribute in the Hub, and not a part of the Hub's primary key structure
• A Hub's PRIMARY KEY cannot contain a record source (though the business key may as noted above).
• A Hub may contain a Last-Seen-Date if desired grain of tracking is needed
The rules for Data Vault modeling have not changed (architecturally) since 1997; which makes the architecture itself stable and easy to use. The rules and standards for modeling are kept up to date on the following web-site: http://DanLinstedt.com.
Super Charge Your Data Warehouse Page 75 of 152
© Dan Linstedt 2010-2011, all rights reserved http://LearnDataVault.com 4.15 What Happens when the Hub Standards Are Broken
The standards, the design, and the architecture of the Hub are based on mathematics including finite complexity, measurable maintenance effort, including number of rows per block. If the Hub standards are broken (such as introducing a foreign key directly in to the Hub) then the flexibility of the model breaks. The adaptability to future business requirements breaks. The ability to load past history (which may not match the relationship definition) breaks. When the rules and standards are broken, it also introduces high levels of re-engineering upstream of the Data Warehouse. It forces business requirements to creep back in to the upstream loads. Eventually the business
requirements change, and thus – force re-engineering to occur in the loading, querying and
structuring of the Data Vault. The current architecture of the Data Vault avoids all re-engineering if the rules and standards are adhered to.
If descriptive data is introduced to a Hub, then data over time becomes more difficult to manage.
The complexity of the loading cycle increases. The staging area requires additional “copies” of the data set to synchronize it with the final image. It becomes impossible to split data by rate of change or type of information.
It is not recommended nor condoned to break the standards of the Data Vault. The engineering work has been done in order to avert pitfalls encountered on typical enterprise data warehousing projects. In fact, if the standards are broken, the model will not qualify as a Data Vault model.
The only risk a “pure” Hub design has is the width of the business key. If the business key is comprised of multiple fields (is a composite business key), then it may be possible that the number of rows per block exceeds the desired count. When this happens, the number of I/O’s increases dramatically to search through the Hub structure and locate the proper business key.
The average Hub row size is accounted for as follows:
Field Average Bytes
Sequence 8
Business Key 25
Load Date Time Stamp 8
Record Source 12
TOTAL 53 bytes
Figure 4-10: Typical Hub Row Sizing
If the block size is 16,384 bytes (16k) then it can fit approximately 309 rows per disk I/O. If the block size is 32k, then the Hub can fit approximately 618 rows per disk I/O. With a block size at 64k the Hub can fit approximately 1236 rows per disk I/O. The best average is around 1000 rows per block. The Data Vault implementation book covers the mathematics in detail, along with the loading mechanisms, block sizes, and row widths.
NOTE: THIS INFORMATION IS TECHNICAL IN NATURE, AND WILL BE COVERED IN DEPTH IN THE DATA VAULT IMPLEMENTATION BOOK, AND IN THE COACHING AREA. THIS INFORMATION IS HERE TO CLARIFY THE PRESENTED TOPIC.
Do not break the rules of the design or architecture. If the rules are broken, the design will suffer re-engineering in the near future. It also breaks the ability to keep costs down from a maintenance perspective. The Data Vault model is based on scalability mathematics involved in computing near-linear scalability from an MPP (massively parallel processing) perspective.
Super Charge Your Data Warehouse Page 77 of 152
© Dan Linstedt 2010-2011, all rights reserved http://LearnDataVault.com