performance
2.2.2 Primary and foreign keys
2.4.2.2 Fact tables
In general fact tables contain two field types, namely numeric fact values and foreign key reference attributes to dimension tables. Figure 2.24 shows a more detailed version of the snowflake schema shown in Figure 2.15. The shipment, group, contents, and voyage specific table fields are named as such simply to indicate their relational table origins. All of these attributes are measurements of one form or another. The remaining attributes for the shipment table shown in Figure 2.24 are all dimensional table foreign keys. The two voyage departure and arrival dates could be removed to a time- based table and replaced with the appropriate foreign keys.
Data warehouse fact tables can be loosely classified into various areas. Once again classifications may vary depending on the data warehouse. Fact table types are as follows:
Fact tables can contain detailed fact record entries.
Fact tables can be summary tables, which are aggregations of details. Detail rows are not stored in the data warehouse, probably having been summarized in transportation from transactional databases.
Fact tables can be simply added to or changed, but are preferably only partially changed, or better still static. The least intrusive method of adding new data is during low usage times and appended to the end of existing data.
Fact table attribute types are another significant point to note with respect to fact tables. Facts are often known as additive, semi-additive, or non-additive. For example, accounting amounts can be added together (additive) but averages cannot (non-additive). Semi-additive facts can be accumulated for some but not all facts.
In Figure 2.24 the shipment.shipment_gross_weight is additive across all dimensions. In other words there is a gross weight for a sender (the per-
son or company shipping the container), the consignee (the person or com- pany receiving the container), a vessel, and a container. The shipment.contents_contents attribute is semi-additive because contents are only specific to each container and thus meaningless to sender, consignee, or vessel. Multiple senders and consignees can have their individual ship- ments grouped in a single container or even spread across multiple contain- ers. The shipment.voyage# and shipment.voyage_leg# attributes are non- additive since these values simply do not make sense to be accumulated. However, an average number of legs per voyage, per vessel might make some sense as an accumulation.
Figure 2.24
A detailed fact table
It is always preferable that fact tables are only added to, and also best appended to. Less desirable situations are fact tables with updatable cumulative fields such as summaries or even fact tables requiring dele- tions. Even worse are fact tables containing directly updatable numerical values. For example, if a table contains a history of stock movements and all prices have increased, it is possible that a large portion of those stock movement entries must be updated. It might be best in a situation such as this to maintain stock item prices outside of the stock movements table, perhaps as part of a dimension.
Granularity, Granularity, and Granularity! The most significant factor with respect to fact tables is granularity—how much data to keep, to what level of detail. Do you store every transaction or do you summarize transac- tions and only store totals for each day, month, per client, and so on?
Granularity is deciding how much detail the data warehouse will need in the future. This is all about requirements. Firstly, do the end-users know how much detail they will need? Do you understand and trust that they know what they need? Will they change their minds? Probably. If there is any doubt, and if possible, keep everything! Now there’s a simple of rule of thumb. Disk space is cheap unless your data warehouse is truly humun- gous. If that is the case you have plenty of other problems to deal with as well, You also have a good reason why particular details might be missing say one year down the line when some really irritated executive comes to ask you why the data he wants isn’t there. Too much data is your answer.
So the most important design issue with fact tables is the level of granu- larity. Simply put this means, does one save all the data or summarize it? Storing all data can lead to very large fact tables and thus very large data- bases. However, after data has been deleted from your transactional data sources it might be costly to discover that all the archived dimensional-fact combinations are required for reports at a later date. From a planning per- spective it might be best to begin by retaining all facts down to the smallest detail if at all possible. Data warehouses are expected to be large and disk space is cheap.
The time factor and how long should data be retained? How long do you keep data in the data warehouse for? Some data warehouses retain data in perpetuity and others discard data over a few years old. Expired data removed from a data warehouse can always be copied to backup copies. However, remember that if a data warehouse is extremely large, removing older data may cause serious performance problems for appending of new data, and most especially for end users. Many modern databases operate on
a global twenty-four hour time scale; there simply is no room for down time or even slow time for that matter.