• No results found

Chapter 10. Populating the Data Warehouse

A.6 CelDial Metadata - Proposed Solution

No model is complete without its metadata. We include here a sample of metadata that could be used for our proposed solution. It is not complete, but it provides much of the needed metadata. It is left as an exercise for the reader to analyze the sample and try to determine additional metadata required for a complete solution.

MODEL METADATA

Name: Inventory

Definition: This model contains inventory data for each product model in each manufacturing plant, on a daily basis.

Purpose: The purpose of this model is to facilitate the analysis of inventory levels.

Contact Person: Plant Manager

Dimensions: Manufacturing, Product, and Time

Facts: Inventory

Measures: Quantity on hand, Reorder level, Total cost, and Total Revenue

Name: Sales

Definition: This model contains sales data for each product model, on each order, on a daily basis.

Purpose: The purpose of this model is to facilitate the analysis of product sales.

Contact Person: Regional Sales Manager

Dimensions: Customer, Manufacturing, Product, Seller, and Time

Facts: Sale

Measures: Total cost, Total revenue, Total quantity sold, and Discount amount

FACT METADATA

Name: Inventory

Definition: This fact contains inventory data for each product model in each manufacturing plant, on a daily basis.

Alias: None

Load Frequency: Daily Load Statistics:

Last Load Date: N/A

Number of Rows Loaded: N/A Usage Statistics:

Average Number of Queries/Day: N/A

Average Rows Returned/Query: N/A

Average Query Runtime: N/A

Maximum Number of Queries/Day: N/A

Maximum Rows Returned/Query: N/A

Maximum Query Runtime: N/A

Archive Rules: Data will be archived after 36 months, on a monthly basis.

Archive Statistics:

Last Archive Date: N/A

Date Archived to: N/A

Purge Rules: Data will be purged after 48 months, on a monthly basis.

Purge Statistics:

Last Purge Date: N/A

Date Purged to: N/A

Data Quality: Inventory levels may fluctuate throughout the day as more stock is received into inventory from production and stock is shipped out to retail stores and customers. The

measures for this fact are collected once per day and thus reflect the state of inventory at that point in time, which is the end of the working day for a plant.

Data Accuracy: The measures of this fact are 97.5% accurate at the point in time they represent. This is based on the results of physical inventories matched to recorded inventory levels.

No inference can be made from these measures as to values at points in time not recorded.

Grain of Time: The measures of this fact represent inventory levels on a daily basis.

Key: The key to an inventory fact is the combination of the keys of its dimensions: Manufacturing, Product, and Time.

Appendix A. The CelDial Case Study 171

Key Generation Method: The time portion of the key is simply the date the inventory level is being recorded for. The product key is retrieved from the product translation table using the product ID, model ID, and the inventory level date. The manufacturing key is retrieved from the manufacturing translation table using the region ID and plant ID.

Source:

Name: Inventory Table

Conversion Rules: Each row in each inventory table is copied into the inventory fact on a daily basis.

Selection Logic: All rows are selected from the inventory table in each plant.

Name: Sale Fact

Conversion Rules: The rows representing sales of an individual model produce from an individual plant are summarized for each day and the result appended to the relevant inventory fact.

Selection Logic: All rows for the day the inventory fact is being loaded with are selected from the sale fact.

Measures: Quantity on hand, Reorder level, Total cost, Total revenue Dimensions: Manufacturing, Product, and Time

Subsidiary Facts: None

Contact Person: Plant Manager

Name: Sale

Definition: This fact contains sale data for each order which has been recorded in the sales systems at each retail store and corporate sales office.

Alias: None

Load Frequency: Daily

Load Statistics:

Last Load Date: N/A

Number of Rows Loaded: N/A Usage Statistics:

Average Number of Queries/Day: N/A

Average Rows Returned/Query: N/A

Average Query Runtime: N/A

Maximum Number of Queries/Day: N/A

Maximum Rows Returned/Query: N/A

Maximum Query Runtime: N/A

Archive Rules: Data will be archived after 36 months, on a monthly basis.

Archive Statistics:

Last Archive Date: N/A

Date Archived to: N/A

Purge Rules: Data will be purged after 48 months, on a monthly basis.

Purge Statistics:

Last Purge Date: N/A

Date Purged to: N/A

Data Quality: It is possible for errors to be made by staff completing an order. However, the numbers recorded represent what is actually contracted with the customer and must be honored.

Data Accuracy: The measures of this fact are 100% accurate in that they represent what was actually sold.

Grain of Time: The measures of this fact represent sales of a given product on a given order.

Key: The key to a sale fact is the combination of the keys of its dimensions: Customer, Manufacturing, Order, Product, Seller, and Time. Note that Order does not have any additional attributes and thus does not physically exist as a dimension. Its primary function is to logically group product sales for analysis.

Key Generation Method: The time portion of the key is simply the date the sale takes place. The product key is retrieved from the product translation table using the product ID, model ID, and the sale date. The manufacturing key is retrieved from the manufacturing translation table using the region ID and plant ID. The seller key is retrieved from the seller translation table using the region ID, outlet ID, and salesperson ID. The customer key is retrieved from the customer translation table using the customer ID and customer location ID. The order ID from the original order is used as the order key for the sale fact.

Source:

Name: Order Table

Conversion Rules: Rows in each order table are copied into the sale fact on a daily basis.

Selection Logic: Only rows for the current transaction date are selected.

Name: Product Dimension

Conversion Rules: The product dimension is used to calculate the cost for the product model on an order.

As well, the negotiated unit price on the order is compared to the suggested retail or wholesale price (based on the outlet type) to determine if a discount was given. If so, the discount amount is calculated. If a discount is given for a product not eligible for

discounting, a message is printed on an error report.

Selection Logic: For each row being inserted into the sale fact, the product data is accessed.

Measures: Total cost, Total revenue, Total quantity sold, and Discount amount

Dimensions: Customer, Manufacturing, Order, Product, Seller, and Time Subsidiary Facts: Inventory: the inventory fact contains daily aggregates for

cost and revenue at the product and plant level of granularity.

Contact Person: Plant Manager

DIMENSION METADATA

Name: Customer

Definition: A customer is any person or organization who purchases goods from CelDial. A customer may be associated with many business locations.

Alias: None

Appendix A. The CelDial Case Study 173

Hierarchy: Data can be summarized at two levels for a customer. The lowest level of summarization is the customer ship-to address. Data from each location (ship-to address) can be further rolled up to summarize for an entire customer.

Change Rules: New customer locations are inserted as new rows into the dimension. Changes to existing locations are updated in place.

Load Frequency: Daily Load Statistics:

Last Load Date: N/A

Number of Rows Loaded: N/A Usage Statistics:

Average Number of Queries/Day: N/A

Average Rows Returned/Query: N/A

Average Query Runtime: N/A

Maximum Number of Queries/Day: N/A

Maximum Rows Returned/Query: N/A

Maximum Query Runtime: N/A Archive Rules: Customer data is not archived.

Archive Statistics:

Last Archive Date: N/A

Date Archived to: N/A

Purge Rules: Customers who have not purchased any goods from CelDial in the past 48 months will be purged on a monthly basis.

Purge Statistics:

Last Purge Date: N/A

Date Purged to: N/A

Data Quality: When a new customer is added a search is done to determine if we already do business with another location.

In rare cases separate branches of a customer are recorded as separate customers because this check fails.

Until such time as the customer notices separate locations dealing with us such occurrences remain as originally recorded.

Data Accuracy: Incorrect association of locations of a common customer occur in less than .5% of our customer data.

Key: The key to the customer dimension consists of a system generated number.

Key Generation Method: When a customer is copied from the operational system, the translation table is checked to determine if the

customer already exists in the warehouse. If not, a new key is generated and the key along with the customer ID and location ID are added to the translation table. If the customer and location already exist, the key from the translation table is used to determine which customer in the warehouse to update.

Source:

Name: Customer Table

Conversion Rules: Rows in each customer table are copied on a daily basis. For existing customers, the name is updated. For new customers, once a location is determined, the key is generated and a row

inserted. Before the update/insert takes place a check is performed for a duplicate customer name. If a

duplicate is detected, a sequence number is appended to the name. This check is repeated until the name and sequence number combination are determined to be unique. Once uniqueness has been confirmed, the update/insert takes place.

Selection Logic: Only new or changed rows are selected.

Name: Customer Location Table

Conversion Rules: Rows in each customer location table are copied on a daily basis. For existing

customer locations, the ship-to address is updated. For new customer locations, the key is generated and a row inserted.

Selection Logic: Only new or changed rows are selected.

Attributes:

Name: Customer Key

Definition: This is an arbitrary value assigned to guarantee uniqueness for each customer and location.

Alias: None

Change Rules: Once assigned, the values of this attribute never change.

Data Type: Numeric

Domain: 1 - 999,999,999

Derivation Rules: A system generated key of the highest used customer key +1 is assigned when creating a new customer and location entry.

Source: System Generated

Name: Name

Definition: This is the name by which a customer is known to CelDial.

Alias: None

Change Rules: When a customer name changes it is updated in place in this dimension.

Data Type: Character(30)

Domain:

Derivation Rules: To ensure the separation of data for customers who have the same name but are not part of the same organization, a number will be appended to names where duplicates exist.

Source: Name in Customer Table

Name: Ship-to Address

Definition: This is an address where CelDial ships goods to a corporate customer. It is possible for a corporate customer to have multiple ship-to locations.

For retail customers no ship-to address is kept.

Therefore, there can only be one entry in the customer dimension for a retail customer.

Alias: None

Change Rules: When a ship-to address changes it is updated in place in this dimension.

Data Type: Character(60)

Appendix A. The CelDial Case Study 175

Domain: All valid addresses within CelDial′s service area.

Derivation Rules: The ship-to address is a direct copy of the source.

Source: Ship-to Address in Customer Location Table

Facts: Sale

Measures: Total cost, Total revenue, Total quantity sold, and Discount amount

Subsidiary Dimensions: None

Contact Person: Vice-president of Sales and Marketing

Name: Manufacturing

Definition: The manufacturing dimension represents the

manufacturing plants owned and operated by CelDial.

Plants are grouped into geographic regions.

Alias: None

Hierarchy: Data can be summarized at two levels for manufacturing.

The lowest level of summarization is the manufacturing plant. Data from each plant can be further rolled up to summarize for an entire geographic region.

Change Rules: New plants are inserted as new rows into the dimension.

Changes to existing plants are updated in place.

Load Frequency: Daily Load Statistics:

Last Load Date: N/A

Number of Rows Loaded: N/A Usage Statistics:

Average Number of Queries/Day: N/A

Average Rows Returned/Query: N/A

Average Query Runtime: N/A

Maximum Number of Queries/Day: N/A

Maximum Rows Returned/Query: N/A

Maximum Query Runtime: N/A Archive Rules: Manufacturing plant data is not archived.

Archive Statistics:

Last Archive Date: N/A

Date Archived to: N/A

Purge Rules: Manufacturing plants that have been closed for at least 48 months will be purged on a monthly basis.

Purge Statistics:

Last Purge Date: N/A

Date Purged to: N/A

Data Quality: There are no opportunities for error or misinterpretation of manufacturing plant data.

Data Accuracy: Manufacturing plant data is 100% accurate.

Key: The key to the manufacturing plant dimension consists of a system generated number.

Key Generation Method: When a manufacturing plant is copied from the operational system, the translation table is checked to determine if the plant already exists in the warehouse. If not, a new key is generated and the key along with the plant ID and region ID are added to the translation table. If the plant and region already exist, the key from the

translation table is used to determine which plant in the warehouse to update.

Source:

Name: Manufacturing Plant Table

Conversion Rules: rows in each plant table are copied on a daily basis. For existing plants, the plant name is updated. For new plants, once a region is determined, the key is generated and a row inserted.

Selection Logic: Only new or changed rows are selected.

Name: Manufacturing Region Table

Conversion Rules: Rows in each region table are copied on a daily basis. For existing regions, the region name is updated for all plants in the region.

For new regions, the key is generated and a row inserted.

Selection Logic: Only new or changed rows are selected.

Attributes:

Name: Manufacturing Key

Definition: This is an arbitrary value assigned to guarantee uniqueness for each plant and region.

Alias: None

Change Rules: Once assigned, the values of this attribute never change.

Data Type: Numeric

Domain: 1 - 999,999,999

Derivation Rules: system generated key of the highest used manufacturing key + 1 is assigned when

creating a new plant and region entry.

Source: System Generated

Name: Region Name

Definition: This is the name CelDial uses to identify a geographic region for the purpose of grouping manufacturing plants.

Alias: None

Change Rules: When a region name changes it is updated in place in this dimension.

Data Type: Character(30)

Domain:

Derivation Rules: The region name is a direct copy of the source

Source: Name in Manufacturing Region Table

Name: Plant Name

Definition: This is the name CelDial uses to identify an individual manufacturing plant.

Alias: None

Change Rules: When a plant name changes it is updated in place in this dimension.

Data Type: Character(30)

Domain:

Derivation Rules: The plant name is a direct copy of the source

Appendix A. The CelDial Case Study 177

Source: Name in Manufacturing Plant Table

Facts: Inventory and Sale

Measures: Quantity on hand, Reorder level, Total cost, Total revenue, Total quantity sold, and Discount amount

Subsidiary Dimensions: None

Contact Person: Plant Manager

Name: Time

Definition: The time dimension represents the time frames used by CelDial for reporting purposes.

Alias: None

Hierarchy: The lowest level of summarization is a day. Data for a given day can be rolled up into either weeks or months.

Weeks cannot be rolled up into months.

Change Rules: Once a year the following year′s dates are inserted as new rows into the dimension. There are no updates to this dimension.

Load Frequency: Annually Load Statistics:

Last Load Date: N/A

Number of Rows Loaded: N/A Usage Statistics:

Average Number of Queries/Day: N/A

Average Rows Returned/Query: N/A

Average Query Runtime: N/A

Maximum Number of Queries/Day: N/A

Maximum Rows Returned/Query: N/A

Maximum Query Runtime: N/A Archive Rules: Time data is not archived.

Archive Statistics:

Last Archive Date: N/A

Date Archived to: N/A

Purge Rules: Time data more than 4 years old will be purged on a yearly basis.

Purge Statistics:

Last Purge Date: N/A

Date Purged to: N/A

Data Quality: There are no opportunities for error or misinterpretation of time data.

Data Accuracy: Time data is 100% accurate.

Key: The key to the time dimension is a date in YYYYMMDD (year-month-day) format.

Key Generation Method: The date in a row is used as the key.

Source:

Name: Calendar spreadsheet maintained by database administrator.

Conversion Rules: Rows in the calendar spreadsheet represent one calendar year. All the rows in the spreadsheet are loaded into the dimension annually.

Selection Logic: All rows are selected.

Attributes:

Name: Time Key

Definition: This is the date in YYYYMMDD format.

Alias: None

Change Rules: Once assigned, the values of this attribute never change.

Data Type: Numeric

Domain: valid dates

Derivation Rules: This date is a direct copy from the source.

Source: Numeric Date in Calendar spreadsheet

Name: Date

Definition: This is the descriptive date equivalent to the numeric date used as the key to this dimension. It is the date used on reports and to limit what data appears on a report. It is in the format MMM DD, YYYY.

Alias: None

Change Rules: Once assigned, the values of this attribute never change.

Data Type: Character(12)

Domain: valid dates in descriptive format

Derivation Rules: This date is a direct copy from the source.

Source: Descriptive Date in Calendar spreadsheet

Name: Week of Year

Definition: Each day of the year is assigned to a week for reporting purposes. Because years don′t divide evenly into weeks it is possible for a given day near the beginning or end of a calendar year to fall into a different year for weekly reporting purposes. The format is WW-YYYY.

Alias: None

Change Rules: Once assigned, the values of this attribute never change.

Data Type: Character(7)

Domain: WW is 1-52. YYYY is any valid year.

Derivation Rules: This date is a direct copy from the source.

Source: Week of Year in Calendar spreadsheet

Facts: Inventory and Sale

Measures: Quantity on hand, Reorder level, Total cost, Total revenue, Total quantity sold, and Discount amount

Subsidiary Dimensions: None

Contact Person: Data Warehouse Administrator

MEASURE METADATA

Name: Total Cost

Definition: This is the cost of all components used to create product models that have been sold.

Alias: None

Data Type: Numeric (9,2) Domain: $0.01 - $9,999,999.99.

Derivation Rules: The total cost is the product of the unit cost of a product model and quantity of the product model sold.

Appendix A. The CelDial Case Study 179

Usage Statistics:

Average Number of Queries/Day: N/A

Maximum Number of Queries/Day: N/A

Data Quality: This figure only represents the cost of components. No attempt is made to record labor or overhead costs. As well, cost is calculated using the current cost at the time a product model is sold. No attempt is made to determine when the model was produced and the cost at that time.

Data Accuracy: We estimate that the cost reported for a product model is accurate to within +/- .5%.

Facts: Inventory and Sale

Dimensions: Customer, Manufacturing, Product, Seller, and Time

Name: Total Revenue

Definition: This is the amount billed to customers for product models that have been sold.

Alias: None

Data Type: Numeric (9,2)

Data Type: Numeric (9,2)

Related documents