Lecture 03
Data Mart
Aniqa Naeem
TEXT BOOKS FOR DATA WAREHOUSING
1. “Building the Data Warehouse” By Inmon2. “Database Systems: Models, Languages , Design and Application Programming” By Ramez Elmasri and Shamkant B. Navathe – 6th Edition”
3. “Advance Data Management” By Lena Wiese
REFERENCE BOOKS FOR DATA WAREHOUSING
4. “Data Warehousing (Design, Development and Best Practices)” By Soumendra Mohanty
5. “Mastering Data Warehouse Design” By Claudia Imhoff, Nicolas Galemmo, Jonathan G. Geiger
Books In Library
•
“Fundamentals of Database
Systems", 7th Ed, by Ramez Elmasri
and Shamkant B. Navathe. 2017
•
“Database System: A Practical
Approach to design,
Implementation and
Management” , 4
thEd, by Thomas
Outline
• What is data mart?
• Reason for creating datamart
• Types of Data Mart – Independent – Logical
– Dependent
• Data Mart architecture
Data Mart
Chapter 31– (31.5)
Book: Database System: A Practical Approach to design, Implementation and
Management” , 4th Ed, by Thomas Connolly
6
Data Mart
•
A
subset
of a data warehouse that supports the
requirements of a particular department
or
business function
.
•
Characteristics include:
– Focuses on only the requirements of one
department or business function.
– Do not normally contain detailed operational data
unlike data warehouses.
Data Mart
• A data mart is focused on a single functional area of an
organization and contains a subset of data stored in a Data Warehouse.
• A data mart is a condensed version of Data Warehouse
and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in an organization.
• Data Mart usually draws data from only a few sources
.
8
Reasons for Creating a Data Mart:
• To give users access to the data they need to analyze most often.
• To provide data in a form that matches the collective view of the data by a group of users in a department or business
function area.
• To improve end-user response time due to the reduction in the
volume of data to be accessed.
• Building a data mart is simpler compared with establishing a corporate data warehouse.
• There are 3 types of data mart
1. Independent - created directly from operational systems to a separate physical data store.
2. Logical - exists as a subset of existing data
warehouse.
3. Dependent - created from data warehouse to a
separate physical data store.
9
Data Warehouse Data Warehouse
Operational System Operational
System
Independent Data Mart Independent
Data Mart Dependent
Data Mart Dependent
Data Mart
Independent Data Mart
11
Pros
• Easy to implement
• Can eventually become a DW
Cons
• Not an enterprise-wide solution
• Costly as more Data Marts are added
• More than one version of the truth
Logical Data Mart
12 Pros
• Single version of the truth
• No historical data limits
• Allows drill downs, trend analysis • No transformations needed
Cons
Dependent Data Mart
13 Pros
• All advantages of Logical Data Mart • Allows for physical control over data
Cons
• Data mart architectures can be built as two-tier or
three-tier database applications.
• First Tier: The data warehouse is the optional first
tier (if the data warehouse provides the data for the data mart)
• Second Tier: The data mart is the second tier,
• Third Tier: The end-user workstation is the third tier
as shown in the next slide.
• Data is distributed among the tiers.
14
11
Typical Data Warehouse and Data Mart
Architecture
First Tier
Data Warehouse Information Flows
• In this section, we examine the activities associated
with the processing (or flow) of data within a data warehouse.
• Data warehousing focuses on the management of
five primary data flows which are described below: 1. Inflow
2. Up-flow 3. Down-flow 4. Out-flow 5. Meta-Flow
18
Inflow
Meta-flow
Outflow
Upflow
Downflow
Processes associated with the extraction, cleaning, and loading
of the data from the source systems into the data warehouse.
• The inflow is concerned with taking data from the source systems
to load into the data warehouse.
• As the source data is generated predominantly by OLTP systems,
the data must be reconstructed for the purpose of the data warehouse.
• The reconstruction of data involves:
1)cleaning dirty data.
2)restructuring data to suit the new requirements of the data warehouse including, for example, adding and/or removing fields, and denormalizing data.
3)ensuring that the source data is consistent with itself and with the data already in the warehouse.
1. Inflow
2. Up-flow
Processes associated with adding value to the data in the
warehouse through summarizing, packaging, and distribution of the data.
• The activities associated with the up-flow include:
1. Summarizing: Summarizing the data by selecting, projecting,
joining and grouping relational data into views that are more convenient and useful to the end-users.
2. Packaging: Packaging the data by converting the detailed or
summarized data into more useful format, such as spreadsheets,
text documents, charts, other graphical presentations, private databases and animations.
3. Down-Flow
Processes associated with archiving and
backing-up/recovery of data in the warehouse.
• Archiving old data plays an important role in maintaining the effectiveness and performance of the warehouse by taking backup to a storage archive such as tape , magnetic or optical disk.
• If the correct partitioning scheme is selected for the
database, the amount of data should not affect.
• Partitioning is a useful design option for very large databases that enables the fragmentation of a table storing enormous numbers of tables into several
smaller tables.
• The downflow of data includes processes to ensure that
4. Outflow:
Processes associated with making the
data available
to the end- users
.
• The outflow is where the real value of warehousing is
realized by the organization.
• The two key activities involved in the outflow include:
1. Accessing: It is concerned with satisfying the end-users’ requests (ad hoc or routine) for the data they need. The main issue is to create an environment so that users can effectively use the query tools to access the most appropriate data source.
2. Delivering: It is concerned with proactively delivering information to the end-users workstations. This is relatively new area, and is referred to as a type of ‘ publish-and-subscribe’ process. The warehouse publishes various ‘business objects’ that are revised periodically by
monitoring usage patterns. Users subscribe to the set of business objects that best meet their needs.
5. Meta-flow
Processes associated with the management of the metadata.
• Meta-data is the log of ETL and user queries through query manager.
• Meta-data is a description of the data contents of the data warehouse,
what is in it , where it came from originally and what has been done to it by the way of cleaning, integrating and summarizing.
• Meta-data associated with the data transformation and loading must
describe the source data and any changes that were made to the data.
For example, for each source field there should be a unique identifier, original field name, source data type and original location including the system and object name, along with the destination data type and
destination table name.
18
Inflow
Meta-flow
Outflow
Upflow
Downflow
Data Warehouse Information Flows
1. Inflow: extraction, cleansing and loading of source data
2. Up-flow: Adding values to the data in the
warehouse through summarizing, packaging, and distribution of the data
3. Down-flow: archiving and backing-up/recovery
of data in the warehouse.
4. Out-flow: making the data available to the end- users.
Summary
• Data Mart subset of a data warehouse that supports the
requirements of a particular department or business function.
• 3 types of data mart: independent, logical and dependent.
• Data mart architectures can two-tier or three-tier
database applications, 1st tier contain backup/recovery,
2nd tier have data summaries and 3rd tier contain end-user
workstation.
• Dataflow in data warehouse are as follows: inflow,