ADBMS-lec 3-Data Mart.pptx

(1)

Lecture 03

Data Mart

Aniqa Naeem

(2)

TEXT BOOKS FOR DATA WAREHOUSING

1. “Building the Data Warehouse” By Inmon

2. “Database Systems: Models, Languages , Design and Application Programming” By Ramez Elmasri and Shamkant B. Navathe – 6th Edition”

3. “Advance Data Management” By Lena Wiese

REFERENCE BOOKS FOR DATA WAREHOUSING

4. “Data Warehousing (Design, Development and Best Practices)” By Soumendra Mohanty

5. “Mastering Data Warehouse Design” By Claudia Imhoff, Nicolas Galemmo, Jonathan G. Geiger

(3)

Books In Library

• _{“Fundamentals of Database}

Systems", 7th Ed, by Ramez Elmasri

and Shamkant B. Navathe. 2017

• _{“Database System: A Practical}

Approach to design,

Implementation and

Management” , 4

th

_{Ed, by Thomas}

(4)

Outline

• What is data mart?

• Reason for creating datamart

• _{Types of Data Mart} – _Independent – _Logical

– _Dependent

• _{Data Mart architecture}

(5)

Data Mart

Chapter 31– (31.5)

Book: Database System: A Practical Approach to design, Implementation and

Management” , 4th_{Ed, by Thomas Connolly}

(6)

6

Data Mart

• _A

_subset

_{of a data warehouse that supports the}

requirements of a particular department

or

business function

.

• _{Characteristics include:}

– _{Focuses on only the}_{requirements of one}

department or business function.

– _{Do not}_{normally contain}_{detailed operational}_data

unlike data warehouses.

(7)

Data Mart

• _{A data mart is focused on a}_{single functional area}_{of an}

organization and contains a subset of data stored in a Data Warehouse.

• _{A data mart is a condensed version of Data Warehouse}

and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in an organization.

• _{Data Mart usually draws data from only a}_{few sources}

(8)

.

8

Reasons for Creating a Data Mart:

• To give users access to the data they need to analyze most often.

• To provide data in a form that matches the collective view of the data by a group of users in a department or business

function area.

• _To_{improve end-user response}_{time due to the reduction in the}

volume of data to be accessed.

• Building a data mart is simpler compared with establishing a corporate data warehouse.

(9)

• _{There are 3 types of data mart}

1. Independent - created directly from operational systems to a separate physical data store.

2. Logical - exists as a subset of existing data

warehouse.

3. Dependent - created from data warehouse to a

separate physical data store.

9

(10)

Data Warehouse Data Warehouse

Operational System Operational

System

Independent Data Mart Independent

Data Mart _Dependent

Data Mart Dependent

Data Mart

(11)

Independent Data Mart

11

Pros

• Easy to implement

• Can eventually become a DW

Cons

• _Not_an_{enterprise-wide}_solution

• Costly as more Data Marts are added

• More than one version of the truth

(12)

Logical Data Mart

12 Pros

• Single version of the truth

• No historical data limits

• _{Allows drill downs, trend analysis} • _No _{transformations}_needed

Cons

(13)

Dependent Data Mart

13 Pros

• _{All advantages of}_{Logical Data Mart} • _{Allows for}_{physical control over data}

Cons

(14)

• _{Data mart architectures can be built as}_two-tier _or

three-tier database applications.

• _{First Tier}_:_The_{data warehouse}_{is the optional first}

tier (if the data warehouse provides the data for the data mart)

• _{Second Tier}_:_The_{data mart}_{is the second tier,}

• _{Third Tier:}_The_{end-user workstation}_{is the third tier}

as shown in the next slide.

• _{Data is distributed among the tiers.}

14

(15)

11

Typical Data Warehouse and Data Mart

Architecture

First Tier

(16)

Data Warehouse Information Flows

• _{In this section, we examine the activities associated}

with the processing (or flow) of data within a data warehouse.

• _{Data warehousing focuses on the management of}

five primary data flows which are described below: 1. Inflow

2. Up-flow 3. Down-flow 4. Out-flow 5. Meta-Flow

(17)

18

Inflow

Meta-flow

Outflow

Upflow

Downflow

(18)

Processes associated with the extraction, cleaning, and loading

of the data from the source systems into the data warehouse.

• _{The inflow is concerned with taking data}_{from the source systems}

to load into the data warehouse.

• _{As the source data is generated predominantly by}_{OLTP systems}_,

the data must be reconstructed for the purpose of the data warehouse.

• _{The reconstruction of data involves:}

1)cleaning dirty data.

2)restructuring data to suit the new requirements of the data warehouse including, for example, adding and/or removing fields, and denormalizing data.

3)ensuring that the source data is consistent with itself and with the data already in the warehouse.

1. Inflow

(19)

2. Up-flow

Processes associated with adding value to the data in the

warehouse through summarizing, packaging, and distribution of the data.

• The activities associated with the up-flow include:

1. Summarizing: Summarizing the data by selecting, projecting,

joining and grouping relational data into views that are more convenient and useful to the end-users.

2. Packaging: Packaging the data by converting the detailed or

summarized data into more useful format, such as spreadsheets,

text documents, charts, other graphical presentations, private databases and animations.

(20)

3. Down-Flow

Processes associated with archiving and

backing-up/recovery of data in the warehouse.

• Archiving old data plays an important role in maintaining the effectiveness and performance of the warehouse by taking backup to a storage archive such as tape , magnetic or optical disk.

• _{If the correct partitioning scheme is selected for the}

database, the amount of data should not affect.

• _Partitioning _{is a useful design option for very large} databases that enables the fragmentation of a table storing enormous numbers of tables into several

smaller tables.

• _{The downflow of data includes processes to ensure that}

(21)

4. Outflow:

Processes associated with making the

data available

to the end- users

.

• _{The outflow is where the real value of warehousing is}

realized by the organization.

• _{The two key activities involved in the outflow include:}

1. Accessing: It is concerned with satisfying the end-users’ requests (ad hoc or routine) for the data they need. The main issue is to create an environment so that users can effectively use the query tools to access the most appropriate data source.

2. Delivering: It is concerned with proactively delivering information to the end-users workstations. This is relatively new area, and is referred to as a type of ‘ publish-and-subscribe’ process. The warehouse publishes various ‘business objects’ that are revised periodically by

monitoring usage patterns. Users subscribe to the set of business objects that best meet their needs.

(22)

5. Meta-flow

Processes associated with the management of the metadata.

• Meta-data is the log of ETL and user queries through query manager.

• Meta-data is a description of the data contents of the data warehouse,

what is in it , where it came from originally and what has been done to it by the way of cleaning, integrating and summarizing.

• Meta-data associated with the data transformation and loading must

describe the source data and any changes that were made to the data.

For example, for each source field there should be a unique identifier, original field name, source data type and original location including the system and object name, along with the destination data type and

destination table name.

(23)

18

Inflow

Meta-flow

Outflow

Upflow

Downflow

(24)

Data Warehouse Information Flows

1. Inflow: extraction, cleansing and loading of source data

2. Up-flow: Adding values to the data in the

warehouse through summarizing, packaging, and distribution of the data

3. Down-flow: archiving and backing-up/recovery

of data in the warehouse.

4. Out-flow: making the data available to the end- users.

(25)

Summary

• _{Data Mart}_subset_{of a data warehouse that supports the}

requirements of a particular department or business function.

• _{3 types of data mart: independent, logical and dependent.}

• _{Data mart architectures can}_two-tier _or_three-tier

database applications, 1st tier contain backup/recovery,

2nd tier have data summaries and 3rd tier contain end-user

workstation.

• _{Dataflow in data warehouse are as follows: inflow,}

(26)