• No results found

ADBMS-lec 3-Data Mart.pptx

N/A
N/A
Protected

Academic year: 2020

Share "ADBMS-lec 3-Data Mart.pptx"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 03

Data Mart

Aniqa Naeem

(2)

TEXT BOOKS FOR DATA WAREHOUSING

1. “Building the Data Warehouse” By Inmon

2. “Database Systems: Models, Languages , Design and Application Programming” By Ramez Elmasri and Shamkant B. Navathe – 6th Edition”

3. “Advance Data Management” By Lena Wiese

REFERENCE BOOKS FOR DATA WAREHOUSING

4. “Data Warehousing (Design, Development and Best Practices)” By Soumendra Mohanty

5. “Mastering Data Warehouse Design” By Claudia Imhoff, Nicolas Galemmo, Jonathan G. Geiger

(3)

Books In Library

“Fundamentals of Database

Systems", 7th Ed, by Ramez Elmasri

and Shamkant B. Navathe. 2017

“Database System: A Practical

Approach to design,

Implementation and

Management” , 4

th

Ed, by Thomas

(4)

Outline

• What is data mart?

• Reason for creating datamart

Types of Data MartIndependentLogical

Dependent

Data Mart architecture

(5)

Data Mart

Chapter 31– (31.5)

Book: Database System: A Practical Approach to design, Implementation and

Management” , 4th Ed, by Thomas Connolly

(6)

6

Data Mart

A

subset

of a data warehouse that supports the

requirements of a particular department

or

business function

.

Characteristics include:

Focuses on only the requirements of one

department or business function.

Do not normally contain detailed operational data

unlike data warehouses.

(7)

Data Mart

A data mart is focused on a single functional area of an

organization and contains a subset of data stored in a Data Warehouse.

A data mart is a condensed version of Data Warehouse

and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in an organization.

Data Mart usually draws data from only a few sources

(8)

.

8

Reasons for Creating a Data Mart:

• To give users access to the data they need to analyze most often.

• To provide data in a form that matches the collective view of the data by a group of users in a department or business

function area.

To improve end-user response time due to the reduction in the

volume of data to be accessed.

• Building a data mart is simpler compared with establishing a corporate data warehouse.

(9)

There are 3 types of data mart

1. Independent - created directly from operational systems to a separate physical data store.

2. Logical - exists as a subset of existing data

warehouse.

3. Dependent - created from data warehouse to a

separate physical data store.

9

(10)

Data Warehouse Data Warehouse

Operational System Operational

System

Independent Data Mart Independent

Data Mart Dependent

Data Mart Dependent

Data Mart

(11)

Independent Data Mart

11

Pros

• Easy to implement

• Can eventually become a DW

Cons

Not an enterprise-wide solution

• Costly as more Data Marts are added

• More than one version of the truth

(12)

Logical Data Mart

12 Pros

• Single version of the truth

• No historical data limits

Allows drill downs, trend analysisNo transformations needed

Cons

(13)

Dependent Data Mart

13 Pros

All advantages of Logical Data MartAllows for physical control over data

Cons

(14)

Data mart architectures can be built as two-tier or

three-tier database applications.

First Tier: The data warehouse is the optional first

tier (if the data warehouse provides the data for the data mart)

Second Tier: The data mart is the second tier,

Third Tier: The end-user workstation is the third tier

as shown in the next slide.

Data is distributed among the tiers.

14

(15)

11

Typical Data Warehouse and Data Mart

Architecture

First Tier

(16)

Data Warehouse Information Flows

In this section, we examine the activities associated

with the processing (or flow) of data within a data warehouse.

Data warehousing focuses on the management of

five primary data flows which are described below: 1. Inflow

2. Up-flow 3. Down-flow 4. Out-flow 5. Meta-Flow

(17)

18

Inflow

Meta-flow

Outflow

Upflow

Downflow

(18)

Processes associated with the extraction, cleaning, and loading

of the data from the source systems into the data warehouse.

The inflow is concerned with taking data from the source systems

to load into the data warehouse.

As the source data is generated predominantly by OLTP systems,

the data must be reconstructed for the purpose of the data warehouse.

The reconstruction of data involves:

1)cleaning dirty data.

2)restructuring data to suit the new requirements of the data warehouse including, for example, adding and/or removing fields, and denormalizing data.

3)ensuring that the source data is consistent with itself and with the data already in the warehouse.

1. Inflow

(19)

2. Up-flow

Processes associated with adding value to the data in the

warehouse through summarizing, packaging, and distribution of the data.

• The activities associated with the up-flow include:

1. Summarizing: Summarizing the data by selecting, projecting,

joining and grouping relational data into views that are more convenient and useful to the end-users.

2. Packaging: Packaging the data by converting the detailed or

summarized data into more useful format, such as spreadsheets,

text documents, charts, other graphical presentations, private databases and animations.

(20)

3. Down-Flow

Processes associated with archiving and

backing-up/recovery of data in the warehouse.

• Archiving old data plays an important role in maintaining the effectiveness and performance of the warehouse by taking backup to a storage archive such as tape , magnetic or optical disk.

If the correct partitioning scheme is selected for the

database, the amount of data should not affect.

Partitioning is a useful design option for very large databases that enables the fragmentation of a table storing enormous numbers of tables into several

smaller tables.

The downflow of data includes processes to ensure that

(21)

4. Outflow:

Processes associated with making the

data available

to the end- users

.

The outflow is where the real value of warehousing is

realized by the organization.

The two key activities involved in the outflow include:

1. Accessing: It is concerned with satisfying the end-users’ requests (ad hoc or routine) for the data they need. The main issue is to create an environment so that users can effectively use the query tools to access the most appropriate data source.

2. Delivering: It is concerned with proactively delivering information to the end-users workstations. This is relatively new area, and is referred to as a type of ‘ publish-and-subscribe’ process. The warehouse publishes various ‘business objects’ that are revised periodically by

monitoring usage patterns. Users subscribe to the set of business objects that best meet their needs.

(22)

5. Meta-flow

Processes associated with the management of the metadata.

• Meta-data is the log of ETL and user queries through query manager.

• Meta-data is a description of the data contents of the data warehouse,

what is in it , where it came from originally and what has been done to it by the way of cleaning, integrating and summarizing.

• Meta-data associated with the data transformation and loading must

describe the source data and any changes that were made to the data.

For example, for each source field there should be a unique identifier, original field name, source data type and original location including the system and object name, along with the destination data type and

destination table name.

(23)

18

Inflow

Meta-flow

Outflow

Upflow

Downflow

(24)

Data Warehouse Information Flows

1. Inflow: extraction, cleansing and loading of source data

2. Up-flow: Adding values to the data in the

warehouse through summarizing, packaging, and distribution of the data

3. Down-flow: archiving and backing-up/recovery

of data in the warehouse.

4. Out-flow: making the data available to the end- users.

(25)

Summary

Data Mart subset of a data warehouse that supports the

requirements of a particular department or business function.

3 types of data mart: independent, logical and dependent.

Data mart architectures can two-tier or three-tier

database applications, 1st tier contain backup/recovery,

2nd tier have data summaries and 3rd tier contain end-user

workstation.

Dataflow in data warehouse are as follows: inflow,

(26)

References

Related documents

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents

We configure an inhibition-augmented COSFIRE filter by using two different types of prototype patterns, namely one positive pattern and one or more negative pattern(s), in order

Null Hypothesis (H 0 4): Patient adherence to antidiabetic medication (Morisky 8- item Adherence Scale) and HbA1c levels are not statistically significant predictors of the severity

Advances in medical treatment, management, diagnosis and surgical palliation have improved the quality and longevity of children born with Congenital Heart Disease. As

Initially, I had difficulty understanding how it was that students were integrating the various disciplinary perspectives in their pursuit of the question, “What does it mean to

Players can create characters and participate in any adventure allowed as a part of the D&D Adventurers League.. As they adventure, players track their characters’

The purpose of this two hour CE course is to provide an overview of the professional aspects of the Certified Nursing Assistant's (CNAs) role and to explore the importance

In the conclusion (Section 5), the processing of mobile phone data, by offering new maps of site practices and information on temporary populations and city usage