• No results found

Decision Support Systems

N/A
N/A
Protected

Academic year: 2020

Share "Decision Support Systems"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Data warehouse

section 2

Decision Support Systems

5-1

(2)

Introduction

• Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business strategies.

• Emphasis is on complex, interactive, exploratory analysis of very large datasets created by integrating data from across all parts of an enterprise; data is fairly static.

• Contrast such On-Line Analytic Processing (OLAP) with

traditional On-line Transaction Processing (OLTP): mostly long queries, instead of short update.

(3)

Three Complementary Trends

• Data Warehousing: Consolidate data from many sources in one large repository.

• Loading, periodic synchronization of replicas.

• Semantic integration.

• OLAP:

• Complex SQL queries and views.

• Queries based on spreadsheet-style operations and

“multidimensional” view of data.

• Interactive and “online” queries.

• Data Mining

: discover patterns/knowledge in

data( classification, clustering, prediction, association, sequence techniques).

(4)

Data Warehousing

• Integrated data spanning long time periods, often augmented with summary information.

• Several gigabytes to terabytes common.

• Interactive response times

expected for complex queries;

ad-hoc updates uncommon.

EXTERNAL DATA SOURCES

EXTRACT TRANSFORM LOAD

REFRESH

DATA WAREHOUSE

Metadata Repository

SUPPORTS

DATA OLAP

MINING

(5)

What is a Data Warehouse?

Defined in many different ways, but not rigorously.

A decision support database that is maintained separately from the organization’s operational database

Support information processing by providing a solid platform of consolidated, historical data for analysis.

“A data warehouse is a subject-oriented, integrated, time- variant, and nonvolatile collection of data in support of

management’s decision-making process.”—W. H. Inmon

Data warehousing:

• The process of constructing and using data warehouses

(6)

What is a Data Warehouse?

• Collection of diverse data

• subject oriented

• aimed at executive, decision maker

• often a copy of operational data.

• with value-added data (e.g., summaries, history)

• integrated

• time-varying

• non-volatile

6

(7)

What is a Warehouse?

• Collection of tools

• gathering data

• cleansing, integrating, ...

• querying, reporting, analysis

• data mining

• monitoring, administering warehouse

7

(8)

Data Warehouse characteristics:

Subject-oriented

• Organized around major subjects, such as customer, product, sales.

• Focusing on the modeling and analysis of data for

decision makers, not on daily operations or transaction processing

• Provide a simple and concise view around particular

subject issues by excluding data that are not useful in the decision support process

(9)

Data Warehouse characteristics: Integrated

Constructed by integrating multiple, heterogeneous data sources

• relational databases, files, on-line transaction records.

• Data cleaning and data integration techniques are applied.

Ensure consistency in naming conventions, encoding structures.

• When data is moved to the warehouse, it is converted.

(10)

Data Warehouse characteristics : Time Variant

• The time horizon for the data warehouse is significantly longer than that of operational systems.

• Operational database: current value data.

• Data warehouse data: provide information

from a historical perspective (e.g., past 5-10

years)

(11)

Data Warehouse characteristics : Non-volatile

• A physically separate store of data transformed from the operational environment.

• Operational update of data does not occur in the data warehouse environment

• Requires only two operations in data accessing:

• initial loading of data and access of data

(12)

Data Warehouse vs. Heterogeneous DBMS

Traditional heterogeneous DB integration: A query driven approach

Build wrappers/mediators on top of heterogeneous databases.

When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual

heterogeneous sites involved, and the results are integrated into a global answer set

Complex information filtering, compete for resources

Data warehouse: update-driven, high performance

Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

(13)

DW Modeling

A data warehouse is based on a multidimensional data model which views data in the form of a data cube

A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions

Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)

Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables

(14)

Example: AllElectronics DB

(15)

From tables to data cubes

(16)

16

From tables to data cubes

(17)

From tables to data cubes

(18)

From tables to data cubes

(19)

Cuboids Corresponding to the Cube

all

product date country

product,date

product, country

date, country

product, date, country

0-D(apex) cuboid 1-D cuboids

2-D cuboids

3-D(base) cuboid In data warehousing literature, an n-D base cube is called a base

cuboid. The top most 0-D cuboid, which holds the highest-level of

summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Each cuboid represents a different degree of summarization, or group by

(20)

Cuboids Corresponding to the

Cube

(21)

Cube: A Lattice of Cuboids

time,item

time,item,location

time, item, location, supplier

all

time item location supplier

time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,supplier

time,location,supplier

item,location,supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

(22)

Modeling of Data Warehouses

Modeling data warehouses: dimensions & measures

Multidimensional modeling:

Star schema: A fact table in the middle connected to a set of dimension tables

Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake

Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

(23)

Example of Star Schema

time_key

day

day_of_the_week month

quarter year

time

location_key street

city

state_or_province country

location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures

item_key item_name brand

type

supplier_type

item

branch_key branch_name branch_type

branch

(24)

Example of Snowflake Schema

time_key day

day_of_the_week month

quarter year

time

location_key street

city_key

location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures

item_key item_name brand

type

supplier_key

item

branch_key branch_name branch_type

branch

supplier_key supplier_type

supplier

city_key city

state_or_province country

city

(25)

Example of Fact Constellation

time_key day

day_of_the_week month

quarter year

time

location_key street

city

province_or_state country

location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures

item_key item_name brand

type

supplier_type

item

branch_key branch_name branch_type

branch

Shipping Fact Table time_key

item_key shipper_key from_location to_location dollars_cost units_shipped

shipper_key shipper_name location_key shipper_type

shipper

(26)

Data warehouse vs. data mart

• A data warehouse collects information about subjects that

span the entire organization, such as customers, items, sales, assets, and personnel.

Its scope is enterprise-wide

For data warehouses, the fact constellation schema is commonly used, since it can model multiple, interrelated subjects.

A data mart is a department subject of the data warehouse that focuses on selected subjects

Its scope is department-wide

Star or snowflake schema are commonly, with the former more popular

(27)

Data Mart

Dependent

 Created from warehouse

 Replicated

 Functional subset of warehouse

Independent

 Scaled down, less expensive version of data warehouse

 Designed for a department.

 Organization may have multiple data marts

 Difficult to integrate.

(28)

A Concept Hierarchy: Dimension (location)

all

Europe North_America

Mexico Canada

Spain Germany

Vancouver

M. Wind L. Chan

...

...

...

... ...

...

all region

office country

Toronto Frankfurt

city

(29)

A Concept Hierarchy in partial order: time

Year Quarter

Month Week Day

In the location example, attributes of a dimension are related by a total order, forming a concept hierarchy.

Alternatively, attributes of a dimension can be organized in a partial order,

forming a lattice.

(30)

How are concepts hierarchies useful in OLAP?

In the multidimensional model, data are organized into

multiple dimensions, and each dimension contains multiple levels of abstraction defined by concept hierarchies.

This organization provides users with flexibility to view data from

different

perspectives.

(31)

Multidimensional Data

• For each dimension, the set of values can be organized in a hierarchy:

• Sales volume as a function of product, month, and region

Product

Region

Month

Dimensions: Product, Location, Time Hierarchical summarization paths

Industry Region Year Category Country Quarter

Product City Month Week Office Day

References

Related documents

A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a unified schema, at a single site. " Greatly simplifies querying,

Data warehouse, A multi-Dimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, Further Development of Data Cube Technology, From data

•Introduction •Motivation •Objectives Proposal Multidimensional Concepts •Example •Conclusions •Future Work. Next, let us present the FactEntity, FE, model, which is

In the core of any data warehouse is a concept of a multidimensional data cube. The data in the cube is stored in specialized relations, called fact and dimen- sion relations.

4.2 Data Warehouse Modeling: Data Cube and OLAP 141 time Dimension table sales Fact table item Dimension table branch Dimension table location Dimension table supplier Dimension

Heterogeneous – Multiple business areas and disparate applications Data Preparation Granular views, No data movement Data warehouse model with. transactional detail, persisted

Active and Real-time Data Warehousing, Cube, Data mart, Data mining, Data warehouse, Data warehouse life-cycle and design, Data warehouse maintenance, evolution and versioning,

• Data cube & rollup operators. • Data warehouse