• No results found

Decision Support Systems5-1

N/A
N/A
Protected

Academic year: 2020

Share "Decision Support Systems5-1"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Data warehouse

section 2

Decision Support Systems

(2)

Introduction

• Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business strategies.

• Emphasis is on complex, interactive, exploratory analysis of very large datasets created by integrating data from across all parts of an enterprise; data is fairly static.

• Contrast such On-Line Analytic Processing (OLAP) with

traditional On-line Transaction Processing (OLTP): mostly long queries, instead of short update.

(3)

Three Complementary Trends

• Data Warehousing: Consolidate data from many sources in one large repository.

• Loading, periodic synchronization of replicas. • Semantic integration.

OLAP:

• Complex SQL queries and views.

• Queries based on spreadsheet-style operations and “multidimensional” view of data.

• Interactive and “online” queries.

• Data Mining: discover patterns/knowledge in data(

classification, clustering, prediction, association, sequence techniques).

(4)

Data Warehousing

• Integrated data spanning long time periods, often augmented with summary information.

• Several gigabytes to terabytes common.

• Interactive response times

expected for complex queries; ad-hoc updates uncommon.

EXTERNAL DATA SOURCES EXTRACT TRANSFORM LOAD REFRESH DATA WAREHOUSE Metadata Repository SUPPORTS OLAP DATA MINING

(5)

What

is a Data Warehouse?

• Defined in many different ways, but not rigorously.

• A decision support database that is maintained separately

from the organization’s operational database

• Support information processing by providing a solid platform of consolidated, historical data for analysis.

• “A data warehouse is a subject-oriented, integrated,

time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon

• Data warehousing:

(6)

What

is a Data Warehouse?

Collection of diverse

data

•subject oriented

•aimed at executive, decision maker •often a copy of operational data.

•with value-added data (e.g., summaries, history) •integrated

time-varying •non-volatile

(7)

What

is a Warehouse?

Collection of

tools

•gathering data

•cleansing, integrating, ...

•querying, reporting, analysis •data mining

•monitoring, administering warehouse

(8)

Data Warehouse characteristics:

Subject-oriented

• Organized around major subjects, such as customer, product, sales.

• Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing

• Provide a simple and concise view around particular

subject issues by excluding data that are not useful in the decision support process

(9)

Data

Warehouse

characteristics:

Integrated

Constructed by integrating multiple, heterogeneous data sources

• relational databases, files, on-line transaction records. • Data cleaning and data integration techniques are applied.

• Ensure consistency in naming conventions, encoding structures.

(10)

Data

Warehouse

characteristics

:

Time

Variant

• The time horizon for the data warehouse is significantly longer than that of operational systems.

• Operational database: current value data.

• Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

(11)

Data

Warehouse

characteristics

:

Non-volatile

• A physically separate store of data transformed from the operational environment.

• Operational update of data does not occur in the data warehouse environment

• Requires only two operations in data accessing:

(12)

Data Warehouse vs. Heterogeneous DBMS

• Traditional heterogeneous DB integration: A query driven approach • Build wrappers/mediators on top of heterogeneous databases. • When a query is posed to a client site, a meta-dictionary is used

to translate the query into queries appropriate for individual

heterogeneous sites involved, and the results are integrated into a global answer set

• Complex information filtering, compete for resources • Data warehouse: update-driven, high performance

• Information from heterogeneous sources is integrated in advance

(13)

DW Modeling

• A data warehouse is based on a multidimensional data model which views data in the form of a data cube

• A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions

• Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)

• Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables

(14)
(15)
(16)

16

(17)
(18)
(19)

Cuboids

Corresponding

to

the

Cube

all

product date country

product,date

product, country

date, country

product, date, country

0-D(apex) cuboid 1-D cuboids

2-D cuboids

3-D(base) cuboid In data warehousing literature, an n-D base cube is called a base

cuboid. The top most 0-D cuboid, which holds the highest-level of

summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

(20)
(21)

Cube: A Lattice of Cuboids

time,item

time,item,location

time, item, location, supplier

all

time item location supplier

time,location time,supplier item,location item,supplier location,supplier time,item,supplier time,location,supplier item,location,supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid

(22)

Modeling of Data Warehouses

• Modeling data warehouses: dimensions & measures • Multidimensional modeling:

• Star schema: A fact table in the middle connected to a set of dimension tables

• Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake

• Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called

(23)

Example of Star Schema

time_key day day_of_the_week month quarter year time location_key street city state_or_province country location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch

(24)

Example of Snowflake Schema

time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city state_or_province country city

(25)

Example of Fact Constellation

time_key day day_of_the_week month quarter year time location_key street city province_or_state country location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch

Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

(26)

Data warehouse vs. data mart

• A data warehouse collects information about subjects that span the entire organization, such as customers, items, sales, assets, and personnel.

• Its scope is enterprise-wide

• For data warehouses, the fact constellation schema is commonly used, since it can model multiple, interrelated subjects.

• A data mart is a department subject of the data warehouse that focuses on selected subjects

• Its scope is department-wide

(27)

Data Mart

▪Dependent

▪ Created from warehouse ▪ Replicated

▪ Functional subset of warehouse

▪Independent

▪ Scaled down, less expensive version of data warehouse

▪ Designed for a department.

▪ Organization may have multiple data marts

(28)

A Concept Hierarchy: Dimension (location)

all Europe North_America Mexico Canada Spain Germany Vancouver M. Wind L. Chan ... ... ... ... ... ... all region office country Toronto Frankfurt city

(29)

A Concept Hierarchy in partial order: time

Year

Quarter

Month Week

Day

■ In the location example, attributes of a

dimension are related by a total order, forming a concept hierarchy.

■ Alternatively, attributes of a dimension

can be organized in a partial order, forming a lattice.

(30)

How are concepts hierarchies useful in OLAP?

■ In the multidimensional model, data are organized into

multiple dimensions, and each dimension contains multiple levels of abstraction defined by concept hierarchies.

■ This organization provides users with flexibility to view data

(31)

Multidimensional Data

• For each dimension, the set of values can be organized in a hierarchy:

• Sales volume as a function of product, month, and region

Product

Region

Month

Dimensions: Product, Location, Time

Hierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

References

Related documents

• Each country report includes data for product groups, business segments, distribution channels and customer segments separately as well as aggregated for the total market of

Dowkrxjk rxu prgho vshflfdwlrq lv forvho| uhodwhg wr wkdw ri Ju qhu dqg Khihnhu +4<<<,/ wkh phfkdqlvp ehklqg wkh uhvxowv lv frpsohwho| glhuhqw1 Lq wkhlu prgho/

At this level it is difficult to identify the cause of this enhanced uncertainty factors: working at room temperature in APSIS should avoid the large temperature extrapolation

The adaptive selection threshold approach used in FEAST also illustrated a number of interesting properties of event-based visual classification and demonstrated the ability to

Logic Delay Logic Delay Leakage per Leakage per instruction instruction Energy Efficiency Energy Efficiency Voltage Supply [V] Voltage Supply [V] Power Power Consumption

Hence, a SAGE receiver has to receive multiple pixel streams from different application instances or different application nodes (on multiple rendering nodes) in one instance and

Last but not least, el duque de Montalto-cardenal Moncada, cuñado tanto del marqués de Aytona - se había casado en segundas nupcias con su hermana Caterina - como del

In addition to controlling the access to the network itself, in order to control the access of users to the multicast session, an Authentication, Authorization and Accounting