• No results found

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

N/A
N/A
Protected

Academic year: 2021

Share "Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

DW Source Integration, Tools,

and Architecture

Original slides were written by Torben Bach Pedersen

Aalborg University 2007 - DWML course 2

Overview

DW Front End Tools

Source Integration

DW architecture

End User Applications (EUA)

The business impact of the DW!

Canned reports

End user application templates Provide answers to common questions

Can be used as (quality-assured) building blocks for other reports

Two extremes

Ad hoc strategic analysis, power users, DIY query tools

Fixed operational analysis, report consumers, operational reporting

EUA fills the gap

“Tactical” analysis, push-button knowledge workers

EUA Concepts

Templates

Layout/structure + parameters

Compare sales per product in <area> for <period1> and <period2>

Parameters - chosen at run-time

Come from any level of the given dimension – drill-down Time (All time, 2002, 2002 4Q, 2002 Dec, 2002 Dec 1) possible Many different

Identify report candidates

Produce a list of candidates

Consolidate candidate list

(2)

Aalborg University 2007 - DWML course 5

What Templates to Choose?

“Analytical Cycle” Steps (repeats)

1) How’s business? – current performance 2) What are the trends? – performance over time

3) What’s unusual? – quick identification of exceptions (+/-) 4) What is driving the exceptions? – find causes for exceptions 5) What if …? – play around with parameters and see effect 6) Make a business decision – small as well as big decisions 7) Implement the decision – feed analysis results into op. systems

Prioritize template list

Rank or group templates – implement 15 most important at first

Aalborg University 2007 - DWML course 6

Overview

DW Front End Tools

Source Integration

DW architecture

Data Integration Research Projects

Focus on source integration and update propagation

Wrapper: convert source data into a standard format

Information Manifold

Sources: databases, SGML docs, unstructured files,… Relational integration data model

TSIMMIS

Wrapper/mediator

Semi-structured OEM integration data model

Squirrel

Powerful “integration mediator”

WHIPS

Wrapper/mediator

Relational integration data model

Views on DW Metadata

• Most DW projects: DW architecture as a “stepwise flow” of information from source to analyst

• No conceptual domain model used for integration

Some questions cannot be answered • DWQ project: extended metamodel to

(3)

Aalborg University 2007 - DWML course 9

Using DW Metadata in the Enterprise

(1)

(2)

(3) (4) (5)

• Analyst wants to analyze data

Gather data from operational departments through OLTP Question travels through (1)-(5) • Traditional DW (previous slide)

only describes step (3)-(4)

Cannot solve problems like “why can’t I answer quest. X?” • Conceptual relationships between

enterprise model, operational models + DW must be captured

Everything is a view on the enterprise model ! (“local as view”) – unlike previous slide

Aalborg University 2007 - DWML course 10

Analyst: “Why can’t I answer question X?”

Possible reasons

Certain measures not included in fact table

Granularity of facts too coarse

Particular dimensions not in DW

Descriptive attributes missing from dimensions

Meaning of attributes/measures deviate from the

analyst’s expectation

……

DWQ Metadata

• Three metadata perspectives

must be captured

Conceptual (enterprise) Logical (data model) Physical (data flow) • Framework instantiated by

conceptual, logical, and physical information models

• DW quality heavily depends on DW processes rather than schemas

• A process meta model is needed to capture process definitions, and the relationships to DW quality

Source integration practice

Focus on information integration in

databases

(schema and data)

Two main approaches

Constructing integrated enterprise model Focus on mappingsbetween sources and DW

Tools for DW management

Schema integration Metadata management Based on modeling tools

Tools for data integration

Mapping specification ETL tools – like last lecture

(4)

Aalborg University 2007 - DWML course 13

Schema Integration

• Producing one global schema (one-shot or incremental) • Pre-integration

Analyzing and annotating source schemata

Semantic enrichment of schema, often in richer data model • Schema comparison

Determine correlations/conflicts among schema concepts Heterogeneity conflicts – different source data models Naming conflicts – homonyms and synonyms Semantic conflicts –different abstraction levels Structural – different constructs

• Schema conforming

Conform/align schemas to make them compatible Typically semi-automatic process

• Schema merging and restructuring

“Superimpose” conformed schemas

Quality: completeness, correctness, minimality, understandability

Aalborg University 2007 - DWML course 14

Virtual Data Integration

• Only data definitionis integrated

Data only in sources, queries on views, queries shipped to sources Not suited for DW?

• Carnot

Individual schemata mapped onto rich GCL ontology (1. order logic) Articulation axioms specify mappings, queries mapped to GCL • SIMS

Creates common class-based domain model to describe sources Sources are dynamicallychosen and integrated at query time Query reformulation, access planning, optimization, execution • Information Manifold

Relational world view + information source description + correspondences Metamodel enriched using description logic/Datalog rules

Datalog queries, optimized by choosing ”minimal” sources • TSIMMIS

Wrapperswrap sources using semi-structured OEM model

Mediator performs its own integration – no global integration (global as view)

Materialized Data Integration

Views on source data are

materialized

in integrated DB

Squirrel

Integration mediators incrementally maintain materialized views Cooperation of sources required

WHIPS

Relational SPJ + aggregation views specified in view tree View manager computes view and handles updates Integrator ensures view maintainability

Global query processorqueries sources using wrappers/mediators

In combination with virtual integration?

DWQ Source Integration

Current DW tools cannot fully support DW quality

No support for validation of interschema assertions and other

specified relationships, i.e., the DW design process

Conceptual perspective

Domain model = enterprise model + source models

Consolidated and reconciled description of important concepts ◆ Not all enterprise data captured (at first, incremental approach) Logic-based formalism allows reasoning over metadata Intermodel assertions capture interdependencies

Logical perspective

Source schemata + DW schema in logical data model (relational) Defined as queries over the corresponding conceptual component

Physical perspective

(5)

Aalborg University 2007 - DWML course 17

DWQ Source Integration Architecture

Note explicit mappings!

Aalborg University 2007 - DWML course 18

DWQ Source Integration Methodology

Source-driven integration

Enterprise and source model construction Source model integration (into the domain model) Source and DW schema specification (+ mappings) Data integration and reconciliation

Quality analysis steps in all phases above

Client-driven integration

New client query considered

Reasoning determines whether query can be answered by

materialized views already in DW

◆ Query containment reasoning

If DW not sufficient, materialize new concepts in domain model? Otherwise, new sources must be added using source-driven integr.

Overview

DW Front End Tools

Source Integration

DW architecture

Technical Architecture

Design

Technical Architecture

Design

Product Selection& Installation

Product Selection& Installation

End-User Application Specification

End-User Application Specification

End-User Application Development

End-User Application Development

Business Requirements

Definition

Business Requirements

Definition Deployment

Deployment Maintenance

and Growth

Maintenance and Growth

Project Management

Project Management

Dimensional Modeling

Dimensional

Modeling PhysicalDesign

Physical Design

Data Staging Design & Development

Data Staging Design & Development

Project Planning

Project Planning

(6)

Aalborg University 2007 - DWML course 21 DB DB DB DB DB Appl. Appl. Appl. ETL DW DM DM DM OLAP Visua-lization Appl. Appl. Data mining Data Marts Data Warehouse Existing databases

and systems (OLTP) New databases and systems (OLAP)

Technical DW Architecture

How to organize DW and DMs?

Clients

Aalborg University 2007 - DWML course 22

Central DW

• All data in one, central DW

• All client queries directly on the central DW

• Pros

Simplicity Easy to manage • Cons

Bad performance due to no redundancy/ workload distribution

Central

DW

Source

Source

Clients

Federated DW

• Data stored in separate data marts, aimed at special departments

• Logical DW (i.e., virtual)

• Data marts contain detail data

• Pros

Performance due to distribution • Cons More complex

Logical

DW

Source

Source

Clients

Finance

mart

Mrktng

mart

Distr.

mart

Tiered Architecture

• Central DW is materialized

• Data is distributed to data marts in one or more tiers

• Only aggregated data in cube tiers

• Data is aggregated/reduced as it moves through tiers

• Pros

Best performance due to redundancy+distribution • Cons

Most complex Hard to manage

2000 2001 Aalborg Copenhagen Milk Bread 123 127 57 45 56 67 211 2000 2001 Aalborg Copenhagen Milk Bread 123 127 57 45 56 67 211 2000 2001 Aalborg Copenhagen Milk Bread 123 127 57 45 56 67 211

Central

DW

2000 2001 Aalborg Copenhagen Milk Bread 123 127 57 45 56 67 211 2000 2001 Aalborg Copenhagen Milk Bread 123 127 57 45 56 67 211

(7)

Aalborg University 2007 - DWML course 25

Coordination w. Development Strategy

Different development strategies pose different demands

to the architecture elements

Example: Kimball Dimensional Modeling

Centralized design of (conforming) dimensions First, design of a single-source data mart Later, design of multi-source data marts

Integration of existing data marts into new data marts The DW is just the union of the marts it is composed of Entails top-down (“Bus Architecture”) and bottom-up elements

Consequences

No initial design of DW, from which data marts are extracted Data is extracted directly from sources to data marts Allows distribution of data marts and computation on them

Aalborg University 2007 - DWML course 26

Operational Data Store (ODS)

DB DB DB DB DB Appl.

Appl.

Appl.

ETL

DW

DM

DM

DM

OLAP

Visua-lization Appl.

Appl.

Data mining Existing databases

and systems (OLTP)

ODS New DB

Operational Data Store I

a “

subject oriented, integrated, volatile, current valued data

store containing only corporate detailed data

(Inmon et al.)

A database which integrates and accumulates operational

data in a subject-oriented structure

Not dimensional, but ordinary relational

An extra level between operational systems and

dimensional structures

Two benefits sought

Integration of operational systems Basis for data warehouse

Operational Data Store II

ODS - pros

More modeling choices

◆ The dimensional “straightjacket” can force sub-optimal design decisions hiding the true semantics of data

◆ No need to choose a granularity, and no need to exclude data ◆ In summary, no need to make design decisions that cannot be

changed subsequently This means extra flexibility

ODS – cons

Not feasible to do analysis directly on ODS extra complexity

(8)

Aalborg University 2007 - DWML course 29

MS Analysis Services

Cheap

Easy to use

(R/M/H)OLAP technology

Data placement as desired

Intelligent pre-aggregation

Server and client parts

Reporting Services a separate tool

Built-in data mining

Decision trees Clustering

MS OLE DB for OLAP interface

Aalborg University 2007 - DWML course 30

IBM DB2 OLAP Server

“Light” version of Hyperion Essbase (OLAP market leader)

Extra product “on top of” DB2

(R/M/H)OLAP

Data in DB2 or in multidimensional structures

Interfaces

Hyperion Essbase API OLE DB for OLAP (promised)

DB2 can also handle aggregates

Automatic summary tables Used by DB2 optimizer

Automatic maintenance by DB2

Oracle 10g BI

Based on Express OLAP product

On the market since 1970!

(R/M/H)OLAP

Flexible data placement

Integrates ROLAP strategy and Express OLAP

Total integration with Oracle 10g RDBMS

Storage, security, management,… Best integration compared to MS and IBM

Add-on data mining (10g Data Mining)

Associations, classification, prediction, clustering

Architecture Alternatives

Cubes are smart

Intuitive model Better overview

Better suited for data analysis

But

logical

cubes suffice

Implementation hidden from user

Architecture alternatives

MS, IBM, Oracle

Virtual cubes, physical cubes ROLAP, MOLAP

Separate relational DW, cubes directly on source data Client tools

(9)

Aalborg University 2007 - DWML course 33

MS vs. IBM vs. Oracle

All

Good scalability Good analysis facilities

Flexible storage (MOLAP, ROLAP, HOLAP) Incremental update

Many client tools

MS Analysis Server

Built-in mining + good integration with MS SQL Server

DB2 OLAP Server

Good integration with DB2

Oracle

Best RDBMS/MOLAP integration of the three

All three products are good

Dependent on the other choices + existing technical architecture

Aalborg University 2007 - DWML course 34

Virtual vs. Physical Cubes

Virtual cubes

Logical cube specification directly on source data ROLAP implementation without aggregates + flexible, design can be changed quickly - performance, constant load on source DB

Physical cubes

Data for cube extracted and stored on OLAP server Several implementation choices possible

+ good performance, only source DB load at creation/update - harder to change design

MOLAP vs. ROLAP

MOLAP

Data in specialized data structure, optimized for OLAP + best performance, least space consumption

- changing design requires rebuilding, scalability at detail level?,

detail data stored several times

ROLAP

Data in RDBMS

+ more flexible change of design, scalable for detail data - not as good performance, larger space consumption

HOLAP

Detail data in RDBMS (can be source DB) Aggregates in multidimensional structure

+ good performance for higher-level queries, detail data only stored

once

- handling design changes, operational complexity

Separate Data Warehouse?

Separate DW

Integration of source data in DW Cubes built from DW

Sometimes the only solution

+ better integration and cleansing, less load on existing servers - larger complexity, design changes, updating DW

Cubes directly on source data

Cubes built directly from source data Cannot handle all cases

+ less complexity, easier to change design, no update of DW - cannot handle all forms of integration and cleansing, more load

(10)

Aalborg University 2007 - DWML course 37

Choosing Client Tools

Many OLAP clients on the market, e.g.,

Hyperion, Targit, Oracle MS Reporting Services

Client and server communicates via an API

MS OLE DB for OLAP

De facto standard

Supported by almost all client tools

Hyperion Essbase API

Supported by many client tools

Some criteria

Functionality (web distribution, analysis, reporting, …) Support

Price

Aalborg University 2007 - DWML course 38

Architecture Alternatives - Conclusion

Architecture alternatives, their pros and cons

No simple general choices

Choices dependent on the concrete situation

Look at books

Look at requirements specs Look at the latest products Think about prototyping

Summary

DW Front End Tools

Source Integration

DW architecture

Mini Project

New subtask

Build a few reports in Reporting Services to answer important

business questions you proposed in part (1a)

Discuss the architecture of your DW system Discuss source integration in your system

MS Reporting Services Tutorial

References

Related documents

In December 2004, the “Swedish Code for Corporate Governance” (the Code) was pre- sented. The Code is available as an offprint in the State ownership policy 2006. The over-

practices of DE.  Faculty are permitted to use other technologies, provided they 

The influence of substrate temperature on the grain size, phase composition, texture quality, CSD size, crystal lat- tice parameters, chemical composition, transmission coefficient,

If one’s native language does have a long-term effect on how one thinks about time, then Mandarin speakers should be faster to answer purely tempo- ral target questions (e.g.,

In this paper, we present a system called OBOME (Ontology-Based Opinion Mining Engine) for analyzing a domain-specific opinion corpus by first assisting the user

Any person who, knowingly and with intent to defraud any insurance company or other person, files an Application for insurance containing any materially false information, or

K EY WORDS critical discourse analysis; Finland; folk church; freedom of religion; ideology; national identity; religious equality; state church.. We have come in this country

Assessment of Healthcare Providers and Systems (CAHPS) suite of surveys and quality improvement tools supports the systematic collection of data on patient experience.. Collecting