• No results found

21741638 the Crm Data Warehouse

N/A
N/A
Protected

Academic year: 2021

Share "21741638 the Crm Data Warehouse"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

CHAPTER 4

(2)

WHAT IS A DATA WAREHOUSE?

A large reservoir of detailed and summary data

that describes the organization and its activities,

organized by the various business dimensions in

a way to facilitate easy retrieval of information

describing activities

data mart

–a subset of the data warehouse,

tailored to meet the specialized needs of a

particular group of users

Top-down approach

bottom-up approach to data warehouse

development—the data marts are created first

and then integrated.

(3)

Data Warehousing Objectives

(1)

keep the warehouse data current;

(2)

ensure that the warehouse data is accurate;

(3)

keep the warehouse data secure;

(4)

make the warehouse data easily available to

authorized users;

(5)

maintain descriptions of the warehouse data

so that users and system developers can

understand the meaning of each element

(4)

Data Warehouse vs. DBMS

OLTP (on-line transaction processing)

 Major task of traditional relational DBMS

 Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.

OLAP (on-line analytical processing)

 Major task of data warehouse system

 Data analysis and decision making

Distinct features (OLTP vs. OLAP):

 User and system orientation: customer vs. market

 Data contents: current, detailed vs. historical, consolidated

 Database design: ER + application vs. star + subject

 View: current, local vs. evolutionary, integrated

(5)

OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical,

summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write

index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

(6)

DATA WAREHOUSE ARCHITECTURE

staging area

— data is prepared to be

moved into the warehouse data repository

and the metadata repository

metadata

— data about data, or

descriptions of the data in the data

warehouse

Exhibit 4.1: A Data Warehouse System

Model

(7)

EXHIBIT 4.1 A DATA WAREHOUSE SYSTEM MODEL Data gathering system Staging area Warehouse data repository Information Delivery system Management and control Metadata repository

Data Warehouse System

Legend : Data flow Control flow

(8)

A Data Warehouse System Model

Management and Control

management and control component—like a traffic

officer standing in the middle of a street intersection,

controlling the flow of traffic through the intersection

Staging Area

ETL

— extraction, transformation, and loading as the

activities of this staging area

extraction

— obtaining data from the internal databases

and files of systems, accomplished according to a

schedule

transformation

— a process that includes cleaning,

standardizing, reformatting, and summarizing

(9)

A Data Warehouse System Model

WAREHOUSE DATA REPOSITORY

 where the warehouse data is stored within the computer system or systems

Data Content

 customer picture—a compilation of geographic, demographic, activity, psychographic, and behavioral data

Data Characteristics

 the types of data to be processed, including considerations of data granularity, data hierarchies, and data dimensions

Data Types

 fixed-length format

(10)

A Data Warehouse System Model

Data Granularity

the degree of detail that is represented by the data, where the greater the detail, the finer the granularity

Data Hierarchies

since multiple attributes can describe a single entity, an attribute is a data element that identifies or describes an occurrence of a data entity (i.e., a particular customer is identified by a customer number attribute)

Exhibit 4.2: An Example of a Data Hierarchy

Data Dimensions

 for example, a manager can query the data warehouse for a display of data according to salesperson, customer, product, and time

(11)

EXHIBIT 4.2 AN EXAMPLE OF A DATA HIERARCHY

Customer

Customer number

Customer age

Customer gender

Customer marital status

Customer number of dependents

Customer education

Customer dwelling type

Customer state

Customer city

Customer zip code

(12)

EXHIBIT 4.3 EVERY DATA RECORD CONTAINS THE TIME ELEMENT

Warehouse shipping order

Sales order date Statement date Date shipped Customer Payment date Invoice date

Customer sales order

Customer payment Customer statement

(13)

A Data Warehouse System Model

METADATA REPOSITORY

 describes the flow of data from the time that the data is captured until it is archived, i.e., metadata in the metadata repository for the customer number attribute would describe its format, editing rules, and so on

TYPES OF METADATA

Metadata for Users

 (analysis) identification of the source systems, the time of the last update, the different report formats that are available, and how to find data in the data warehouse

Metadata for Systems Developers

 data to allow developers to maintain, revise, and reengineer the data warehouse system, including the various rules that were employed in creating the warehouse data repository, and the rules for extraction, cleansing, transforming, purging, and archiving

(14)

A Data Warehouse System Model

Data and Process Models

object diagrams and entity-relationship diagrams

use cases, use case diagrams, and data flow

diagrams

CASE Tools

stands for computer-aided system engineering and is

a way to use the computer to develop systems

DBMS Systems

include a data dictionary component, which contains

excellent descriptions of the data in the database or

data warehouse.

(15)

HOW DATA IS STORED IN THE DATA

WAREHOUSE

dimension table

— a list of all of the

attributes that identify and describe a

particular entity

Exhibit 4.4: A Sample Dimension Table

fact table

— a list of all the facts that relate

to some type of the organization’s activity

(16)

EXHIBIT 4.4 A SAMPLE DIMENSION TABLE

Customer

Customer number

Customer name

Customer phone number

Customer e-mail address

Customer territory

Customer credit code

Customer standard industry code

Customer city

Customer state

(17)

EXHIBIT 4.5 A SAMPLE FACT TABLE

Commercial Sales Facts

Actual sales units

Budgeted sales units

Actual sales amount

Budgeted sales amount

Sales discount amount

Net sales amount

Sales commission amount

Sales bonus amount

(18)

INFORMATION PACKAGES

a table that is maintained in the data warehouse

repository that identifies both the dimensions

and the facts that relate to a business activity

Exhibit 4.6: Information Package Format

key—a number, such as a customer number,

that identifies a particular occurrence of the

dimension

(19)

EXHIBIT 4.6 INFORMATION PACKAGE FORMAT

Subject : Name of business activity being measured

Dimension Name Dimension Name Dimension Name Dimension Name

Dimension Key Dimension Key Dimension Key Dimension Key

Dimension 1 Dimension 1 Dimension 1 Dimension 1

Dimension 2 Dimension 2 Dimension 2 Dimension 2

Dimension 3 Dimension 3 Dimension n Dimension 3

Dimension 4 Dimension n Dimension 4

Dimension n Dimension n

(20)

EXHIBIT 4.7 A SAMPLE INFORMATION PACKAGE

Subject : Commercial sales

Time Salesperson Customer Product

Time Key Salesperson key Customer key Product key Hour Salesperson name Customer name Product name Day Sales branch Customer territory Product model Month Sales region Customer credit code Product brand Quarter Subsidiary Product line

Year

Facts : Actual sales units, budgeted sales units, actual sales amount, budgeted sales amount, sales discount amount, net sales amount, sales commission amount, sales bonus amount, sales tax amount

(21)

STAR SCHEMAS

the arrangement of an information package that usually

identifies multiple dimension tables for a single fact table

and has the appearance of a star, with the fact table in

the center and the dimension tables forming the points

Exhibit 4.8: Star Schema Format

foreign keys— a means of linking the fact table to the

dimension tables by means of the keys identified at the

top of the fact table where the keys identify other,

―foreign‖ tables as opposed to the fact table

(22)

EXHIBIT 4.8 STAR SCHEMA FORMAT Dimension 1 name Dimension 2 name Dimension n name Dimension 1 key Dimension 1 hierarchy Dimension 2 key Dimension 2 hierarchy Dimension 1 key Dimension 2 key Dimension n key Measurable fact 2 Measurable fact 4 Measurable fact 5 Measurable fact n Dimension n key Dimension n hierarchy

(23)

EXHIBIT 4.9 A SAMPLE STAR SCHEMA

Customer key Customer name Customer type

Customer credit code Salesperson number Sales territory

Standard industry code

Product key Customer key Salesperson key Time key

Sales units

Gross sales amount Sales discount amount Net sales amount

Sales commission amount Salesperson key Salesperson name Sales region Sales branch Product key Product name Product unit price Product quantity Time key Day Month Quarter Year Customer Customer payment Product sales facts

Time

(24)

Example of Star Schema

time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type

item

branch_key branch_name branch_type

branch

(25)

Example of Snowflake Schema

time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key

item

branch_key branch_name branch_type

branch

supplier_key supplier_type

supplier

city_key city province_or_street country city

(26)

Example of Fact

Constellation

time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch

Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

(27)

DATA WAREHOUSE NAVIGATION

summary information— preprocessed data that provides

the user with exactly the content that is needed

top-down

navigation— the user seeks more detail in an

effort to understand the summary information

roll up

navigation— the user summarizes data to ―see

the forest rather than the trees‖ or to prepare summary

graphs

drill across

navigation— the user moves from one data

hierarchy to another, i.e., information on customer sales,

salesperson sales, and then product sales

(28)

EXHIBIT 4.10 NAVIGATION PATHS

Summary information

(Net sales for the Western sales region)

Hierarchy 1 (customer) Hierarchy 2 (salesperson) Hierarchy n (product) Detailed information

(Net sales for salesperson 3742)

Detailed data

(Sales units for salesperson 3742)

Roll up

Drill across

Drill down

(29)

DATA WAREHOUSE SECURITY

 information systems security— damage, destruction, theft, and

misuse

Exhibit 4.11: The Security Action Cycle

The Corporate Security Environment

 deterrence— security policies and procedures that are intended to

deter security violations, such as guidelines for proper system use and the requirement that users change their passwords periodically

 prevention— measures aimed at those persons who ignore

deterrence, and include such things as locks on computer rooms, user passwords, file permissions

 detection— proactive actions include system audits, reports of

suspicious activity, and virus scanning software and reactive actions take the form of investigations

 remedies— respond with warnings, reprimands, termination of

(30)

EXHIBIT 4.11 THE SECURITY ACTION CYCLE 1. Deterrence 2. Prevention 3. Detection 4. Remedies Maximize Deterred abuse Maximize Prevented abuse Maximize Undetected abuse Maximize Unpunished abuse Deterred abuse Prevented abuse Undetected abuse Unpunished abuse Deterrence feedback

(31)

DATA WAREHOUSE SECURITY

Data Warehouse Security Measures

network security

— using procedures such as

firewalls

to

restrict access to the network that houses the servers

and data files, databases, data warehouses, and data

marts

data security

— obtaining access to data once access to

the network has been achieved; where, data files may be

located on

multiple servers

on the network, and the user

must provide a

second password

database or data warehouse security

— the security

checks of the database management system (DBMS)

that may include a

third password

,

verification of user

name

, and also verification of access to particular data

tables, records, and even record fields

References

Related documents

Trinity Way (Right) Cheetham Hill Road (Left) Corporation Street - Angel Street (Right) Rochdale Road (Ahead) Shudehill to Interchange Stand B 33 FM/SC Manchester Worsley

It is a fact that it is important for women to get involved in social, economic and political policy development because, in our society, women have different concerns than

Last but not least, el duque de Montalto-cardenal Moncada, cuñado tanto del marqués de Aytona - se había casado en segundas nupcias con su hermana Caterina - como del

Outcomes such as personal health indicators, utilization of health care services, and changes to the environment help to measure change and establish accountability for use of

Over the period, we see stagnant or rising informality in the lower or intermediate educational levels whereas informality is seen to fall at the higher educational levels.. (Table

Not only education subsidies, but also postponing retirement timing and an increase in pension benefit increase the effort to become a skilled laborer.. Therefore

Med LCF jämfördes spektra från jordproven med de spektra som tagits fram för vanadinet i referensjorden (d.v.s. det ”ursprungliga” vanadinet i jorden) samt för vanadin bundet

This paper proposes a fault tolerant, four-phase inverter PMSM drive topology which can be used to track the saturation saliency in PM motors and rotor slotting saliency