• No results found

INFO 321, Database Systems, Semester

N/A
N/A
Protected

Academic year: 2021

Share "INFO 321, Database Systems, Semester"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Intro DSS DW DW design OLAP

INFO 321

Chapter 3: Decision Support Systems

Department of Information Science Semester 2, 2012

Semester 2, 2012 INFO 321 1

Intro DSS DW DW design OLAP

References

General

I Kifer Chapter 17

I Silberschatz (5th ed.) Chapter 18

I Data Warehousing for Cavemen

(see Blackboard.Other Documents.Decision Support Systems)

I Coronel, Morris & Rob (9th ed.) Chapter 13

I Mannino (3rd ed.) Chapter 16

I http://www.searchcrm.com/

Semester 2, 2012 INFO 321 3

Intro DSS DW DW design OLAP

References

Oracle11g documentation

IData Warehousing Guide

IOLAP User’s Guide

IOLAP DML Reference

Semester 2, 2012 INFO 321 4

Intro DSS DW DW design OLAP

What is decision support?

(Kifer §1.4, 17.1; Silberschatz §18.1)

IData→timely, relevant, well-visualised information.

ITune information and presentation to specific purposes.

Semester 2, 2012 INFO 321 5

Intro DSS DW DW design OLAP

Decision-making occurs at the operational level

(see also Figure 3–1)

I Very short-term.

I Well-defined inputs.

I Produced by existing applications or simple front-end tools.

I Line managers.

Semester 2, 2012 INFO 321 6

Intro DSS DW DW design OLAP

Decision-making occurs at the tactical level

(see also Figure 3–1)

IShort-term.

ILess well-defined inputs.

IMiddle managers.

Semester 2, 2012 INFO 321 7

(2)

Intro DSS DW DW design OLAP

Decision-making occurs at the strategic level

(see also Figure 3–1)

ILong-term.

IIll-defined inputs.

IOften cannot use pre-existing applications⇒Decision

Support Systems (DSS); or Executive Information Systems (EIS).

ISenior managers.

Semester 2, 2012 INFO 321 8

Intro DSS DW DW design OLAP

Operational vs. decision support queries

Operational

I How many brass reciprocating hammers do we have in stock?

I How much electrical twine did we sell yesterday?

Decision support

I How many brass reciprocating hammers were sold to

customers aged 18–25 in large North Island towns over each of the last six months?

I If we double the advertising budget for electrical twine, how

might that affect revenues for the next six months?

Semester 2, 2012 INFO 321 9

Intro DSS DW DW design OLAP

There is a strong need for DSS

IModern business very complex.

IShrinking time frame for decision-making.

IData from multiple sources:(see Figure 3–1) I internal vs. external

I “formal” vs. “informal”

⇒ must be sensibly integrated

Semester 2, 2012 INFO 321 10

Intro DSS DW DW design OLAP

Components of a DSS

(adapted from Coronel, Morris & Rob, Figure 13.2)

Decision Support System Data extraction, transformation and loading End-user query tool Data

store End-user presentation

and visualisation tool Operational data Decision support data Business data analysis models External data

Sales Expenses Profits 14 000 9 500 4 500 17 000 11 000 6 000 21 000 14 000 7 000 0 5000 10000 15000 20000 25000 Semester 2, 2012 INFO 321 11

Intro DSS DW DW design OLAP

There are many types of decision support tool

Basic

I Ad hoc query tools (SQL?).

I Graph and report generators.

I Spreadsheets (small data sets only!).

More advanced

I Data warehouses.

I Online analytical processing (OLAP).

Semester 2, 2012 INFO 321 12

Intro DSS DW DW design OLAP

Operational vs. decision support data

(Coronel, Morris & Rob, Table 13.4; see also Mannino §16.1.1)

Characteristic Operational data Decision support data Data currency Current operations Historic data

Real-time data Snapshot of company data Time component (week/month/year) Granularity Atomic, detailed data Summarised data

Summarisation level Low; some aggregation High; heavily aggregated Data structure Highly normalised Non-normalised

Mostly RDBMS Complex structures

Some relational; mostly multidimensional Transaction type Mostly updates Mostly queries

Transaction volumes High update volume Periodic loads and summary calculations Transaction speed Updates are critical Retrievals are critical

Query activity Low to medium High Query scope Narrow range Broad range Query complexity Simple to medium Very complex Data volumes Hundreds of MiB→GiB Hundreds of GiB→TiB

Semester 2, 2012 INFO 321 13

(3)

Intro DSS DW DW design OLAP

Timespan is a key difference

Operational

IVery short (current transactions).

Decision support

ILong (past and future).

IData may not be current.

Semester 2, 2012 INFO 321 14

Intro DSS DW DW design OLAP

Granularity is a key difference

Operational

I Represent specific transactions (atomic).

Decision support

I Varying levels of aggregation (atomic→highly summarised).

I Drilling down vs. rolling up.

Semester 2, 2012 INFO 321 15

Intro DSS DW DW design OLAP

Dimensionality is a key difference

(Kifer §17.2; Silberschatz §18.2.1)

Operational

I“Flat” (tables of atomic transactions). **

Decision support

IMany dimensions.

IView orders by region per quarter (2D).

ICompare sales of products during the last six months by

region, city, store & customer (4D).

Semester 2, 2012 INFO 321 16

Intro DSS DW DW design OLAP

Dimensionality is a key difference

(adapted from Coronel, Morris & Rob, Figure 13.13; see also Kifer §17.2 & Silberschatz §18.2.1)

Time Product Loca tion April May June DUD AKL CHC B al l Club B at Conceptual three-dimensional cube of sales by product, location and time.

Sales facts are stored in the cells of the intersection of each product, time and location dimension value.

Semester 2, 2012 INFO 321 17

Intro DSS DW DW design OLAP

Data warehouses store decision support data

(Mannino §16.1.2; Coronel, Morris & Rob Table 13.7)

I Designed and optimised for decision support data.

I Internal structure quite different from operational databases: I aggregated

I denormalised

I data from multiple internal/external sources

Semester 2, 2012 INFO 321 18

Intro DSS DW DW design OLAP

Data warehouse data are integrated

(Coronel, Morris & Rob Table 13.7)

Operational database data

IMostly internal sources.

IMultiple representations.

Data warehouse data

IBoth internal and external sources.

ITransformed, cleaned and summarised during integration.

Semester 2, 2012 INFO 321 19

(4)

Intro DSS DW DW design OLAP

Data warehouse data are subject-oriented

(Coronel, Morris & Rob Table 13.7)

Operational database data

IFunctional or process-oriented (invoices, payments, products).

Data warehouse data

IFacts or measures organised by major subject areas (sales, marketing, etc.).

IHeld according to dimensions or variables of interest: product, customer, region, . . .

IAggregated data from many operational tables.

IQueries tuned to specific decision-making needs.

Semester 2, 2012 INFO 321 20

Intro DSS DW DW design OLAP

Data warehouse data are time-variant

(Coronel, Morris & Rob Table 13.7)

Operational database data

I Current transactions with precise time stamps.

Data warehouse data

I Time an important dimension for almost all subject areas.

I Data aggregated by time, e.g., sales by week, month, quarter,

year. . .

I Historical focus (past and future).

Semester 2, 2012 INFO 321 21

Intro DSS DW DW design OLAP

Data warehouse data are non-volatile

(Coronel, Morris & Rob Table 13.7)

Operational database data

IFrequent changes⇒dynamic.

IOften archived periodically.

Data warehouse data

IRead only (occasional batch updates)⇒static.

IHistorical data retained⇒always growing (GiB→. . . ).

Semester 2, 2012 INFO 321 22

Intro DSS DW DW design OLAP

Defining a data warehouse in more detail

(Silberschatz §18.3.1; Table 3–2)

IRead-only database optimised for data analysis and query

processing. IData from: I “legacy”/archived databases I operational databases I other sources IOptimisation includes: I decisions on aggregations I important dimensions

I appropriate indexing and physical design

Semester 2, 2012 INFO 321 23

Intro DSS DW DW design OLAP

Data marts are small, specialised data warehouses

I Focused subset of data.

I “Clusters” of data marts surrounding central enterprise data warehouse?

Semester 2, 2012 INFO 321 24

Intro DSS DW DW design OLAP

Data warehouse analysis is more demanding

(Mannino §16.3.1)

ISome queries may be impossible if not designed for.

INot as flexible for ad hoc queries.

IUsers must identify intended use.

IData derived from both internal and external sources (e.g., Internet: NZX, Dow Jones, NASDAQ).

Semester 2, 2012 INFO 321 25

(5)

Intro DSS DW DW design OLAP

The difficulty of data warehouse design

(The Standish Group (1997), “The Meta Myth”; http://standishgroup.com/)

Interviewer: How many data warehouses have you had? Data warehouser: We have had eight.

Interviewer: To what do you attribute so many warehouses? Data warehouser: Seven mistakes . . .

Semester 2, 2012 INFO 321 26

Intro DSS DW DW design OLAP

Facts are a key design aspect

(Kifer pp. 713–715; Mannino §16.3.2)

I A value that we are interested in.

I Examples: revenue, profits, cost, number of sales.

I Also known as measures.

Semester 2, 2012 INFO 321 27

Intro DSS DW DW design OLAP

Dimensions are a key design aspect

(Kifer pp. 713–715; Mannino §16.3.2)

IA factor/variable that influences the facts.

IExamples: time, product, customer, salesrep, location.

IEach has attributes.

Semester 2, 2012 INFO 321 28

Intro DSS DW DW design OLAP

Time as a dimension

(see also Mannino §16.2.3)

INot as simple as it seems!

IGranularity (unit size): year, month, week, day, hour.

IAlternate units (periodicity): season, financial year, quarter.

Semester 2, 2012 INFO 321 29

Intro DSS DW DW design OLAP

Star schemas for relational data warehouses

(Kifer pp. 715–717; Silberschatz §18.3.2; Figure 3–3; see alsoData Warehousing Guidech. 2)

I Central fact table.

I Cluster of related dimension tables.

I Needed because of inadequate physical data independence?

(denormalised)

I Partial normalisation⇒“snowflake” or “starflake” structure

(also “constellation”).

Semester 2, 2012 INFO 321 30

Intro DSS DW DW design OLAP

Star schemas for relational data warehouses

(Kifer pp. 715–717; Silberschatz §18.3.2; Figure 3–3; see alsoData Warehousing Guidech. 2)

Dimension table Fact table Dimension table Dimension table Dimension table Semester 2, 2012 INFO 321 31

(6)

Intro DSS DW DW design OLAP

Three steps to populate a data warehouse

(Kifer §17.6; Mannino §16.4)

Extraction: obtaining data from sources.

Transformation: altering form of data (includes cleaning). Loading: adding data to warehouse.

IPossibly intermediate data staging steps.

ICritical for successful data warehouses.

Semester 2, 2012 INFO 321 32

Intro DSS DW DW design OLAP

Performance tuning for data warehouses

(Kifer §17.5;Data Warehousing Guidech. 3; see also INFO 321 Chapter 1)

I Complex queries⇒denormalisation (fewer joins).

I Mostly read-only + complex queries⇒index heavily.

I Other techniques:

I normalise dimension tables

I multiple fact tables for different aggregation levels I physical tuning: partitioning, replication, etc.

Semester 2, 2012 INFO 321 33

Intro DSS DW DW design OLAP

Performance tuning for data warehouses

(Kifer §17.5;Data Warehousing Guidech. 3; see also INFO 321 Chapter 1)

IB-tree indexes and hashing generally useful.

IBitmap indexes particularly for “counting by category” queries.

IIntegrated indexes for dimension tables?

IFunction-based indexes could be useful? (Time queries?)

Semester 2, 2012 INFO 321 34

Intro DSS DW DW design OLAP

Oracle11

g

supports data warehouses

(Oracle11g Data Warehousing Guide) The simple approach

IUse distribution and replication services.

IScales poorly.

Oracle data mart suite

IAdd-on for constructing Oracle data marts.

IGUI interface; design & ETL modules; third-party tools.

Oracle Data Integrator Enterprise Edition

IBuild & manage high-end, complex data warehouses.

ICombines Oracle Data Integrator and Oracle Warehouse

Builder.

Semester 2, 2012 INFO 321 35

Intro DSS DW DW design OLAP

Oracle11

g

supports data warehouses

(Oracle11g Data Warehousing Guide)

I Bitmap & function-based indexes, index-organised tables.

I Bitmap join indexes.

I Other relevant tools:

I SQL*Loader (possibly in conjunction with Transparent

Gateways)

I export and import (basic)

I Also see Oracle’s web site (good luck!).

Semester 2, 2012 INFO 321 36

Intro DSS DW DW design OLAP

OLAP tools enable complex data processing

(Silberschatz §18.2; Figure 3–4)

IComplex analysis of multidimensional data.

ISpreadsheet-like “simplicity”.

IData stored in warehouse or tool’s internal proprietary database.

Semester 2, 2012 INFO 321 37

(7)

Intro DSS DW DW design OLAP

OLAP tools have many capabilities

IData transformation.

IBusiness modelling.

IStatistical analysis.

IPowerful GUI query facility.

IVisualisation (graphics).

Semester 2, 2012 INFO 321 38

Intro DSS DW DW design OLAP

A simple OLAP example using Excel

(Kifer §17.3; see also Silberschatz §18.2.3–18.2.5)

I Sales subject area dimensions: customer, salesreps, product, region, time, . . .

I View sales aggregated by dimensions.

I Dynamically alter presentation:

I drill down/roll up

I “slice and dice” (see next slide) I “pivot” the table (see demo)

I highlight exceptions (e.g., high loss products) I invent new columns (e.g., % sales revenue)

Semester 2, 2012 INFO 321 39

Intro DSS DW DW design OLAP

“Slice and dice” enables dynamic visualisation

(adapted from Coronel, Morris & Rob, Figure 13.14; see also Kifer §17.3.1)

Time Product Loca tion April May June DUD AKL CHC B al l Club B at Store manager’s view of sales data

Product manager’s view of sales data

Semester 2, 2012 INFO 321 40

Intro DSS DW DW design OLAP

Another example: Quicken

Intuit’s Quicken provides some simple

OLAP-like features.

Semester 2, 2012 INFO 321 41

Intro DSS DW DW design OLAP

Another example: Quicken

Drill down to expand a summarised

category.

Semester 2, 2012 INFO 321 42

Intro DSS DW DW design OLAP

Another example: Quicken

Drill down to expenses by month

for a particular category.

Semester 2, 2012 INFO 321 44

(8)

Intro DSS DW DW design OLAP

Another example: Quicken

Drill down to individual category transactions within a

month.

Semester 2, 2012 INFO 321 45

Intro DSS DW DW design OLAP

OLAP data may be stored in different ways

(Kifer §17.4; Silberschatz §18.2.2)

I Internal proprietary database (often MDD).

I Access external databases (data warehouses?):

I relational (ROLAP) I multidimensional (MOLAP) I both (HOLAP)

Semester 2, 2012 INFO 321 46

Intro DSS DW DW design OLAP

Oracle11

g

SQL has extensive OLAP support

(Oracle11g SQL Language Reference: SELECT; see also Kifer §17.3.2 & Silberschatz §18.2.3)

IGROUP BY CUBE (<columns>).

IGROUP BY ROLLUP (<columns>).

IGROUPING SETS(different from SQL:1999’sGROUPING

function).

IMODELclause.

IVarious analytic functions, includingRANK,PARTITION BY. (seeOracle11g SQL Language Reference: Analytic Functions)

ICrosstabs usingPIVOTandUNPIVOT.

I. . .

Semester 2, 2012 INFO 321 48

Intro DSS DW DW design OLAP

Data mining may find hidden trends

(Kifer §17.7; Silberschatz §18.4)

IOLAP & data warehousing help identify trends and relationships.

BUT: Some relationships too complex or subtle to easily notice.

IData mining tools claim to sift through databases and find

unrecognised relationships and trends.

Semester 2, 2012 INFO 321 49

Intro DSS DW DW design OLAP

There are many data mining techniques

(Kifer §17.8–17.11; Silberschatz §18.4)

I Neural networks.

I Complex visualisation.

I Genetic algorithms (evolve a solution).

I Advanced statistical analysis (traditional).

I See INFO 331 for many of these.

Semester 2, 2012 INFO 321 50

Intro DSS DW DW design OLAP

Some examples of data mining

IBeer and nappies (probably apocryphal).

IFraud detection (phone, credit card).

IMCI’s statistical profiles.

IRisk assessment for car insurance (FIG).

INBA strategy analysis.

But data mining is not foolproof . . .

Semester 2, 2012 INFO 321 51

(9)

Figure 3–1: Sources of information

SOURCE: Unknown

Table 3–2: Twelve rules that define a data warehouse

Bill Inmon is widely referred to as the “father of data warehouses”. In 1994, he and Chuck Kelley defined a list of twelve rules defining a data warehouse.

1. The data warehouse and operational databases are separated. 2. Data warehouse data are integrated.

3. A data warehouse contains historical data over a long time horizon. 4. Data warehouse data are a snapshot captured at a particular point in time. 5. Data warehouse data are subject-oriented.

6. A data warehouse is mainly read-only with periodic batch updates from opera-tional data. No online updates are allowed.

7. The data warehouse development cycle is data-driven, whereas the classical

sys-tems development approach is process-driven.1

8. A data warehouse contains data at several levels of detail: current detail data, old detail data, lightly summarised and highly summarised data.

9. Database operations in a data warehouse are typically read-only transactions on very large data sets, whereas in an operational database, there are typically many update transactions to a few data entities at a time.

10. A data warehouse has a system that tracks data sources, transformations and storage.

11. Metadata are critical for a data warehouse, as they identify and define all data elements. Metadata provide the source, transformation, integration, storage,

us-age, relationships and history of each data element.2

12. A data warehouse contains a charge-back mechanism for resource usage, in

or-der to enforce optimal use of data by end users.3

1There are many who would argue with the latter claim.

(10)

Figure 3–3: Star schemas

(a) Orders star schema

TIME ORDER PRODUCT VENDOR PRODUCT_ID DESCRIPTION PROD_TYPE_ID BRAND COLOUR SIZE PACKAGE VENDOR_ID VENDOR_NAME TIME_ID PRODUCT_ID VENDOR_ID QUANTITY PRICE AMOUNT TIME_ID YEAR QUARTER MONTH WEEK DAY 3000 rows 50 rows 85 000 rows 1827 rows (5 years) Daily aggregates

by product and vendor

SOURCE: adapted from Rob & Coronel, Figure 13.18

(b) Sales star schema

TIME SALE LOCATION PERSON LOCATION_ID DESCRIPTION REGION_ID STATE CITY GENDER PERSON_ID NAME TIME_ID LOCATION_ID PERSON_ID QUANTITY PRICE AMOUNT TIME_ID YEAR QUARTER MONTH WEEK DAY 25 rows 125 rows 3 000 000 rows 1827 rows Daily aggregates by store, person and product PRODUCT_ID PRODUCT PRODUCT_ID DESCRIPTION PROD_TYPE_ID BRAND COLOUR SIZE PACKAGE 3000 rows

SOURCE: adapted from Rob & Coronel, Figure 13.17

Figure 3–4: Operational vs. multidimensional data

INVOICE_HEADER

INV_NUM INV_DATE CUST_NO

INVOICE_HEADER

INV_NUM PROD_NUM LINE_PRICE LINE_QTY

Customer Dimension Time Dimension Totals Totals 2034 2035 2036 2037 2038 15-May-2007 15-May-2007 16-May-2007 16-May-2007 16-May-2007 12345 82739 12345 82739 35348 (Circuit Central) (Small Bytes) (Circuit Central) (Small Bytes) (Computer House) 2034 2034 2035 2036 2037 2037 2037 2038

(Microsoft Wireless Mouse)

(Microsoft Wireless Mouse) (Microsoft Wireless Mouse) (SanDisk USB Flash Drive, 8 GB) (Seagate Hard Drive, 2 TB)

(Seagate Hard Drive, 2 TB) (D-Link 4-Port Ethernet Switch) (Creative Speaker System) M34661 M34661 M34661 D99280 D44346 D44346 C74316 S64371 $50.00 $50.00 $50.00 $30.00 $165.00 $165.00 $35.00 $60.00 20 30 10 10 6 10 5 8 Circuit Central Small Bytes Computer House 15-May-2007 16-May-2007 $1,300.00 $990.00 $1,500.00 $2,325.00 $480.00 $2,290.00 $4,305.00 $480.00 $2,800.00 $3,315.00 $6,595.00

Sales are located at the intersection of a customer row and a time column

Aggregrations are calculated for both dimensions

Two-dimensional View of Sales Operational View of Sales

SOURCE: adapted from Coronel, Morris & Rob, Figure 13.5

References

Related documents

This kind of spot-modelling analysis can therefore be used to obtain information on the stellar flux distribution across the disc and the limb-darkening parameters more efficiently

Types of Signals Time Frequency Sine wave Time Frequency Square wave Time Frequency Transient Time Frequency Ideal Impulse Infinite duration in Time Finite bandwidth in Frequency

The six-month window in the wake of a successful targeted killing strike presents a valuable opportunity to use some of these techniques to create more significant long term damage

According to the results obtained from ANOVA, the type of vegetable oil, the control parameters of the burner (airflow and fuel flow rates) together with most of their

Let us know whenever your home is shown by any agent who does not use the lockbox and leave the showing agents name, real estate office, day and time of showing. Step 5: Paint

The second approach, which we adopt here, is to directly determine the min-bias trigger cross section by comparison to the reliably calculated [5] deuteron dissociation process

operated as free trade zone under the control of Almacenes Universales S.A. On the east side of the bay are a port, cement works and a power station. The new project at Mariel will

The examples above have several implications: first, that the table in the computer could not accommodate frequent vital sign recordings; second, that it took less