Intro DSS DW DW design OLAP
INFO 321
Chapter 3: Decision Support Systems
Department of Information Science Semester 2, 2012
Semester 2, 2012 INFO 321 1
Intro DSS DW DW design OLAP
References
General
I Kifer Chapter 17
I Silberschatz (5th ed.) Chapter 18
I Data Warehousing for Cavemen
(see Blackboard.Other Documents.Decision Support Systems)
I Coronel, Morris & Rob (9th ed.) Chapter 13
I Mannino (3rd ed.) Chapter 16
I http://www.searchcrm.com/
Semester 2, 2012 INFO 321 3
Intro DSS DW DW design OLAP
References
Oracle11g documentation
IData Warehousing Guide
IOLAP User’s Guide
IOLAP DML Reference
Semester 2, 2012 INFO 321 4
Intro DSS DW DW design OLAP
What is decision support?
(Kifer §1.4, 17.1; Silberschatz §18.1)
IData→timely, relevant, well-visualised information.
ITune information and presentation to specific purposes.
Semester 2, 2012 INFO 321 5
Intro DSS DW DW design OLAP
Decision-making occurs at the operational level
(see also Figure 3–1)
I Very short-term.
I Well-defined inputs.
I Produced by existing applications or simple front-end tools.
I Line managers.
Semester 2, 2012 INFO 321 6
Intro DSS DW DW design OLAP
Decision-making occurs at the tactical level
(see also Figure 3–1)
IShort-term.
ILess well-defined inputs.
IMiddle managers.
Semester 2, 2012 INFO 321 7
Intro DSS DW DW design OLAP
Decision-making occurs at the strategic level
(see also Figure 3–1)
ILong-term.
IIll-defined inputs.
IOften cannot use pre-existing applications⇒Decision
Support Systems (DSS); or Executive Information Systems (EIS).
ISenior managers.
Semester 2, 2012 INFO 321 8
Intro DSS DW DW design OLAP
Operational vs. decision support queries
Operational
I How many brass reciprocating hammers do we have in stock?
I How much electrical twine did we sell yesterday?
Decision support
I How many brass reciprocating hammers were sold to
customers aged 18–25 in large North Island towns over each of the last six months?
I If we double the advertising budget for electrical twine, how
might that affect revenues for the next six months?
Semester 2, 2012 INFO 321 9
Intro DSS DW DW design OLAP
There is a strong need for DSS
IModern business very complex.
IShrinking time frame for decision-making.
IData from multiple sources:(see Figure 3–1) I internal vs. external
I “formal” vs. “informal”
⇒ must be sensibly integrated
Semester 2, 2012 INFO 321 10
Intro DSS DW DW design OLAP
Components of a DSS
(adapted from Coronel, Morris & Rob, Figure 13.2)
Decision Support System Data extraction, transformation and loading End-user query tool Data
store End-user presentation
and visualisation tool Operational data Decision support data Business data analysis models External data
Sales Expenses Profits 14 000 9 500 4 500 17 000 11 000 6 000 21 000 14 000 7 000 0 5000 10000 15000 20000 25000 Semester 2, 2012 INFO 321 11
Intro DSS DW DW design OLAP
There are many types of decision support tool
Basic
I Ad hoc query tools (SQL?).
I Graph and report generators.
I Spreadsheets (small data sets only!).
More advanced
I Data warehouses.
I Online analytical processing (OLAP).
Semester 2, 2012 INFO 321 12
Intro DSS DW DW design OLAP
Operational vs. decision support data
(Coronel, Morris & Rob, Table 13.4; see also Mannino §16.1.1)
Characteristic Operational data Decision support data Data currency Current operations Historic data
Real-time data Snapshot of company data Time component (week/month/year) Granularity Atomic, detailed data Summarised data
Summarisation level Low; some aggregation High; heavily aggregated Data structure Highly normalised Non-normalised
Mostly RDBMS Complex structures
Some relational; mostly multidimensional Transaction type Mostly updates Mostly queries
Transaction volumes High update volume Periodic loads and summary calculations Transaction speed Updates are critical Retrievals are critical
Query activity Low to medium High Query scope Narrow range Broad range Query complexity Simple to medium Very complex Data volumes Hundreds of MiB→GiB Hundreds of GiB→TiB
Semester 2, 2012 INFO 321 13
Intro DSS DW DW design OLAP
Timespan is a key difference
Operational
IVery short (current transactions).
Decision support
ILong (past and future).
IData may not be current.
Semester 2, 2012 INFO 321 14
Intro DSS DW DW design OLAP
Granularity is a key difference
Operational
I Represent specific transactions (atomic).
Decision support
I Varying levels of aggregation (atomic→highly summarised).
I Drilling down vs. rolling up.
Semester 2, 2012 INFO 321 15
Intro DSS DW DW design OLAP
Dimensionality is a key difference
(Kifer §17.2; Silberschatz §18.2.1)
Operational
I“Flat” (tables of atomic transactions). **
Decision support
IMany dimensions.
IView orders by region per quarter (2D).
ICompare sales of products during the last six months by
region, city, store & customer (4D).
Semester 2, 2012 INFO 321 16
Intro DSS DW DW design OLAP
Dimensionality is a key difference
(adapted from Coronel, Morris & Rob, Figure 13.13; see also Kifer §17.2 & Silberschatz §18.2.1)
Time Product Loca tion April May June DUD AKL CHC B al l Club B at Conceptual three-dimensional cube of sales by product, location and time.
Sales facts are stored in the cells of the intersection of each product, time and location dimension value.
Semester 2, 2012 INFO 321 17
Intro DSS DW DW design OLAP
Data warehouses store decision support data
(Mannino §16.1.2; Coronel, Morris & Rob Table 13.7)
I Designed and optimised for decision support data.
I Internal structure quite different from operational databases: I aggregated
I denormalised
I data from multiple internal/external sources
Semester 2, 2012 INFO 321 18
Intro DSS DW DW design OLAP
Data warehouse data are integrated
(Coronel, Morris & Rob Table 13.7)
Operational database data
IMostly internal sources.
IMultiple representations.
Data warehouse data
IBoth internal and external sources.
ITransformed, cleaned and summarised during integration.
Semester 2, 2012 INFO 321 19
Intro DSS DW DW design OLAP
Data warehouse data are subject-oriented
(Coronel, Morris & Rob Table 13.7)
Operational database data
IFunctional or process-oriented (invoices, payments, products).
Data warehouse data
IFacts or measures organised by major subject areas (sales, marketing, etc.).
IHeld according to dimensions or variables of interest: product, customer, region, . . .
IAggregated data from many operational tables.
IQueries tuned to specific decision-making needs.
Semester 2, 2012 INFO 321 20
Intro DSS DW DW design OLAP
Data warehouse data are time-variant
(Coronel, Morris & Rob Table 13.7)
Operational database data
I Current transactions with precise time stamps.
Data warehouse data
I Time an important dimension for almost all subject areas.
I Data aggregated by time, e.g., sales by week, month, quarter,
year. . .
I Historical focus (past and future).
Semester 2, 2012 INFO 321 21
Intro DSS DW DW design OLAP
Data warehouse data are non-volatile
(Coronel, Morris & Rob Table 13.7)
Operational database data
IFrequent changes⇒dynamic.
IOften archived periodically.
Data warehouse data
IRead only (occasional batch updates)⇒static.
IHistorical data retained⇒always growing (GiB→. . . ).
Semester 2, 2012 INFO 321 22
Intro DSS DW DW design OLAP
Defining a data warehouse in more detail
(Silberschatz §18.3.1; Table 3–2)
IRead-only database optimised for data analysis and query
processing. IData from: I “legacy”/archived databases I operational databases I other sources IOptimisation includes: I decisions on aggregations I important dimensions
I appropriate indexing and physical design
Semester 2, 2012 INFO 321 23
Intro DSS DW DW design OLAP
Data marts are small, specialised data warehouses
I Focused subset of data.
I “Clusters” of data marts surrounding central enterprise data warehouse?
Semester 2, 2012 INFO 321 24
Intro DSS DW DW design OLAP
Data warehouse analysis is more demanding
(Mannino §16.3.1)
ISome queries may be impossible if not designed for.
INot as flexible for ad hoc queries.
IUsers must identify intended use.
IData derived from both internal and external sources (e.g., Internet: NZX, Dow Jones, NASDAQ).
Semester 2, 2012 INFO 321 25
Intro DSS DW DW design OLAP
The difficulty of data warehouse design
(The Standish Group (1997), “The Meta Myth”; http://standishgroup.com/)
Interviewer: How many data warehouses have you had? Data warehouser: We have had eight.
Interviewer: To what do you attribute so many warehouses? Data warehouser: Seven mistakes . . .
Semester 2, 2012 INFO 321 26
Intro DSS DW DW design OLAP
Facts are a key design aspect
(Kifer pp. 713–715; Mannino §16.3.2)
I A value that we are interested in.
I Examples: revenue, profits, cost, number of sales.
I Also known as measures.
Semester 2, 2012 INFO 321 27
Intro DSS DW DW design OLAP
Dimensions are a key design aspect
(Kifer pp. 713–715; Mannino §16.3.2)
IA factor/variable that influences the facts.
IExamples: time, product, customer, salesrep, location.
IEach has attributes.
Semester 2, 2012 INFO 321 28
Intro DSS DW DW design OLAP
Time as a dimension
(see also Mannino §16.2.3)
INot as simple as it seems!
IGranularity (unit size): year, month, week, day, hour.
IAlternate units (periodicity): season, financial year, quarter.
Semester 2, 2012 INFO 321 29
Intro DSS DW DW design OLAP
Star schemas for relational data warehouses
(Kifer pp. 715–717; Silberschatz §18.3.2; Figure 3–3; see alsoData Warehousing Guidech. 2)
I Central fact table.
I Cluster of related dimension tables.
I Needed because of inadequate physical data independence?
(denormalised)
I Partial normalisation⇒“snowflake” or “starflake” structure
(also “constellation”).
Semester 2, 2012 INFO 321 30
Intro DSS DW DW design OLAP
Star schemas for relational data warehouses
(Kifer pp. 715–717; Silberschatz §18.3.2; Figure 3–3; see alsoData Warehousing Guidech. 2)
Dimension table Fact table Dimension table Dimension table Dimension table Semester 2, 2012 INFO 321 31
Intro DSS DW DW design OLAP
Three steps to populate a data warehouse
(Kifer §17.6; Mannino §16.4)
Extraction: obtaining data from sources.
Transformation: altering form of data (includes cleaning). Loading: adding data to warehouse.
IPossibly intermediate data staging steps.
ICritical for successful data warehouses.
Semester 2, 2012 INFO 321 32
Intro DSS DW DW design OLAP
Performance tuning for data warehouses
(Kifer §17.5;Data Warehousing Guidech. 3; see also INFO 321 Chapter 1)
I Complex queries⇒denormalisation (fewer joins).
I Mostly read-only + complex queries⇒index heavily.
I Other techniques:
I normalise dimension tables
I multiple fact tables for different aggregation levels I physical tuning: partitioning, replication, etc.
Semester 2, 2012 INFO 321 33
Intro DSS DW DW design OLAP
Performance tuning for data warehouses
(Kifer §17.5;Data Warehousing Guidech. 3; see also INFO 321 Chapter 1)
IB-tree indexes and hashing generally useful.
IBitmap indexes particularly for “counting by category” queries.
IIntegrated indexes for dimension tables?
IFunction-based indexes could be useful? (Time queries?)
Semester 2, 2012 INFO 321 34
Intro DSS DW DW design OLAP
Oracle11
g
supports data warehouses
(Oracle11g Data Warehousing Guide) The simple approach
IUse distribution and replication services.
IScales poorly.
Oracle data mart suite
IAdd-on for constructing Oracle data marts.
IGUI interface; design & ETL modules; third-party tools.
Oracle Data Integrator Enterprise Edition
IBuild & manage high-end, complex data warehouses.
ICombines Oracle Data Integrator and Oracle Warehouse
Builder.
Semester 2, 2012 INFO 321 35
Intro DSS DW DW design OLAP
Oracle11
g
supports data warehouses
(Oracle11g Data Warehousing Guide)
I Bitmap & function-based indexes, index-organised tables.
I Bitmap join indexes.
I Other relevant tools:
I SQL*Loader (possibly in conjunction with Transparent
Gateways)
I export and import (basic)
I Also see Oracle’s web site (good luck!).
Semester 2, 2012 INFO 321 36
Intro DSS DW DW design OLAP
OLAP tools enable complex data processing
(Silberschatz §18.2; Figure 3–4)
IComplex analysis of multidimensional data.
ISpreadsheet-like “simplicity”.
IData stored in warehouse or tool’s internal proprietary database.
Semester 2, 2012 INFO 321 37
Intro DSS DW DW design OLAP
OLAP tools have many capabilities
IData transformation.
IBusiness modelling.
IStatistical analysis.
IPowerful GUI query facility.
IVisualisation (graphics).
Semester 2, 2012 INFO 321 38
Intro DSS DW DW design OLAP
A simple OLAP example using Excel
(Kifer §17.3; see also Silberschatz §18.2.3–18.2.5)
I Sales subject area dimensions: customer, salesreps, product, region, time, . . .
I View sales aggregated by dimensions.
I Dynamically alter presentation:
I drill down/roll up
I “slice and dice” (see next slide) I “pivot” the table (see demo)
I highlight exceptions (e.g., high loss products) I invent new columns (e.g., % sales revenue)
Semester 2, 2012 INFO 321 39
Intro DSS DW DW design OLAP
“Slice and dice” enables dynamic visualisation
(adapted from Coronel, Morris & Rob, Figure 13.14; see also Kifer §17.3.1)
Time Product Loca tion April May June DUD AKL CHC B al l Club B at Store manager’s view of sales data
Product manager’s view of sales data
Semester 2, 2012 INFO 321 40
Intro DSS DW DW design OLAP
Another example: Quicken
Intuit’s Quicken provides some simple
OLAP-like features.
Semester 2, 2012 INFO 321 41
Intro DSS DW DW design OLAP
Another example: Quicken
Drill down to expand a summarised
category.
Semester 2, 2012 INFO 321 42
Intro DSS DW DW design OLAP
Another example: Quicken
Drill down to expenses by month
for a particular category.
Semester 2, 2012 INFO 321 44
Intro DSS DW DW design OLAP
Another example: Quicken
Drill down to individual category transactions within a
month.
Semester 2, 2012 INFO 321 45
Intro DSS DW DW design OLAP
OLAP data may be stored in different ways
(Kifer §17.4; Silberschatz §18.2.2)
I Internal proprietary database (often MDD).
I Access external databases (data warehouses?):
I relational (ROLAP) I multidimensional (MOLAP) I both (HOLAP)
Semester 2, 2012 INFO 321 46
Intro DSS DW DW design OLAP
Oracle11
g
SQL has extensive OLAP support
(Oracle11g SQL Language Reference: SELECT; see also Kifer §17.3.2 & Silberschatz §18.2.3)
IGROUP BY CUBE (<columns>).
IGROUP BY ROLLUP (<columns>).
IGROUPING SETS(different from SQL:1999’sGROUPING
function).
IMODELclause.
IVarious analytic functions, includingRANK,PARTITION BY. (seeOracle11g SQL Language Reference: Analytic Functions)
ICrosstabs usingPIVOTandUNPIVOT.
I. . .
Semester 2, 2012 INFO 321 48
Intro DSS DW DW design OLAP
Data mining may find hidden trends
(Kifer §17.7; Silberschatz §18.4)
IOLAP & data warehousing help identify trends and relationships.
BUT: Some relationships too complex or subtle to easily notice.
IData mining tools claim to sift through databases and find
unrecognised relationships and trends.
Semester 2, 2012 INFO 321 49
Intro DSS DW DW design OLAP
There are many data mining techniques
(Kifer §17.8–17.11; Silberschatz §18.4)
I Neural networks.
I Complex visualisation.
I Genetic algorithms (evolve a solution).
I Advanced statistical analysis (traditional).
I See INFO 331 for many of these.
Semester 2, 2012 INFO 321 50
Intro DSS DW DW design OLAP
Some examples of data mining
IBeer and nappies (probably apocryphal).
IFraud detection (phone, credit card).
IMCI’s statistical profiles.
IRisk assessment for car insurance (FIG).
INBA strategy analysis.
But data mining is not foolproof . . .
Semester 2, 2012 INFO 321 51
Figure 3–1: Sources of information
SOURCE: Unknown
Table 3–2: Twelve rules that define a data warehouse
Bill Inmon is widely referred to as the “father of data warehouses”. In 1994, he and Chuck Kelley defined a list of twelve rules defining a data warehouse.
1. The data warehouse and operational databases are separated. 2. Data warehouse data are integrated.
3. A data warehouse contains historical data over a long time horizon. 4. Data warehouse data are a snapshot captured at a particular point in time. 5. Data warehouse data are subject-oriented.
6. A data warehouse is mainly read-only with periodic batch updates from opera-tional data. No online updates are allowed.
7. The data warehouse development cycle is data-driven, whereas the classical
sys-tems development approach is process-driven.1
8. A data warehouse contains data at several levels of detail: current detail data, old detail data, lightly summarised and highly summarised data.
9. Database operations in a data warehouse are typically read-only transactions on very large data sets, whereas in an operational database, there are typically many update transactions to a few data entities at a time.
10. A data warehouse has a system that tracks data sources, transformations and storage.
11. Metadata are critical for a data warehouse, as they identify and define all data elements. Metadata provide the source, transformation, integration, storage,
us-age, relationships and history of each data element.2
12. A data warehouse contains a charge-back mechanism for resource usage, in
or-der to enforce optimal use of data by end users.3
1There are many who would argue with the latter claim.
Figure 3–3: Star schemas
(a) Orders star schemaTIME ORDER PRODUCT VENDOR PRODUCT_ID DESCRIPTION PROD_TYPE_ID BRAND COLOUR SIZE PACKAGE VENDOR_ID VENDOR_NAME TIME_ID PRODUCT_ID VENDOR_ID QUANTITY PRICE AMOUNT TIME_ID YEAR QUARTER MONTH WEEK DAY 3000 rows 50 rows 85 000 rows 1827 rows (5 years) Daily aggregates
by product and vendor
SOURCE: adapted from Rob & Coronel, Figure 13.18
(b) Sales star schema
TIME SALE LOCATION PERSON LOCATION_ID DESCRIPTION REGION_ID STATE CITY GENDER PERSON_ID NAME TIME_ID LOCATION_ID PERSON_ID QUANTITY PRICE AMOUNT TIME_ID YEAR QUARTER MONTH WEEK DAY 25 rows 125 rows 3 000 000 rows 1827 rows Daily aggregates by store, person and product PRODUCT_ID PRODUCT PRODUCT_ID DESCRIPTION PROD_TYPE_ID BRAND COLOUR SIZE PACKAGE 3000 rows
SOURCE: adapted from Rob & Coronel, Figure 13.17
Figure 3–4: Operational vs. multidimensional data
INVOICE_HEADER
INV_NUM INV_DATE CUST_NO
INVOICE_HEADER
INV_NUM PROD_NUM LINE_PRICE LINE_QTY
Customer Dimension Time Dimension Totals Totals 2034 2035 2036 2037 2038 15-May-2007 15-May-2007 16-May-2007 16-May-2007 16-May-2007 12345 82739 12345 82739 35348 (Circuit Central) (Small Bytes) (Circuit Central) (Small Bytes) (Computer House) 2034 2034 2035 2036 2037 2037 2037 2038
(Microsoft Wireless Mouse)
(Microsoft Wireless Mouse) (Microsoft Wireless Mouse) (SanDisk USB Flash Drive, 8 GB) (Seagate Hard Drive, 2 TB)
(Seagate Hard Drive, 2 TB) (D-Link 4-Port Ethernet Switch) (Creative Speaker System) M34661 M34661 M34661 D99280 D44346 D44346 C74316 S64371 $50.00 $50.00 $50.00 $30.00 $165.00 $165.00 $35.00 $60.00 20 30 10 10 6 10 5 8 Circuit Central Small Bytes Computer House 15-May-2007 16-May-2007 $1,300.00 $990.00 $1,500.00 $2,325.00 $480.00 $2,290.00 $4,305.00 $480.00 $2,800.00 $3,315.00 $6,595.00
Sales are located at the intersection of a customer row and a time column
Aggregrations are calculated for both dimensions
Two-dimensional View of Sales Operational View of Sales
SOURCE: adapted from Coronel, Morris & Rob, Figure 13.5