• No results found

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

N/A
N/A
Protected

Academic year: 2021

Share "Chapter 5. Warehousing, Data Acquisition, Data. Visualization"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

Turban, Aronson, and Liang

Decision Support Systems and Intelligent Systems, pp y g y ,

Seventh Edition

Chapter 5

Business Intelligence: Data

Business Intelligence: Data

Warehousing, Data Acquisition, Data

Mining, Business Analytics, and

Mining, Business Analytics, and

Visualization

(2)

Learning Objectives

Learning Objectives

Describe the issues in management of data

• Describe the issues in management of data.

• Understand the concepts and use of DBMS.

• Learn about data warehousing and data marts. Learn about data warehousing and data marts.

• Explain business intelligence/business analytics.

• Examine how decision making can be improved

th h d t i l ti d l ti

through data manipulation and analytics.

• Understand the interaction betwixt the Web and

database technologies.

da abase ec o og es

• Explain how database technologies are used in

business analytics.

U d t d th i t f th W b b i

• Understand the impact of the Web on business

intelligence and analytics.

(3)

Information Sharing a Principle

Component of the National Strategy for

Component of the National Strategy for

Homeland Security Vignette

• Network of systems that provide

knowledge integration and distribution g g

• Horizontal and vertical information

sharing

sharing

• Improved communications

• Mining of data stored in Web-enabled

warehouse

warehouse

(4)

Data

Data

• Raw data collected manually or by

instruments

• Quality is critical

– Quality determines usefulness y

• Contextual data quality

• Intrinsic data quality

• Accessibility data quality

• Representation data quality

Often neglected or casually handled

– Often neglected or casually handled

– Problems exposed when data is summarized

(5)

Data Management

Data Management

(6)

Data Information Knowledge

Data, Information, Knowledge

• Data

– Items that are the most elementary descriptions

of things, events, activities, and transactions

– May be internal or external

• Information

– Organized data that has meaning and value

• Knowledge

– Processed data or information that conveys Processed data or information that conveys

understanding or learning applicable to a

problem or activity

(7)
(8)

Data Problems

Data Problems

• Data Quality

• Data integrity Data integrity

• Data Access

• Data integration

– XML XML

– Integration software

(9)

Data Quality

Data Quality

• Source of data quality problems

– Data entry by employee y y p y

– Change to source system

Data migration / conversion

– Data migration / conversion

– Mixed expectation by users

– External data

– System errors y

(10)

Data Quality problems

( categories )

• Contextual ( ﯼﻮﺘﺤﻣ ): relevancy

• Intrinsic Intrinsic ( ( ﯽﺗاذ ﯽ ا ) ) : accuracy : accuracy

• Accessibility ﯽﺳﺮﺘﺳد: access security

• Representation ﯽﻧوﺮﻴﺑ ﻞﮑﺷ: ease of

understanding

u de sta d g

(11)

Data integrity

Data integrity

• Uniformity : within specified limits

• Version : original data Version : original data

• Completeness check: summarize

t

correct

• Conformity check : correlation Co o ty c ec co e at o

• Drill down : tracing

(12)

Data

Data

• Cleanse data

When populating warehouse

– When populating warehouse

– Data quality action plan

– Best practices for data quality p q y

– Measure results

• Data integrity issues

– Uniformity

– Version

Completeness check

– Completeness check

– Conformity check

– Genealogy or drill-down Genealogy or drill down

(13)

Data access

Data access

• Data Integration

• Access needed to multiple sources Access needed to multiple sources

– Often enterprise-wide

Di t d h t d t b

– Disparate and heterogeneous databases

– XML becoming language standard

(14)

External Data Sources

External Data Sources

• Web

– Intelligent agents g g

– Document management systems

Content management systems

– Content management systems

• Commercial databases

– Sell access to specialized databases

(15)
(16)

Data Store & Data Warehouse

Characteristics

(17)

Database Management Systems

Database Management Systems

• Software program

• Supplements operating system Supplements operating system

• Manages data

• Queries data and generates reports

• Data security

• Data security

• Combines with modeling language for

construction of DSS

(18)

Database Models

Database Models

• Hierarchical

• Hierarchical

– Top down, like inverted tree

– Fields have only one “parent”, each “parent” can have multiple

“children” children

– Fast

• Network

– Relationships created through linked lists using pointers Relationships created through linked lists, using pointers

– “Children” can have multiple “parents”

– Greater flexibility, substantial overhead

• Relational Relational

– Flat, two-dimensional tables with multiple access queries

– Examines relations between multiple tables

– Flexible, quick, and extendable with data independence , q , p

• Object oriented

– Data analyzed at conceptual level

– Inheritance, abstraction, encapsulation , , p

(19)
(20)

Database Models continued

Database Models, continued

• Multimedia Based

– Multiple data formats

• JPEG, GIF, bitmap, PNG, sound, video, virtual reality

– Requires specific hardware for full feature

il bilit

availability

• Document Based

– Document storage and management

• Intelligent

– Intelligent agents and ANN

• Inference engines ( Oracle Query optimizer)

(21)
(22)

Data Warehouse

Data Warehouse

• Subject oriented

• Subject oriented

• Scrubbed so that data from heterogeneous sources are

standardized

• Time series; no current status

• Time series; no current status

• Nonvolatile

– Read only

S i d

• Summarized

• Not normalized; may be redundant

• Data from both internal and external sources is present

• Metadata included

– Data about data

• Business metadata

• Semantic metadata

(23)
(24)
(25)

Architecture

Architecture

• May have one or more tiers

– Determined by warehouse, data y ,

acquisition (back end), and client (front

end))

• One tier, where all run on same platform, is

rare

• Two tier usually combines DSS engine

(client) with warehouse

– More economical

(26)
(27)
(28)

Migrating Data

Migrating Data

B i l

• Business rules

– Stored in metadata repository

A li d t d t h t ll

– Applied to data warehouse centrally

• Data extracted from all relevant sources

L d d th h d t t f ti t l

– Loaded through data-transformation tools or

programs

– Separate operation and decision support

– Separate operation and decision support

environments

• Correct problems in quality before data Correct problems in quality before data

stored

– Cleanse and organize in consistent manner g

(29)
(30)

Data Warehouse Design

Data Warehouse Design

• Dimensional modeling

– Retrieval based

– Implemented by star schema

• Central fact table

• Central fact table

• Dimension tables

G i

• Grain

– Highest level of detail

– Drill-down analysis

(31)

Data Warehouse Development

Data Warehouse Development

D t h i l t ti t h i

• Data warehouse implementation techniques

– Top down

– Bottom up Bottom up

– Hybrid

– Federated

• Projects may be data centric or application centric

• Implementation factors

Organizational issues

– Organizational issues

– Project issues

– Technical issues

• Scalable

(32)

Data Marts

Data Marts

• Dependent

– Created from warehouse

– Replicated

• Functional subset of warehouse

• Independent

– Scaled down, less expensive version of data

warehouse

– Designed for a department or SBU

– Organization may have multiple data marts

• Difficult to integrate

(33)
(34)
(35)
(36)
(37)

Business Intelligence and Analytics

Business Intelligence and Analytics

• Business intelligence

– Acquisition of data and information for q

use in decision-making activities

• Business analytics

• Business analytics

– Models and solution methods

• Data mining

– Applying models and methods to data to Applying models and methods to data to

identify patterns and trends

(38)

OLAP

OLAP

Activities performed by end users in online

• Activities performed by end users in online

systems

– Specific, open-ended query generation

• SQL

– Ad hoc reports

– Statistical analysis

– Building DSS applications

• Modeling and visualization capabilities

• Special class of tools

• Special class of tools

– DSS/BI/BA front ends

– Data access front ends

D t b f t d

– Database front ends

– Visual information access systems

(39)

Data Mining

Data Mining

O i d l i f ti d

• Organizes and employs information and

knowledge from databases

St ti ti l th ti l tifi i l

• Statistical, mathematical, artificial

intelligence, and machine-learning

techniques

techniques

• Automatic and fast

T l l k f tt

• Tools look for patterns

– Simple models

I t di t d l

– Intermediate models

(40)

Data Mining

Data Mining

D t i i li ti l f bl

• Data mining application classes of problems

– Classification

– Clustering Clustering

– Association

– Sequencing

R i

– Regression

– Forecasting

– Others

• Hypothesis or discovery driven

• Iterative

• Scalable

(41)

Tools and Techniques

Tools and Techniques

D t i i

• Data mining

– Statistical methods

– Decision trees

– Decision trees

– Case based reasoning

– Neural computing p g

– Intelligent agents

– Genetic algorithms

T t Mi i

• Text Mining

– Hidden content

Group by themes

– Group by themes

(42)

Data Mining application

Data Mining application

• Marketing

• Banking Banking

• Retails and sales

• Manufacturing / production

• Airline

• Airline

• Health

• …

(43)

Knowledge Discovery in Databases

(KDD)

• Data mining used to find patterns in

data

– Identification of data

Preprocessing

– Preprocessing

– Transformation to common format

– Data mining through algorithms

– Evaluation

(44)

Data Visualization

Data Visualization

• Technologies supporting visualization

and interpretation p

– Digital imaging, GIS, GUI, tables,

multidimensions graphs VR 3D

multidimensions, graphs, VR, 3D,

animation

Identify relationships and trends

– Identify relationships and trends

• Data manipulation allows real time

look at performance data

(45)

Multidimensionality

Multidimensionality

D t i d di t b i

• Data organized according to business

standards, not analysts

C t l

• Conceptual

• Factors

– Dimensions

– Measures

Ti

– Time

• Significant overhead and storage

• Expensive

(46)

Analytic systems

Analytic systems

R l ti i d l i

• Real-time queries and analysis

• Real-time decision-makingg

• Real-time data warehouses updated

daily or more frequently

daily or more frequently

– Updates may be made while queries are

active

active

– Not all data updated continuously

D l t f b i l ti

• Deployment of business analytic

applications

(47)

GIS

GIS

• Computerized system for managing

and manipulating data with digitized p g g

maps

Geographically oriented

– Geographically oriented

– Geographic spreadsheet for models

– Software allows web access to maps

– Used for modeling and simulations g

(48)
(49)

Web Analytics/Intelligence

Web Analytics/Intelligence

• Web analytics

– Application of business analytics to Web pp y

sites

• Web intelligence

• Web intelligence

– Application of business intelligence

t h i t W b it

techniques to Web sites

(50)

Case study

Case study

• A university DSS for Budgeting

• A University DSS for Location A University DSS for Location

analysis

A U i it DSS f Cl

• A University DSS for Classroom

assignment

• How Data warehouse / Data mining

will help University Staff

will help University Staff

References

Related documents

А для того, щоб така системна організація інформаційного забезпечення управління існувала необхідно додержуватися наступних принципів:

Finally, protease sensitivity studies in Pax3 mutants bearing engineered Factor Xa sites either in the linker separating the PAl and RED motif (position 100), or upstream the

Whether grown as freestanding trees or wall- trained fans, established figs should be lightly pruned twice a year: once in spring to thin out old or damaged wood and to maintain

Like the human eye, the relative sensitivity of a photoconductive cell is dependent on the wavelength (color) of the incident light. Each photoconductor material

Potential explanations for the large and seemingly random price variation are: (i) different cost pricing methods used by hospitals, (ii) uncertainty due to frequent changes in

In order to provide an appropriate and timely response, the IFRC Sahel Regional Representation, in collaboration with the Africa Region Disaster Management Unit and the Senegalese

Method: A ction research was used with nurses and care staff to address continence care practices for older people living in the aged care section of one multi-purpose service

Considering the figures of the new Financial Perspective 2007-2013, the issue of market access, and the internal power dynamics of the EU, we see that it is hardly conceivable