• No results found

Database Trends

In document Database Fundamentals Handout (Page 106-113)

Learning Objectives

At the end of this Chapter you will be familiar with terms such as:

??OLAP

??Data warehousing

??Data mining

Overview

Most database applications are characterized by a large number of transactions, each transaction making only small changes to operational data (On-Line Transaction Processing OLTP).

Organizational decision-making requires comprehensive view of all aspects of an enterprise. Therefore many organizations have consolidated data warehouses containing data drawn from several OLTP systems managed by different business units, together with history and summary Information.

Trend of building data warehouses is complemented by an increased emphasis on powerful analysis tools:

a. On-Line Analytical Processing (OLAP) dominated by stylized queries that typically involves group-by and aggregate operators.

b. Exploratory Data Analysis or Data Mining users look for interesting patterns in the data, queries are typically difficult to formulate, amount of data is to large to permit manual or even traditional statistical analysis.

Important for industry, data warehousing is a big business:

??$3.5 billion in 1997,over $8 billion in 1998

??Wal Mart maintains largest data warehouse,> 6Tb data, several hundred Mb added per day.

Data Warehousing

©Copyright 2004, Cognizant Academy, All Rights Reserved

107

DW stores collection of diverse data

??“Solution ”to the data integration problem

??Single repository of all business related information

Highly subject-oriented

??Data collection occurs by subject, not by application

??Collected data is used for OLAP and/or data mining

Typical workloads involve ad hoc, fairly complex read-only queries;

??Optimized differently from OLTP system

User interfaces are aimed at executives and decision makers DW contains large volume of data (Gb,Tb)

Information is non-volatile,i.e.,

??Contents are stable for a long period of time

©Copyright 2004, Cognizant Academy, All Rights Reserved

108

??Often updates are append only

Time variant kept: history (set of snapshots) and time attributes are essential

Creating and Maintaining a Data Warehouse

Identify warehouse data and source data needed.

??Grocery store chain: cashier sales DB, inventory DB, promotion history

??Insurance Company: policy information,claims processing DB

Creating a DW essentially is a data integration problem

??Data extracted from operational databases is cleaned (minimize errors, fill missing information), and transformed to reconcile semantic mismatches.

??Transforming is accomplished by defining a view over the tables in the data sources. But unlike a standard view, the view is stored (materialized in the DW.

Data are loaded into the DW. Additional processing such as sorting, generating summary of data, data partitioning, and building indexes is performed. Because of the huge amount of data, loading can be a slow process (_ parallelism)

Loaded data is periodically refreshed to reflect updates on the data sources.

The problem of efficiently refreshing warehouse tables (which are materialized views over tables in the sources databases) is an important research topic.

(_ Incremental maintenance of materialized views)

Periodically purge data from the data warehouse that is too old (perhaps onto archival media).

System catalogs associated with DW are used to keep track of data currently stored in DW; can be very large; typically managed by a separate database called metadata repository.

While there have been attempts to make a relational database do online analytical processing (termed ROLAP), the fact is that the relational and OLAP engines are quite distinct in how they store and access data. Most often, a data warehouse uses a relational engine for its data management. The reasons for this are many, but the central driving force is that relational engines have the maturity and capability to handle the heavy load and storage requirements of very large data warehouses.

©Copyright 2004, Cognizant Academy, All Rights Reserved

109

The OLAP engines differ from their relational cousins in that they use a different object foundation. While the two-dimensional table is the main logical storage structure for the relational database, the OLAP engine uses (primarily) a three-dimensional cube structure with the third dimension most often being time. While OLAP databases can handle fairly large databases, their capabilities do not match that of relational engines when it comes to managing hundreds of gigabytes of data. They do, however, make excellent data mart candidates.

OLAP – Online Analytical Processing

OLAP applications are dominated by ad hoc, complex queries (involving group by and aggregation operators)

Typical way to think about OLAP queries is in terms of a multidimensional model of data

Multidimensional Data Model

Focus is on collection of numeric measures each measure depends on a set of dimensions

Example: A multidimensional dataset for Sales

Underlying star schema Store (store id, name, address) Product (prod id, name, category) Date (time id, day, month, year)

©Copyright 2004, Cognizant Academy, All Rights Reserved

110

View of data as multidimensional array is readily generalized to more than three dimensions.

Every company conducting business inputs valuable information into transactional- oriented data stores. The distinguishing traits of these online transaction processing (OLTP) databases are that they handle very detailed, day- to-day segments of data, are very write-intensive by nature and are designed to maximize data input and throughput while minimizing data contention and resource-intensive data lookups.

By contrast, a data warehouse is constructed to manage aggregated, historical data records, is very read- intensive by nature and is oriented to maximize data output. Usually, a data warehouse is fed a daily diet of detailed business data in overnight batch loads with the intricate daily transactions being aggregated into more historical and analytically formatted database objects. Naturally, since a data warehouse is a collection of business entity’s historical information, it tends to be much larger in terms of size than its OLTP counterpart.

Data Mining

Data Mining (DM) seeks to discover knowledge automatically, in the form of statistical rules and patterns or trends from (very) large databases. Can be combined with OLAP.

It differs from machine learning in that it deals with large amounts of data, stored primarily on disk (rather than in main memory).

Knowledge discovered from a database can be represented by a set of rules. Such rules can be discovered using one of two methods:

??User is involved directly in the process of knowledge discovery

??The DM system is responsible for automatically discovering knowledge from the database, by detecting patterns and correlations in the data.

SUMMARY

??Data warehousing is collection of data under a unified schema at a single site in advance of queries

While a solid foundational design is extremely important to a data warehouse, it ranks second to clean, reliable data. Perhaps the most difficult task in initially creating, and maintaining, a data warehouse is ensuring the validity of the information stored within the database itself. The collecting and cleansing of data from many disparate enterprise systems is not an easy task, and a data warehouse project usually fails because the data cannot be validated and relied upon by the decision-makers using the warehouse itself.

©Copyright 2004, Cognizant Academy, All Rights Reserved

111

??OLAP engine usually uses (primarily) a three-dimensional cube structure with the third

dimension most often being time.

Test your Understanding

1. How is a data warehouse different from a normal database? 2. How do data warehousing and OLAP differ?

©Copyright 2004, Cognizant Academy, All Rights Reserved

112

REFERENCES

WEBSITES Internet sites: http://www.datawarehouse.com/home http://utenti.lycos.it/yanorel6/2/ch43.htm http://www.dhruvraj.com BOOKS

??Case*Method: Entity Relationship Modeling - Richard Barker

??Data & Databases – Joe Celko

??An Introduction to Database Systems – C. J. DateBook

??The Data Modeling Handbook - Reingruber and Gregory Book

??Data Modeling for Information Professionals – Bob SchmidtBook

??Data Model Patterns – David C. Hay, Richard Barker

PRESENTATION

©Copyright 2004, Cognizant Academy, All Rights Reserved

113

STUDENT NOTES:

In document Database Fundamentals Handout (Page 106-113)

Related documents