NOTES 1.4.3 Techniques used in Data Mining
1.9 OLAP: ON-LINE ANALYTICAL PROCESSING
1.9.1 Introduction:
The term OLAP (On-Line Analytical Processing) was coined by E.F. Codd in 1993 to refer a type of application that allows a user to interactively analyze data. An OLAP system is often contrasted to an OLTP (On-Line Transaction Processing) system that focuses on processing transactions such as orders, invoices or general ledger transactions.
OLAP is now acknowledged as a key technology for successful management in the 90’s. It describes a class of applications that require multidimensional analysis of business data. OLAP systems enable managers and analysts to rapidly and easily examine key performance data and perform powerful comparison and trend analyses, even on very large data volumes. They can be used in a wide variety of business areas, including sales and marketing analysis, financial reporting, quality tracking, profitability analysis, manpower and pricing applications and many others.
OLAP technology is being used in an increasingly wide range of applications. The most common are sales and marketing analysis; financial reporting and consolidation; and budgeting and planning. OLAP is being used for applications such as product profitability and pricing analysis, activity based costing, manpower planning; quality analysis, in fact for any management system that requires a flexible top down view of an organization.
1.9.2 What is OLAP?
On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities including:
• calculations and modeling applied across dimensions, through hierarchies and/
or across members
• trend analysis over sequential time periods
NOTES
• slicing subsets for on-screen viewing• drill-down to deeper levels of consolidation
• reach-through to underlying detail data
• rotation to new dimensional comparisons in the viewing area
OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various “what-if” data model scenarios.
This is achieved through use of an OLAP Server.
OLAP allows business users to slice and dice data at will. Normally data in an organization is distributed in multiple data sources and are incompatible with each other. A retail example: Point-of-sales data and sales made via call-center or the Web are stored in different location and formats. It would a time consuming process for an executive to obtain OLAP reports such as - What are the most popular products purchased by customers between the ages 15 to 30?
Part of the OLAP implementation process involves extracting data from the various data repositories and making them compatible. Making data compatible involves ensuring that the meaning of the data in one repository matches all other repositories. An example of incompatible data: Customer ages can be stored as birth date for purchases made over the web and stored as age categories (i.e. between 15 and 30) for in store sales.
It is not always necessary to create a data warehouse for OLAP analysis. Data stored by operational systems, such as point-of-sales, are in types of databases called OLTPs. OLTP, Online Transaction Process, databases do not have any difference from a structural perspective from any other databases. The main difference, and only, difference is the way in which data is stored.
Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call Center.
OLTPs are designed for optimal transaction speed. When a consumer makes a purchase online, they expect the transactions to occur instantaneously. With a database design, call data modeling, optimized for transactions the record ‘Consumer name, Address, Telephone, Order Number, Order Name, Price, Payment Method’ is created quickly on the database and the results can be recalled by managers equally quickly if needed.
NOTES
Data Model for OLTP
NOTES
Data are not typically stored for an extended period on OLTPs for storage cost and transaction speed reasons.OLAPs have a different mandate from OLTPs. OLAPs are designed to give an overview analysis of what happened. Hence the data storage (i.e. data modeling) has to be set up differently. The most common method is called the star design.
Star Data Model for OLAP
The central table in an OLAP start data model is called the fact table. The surrounding tables are called the dimensions. Using the above data model, it is possible to build reports that answer questions such as:
• The supervisor that gave the most discounts.
• The quantity shipped on a particular date, month, year or quarter.
• In which zip code did product A sell the most.
To obtain answers, such as the ones above, from a data model OLAP cubes are created. OLAP cubes are not strictly cuboids - it is the name given to the process of linking
NOTES
data from the different dimensions. The cubes can be developed along business units such as sales or marketing. Or a giant cube can be formed with all the dimensions.
OLAP Cube with Time, Customer and Product Dimensions
OLAP can be a valuable and rewarding business tool. Aside from producing reports, OLAP analysis can aid an organization evaluate balanced scorecard targets.
Steps in the OLAP Creation Process
1.9.3 OLAP Server
An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located and accessed based on the intersection of the dimension members which define that item. The design of the server and the structure of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. The OLAP Server may either physically stage the processed multi-dimensional information to deliver consistent and rapid response times to end users, or it may populate its data structures in real-time from relational or other databases, or offer a choice of both.
Given the current state of technology and the end user requirement for consistent and rapid response times, staging the multi-dimensional data in the OLAP Server is often the preferred method.