• No results found

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

N/A
N/A
Protected

Academic year: 2021

Share "Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

TDWI 1

1

Emerging Technologies

Emerging Technologies

Shaping the Future of

Shaping the Future of

Data Warehouses & Business Intelligence

Data Warehouses & Business Intelligence

John O’Brien President and Executive Architect Zukeran Technologies

Appliances and DW Architectures

(2)

TDWI 2

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 2

Agenda

What is an “appliance”?

Data warehouse appliances

Current BI appliances

Data Warehouse Architectures

(3)

TDWI 3

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 3

What is an “appliance”?

How do you make toast and waffles?

Appliance-oriented General Purpose

(4)

TDWI 4

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 4

An appliance is born…

• Horizontal versus Vertical Layers

• When does it flip?

Computing Resources (CPU, memory)

Storage Resources (Persistence)

Operating System (Windows, Unix, Linux, etc) Applications (Applications, Operational, BI, DSS)

Networking Resources Applications (Databases)

Compatibility Purpose Specific Appliance

Purpose

(5)

TDWI 5

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 5

Characteristics of a good appliance

1. Personalization vs. Configuration

– “How dark do you like your toast?”

2. Few or no settings, options or configuration 3. Clear specific purpose

– Designed and architected for a specific purpose

4. Not general purpose 5. Easy of use

– “Just plug it in and it works…”

6. Compatible with existing infrastructure

(6)

TDWI 6

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 6

Proven appliances

• Google search appliances

• Symantec VPN appliance

• Decru Security appliance

• Barracuda Email appliance

– block viruses, spam, phishing attacks and other mail-borne security threats.

(7)

TDWI 7

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 7

Pre-configured vs. Appliance

• Appliances are not a “pre-configuration” or assembly of tuned products and technologies

• An appliance is builtfor a specific purpose and can rely on pre existing products or commodity parts

• Appliances have no user serviceable parts inside

Test: Does it matter what’s inside the appliance?

• IBM Balanced Configuration Unit (BCU) is not an appliance but the benefits come from standardized preconfigured and pre- tested units.

(8)

TDWI 8

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 8

IBM Balance Configuration Unit

Pre-configured stack of IBM Technologies for Business Intelligence workloads (IBM DB2 Universal Database, IBM eServer p5 575, IBM TotalStorage DS4500)

IBM website:

"With the BCU, IBM has "one-upped" the data warehouse appliance makers..."

All the appliance advantages of price, performance and simplicity PLUS:

•Fully functional top-of-the-line DB2 data server enhanced for data warehousing, the IBM DB2 Data Warehouse Edition

•Built completely on industry-standard components and open systems

•Designed for modular scalability

• The IBM Data Warehousing Balanced Configuration Unit (BCU) reduces the complexity, cost and risk of designing, implementing, growing and maintaining a data warehouse and BI infrastructure.

A Balanced Configuration Unit (BCU) is composed of software and hardware that IBM has integrated and tested as a pre-configured building block for data

warehousing systems. A single BCU contains a balanced amount of disk,

processing power and memory to optimize cost-effectiveness and throughput. IT departments can use BCUs to reduce design time, shorten deployments and maintain strong price/performance ratio as they add building blocks to enlarge their BI systems.

(9)

TDWI 9

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 9

Future data warehouse appliances?

Will we see these specific appliances in the future?

Do they meet the criteria for success?

ETL or ELT appliance ?

BI appliance ?

EII appliance ?

OLAP appliance ?

Audit appliance ?

(10)

TDWI 10

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 10

Appliances shaping the future of DW

Diagram and Notes from:

Inmon Data Systems http://www.inmoncif.com

Operational Systems are the internal and external core systems that support the day-to-day business operations. They are accessed through application program interfaces (APIs) and are the source of data for the data warehouse and operational data store. (Encompasses all operational systems including ERP, relational and legacy.)

Data Acquisition is the set of processes that capture, integrate, trans-form, cleanse, reengineer and load source data into the data warehouse and operational data store. Data reengineering is the process of investigating, standardizing and providing clean consolidated data.

The Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data used to support the strategic decision-making process for the enterprise. It is the central point of data integration for business intelligence and is the source of data for the data marts, delivering a common view of enterprise data.

Primary Storage Management consists of the processes that manage data within and across the data warehouse and operational data store. It includes processes for backup and recovery, partitioning, summarization, aggregation, and archival and retrieval of data to and from alternative storage.

Alternative Storage is the set of devices used to cost-effectively store data warehouse and exploration warehouse data that is needed but not frequently accessed. These devices are less expensive than disks and still provide adequate performance when the data is needed.

Data Delivery is the set of processes that enable end users and their supporting IS group to build and manage views of the data warehouse within their data marts. It involves a three-step process consisting of filtering, formatting and delivering data from the data warehouse to the data marts.

The Data Mart is customized and/or summarized data derived from the data warehouse and tailored to support the specific analytical requirements of a business unit or function. It utilizes a common enterprise view of strategic data and provides business units more flexibility, control and responsibility. The data mart may or may not be on the same server or location as the data warehouse.

The Operational Data Store (ODS) is a subject-oriented, integrated, current, volatile collection of data used to support the tactical decision-making process for the enterprise. It is the central point of data integration for business management, delivering a common view of enterprise data.

Meta Data Management is the process for managing information needed to promote data legibility, use and administration.

Contents are described in terms of data about data, activity and knowledge.

The Exploration Warehouse is a DSS architectural structure whose purpose is to provide a safe haven for exploratory and ad hoc processing. An exploration warehouse utilizes data compression to provide fast response times with the ability to access the entire database.

The Data Mining Warehouse is an environment created so analysts may test their hypotheses, assertions and assumptions developed in the exploration warehouse. Specialized data mining tools containing intelligent agents are used to perform these tasks.

Activities are the events captured by the enterprise legacy and/or ERP systems as well as external transactions such as Internet interactions.

Statistical Applications are set up to perform complex, difficult statistical analyses such as exception, means, average and pattern analyses. The data warehouse is the source of data for these analyses. These applications analyze massive amounts of detailed data and require a reasonably performing environment.

Analytic Applications are pre-designed, ready-to-install, decision sup-port applications. They generally require some customization to fit the specific requirements of the enterprise. The source of data is the data warehouse. Examples of these applications are risk analysis, database marketing (CRM) analyses, vertical industry "data marts in a box," etc.

External Data is any data outside the normal data collected through an enterprise's internal applications. There can be any number of sources of external data such as demographic, credit, competitor and financial information. Generally, external data is purchased by the enterprise from a vendor of such information.

(11)

TDWI 11

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 11

TDWI Exhibitors

(12)

TDWI 12

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 12

BI Specific Appliance

or

Analytics Appliance

(13)

TDWI 13

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 13

How do you make a BI specific appliance?

Optimized for BI workloads

Store lots of data

Perform analytical queries

Combination of Server + Database + Storage

Ease of use

Low maintenance and TCO

Integrates with existing infrastructure

BI tools with ANSI SQL

Ability to load data easily

Compatible with standard operating procedures

(14)

TDWI 14

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 14

BI appliances are here…

 High Performance & Price Performance

 On-Demand

 TCO and DW Budgets

 Scalability to petabytes

 Backup and Recovery

 Fast Loading and Unloading

 Licensing model

 Operations and Administration

(15)

TDWI 15

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 15

BI Appliance: High Performance

• Massive Parallelism inherent to the architecture

• Improved I/O throughput rates with effective I/O

Host Database Ethernet

ODBC JDBC

CPU CPU CPU CPU

CPU

Backplane

(16)

TDWI 16

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 16

BI Appliance: TCO & DW Budgets

• Licensing costs

– Appliance license replaces separate sever and database licenses

• Reduced DBA expertise and administration

– No tablespaces, extents to manage – No archive, redo logs to manage – No partitioning management

• Reduce need for storage expertise

– SAN storage architects – LUNs, meta volumes

(17)

TDWI 17

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 17

BI Appliance: Large Scale Databases

• 2TB to 100TB appliances are available today

How do you grow from 2TB to 100TB with appliances in the DW Architecture?

• Federated Architecture

• Data Redistribution

(18)

TDWI 18

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 18

Scalability: Federated Architecture

• Logical areas of the data warehouse architecture move to new appliances

– data marts, ODS, staging

• Subject areas or conformed dimensions stay

together unless the capability to perform database joins across platforms exist

– EII tool, remote tables

• Logical groups such as North America DW, Euro DW or corporate entities

(19)

TDWI 19

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 19

Scalability: Data Redistribution

• Adding additional appliances or upgrading to larger appliance may cause data to be redistributed

• Newer data can be loaded to the new appliance and allow the older appliance to become historical data.

• However, this is not an effective use of the appliance price-performance with relatively dormant data on high performance storage

• Appliance vendors should be able to evenly

distribute data over more than one appliance and leverage a single host database architecture.

(20)

TDWI 20

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 20

Scalability: Data Redistribution

Host DB Ethernet

ODBC JDBC

CPU CPU CPU Backplane CPU

Host DB

CPU CPU CPU Backplane CPU

Existing BI Appliance

Adding new BI Appliance Second

Host DB not used

(21)

TDWI 21

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 21

Maintaining performance with scalability

Host DB Ethernet

ODBC JDBC

CPU CPU CPU Backplane CPU

Host DB

CPU CPU CPU Backplane CPU

Existing BI Appliance

Adding new BI Appliance

(22)

TDWI 22

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 22

BI Appliance: Backup and Recovery

• Built-in RAID 1+0 mirroring and striping

• The challenge is that large systems take massive amount of computing and network resources to backup changes in a high volume environment

• Recommended to compress and save load files since loads are faster than recoveries

• In a fully mirrored environment, chances of going to need to restore from a backup are low

• Veritas or other standard APIs are available backups

(23)

TDWI 23

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 23

BI Appliance: Fast Load/Unload

• 500GB per hour load rates

• Near physical limitations disk I/O and Ethernet connection to appliance

• Utilizes high performance ODBC drivers and special loader utilities

(24)

TDWI 24

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 24

BI Appliance: Licensing cost

• Appliance license cost

– Annual maintenance cost as % of appliance cost

--Versus ---

• Server manufacturer annual maintenance cost

• Operating System license cost

– Operating System vendor may be different from hardware

• Database license cost

– Per CPU license

– Per Concurrent user or Named user license

• Database options cost

– Increases database maintenance cost as a %

(25)

TDWI 25

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 25

BI Appliance: Operations

• How do they address:

– User activity tracking – Query activity tracking – Explain Plans

– Management Console – Data distribution profiles – Users, Roles, Privileges

(26)

TDWI 26

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 26

What a fast database can do for you

• Before ETL tools

– ETL was hand coded programs

– ETL was code in database procedural languages

• ETL tools offered

– Faster development with 3GL

– Better performance than database code & SQL

• High Performance database appliance

– Faster queries on large data sets –  High Performance SQL

(27)

TDWI 27

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 27

ELT – A paradigm shift

• Transformations typically are performed after

Extraction and before Loading limiting the database workload which is thought to be query intense for data warehouses.

• High Performance BI appliances are being used to transform the data inside the high performance database and the results stored in the appliance or in other databases

– Some companies point to ETL license savings as part of the ROI for appliances

– Sunopsis is a tool being used in these cases

(28)

TDWI 28

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 28

ROLAP versus MOLAP

• Multidimensional database engines were created to make up for the performance deficiencies of relational OLTP databases at that time.

• MOLAP cubes are known for their high performance interactive capabilities, complex calculations and dimensional and

hierarchical analysis but are challenged in:

– Real-time environments – Large datasets

– High degree of dimensional and hierarchical branches

• MOLAP cubes are also a duplication a data in another database format, the multidimensional database.

• ROLAP leverages on the underlying database for performance characteristics

• ROLAP metrics can be atomic level, pre-calculated, pre-rolled up depending on what works best

(29)

TDWI 29

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 29

Oracle 10g RAC/GRID

(30)

TDWI 30

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 30

Appliances don’t do everything

• Data Architecture and Modeling

– High performance databases shouldn’t make up for poor data analysis and modeling efforts

• Good Requirements, Analysis and Reports

– Make sure that the wrong answer doesn’t just come back faster…

(31)

TDWI 31

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 31

What the business wants from DW

• On-Demand

– New products and projects

• Lower Cost

– Overall TCO and long term maintainability

• Flexibility

– Simpler infrastructure to build and go

• New capabilities

– Scalability

– Ability to store more data with less cost

(32)

TDWI 32

Copyright © 2004-2006 Zukeran Technologies Corp., All Rights Reserved 32

Class Discussion

References

Related documents

With a choice of target residues for mutagenesis, one has to consider many factors. These mutations must not disrupt, or largely change, the protein

Enterprise Performance Custom Reporting Packaged Applications Business Intelligence Analytics Data Federation Data Warehousing Custom Data Marts Data Access Data

Among the respondents supporting an enterprise business intelligence platform, 63 percent said a data warehouse was a primary data repository, but data marts, operational data

The significant advantage of using KLT and KT rotary joints is the fact that there is no coolant leakage from the drain port since TESS technology ensures that the seal faces are

Funding: Black Butte Ranch pays full coost of the vanpool and hired VPSI to provide operation and administra- tive support.. VPSI provided (and continues to provide) the

The 28-day compressive strength   f c ' 28 of GGBS- based mortars activated by the combination of sodium silicate and sodium hydroxide was generally.. lower than that of

Checker Plate Cover only Class A Series 600 Galv.. Sewerage

Voice regimes can be direct or representative in nature and can be delivered in a number of ways; via a union, through management led initiatives or as part of some dual channel