• No results found

Cloud Computing. With MySQL and Pentaho Data Integration. Matt Casters Chief Data Integration at Pentaho Kettle project founder

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Computing. With MySQL and Pentaho Data Integration. Matt Casters Chief Data Integration at Pentaho Kettle project founder"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloud Computing

With MySQL and Pentaho Data Integration

Matt Casters

Chief Data Integration at Pentaho

Kettle project founder

(2)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Agenda

Introduction to Kettle

Introduction

Use-cases + load demo

Performance / Scalability Kettle Slave Servers

What & How

Kettle Cluster Schemas

What & How

The Cloud

Cloud Examples Q & A

(3)

Pentaho Data Integration - Kettle

PDI is the product associated with the KETTLE open source project

KETTLE provides open source software PDI is a “whole product”

Member of the Pentaho BI Suite Kettle = PDI CE

PDI EE

Management Services Console Knowledge Base Documentation Portal Support License Indemnification ...

(4)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Introduction – Kettle : Kettle

K

ettle

E

xtraction

T

ransportation

T

ransformation

L

oading

E

nvironment

(5)

Introduction - Kettle : Extraction

Extract data from :

35+ database types

MySQL, PostgreSQL, SQLite, ... Oracle, SQL Server, etc

Text files XML files XLS files

Xbase files (dBase, Foxpro, etc) File systems information

Generated data MS Access files LDAP

Geo-data ...

(6)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Introduction - Kettle : Transportation

Transportation of data

Engine based data transfer (no code generator)

Very flexible pathways:

splitting

partitioning

merging

joining

duplicating

clustering (MPP)

(7)

Introduction - Kettle : Transformation

Flexibly transform data

Looking up data databases files memory... Calculating Scripting JavaScript, SQL, RegExp Splitting Mapping Selecting Filtering Pivotting ...

(8)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Introduction - Kettle : Loading

Load data into a target format

Database loads

Data warehouse population Partitioned loading

Bulk loading Parallel loading Clustering

(9)

Introduction - Kettle : Environment

Full GUI called “Spoon” to edit every option in Kettle

Drag & Drop Debugger Rich GUI

Command line tools

execute jobs

execute transformations

Web server

clustering

remote execution

Programming API for Java Plugin eco-system

(10)

© 2006-2007 Pentaho Corporation. All Rights Reserved

(11)
(12)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Introduction – Kettle : User community

Paying Pentaho customers Large and small corporations

All possible sectors

Lone rangers & Hobbiests All regions on Earth

Meet on our Forum : +30,000 posts in 3 years Use our JIRA case tracking systems

Download more than 10,000 copies of Kettle per month

http://www.ohloh.net/projects/3624?p=Kettle

(13)

Typical use-cases

Load data from text files and store it into a database [demo]

Export data from database to text-file or more other databases Data migration between database applications

Exploration of data in existing databases (tables, views, etc.) Information improvement using lookups

Data cleaning

Application integration

Data warehouse population Application integration

Report data generation ...

(14)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Agenda

Introduction to Kettle

Introduction

Use-cases + load demo

Performance / Scalability

Kettle Slave Servers

What & How

Kettle Cluster Schemas

What & How

The Cloud

Cloud Examples Q & A

(15)

Party time!!!!

You're preparing for a big event You bought plenty of food:

500 hot-dogs 500 burgers

Distaster strikes: invitees are not showing up!

(16)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Call Joey Chestnut!!

Fastest eater in the world ** Local boy from Vallejo

103 burgers in 8 minutes 66 hot dogs in 12 minutes

(17)

Problems with the chosen solution

You would need at least 10 Joey Chestnuts to get the “job” done Getting Joeys to show up costs money $

(18)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Performance / Scalability

(19)

Performance / Scalability

MySQL bulk loading limitations...

Memory backed B-tree grows and need to swap out to disk Kills performance

MySQL Performance Blog (Percona)

Predicting how long data load would take

(20)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Performance / Scalability

Limits

Single threaded limits Multi-threaded limits The weakest link

Solutions? Optimizing Tweaking Prodding Removing bottlenecks ... Scaling out

Scaling out with Pentaho Data Integration

Clustering Partitioning

(21)

Performance / Scalability

The “Ideal” architecture has linear scalability

N

(22)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Scalability

The answer from PDI to this challenge is multi-layered:

Clustering : multiple servers running in parallel Partitioning : directing data

Database sharding : scaling the database

Sales table Year 2003 Partition Year 2004 Partition Year 2005 Partition Year 2006 Partition Sales table 2003 2004 2005 2006 Sales 2003 2004 2005 2006 Sales 2003 2004 2005 2006 Sales 2003 2004 2005 2006 Sales DB1 DB2 DB3 DB4

(23)

Agenda

Introduction to Kettle

Introduction

Use-cases + load demo

Scalability

Kettle Slave Servers

What & How

Kettle Cluster Schemas

What & How

The Cloud

Cloud Examples Q & A

(24)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Slave Servers

Building block of the PDI clustering offering Small embedded webserver (Jetty)

Controlled over HTTP

Spits out XML or HTTP [demo]

Easy to start / configure / use Available HTTP services:

Start / Stop transformation or Job Pauze transformation

Adding (posting) transformation or job

Get status of server, transformation or job Cleanup of transformation

Allocate a socket port Register slave

(25)

Slave Server Configuration

Simple configuration : Hostname & HTTP Control port

sh carte.sh localhost 8080

XML configuration:

Optionally look at network interface to grab address Optionally report to a central server

(26)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Agenda

Introduction to Kettle

Introduction

Use-cases + load demo

Scalability

Kettle Slave Servers

What & How

Kettle Cluster Schemas

What & How

The Cloud

Cloud Examples Q & A

(27)

Cluster

Clustering Schema

Consists of a collection of one or more slave servers Is made up of

Master : at least one per cluster Slaves : parallel worker nodes

Simple local sample [demo]

Slave Slave Master Slave Slave Slave Slave

(28)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Agenda

Introduction to Kettle

Introduction

Use-cases + load demo

Scalability

Kettle Slave Servers

What & How

Kettle Cluster Schemas

What & How

The Cloud

Cloud Examples Q & A

(29)

The Cloud

Wikipedia:

“Cloud computing is a style of computing in

which dynamically scalable and often

virtualised resources are provided as a

service over the Internet”

(30)

© 2006-2007 Pentaho Corporation. All Rights Reserved

The Cloud : types

Infrastructure as a Service (IaaS) : Amazon EC2,

Eucalyptus, GoGrid, Nymbus, ...

Platform as a Service (PaaS) : Amazon Web Services,

AppJet (ex Google guys), Azure Services Platform (MS),

Force.com (SalesForce), ...

Software as a Service (SaaS) : SalesForce, Google,

(31)

The Cloud : white paper

Bayon Technologies (Pentaho Partner)

http://www.bayontechnologies.com

(32)

© 2006-2007 Pentaho Corporation. All Rights Reserved

The Cloud : Demo time

Start a dynamic cluster on EC2

Job:

Execute DDL on all slave servers

Design a first transformation :

Load a file on the cloud in parallel in 10 MySQL

dabases in 10 tables, split the rows

(33)

The Kettle Cloud : links

My blog : http://www.ibridge.be

(34)

© 2006-2007 Pentaho Corporation. All Rights Reserved

Q & A

Our homepage: http://kettle.pentaho.org

Our Forum: http://forums.pentaho.org/forumdisplay.php?f=69

Our case tracker: http://jira.pentaho.org/browse/PDI

Our wiki : http://wiki.pentaho.org/

http://wiki.pentaho.com/display/EAI

Our IRC Channel: ##pentaho (on Freenode) Developers mailing list:

http://groups.google.com/group/kettle-developers

My humble blog: http://www.ibridge.be

References

Related documents

Given the elevated exposure of miners and residents in artisanal mining communities to Hg vapor and MeHg accumulated in fish, the aim of this study is to quantitatively as sess the

If the water utility provider cannot determine if there are lead service lines, or the home owner uses their own private water system, the home owner can contact a local public

Here you will learn how these cranes are able to lift thousands of pounds using hydraulics, and we ll climb into the cab to show you just how these machines are operated.... It s

heel and trim 5° 2° Slewing angle endless Dead weight 24800 kg Operating pressure 300 bar Recommended pump capacity 2x250 l/min.. Indicated loads in

Code 2 (atypical) if symptoms were not typical but there was (a) one or more of atypical pain, acute left ventricular failure, shock, and syncope and (b) the absence of cardiac

To answer this ques- tion, we consider the following realistic policy instruments: a tax-transfer schedule with parame- ters λ and τ as in equation ( 5 ); uncapped social

The SAT-MDG approach including boolean encoding of directed formula, CNF conversion with correctness formula generation and using SAT solver for decision took about 0.0422 seconds

“Mersault's revolt consummates a series of rejections : of resignation in the face of inevitability of death, of acceptance of the meaninglessness of a life