• No results found

Getting Started & Successful with Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Getting Started & Successful with Big Data"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Getting Started & Successful

with Big Data

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

(2)

Your Hosts Today

Paul Brook

Cloud EMEA Program Manager

Dell

2

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Davy Nys

VP EMEA & APAC

Pentaho

Chuck Yarbrough

Technical Solutions Marketing

Pentaho

(3)

Pentaho Webinar Series

3

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

(4)

Goals for Today

4

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

To Understand:

How to get a Hadoop

cluster up and running

Where Hadoop and

other pieces fit into the

architecture

How you can easily get

data in & out Hadoop

How to leverage

Hadoop with Pentaho

(5)

Complete Analytics and Visual Data Management

Hadoop

NoSQL Databases

Data Discovery

&

Visualization

Enterprise

&

Ad Hoc Reporting

Predictive Analytics

&

Machine Learning

Data Ingestion, Manipulation

&

Integration

Analytic Databases

(6)

Data Warehouse Optimization

Data Sources

Big Data Architecture

Data Warehouse

(Master & Transactional Data)

ERP

CRM

CDR

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Logs

Logs

Other Data

Raw Data Parsed Data Analytic Datasets Master Data

Tape

Archive

(7)

Steps To Start with Hadoop

7

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Hadoop

Installation

• 

Install locally – as Pseudo-Distributed

mode

• 

Leverage tools like Dell Crowbar

• 

Cloud sandbox

• 

Easy download & installation

• 

Start on desktop

Pentaho

Installation

1

2

• 

Extract or access data from source

systems

• 

Load it (in its raw form) into Hadoop

• 

Tokenize & parse as required

• 

Transform & enrich

• 

Load into destination

Start

Loading

(8)

Data Architecture and Integration Challenges

Data Sources

Big Data Architecture

Data Warehouse

(Master & Transactional Data)

ERP

CRM

CDR

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Logs

Logs

Other Data

Raw Data Parsed Data Analytic Datasets Master Data

NOSQL

(9)

Data Architecture and Integration Challenges

Data Sources

Big Data Architecture

Data Warehouse

(Master & Transactional Data)

ERP

CRM

CDR

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Logs

Logs

Other Data

Raw Data Parsed Data Analytic Datasets Master Data

NOSQL

Extract

Transform

Load

Orchestration & Integration

MR

(10)

Data Architecture and Integration Challenges

Data Sources

Big Data Architecture

Data Warehouse

(Master & Transactional Data)

ERP

CRM

CDR

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Analytic

Data Mart(s)

Logs

Logs

Other Data

Raw Data Parsed Data Analytic Datasets Master Data

NOSQL

Extract

Transform

Load

Orchestration & Integration

MR

(11)

Solution Architecture & Demo

11

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Fast and easy way to deploy

Hadoop clusters with Dell

(12)

Global Marketing

Fast and easy way to deploy

Hadoop clusters with Dell

(13)

Global Marketing

Well we are ready, but

how will the Hardware

Team know how to size

and design the Hadoop

cluster……..?

I don’t know….and it

may take a long time to

build the Hadoop cluster

Time is a critical

factor, we need to get

this project moving

(14)

Global Marketing

Reduce time to

Cluster Sizing, Design & Deployment

Faster

time to productive operations

Optimize

and adapt for your needs

Deliver

the best return on investment

Reduce risk &

increase

flexibility with

Dell

(15)

Global Marketing

Dell | Hadoop Solution

“Dell … was one of the first of

the hardware vendors to grasp

the fact that

cloud is about provisioning

services, not about the hardware.”

Maxwell Cooter, Cloud Pro

Excels at supporting complex big data analyses

across large collections of structured and unstructured

data

Hadoop handles a variety of workloads, including

search, log processing, data warehousing,

recommendation systems and video/image analysis

Work on the most modern scale-out architectures

using a clean-sheet design data framework

Without vendor lock-in

Apache Hadoop software

Crowbar software framework with a Hadoop

barclamp

PowerEdge C8000 Series, C6220, R720, R720XD

Force10 or PowerConnect switches

Reference Architecture

Deployment Guide

Joint Service and Support

Proven solutions

Proven components

(16)

Global Marketing

Crowbar

•  Accelerates

multi-node deployments

•  Simplifies

maintenance

•  Streamlines

ongoing updates

Built with DevOps

Provides an operational model for managing big data clusters

and cloud

Field-proven technologies

Build on locally deployed Chef Server

Raw servers to full cluster in <2 hours

Hardened with more than a year of deployments

Apache 2 open source

Multi-apps (Hadoop & OpenStack)

Multi-OS (Ubuntu, RHEL, CentOS, SUSE)

NOT limited to Dell hardware

Crowbar Software Framework

(17)

Global Marketing

Deploy a Hadoop cluster in

~2 hours

Reduce software

licensing fees

100%

1. Dell case study: 7-Eleven; 2, Salesforce.com

Use Crowbar to:

• 

Automate the deployment and

configuration of a Hadoop cluster

• 

Quickly provision bare-metal servers

from box to cluster with minimal

intervention

• 

Maintain, upgrade and evolve your

Hadoop cluster over time

• 

Leverage an open source framework

backed by a growing global developer

ecosystem

Reduce

development time

4-6 mo.

Crowbar

software

framework

(18)

Global Marketing

(19)

Global Marketing

Leverage developer expertise

worldwide

Download the open source software: https://

github.com/dellcloudedge/crowbar

Participate in an active community http://

lists.us.dell.com/mailman/listinfo/crowbar

Get resources on the Wiki: https://github.com/

dellcloudedge/crowbar/wiki

(20)

Global Marketing

(21)

pentaho.com/download

Install on a local desktop – no need for a

cluster

“Managed Code” no additional installations

Pentaho will write to the Hadoop Distributed

Cache for execution

Pentaho Installation

21

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Scheduling

Integration

Manipulation

Orchestration

(22)

Start Loading

Loading into HDFS & HIVE

Hadoop Copy Files

Specify source files / destination

22

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

3

Loading into HBASE

Zookeeper host & port

(23)

Solution Architecture & Demo

23

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

(24)

Maximize Performance

24

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

As much as

15x faster than

hand-written

code.

Parallel

execution as

MapReduce

in the Hadoop

cluster.

(25)

Additional Best Practices

25

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Leverage

Hadoop

Don’t do database lookups inside a Mapper/

Reducer – bring the data set into HDFS

Don’t transfer data between two clustering

technologies – network overload

Start with a small data set and validate logic &

performance outside the cluster

Gradually increase volumes and fine tune the

application, cluster, data stores & network

Don’t Boil

the Ocean

Leverage the various technologies available

A combination of easy to use tools, powerful

scripting and custom coding provides the best mix

It’s AND…

AND

(26)

Solution Architecture & Demo

26

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

(27)

27

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Contact Us or Sign-up at:

References

Related documents

Fuller caps his natural law inquiry by acknowledging man's nature as a communicative being, but this is only the starting point for contemporary continental

We present evidence on whether workers have social preferences by comparing workers' productivity under relative incentives, where individual effort imposes a negative externality

Und, wenn man den App-Begriff weiter fasst und sagt, dass muss jetzt nicht unbedingt eine native App sein, sondern es kann auch eine Web-App sein, dann kann das durch aus sein,

In the Trachiniae , Deianira's authoritative position in relation to speech, and particularly the efficacious praise and blame speech which is used for

8 Схема взаємозв’язку партнерів, які входять до коаліції Ко – брендингові програма лояльності – це програма, яка дозволяє не

Area 44 (Pars Opercularis) (left column in figure 3) in the left hemisphere showed positive RSFC with bilateral ventrolateral and dorsal frontal regions (including areas 45

F IGURE 4.37 T IME DOMAIN RESULTS OF ANISOTROPIC CYLINDER WITH INCLINED IMPACT AT THE MIDDLE CONSIDERING TIMBER POLE SITUATION ( DOWN GOING WAVE )

Introduction to creativity was the second most liked topic for 29.4% of the participants, Business Environment and Entrepreneurship, 23.5%; while Entrepreneurship and Innovation: