• No results found

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

N/A
N/A
Protected

Academic year: 2021

Share "HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

HITACHI DATA

SYSTEMS HADOOP

SOLUTION

(2)

Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not designed to handle such high volumes and varieties of data.

Hadoop, the latest software platform that scales to process massive volumes of

unstructured and semi-structured data by distributing the workload through clusters of servers, is giving customers new option to tackle data growth and deploy big data analysis to help better understand their business.

Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pre-tested with Cloudera Hadoop distribution to provide a faster time to market for customers deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together and explain how to get you there.

Attend this WebTech and learn how to

• Solve big-data problems with Hadoop.

• Deploy Hadoop in your data warehouse environment to better manage your unstructured and structured data.

• Implement Hadoop using HDS Hadoop reference architecture.

HITACHI DATA SYSTEMS HADOOP SOLUTION

(3)

PRESENTERS

Shankar Radhakrishnan, Solutions Manager, Hitachi Data Systems

Sai Saiprabhu Director, Specialized Services, Hitachi Consulting

Art Vancil Big Data Senior Manager, Hitachi Consulting

(4)

4

ASK BIGGER

QUESTIONS

DANIEL TEMPLETON, PROGRAM

MANAGER AT CLOUDERA

(5)

Enterprise Data Evolution

AMO UNT O F D A TA

• Data collection & reporting

• Process data faster

• Store data more cost-effectively • Simplify infrastructure

• Combine data from across the business • Ask new questions immediately • Enable new real-time applications CREATE

COMPETITIVE ADVANTAGE

IMPROVE

(6)

Data Has Changed in the Last 30 Years

D A TA G R O WTH END-USER APPLICATIONS THE INTERNET MOBILE DEVICES SOPHISTICATED MACHINES STRUCTURED DATA – 10% 1980 2012 UNSTRUCTURED DATA – 90%

(7)

Data Management Strategies

Have Stayed the Same

Raw data on SAN, NAS

and tape

Data moved from

storage to compute

Relational models with

(8)

Too Much Data, Too Many Sources

(9)

Too Much Data, Too Many Sources

$

!

$ $

$

Can’t ingest fast enough

(10)

Too Much Data, Too Many Sources

1

2 3 4

5

Can’t ingest fast enough

Costs too much to store

(11)

Too Much Data, Too Many Sources

Can’t ingest fast enough

Costs too much to store

Exists in different places

(12)

Can’t Use It The Way You Want To

Analysis and processing

takes too long

(13)

Can’t Use It The Way You Want To

1

2 3 4

5

Analysis and processing

takes too long

(14)

Can’t Use It The Way You Want To

?

?

?

Analysis and processing

takes too long

Data exists in silos

(15)

Can’t Use It The Way You Want To

Analysis and processing

takes too long

Data exists in silos

Can’t ask new questions

Can’t analyze

(16)

16

Transform The Way You Think About Data

(17)

SIMPLIFIED, UNIFIED, EFFICIENT

• Bulk of data stored on scalable low cost platform

• Perform end-to-end workflows

• Specialized systems reserved for specialized workloads

• Provides data access across departments or LOB

COMPLEX, FRAGMENTED, COSTLY •Data silos by department or LOB

• Lots of data stored in expensive specialized systems

• Analysts pull select data into EDW

• No one has a complete view

The Cloudera Approach

17

Meet enterprise demands with a new way to think about data.

THE CLOUDERA WAY THE OLD WAY

Single data platform to support BI, Reporting &

App Serving Multiple platforms

(18)

Hadoop complements the Data Warehouse

18 OLTP Enterprise Applications Business Intelligence Data Warehouse Query (High $/Byte) CLOUDERA Store Query Transform ETL Math Load Archive Operational BI Archival Data, Exploration, Analytics

(19)

INGEST STORE EXPLORE PROCESS ANALYZE SERVE

CDH CLOUDERA MANAGER

CLOUDERA SUPPORT

Cloudera Enterprise: The Platform for Big Data

19

BRINGS STORAGE & COMPUTE TOGETHER

WORKS WITH EVERY TYPE OF DATA

CHANGES THE ECONOMICS OF DATA

MANGAGEMENT

A Revolutionary Solution Built on Apache Hadoop

CLOUDERA NAVIGATOR

(20)

CDH4

20

Big Data Storage, Processing & Analytics Based on Apache Hadoop

Store

Land structured and unstructured data in a scalable, cost-effective repository

1

Process & Analyze

Transform data in parallel and query at the speed of thought

2

Integrate

Interoperate with existing platforms, systems and applications

(21)

Cloudera Manager

21

End-to-End Administration for CDH

Deploy

Install, configure & start your cluster in 3 simple steps

1

Configure & Optimize

Ensure optimal settings for all hosts & services

2

Monitor, Diagnose & Report

Find & fix problems quickly, view current & historical activity & resource usage

(22)

Cloudera Navigator

22

Data Management Layer for Cloudera Enterprise

Audit & Access Control

(AVAILABLE NOW)

Ensuring appropriate permissions and reporting on data access for compliance

1

Exploration & Lineage

(COMING SOON)

Finding out what data is available, what it looks like and where it came from

2

Lifecycle Management

(COMING SOON)

Migration of data based on policies

3

(23)

Cloudera Support

23

Our Team of Experts on Call to Help You Meet Your SLAs

Extend Your Team

Get a dedicated team at your disposal to help you solve problems quickly

1

Leverage the Experts

Take advantage of our expertise to make sure your cluster operates at its best

2

Influence Roadmaps

Get advocacy with the open source community to build the features and functionality you need

3

(24)

Cloudera Manager

Management for the complete Hadoop system

The most mature & functionally advanced

The easiest to use w/built-in intelligence

Integration w/enterprise monitoring tools

Cloudera Enterprise

24

CDH4

The only solution with real time query (Impala)

The only solution with HDFS high availability

The most widely deployed & proven

The broadest ecosystem of certified partners

100% open source & built for the enterprise

The Best Hadoop-Based Platform

Cloudera Navigator

The only data management tool for Hadoop

Cloudera Navigator 1.0: Data audit & access control

Cloudera Support

Dedicated team with a global presence

Contributors and committers for every part of CDH

Tens of thousands of nodes under management across industries

(25)

A Complete Solution

25 CLOUDERA UNIVERSITY DEVELOPER TRAINING ADMINISTRATOR TRAINING DATA SCIENCE TRAINING CERTIFICATION PROGRAMS

INGEST STORE EXPLORE PROCESS ANALYZE SERVE

CDH CLOUDERA MANAGER CLOUDERA SUPPORT CLOUDERA NAVIGATOR

(26)

ALTERNATE

TITLE SLIDE

PRESENTER NAME

DATE

TITLE SLIDES

Additional title slide options can be found in the HDS

Icon and Slide Library.

(View in slideshow mode to activate link.)

NOTE

CHOOSING THE

RIGHT

INFRASTRUCTURE

FOR HADOOP

SHANKAR RADHAKRISHNAN,

SOLUTIONS PRODUCT

MANAGER – ORACLE, SAP HANA

AND BIG DATA SOLUTIONS

(27)

HADOOP APPLICATION EXAMPLE:

GENOME ANALYSIS

National Institute of Genomics

– Japan

Challenge: Accelerate the

speed of analysis for genome

data from next-generation

sequencers

4 PB of data

Solution

‒ 115-node Hadoop cluster using Hitachi Compute Rack servers ‒ Reliable and scalable solution

(28)

PROACTIVE MAINTENANCE AT HITACHI

SERVER DIVISION

28 User Inquiry Hardware Auditing Log

Callcenter Log Maintenance Report CRM Customer Data Sales/Financial Data Distribution/Stock Data Location Information Server Log Operation History

BOM data Production Data Of Business System

・Proactive hardware maintenance from logs, call center data, and product information

・Leverage historical data for future product development

Challenge

(29)

• Cost-effective for low-fidelity data

• Increase efficiency and utilization of resources and meet required service levels

• Hardware less prone to failures • Easy to manage

• Scale out to handle petabytes of unstructured and semi-structured data

• Keep data closer to CPU

DATA

GROWTH

COST

COMPLEXITY

INFRASTRUCTURE REQUIREMENTS FOR

HADOOP

(30)

HADOOP IN THE ENTERPRISE:

ARCHITECTURE

Data Warehouse Hadoop Real Time Computer (Streaming) Real Time Computer (Streaming) Outside Services (Connect to Facebook for CRM, etc.)

One Platform for All Data, All Applications

Other Big Data Sources (Email, Audio, Documents, etc.) Business Apps RDB Real-Time Computer (Streaming) Data Connector

CxOs

Data Scientist

Business Users /

Customers

Business Intelligence Dashboard

(31)

INTRODUCING HITACHI REFERENCE

ARCHITECTURE FOR HADOOP

 Pretested and validated for

interoperability, performance, and scalability

 Flexible − customize to fit application

 Pre-validated using Cloudera, leading Hadoop distribution (certification in progress)

 Complementary to existing Hitachi platforms for block, file, and object

 Seamless management integration with other Hitachi solutions

D A T A N O D E -H D F S T A S K T R A C K E R Name Node + Job Tracker

Secondary Name Node

Management LAN

ENTERPRISE-READY INFRASTRUCTURE FOR HADOOP

D A T A N O D E -H D F S T A S K T R A C K E R LAN

(32)

REFERENCE ARCHITECTURE: HARDWARE

COMPONENTS

Qty Form factor Component Description

1 1U Management node Hitachi server CR 210H

- 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard)

- 5 x 3.5-inch 3TB NL-SAS 7200 RPM

1 2U HDFS master name node

- Name node - Job tracker

Hitachi server CR 220S - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard)

- 12 x 3.5-inch 3TB NL-SAS 7200 RPM 1 2U Secondary name node Hitachi server CR 220S

- 2 x quad-core E2600 Series - 64GB main memory - 2 x GigE (onboard)

- 12 x 3.5-inch 3TB NL-SAS 7200 RPM As needed 2U Data nodes

- Data node - Task tracker

Hitachi server CR 220S - 2 x quad-core E2600 series - 64GB main memory - 2 x GigE (onboard) - 12 x 3.5-inch 3TB NL-SAS 7200 RPM 2 1U or 2U Ethernet switches (10 GbE network) Cisco Nexus 5548 - 48 x GigE / 10GigE or Brocade VDX 6720-60

- 40 x GigE / 10GigE – form factor = 2U

1U 2U CR220S Switch-2 42U Internal HDD Switch-1 1U

• High density (2U), high processing power (2 CPU sockets), large data storage (12 HDD)

• Redundant power supplies

• Eco-friendly power saving capabilities

Why Compute

Rack Servers?

(33)

Component Version Description

Operating System 6.3 Redhat or CentOS 64-bit Linux distribution

Hadoop distribution CDH4 Cloudera Hadoop distribution

Hadoop management

4.0.1 Cloudera Manager

Management framework

n/a Hitachi Compute Systems Manager

REFERENCE ARCHITECTURE: SOFTWARE

COMPONENTS

Tested Software

D A T A N O D E -H D F S T A S K T R A C K E R Name Node + Job Tracker

HA Name Node

Management LAN

Reference Architecture White Paper Targeted for June 2013

(34)

WHY HITACHI FOR HADOOP

INFRASTRUCTURE

Enterprise-ready (RAS) for Hadoop

‒ Less worry about hardware failure, more focus on business

value

Seamless management integration with Hitachi solutions

‒ Lower opex

Competitive pricing with commodity hardware

‒ Lower capex

One platform solution for all your data volumes, velocity

and types

(35)

35

HITACHI

CONSULTING

SAI SAIPRABHU, DIRECTOR,

SPECIALIZED SERVICES

ART VANCIL, BIG DATA SENIOR

MANAGER

(36)

HITACHI CONSULTING

As the global consulting company of Hitachi, Ltd., Hitachi Consulting brings business visions to life through in-depth industry expertise combined with innovative technology solutions and services

From articulating strategy through deploying and maintaining applications, Hitachi

Consulting helps clients quickly realize measurable business value and achieve sustainable ROI

The Hitachi Consulting client base includes 35 percent of the Fortune 100 and 25 percent of the Fortune Global 100, along with many mid-market leaders. With offices in North America, Europe, the Middle East, and Asia, the company employs more than 5,000 professionals, with delivery

centers in India and China for global delivery scale

(37)

WHAT DO WE SEE WITH OUR CLIENTS?

Business Objectives Refinement Technology Adoption without disruption Data Science Practice Adoption Business Intelligence Jump Start With Big Data Technologies

Emerging Businesses

Business Intelligence Practice Adoption

(38)

DO YOU NEED AN EXECUTIVE SPONSOR?

The Internet has driven most businesses to demand better information much faster than

ever before across almost every industry

Examples: Retailers can influence the next shopping visit based on analytics; Amazon

can tailor a shopping visit on a variety of dimensions (personalization, price incentives, product combinations, etc.). How will similar dynamics impact your company?

Perhaps your company has not yet started using Hadoop for big data initiatives. Or, perhaps you are stuck in "discovery mode" trying to find

that golden nugget big idea from big data. If your

company is like mine, you will not be given permission to simply play with Hadoop for months on end

In most companies your time spent on a project needs to be backed by someone with a budget who wants to get something done. Let's look at successful methods to secure your big data executive sponsorship.

(39)

HOW DO I GET STARTED?

3 9

Award-winning luck #1

1. Your executive brings to you the justification for big data

Award-winning luck #2

2. Your subject matter expert and your data scientist pour over the data until they find the “golden nugget” of

justification

If you have no budget for big data, then perhaps you are waiting for a stroke of luck?

Stop waiting, and begin now to collaborate with your business consultant to discover the data value and the “essence” of your big data business opportunity

(40)

THE NITTY-GRITTY DETAILS

4 0 CEO/ CSO • Predict the Future COO • Optimize the Business Process CMO CFO/ CTO • Deliver Faster and Cheaper

Hitachi helps you to choose your big data solution by targeting the message to your sponsor’s role and asking the BIG QUESTIONS

• Nurture the Customer Relationship

(41)

FOR EXAMPLE

4 1 A high-end disk storage manufacturer collects daily performance data

from its customers’ storage devices, but cannot effectively analyze it BECAUSE OF THE VOLUME

The big questions to ask: If we stored the data in Hadoop, then

 Could we detect operational patterns that predict device failure worldwide?  Could we anticipate the failure AND suggest a replacement without downtime?  Could we sell the data analysis back to the customer for a fee?

 Could we reduce the support effort by delivering proactive notifications?  How much revenue would we gain/costs would we eliminate?

(42)

SOLUTION SELECTION FRAMEWORK

The solution discovery and evaluation process is a top-down

survey of organizational leadership followed by a prioritization

and ranking, based upon business value and organizational

priorities

All Possible Solutions and Purposes

Solution Solution Solution Solution Solution Solution Solution Solution Prioritized Big Data Solution Selection

Feasible Solutions

Solution

(43)

SPONSOR CONVERSATIONS: ESTABLISHED

BUSINESS INTELLIGENCE ENVIRONMENT

Specific use cases that address chosen pain points to be tackled using big data

capabilities

Measures that show how the use cases alleviate current pain points

External expertise needed to augment your

big data jump start

Action plan to implement prioritized use cases and evaluate larger adoption of big data capabilities

Executive sponsor buy-in

Executive sponsor oversight

(44)

LEVERAGE BIG DATA CAPABILITIES

Extend Historical Transactions Availability

Extend Data Staging, Volume Processing and Complex Data Processing Extend Complex Data Processing Ability to Process Large Volumes Flexibility and Complexity Management Leverage Emerging Capabilities Extends Existing Data Management Environment Introduces New Analytic Capabilities

(45)

BIG DATA TECHNOLOGIES: ADOPTION

STRATEGY

Protect Existing Investments That are Already in the Right Place. Introduce Big Data Technologies to Enable new and Evolving Business Needs

Big Data Appliance

Existing Transactional Sources Social Media Sources Existing Analytic Capabilities Structured Data Management and Existing Data Management

Batch or Stream

Current Augmentation to Structured Data Management (Limited) Stream and Organize

Stream and Organize

Stream and Organize

Sporadic Analytic Capabilities

Big Volume Data Analyses High Velocity Data Analyses Unstructured Data Analyses

Protect Investments as Needed

Streamline as the Environment Matures Expand as Demand grows Introduce New Capabilities Introduce, Consolidate and Expand New Capabilities Enterprise Analytics 1 2 4 3

(46)

SPONSOR CONVERSATIONS: EMERGING

BUSINESS INTELLIGENCE ENVIRONMENT

Business intelligence competencies needed to attain and sustain competitive edge

Measures that help monitor business operations alignment with business strategies

External expertise needed to augment your

Big data and business intelligence jump start

Action plan to implement and evaluate larger adoption of big data business intelligence capabilities

Executive sponsor buy-in

Executive sponsor oversight

(47)

NEXT STEPS

Hitachi Unified Compute Platform for Business Analytics web page

• http://www.hds.com/products/hitachi-unified-compute-platform/business-analytics.html

(48)

QUESTIONS AND

DISCUSSION

(49)

UPCOMING WEBTECHS

WebTechs

‒ Take SAP HANA From Proof of Value Through Production Deployment, June 20, 9 a.m. PT, noon ET

‒ A Cloud You Can Trust–Improve Datacenter Efficiency and Agility, June 26, 9 a.m. PT, noon ET

Check

www.hds.com/webtech

for

Links to the recording, the presentation, and Q&A (available next

week)

(50)

References

Related documents

The Alliance for Telecommunications Industry Solutions (“ATIS”) 1 submits these Comments on behalf of the International Forum on ANSI-41 Standards Technology (“IFAST”) with

Hypothermia risk factors in the very low weight newborn and associated morbidity and mortality in a neonatal care unit.. An

With the onset of the SSW, a long period of weakening in the S 4 amplitude scintillation index, delayed from 20 December to 30 December probably due to the simultaneous solar

Findings from this study suggest that in making decisions about their classroom practices, the two Transitions teachers’ views about writing and academic language development guided

Enterprise Data Warehouse (MPP) Line Of Business Data Marts Hadoop/MapReduce Platform 2012 2013 POC Visualization Platform NoSQL Data Stores e.g..

CRM Applications Data Warehousing Legacy Systems Customer Data Warehouse Track Analyze Plan Execute Operational Data Store Data Mining OLAP Query & Reporting ETL

Different from the above viewpoints, the article “Innovation and Service of Public Libraries in the Background of ‘Belt and Road’ for All People Reading” expounded from the

Mid-State Health Network (MSHN), either directly or through delegation of function to the Community Mental Health Services Program (CMHSP) Participants acting on its behalf,