• No results found

More Data in Less Time

N/A
N/A
Protected

Academic year: 2021

Share "More Data in Less Time"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

More Data

in Less Time

Leveraging Cloudera CDH as an Operational Data Store

(2)

Goals of an Operational Data Store

Ingest Data

Prepare Data

Store Data

Enterprise Data Warehouse

Applications Data Sources

Structured Unstructured

Ingest

Operational Data Store

Traditional Architecture

Enterprise Data Warehouse

(3)

Enterprise Data Warehouse Applications Data Sources Structured Unstructured Ingest

Operational Data Store

Traditional Architecture

Enterprise Data Warehouse

Serve ELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest P ro ce ss Load

Challenges with a Traditional Architecture

1) Limited Data Ingest

Unstructured Data Challenge

Data Siloes

Limit Data Collection

2) Inefficient Data Processing

Resource Intensive ELT

Transforming Unstructured Data

Meeting SLAs

1 2 2 3

3) Data Archived

(4)

A New Way Forward

1) Ingest More Data

Collect Any Data Volume

Collect Data in Full Fidelity

Diverse Data

2) Optimize Data Processing

ELT Offload

Parallel Processing

Scalable Storage

3) Automated Secure Archive

Historic Data Access

Cost Effective Data Storage

Compliance-Ready

Enterprise Data Warehouse

Applications Data Sources

Structured Unstructured

Operational Data Store

Modern Architecture

Enterprise Data Warehouse

(5)

Traditional system could not process

omni-channel data fast enough…

Limiting customers to monthly

reports

Forcing decisions to be made with

stale data

Leading to poor consumer experience

due to latency

Cloudera provided a landing

zone where Experian could

process and store large

amounts of disparate data at

scale.

Solution

Customer Spotlight

Process 28K records per

second

Process data 50X faster

Increase consumer report

frequency from monthly to

weekly

Challenge

Benefit

“We needed to leap forward in our processing ability. We wanted to process data orders of

magnitude faster so we could react to tomorrow’s consumer.”

(6)

How Cloudera Helps

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

Filesystem Online NoSQL

1.

Scalable Storage & Ingest

2.

ETL Tool Integration

3.

Data Modeling

4.

Parallel Processing

5.

Data Security & Governance

6.

High Availability Administration

(7)

Store and Ingest More Data

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

Filesystem Online NoSQL

Data Storage

Store any volume or type of data in

full fidelity

Storage for Replay

Data Ingestion

Easily integrate data from existing

systems (relational, EDW, NoSQL, etc)

Quickly ingest multiple data types

(schema on read vs schema on write)

Unstructured Structured

“The NetApp Open Solution for Hadoop system offers us

the scalability and flexibility we need to effectively

support our growing client base and rapidly expanding

data stores…”

(8)

Integrate with Existing Tools

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

Filesystem Online NoSQL

ETL Partners

(9)

Model Structured & Unstructured Data Faster

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

Filesystem Online NoSQL

Data Management

Use lineage to discover, track, and

validate new and old data to ensure

proper use

Analytic SQL

(10)

Batch Processing

Fault-tolerant processing of large

volumes of diverse data

Stream Processing

Process data as it’s made

available

Parallel Process Data Volumes

"The Orbitz Worldwide sites process millions of searches and transactions every day... Hadoop was

selected to provide a solution to the problem of long-term storage and processing…”

-— Jonathan Seifman, Lead Engineer for the Intelligent Marketplace Team

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

(11)

Enterprise Security & Governance

End-to-end protection with integrated

authentication, role based authorization,

encryption, key management, audit, and

lineage

Native platform solution ensures unified data

management for easy reporting and

discovery of data

Compliance-ready to meet stringent

regulatory requirements, out-of-the-box

Protect and Govern Your Data

"We selected Cloudera because of its short deployment time and breadth of mission-critical features,

which satisfy the strict security and reliability requirements of our business.”

— Stefan Apitz, VP of Operations

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

(12)

High Availably Administration

Simple, centralized system view from

ingest to analysis

Supports mission critical workloads with

necessary enterprise features (BDR,

Proactive Support, Security)

Zero downtime rolling upgrades

Natively deploy and mange ETL tools

Manage Overall System Performance

“Cloudera Enterprise gives our operations

team the confidence that we are ahead of

the curve in terms of keeping our cluster

running with peak performance.”

—Nick Halstead, Founder

BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

UNIFIED, ELASTIC, RESILIENT, SECURE

DAT A MAN AG EME N T SY ST EM MAN AG EME N T

CLOUDERA’S ENTERPRISE DATA HUB

(13)

Focus on the solution, not the

cluster, with the only complete,

zero-downtime administration

tool for Apache Hadoop.

Unique Capabilities:

Unified configuration, management

and monitoring across all services

Online installation and upgrades

Direct connection to Cloudera

Support

3

rd

Party Extensibility

Keep Services Running

“Cloudera Enterprise gives our operations team the

confidence that we are ahead of the curve in terms of

keeping our cluster running with peak performance.”

(14)

Traditional vs Modern Architectures

Enterprise Data Warehouse

Applications Data Sources

Structured Unstructured

Operational Data Store

Modern Architecture

Enterprise Data Warehouse

BI System Modeling Reporting EDH Ingest Active Structured Data Serve Serve ELT Archiv e Load ETL

Ingest More Data

Optimize Data Processing

Automated Secure Archive

Enterprise Data Warehouse

Applications Data Sources

Structured Unstructured

Ingest

Operational Data Store

(15)

The Road to Success

Audit architecture in light of security policies and best practices

Implement custom security to authenticate users, admins, and apps

Security

Integration

Administrator

Training

Configure, install, and monitor clusters for optimal performance

Implement security measures and multi-user functionality

Apply SQL to much larger data sets with Impala, Hive, and Pig

Master advanced techniques that boost Hadoop accessibility

Data Analyst

Training

Reference implementation to 3 sources, 5 transforms, 1 target

Create, execute, test, and review a custom ingestion/ETL plan

ETL Ingestion

(16)

Disrupt the Industry Not Your Business

Proposed Evolution of Cloudera Enterprise Deployment

Estimated Data in Production

Proposed Services Timeline

Implement Full

Governance, Privacy,

and Compliance

Align Systems, Operations, &

Strategy to Best-in-Class

Enable Big Data Processing and

Applications Development

Activate All Your Data

in One Place

Administrator

Training

4 Days

ETL Ingestion

Pilot

2 Weeks

Security

Integration

1-2 Week

Data Analyst

Training

3 Days

Cluster Setup &

(17)
(18)

Why Cloudera?

Enterprise-Grade Hadoop

Differentiated performance, security, management, and governance.

Expertise

No one knows Hadoop better than Cloudera.

Enablement

Support, Training, and Professional Services enable and deliver success.

Ecosystem

Cloudera ensures that Hadoop works with the platforms, tools, and integrators you

rely on.

Sustainable Innovation

(19)

The Most Complete Ecosystem

Data

Systems

Infrastructure

Applications

Operational

Tools

Enterprise Data Hub

Security and Administration

Unlimited Storage

Process Discover

Model

Serve

System Integration

More than

1,200

partners

(20)

The Journey to a Data Strategy

Operational Efficiency

New Business Value

Security and Administration

Unlimited Storage

Proces

s

Discov

er

Model

Serve

Optimize your architecture.

IT

Discover the value in your data.

analysts and data

scientists

References

Related documents

With the identification of cognitive behavior and reflexivity indicators, findings in this Chapter are presented as followed: first, results regarding the underlying

9 This estimate is derived using estimates of the total number of rental occupied housing units from the American Community Survey (2009-2013 five-year estimates) in combination

This paper, by the Convener of the working party which recently produced insurers’ guidance on the protection of empty buildings * , suggests that where an existing commercial

(2018) How Will the Chocolate Industry Approach Cocoa Farmer ‘Living Income’?, 3 May, www.confectionerynews.com/Article/2018/05/03/

In keeping with the ILO’s global estimate classifications, child labour in domestic work statistically includes: (i) all children aged 5-11 years engaged in domestic work;

This issue supersedes all previous issues All prices are subject to change without prior notice Use the latest official GEDE order form ML / Match UPS Page 2 of 7.

The questions were divided into four themes, which reflect the research objectives: the managerial and performance related benefits of an FSMP approach; the process of

ALAN CULBRETH , MD Family Practice University of Louisville Clifty Drive Medical Building 445 Clifty Drive, Madison, IN 47250 812/273-7700 ROBERT ELLIS , MD Family Practice