More Data
in Less Time
Leveraging Cloudera CDH as an Operational Data Store
Goals of an Operational Data Store
Ingest Data
Prepare Data
Store Data
Enterprise Data Warehouse
Applications Data Sources
Structured Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
Enterprise Data Warehouse Applications Data Sources Structured Unstructured Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
Serve ELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest P ro ce ss Load
Challenges with a Traditional Architecture
1) Limited Data Ingest
Unstructured Data Challenge
Data Siloes
Limit Data Collection
2) Inefficient Data Processing
Resource Intensive ELT
Transforming Unstructured Data
Meeting SLAs
1 2 2 33) Data Archived
A New Way Forward
1) Ingest More Data
Collect Any Data Volume
Collect Data in Full Fidelity
Diverse Data
2) Optimize Data Processing
ELT Offload
Parallel Processing
Scalable Storage
3) Automated Secure Archive
Historic Data Access
Cost Effective Data Storage
Compliance-Ready
Enterprise Data Warehouse
Applications Data Sources
Structured Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
Traditional system could not process
omni-channel data fast enough…
•
Limiting customers to monthly
reports
•
Forcing decisions to be made with
stale data
•
Leading to poor consumer experience
due to latency
Cloudera provided a landing
zone where Experian could
process and store large
amounts of disparate data at
scale.
Solution
Customer Spotlight
Process 28K records per
second
Process data 50X faster
Increase consumer report
frequency from monthly to
weekly
Challenge
Benefit
“We needed to leap forward in our processing ability. We wanted to process data orders of
magnitude faster so we could react to tomorrow’s consumer.”
How Cloudera Helps
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENTSTORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
Filesystem Online NoSQL
1.
Scalable Storage & Ingest
2.
ETL Tool Integration
3.
Data Modeling
4.
Parallel Processing
5.
Data Security & Governance
6.
High Availability Administration
Store and Ingest More Data
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENTSTORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
Filesystem Online NoSQL
Data Storage
•
Store any volume or type of data in
full fidelity
•
Storage for Replay
Data Ingestion
•
Easily integrate data from existing
systems (relational, EDW, NoSQL, etc)
•
Quickly ingest multiple data types
(schema on read vs schema on write)
Unstructured Structured
“The NetApp Open Solution for Hadoop system offers us
the scalability and flexibility we need to effectively
support our growing client base and rapidly expanding
data stores…”
Integrate with Existing Tools
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENTSTORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
Filesystem Online NoSQL
ETL Partners
Model Structured & Unstructured Data Faster
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENTSTORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
Filesystem Online NoSQL
Data Management
•
Use lineage to discover, track, and
validate new and old data to ensure
proper use
Analytic SQL
Batch Processing
•
Fault-tolerant processing of large
volumes of diverse data
Stream Processing
•
Process data as it’s made
available
Parallel Process Data Volumes
"The Orbitz Worldwide sites process millions of searches and transactions every day... Hadoop was
selected to provide a solution to the problem of long-term storage and processing…”
-— Jonathan Seifman, Lead Engineer for the Intelligent Marketplace Team
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
Enterprise Security & Governance
•
End-to-end protection with integrated
authentication, role based authorization,
encryption, key management, audit, and
lineage
•
Native platform solution ensures unified data
management for easy reporting and
discovery of data
•
Compliance-ready to meet stringent
regulatory requirements, out-of-the-box
Protect and Govern Your Data
"We selected Cloudera because of its short deployment time and breadth of mission-critical features,
which satisfy the strict security and reliability requirements of our business.”
— Stefan Apitz, VP of Operations
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENTSTORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
High Availably Administration
•
Simple, centralized system view from
ingest to analysis
•
Supports mission critical workloads with
necessary enterprise features (BDR,
Proactive Support, Security)
•
Zero downtime rolling upgrades
•
Natively deploy and mange ETL tools
Manage Overall System Performance
“Cloudera Enterprise gives our operations
team the confidence that we are ahead of
the curve in terms of keeping our cluster
running with peak performance.”
—Nick Halstead, Founder
BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECUREDAT A MAN AG EME N T SY ST EM MAN AG EME N T
CLOUDERA’S ENTERPRISE DATA HUB
Focus on the solution, not the
cluster, with the only complete,
zero-downtime administration
tool for Apache Hadoop.
Unique Capabilities:
•
Unified configuration, management
and monitoring across all services
•
Online installation and upgrades
•
Direct connection to Cloudera
Support
•
3
rdParty Extensibility
Keep Services Running
“Cloudera Enterprise gives our operations team the
confidence that we are ahead of the curve in terms of
keeping our cluster running with peak performance.”
Traditional vs Modern Architectures
Enterprise Data Warehouse
Applications Data Sources
Structured Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
BI System Modeling Reporting EDH Ingest Active Structured Data Serve Serve ELT Archiv e Load ETL
Ingest More Data
Optimize Data Processing
Automated Secure Archive
Enterprise Data Warehouse
Applications Data Sources
Structured Unstructured
Ingest
Operational Data Store
The Road to Success
Audit architecture in light of security policies and best practices
Implement custom security to authenticate users, admins, and apps
Security
Integration
Administrator
Training
Configure, install, and monitor clusters for optimal performance
Implement security measures and multi-user functionality
Apply SQL to much larger data sets with Impala, Hive, and Pig
Master advanced techniques that boost Hadoop accessibility
Data Analyst
Training
Reference implementation to 3 sources, 5 transforms, 1 target
Create, execute, test, and review a custom ingestion/ETL plan
ETL Ingestion
Disrupt the Industry Not Your Business
Proposed Evolution of Cloudera Enterprise Deployment
Estimated Data in Production
Proposed Services Timeline
Implement Full
Governance, Privacy,
and Compliance
Align Systems, Operations, &
Strategy to Best-in-Class
Enable Big Data Processing and
Applications Development
Activate All Your Data
in One Place
Administrator
Training
4 DaysETL Ingestion
Pilot
2 WeeksSecurity
Integration
1-2 WeekData Analyst
Training
3 DaysCluster Setup &
Why Cloudera?
Enterprise-Grade Hadoop
Differentiated performance, security, management, and governance.
Expertise
No one knows Hadoop better than Cloudera.
Enablement
Support, Training, and Professional Services enable and deliver success.
Ecosystem
Cloudera ensures that Hadoop works with the platforms, tools, and integrators you
rely on.
Sustainable Innovation
The Most Complete Ecosystem
Data
Systems
Infrastructure
Applications
Operational
Tools
Enterprise Data Hub