Greenplum Database
Getting Started with
Big Data Analytics
Ofir Manor
Agenda
• Introduction to Greenplum
• Greenplum Database Architecture
• Flexible Database Configuration
• Beyond SQL – Flexible Analytics
• Flexible Deployment
!!!
!!!
!!!
!!!
!!!
“
Big Data
Is Less
About Size, And
More About Freedom”
―Techcrunch
!!!
!!!
!!!
“Findings: ‘Big Data’ Is
More Extreme Than
Volume”
― Gartner
“Big Data! It’s Real, It’s
Real-time, and It’s
Already Changing Your
World”
―IDC
“Total data:
‘bigger’ than big
data”
!!!
!!!
!!!
!!!
!!!
“
Big Data
Is Less
About Size, And
More About Freedom”
―Techcrunch
!!!
!!!
!!!
“Findings: ‘Big Data’ Is
More Extreme Than
Volume”
― Gartner
“Big Data! It’s Real, It’s
Real-time, and It’s
Already Changing Your
World”
―IDC
“Total data:
‘bigger’ than big
data”
― 451 Group
THE ERA OF
BIG DATA
Industries Are Broadly
Embracing Big Data
Retail
•CRM – Customer Scoring •Store Siting and Layout •Fraud Detection / Prevention •Supply Chain Optimization
Advertising & Public Relations
•Demand Signaling •Ad Targeting •Sentiment Analysis •Customer Acquisition
Financial Services
•Algorithmic Trading •Risk Analysis •Fraud Detection •Portfolio AnalysisMedia & Telecommunications
•Network Optimization •Customer Scoring •Churn Prevention •Fraud Prevention
Manufacturing
•Product Research •Engineering Analytics •Process & Quality Analysis •Distribution OptimizationEnergy
•Smart Grid •ExplorationGovernment
•Market Governance •Counter-Terrorism •Econometrics •Health InformaticsHealthcare & Life Sciences
•Pharmaco-Genomics •Bio-Informatics
•Pharmaceutical Research •Clinical Outcomes Research
Extreme Performance for Analytics
Optimized for BI and analytics
–
Deep integration with statistical packages
–
High performance parallel implementations
•
Simple and automatic
–
Just load and query like any database
–
Tables are automatically distributed
across nodes
•
Extremely scalable
–
MPP shared-nothing architecture
–
All nodes can scan and process in parallel
–
Linear scalability by adding nodes
A Mature Enterprise Platform
PRODUCT
FEATURES
CLIENT ACCESS
& TOOLS
Multi-Level Fault Tolerance (RAID, Mirroring, DR with
Data Domain Boost)
Shared-Nothing MPP Parallel Query Optimizer Polymorphic Data Storage™
CLIENT ACCESS
ODBC, JDBC, OLEDB, MapReduce, etc.
CORE MPP
ARCHITECTURE
Parallel Dataflow Engine gNet™ Software Interconnect Scatter/Gather Streaming™ Data Loading Online System Expansion Workload Management
GREENPLUM
DATABASE ADAPTIVE
SERVICES
LOADING & EXT. ACCESS
Petabyte-Scale Loading Trickle Micro-Batching Anywhere Data Access
STORAGE & DATA ACCESS
Hybrid Storage & Execution (Row- & Column-Oriented)
In-Database Compression Multi-Level Partitioning Indexes – Btree, Bitmap, etc.
External Table Support
LANGUAGE SUPPORT
Comprehensive SQL Native MapReduce SQL 2003 OLAP Extensions
Programmable Analytics Analytics Extensions (GeoSpatial, PR/R, PL/Java,
PL/Python, PL/Perl)
3
rdPARTY TOOLS
BI Tools, ETL ToolsData Mining, etc
ADMIN TOOLS
Greenplum Command Center Greenplum Package Manager