© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Vertica Database
simply fast !
Mastering Big Data with HP Software
Lior Tzabari - Regional Sales Manager
There is strategic value in
big data; with real-time
analytics, organizations are
able to maximize business
value and efficiencies
Welcome to the World of Big Data
Compliance Sarbanes-Oxley, HIPAA, Basel II Enterprise ERP CRM Products, Customers, Suppliers, Partners Financial Services High-frequency Trading Algorithmic Trading Healthcare
Electronic Patient Record Gene Sequencing Medical Imaging Mobility Technology Sensors, LOBs, XML Social Media Communications
Call Detail Records
Geophysical Exploration
50
%
98
%
34
%
35
%
HP survey responses - senior business and technology executives
3
HP Customers Big Data Concerns
Do not have an effective information strategy in place
Can not deliver the right information, at right time to support enterprise outcomes all of the time
Say that half of their information
is unconnected, undiscovered and unused
Are not effective at accessing enterprise information as and when needed for compliance or operational needs
How Big is Big Data?
• Storage capacity growing 23% per annum
• Computing capacity growing 54% per
annum
• 60% of the world’s population used mobile
phones in 2010
• 30 billion pieces of content shared every
month on Facebook
• 30 million network sensor nodes in 2010 –
annual growth rate > 30% a year
• 40% projected growth in global data
generated per year vs. 5% growth in global IT spending
Source: Big Data – The Next Frontier for
Innovation, Competition and Productivity McKinsey Global Institute
Figure 1: The Digital Universe 2009 - 2020
*Zettabyte = 1 trillion gigabytes
2020
35 ZB
*
2009 0.8 ZB* Growing by a Factor of 44Extreme Information: volume, velocity, variety and complexity
What is Big Data?
BIG DATA:
datasets whose
velocity and/or volume is
beyond the capability of
typical database tools to
collect, store, manage and
analyze
Pattern- Based Analytics Targeted Engagement Contextual Relevance BIG DATA Social Media Video Audio Email Texts Mobile Transactional Data IT/OT Docs Search Engine Images• IT Value:
• Big Data and analytics projects offer
higher ROI than any other IT projects
• Opportunity for IT, analysts, and business
users to come together (Moneyball!)
• Leverage previous skills and investments
in IT projects that collect and store information
It Can Be Monetized!
Why Should You Care About Big Data?
• Business Value Examples:
• $300B in annual U.S. Healthcare value
• Retailers can increase operating margin by
60% using Big Data
• Governments could save more than $149B
(Europe alone) annual through improved operational efficiency
Data volumes growing faster than
people, skills, disk, plant and power
The Big Data Paradox:
•
Outdated Technology:
– Traditional DBMS were never designed for today’s volume, velocity, complexity
– Ad hoc questions come from all users, even customers directly – Detailed data is where the interesting things happen
•
Shortage of People:
– U.S. alone faces shortage of 150,000+ people with deep analytic skills
– U.S. missing 1.5M managers and analysts to analyze data and make decisions
=
Mobile Sensor Individual Services Cloud Better Decisions Analysis Statistics Real Time MonetizeVertica Analytics Platform – Real Time Big Data
• SOFTWARE based
Real-Time Analytics Platform
• SQL & NoSQL
analytics capabilities • Industry Leading LOAD
& QUERY Performance • SIMPLE installation &
use with AUTOMATIC
setup and tuning
• Highly SCALABLE,
ELASTIC and full parallelism – MPP
• MONETIZE 100% of your data
Healthcare Financial Services Communications Consumer Marketing OnlineWeb & Gaming Retail
750
customers
+
Next Generation Administration and Design Tools
A Platform Designed for Big Data
True Column Store - RDBMS
Native and Performance Optimized High Availability
Real Time Massively Parallel Processing
Columnar Compression Concurrent Load & Query Elastic Cluster SQL
Analytics User- Defined Analytics
Optimized
Graphing with Vertica – It’s not just Social!
Visualize the Power of relationships
Relationships can be people, products, markets, compounds, etc.
Scale, performance, and elasticity are
Structured Unstructured Semi-structured
Big Data Analytics – Not Only SQL & Structured
Monetize 100% of your data
–All data sources–Internal / External
–More data points = greater insight
Common Platform – Uncommon Results
–Real-time analytics with both SQL & NoSQL –Dynamically add / change sources– Scale, elasticity, and simplicity – all with predictable performance
How HP/Vertica Predicted the
Oscars from Twitter Sentiment
Understand the Past,
Predict the Future…
• Loaded raw tweets from Twitter into
Vertica prior to Oscars
• Performed text parsing and
sentiment analysis in Vertica
• Scored each film category based
on positive/negative mentions
• Accurately predicted winners in
nearly every category!
Real Time Monetize Better Decisions Analysis Statistics
Vertica Analytics Platform - Monetizing Big Data
Telecommunications
• 7 of the top 10 global telecommunications firms run
their business on Vertica
• Revenue & Service Assurance and Fraud Detection • Sensor & Device management and performance
monitoring
• Subscriber insights and targeted marketing and
advertising
“Vertica opened doors to analyses that otherwise were too time-intensive or impossible. A larger team of business managers now have faster, easier access to more information. That knowledge is invaluable in an aggressively competitive market like ours.”
Internet Gaming/Web 2.0
• Predictive & targeted engagement for every individual • Pattern recognition, sentiment, and social media • Capture, analyze, and store PB’s of data – no pruning • Real-time analysis for actionable insights – NOW!
“…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing”
Financial Services
• Revolutionize catastrophe and risk management
• Real-time measurement and management to maximize
asset performance
• Integrated offerings for financial services – Institutional,
Retail, Liquidity, Risk, etc.
• Comprehensive structured and unstructured data
capabilities
“…with 100’s of clients and 1000’s of analyses understanding our portfolio used to take 3 months – with Vertica it doesn’t even take an hour. We’ve not only saved millions, but made even more…”
Healthcare
• Re-think health care in its entirety – payer, provider, and PMP
• $300BN annual value creation opportunity – two thirds in the
form of reductions to national health care expenditure
• Emergence of new business models powered by Big Data (e.g.
Blue Health Intelligence)
• Four distinct health care data silos
• Pharmaceutical R&D • Clinical
• Activity (claims) and cost • Patient behavior and sentiment
• Patient safety, protocol effectiveness, fraud detection and cost
reduction all Big Data opportunities
“…we went from waiting days to waiting seconds – the impact on every aspect of our business has been transformational…”
Built from the Ground Up: The Four C’s of Vertica
Achieve best data query performance with unique
Vertica column store
Columnar storage and execution
Linear scaling by adding more resources on the fly
Clustering
Store more data, provide more views, use less
hardware
Capacity Optimization
Query and load 24x7 with zero administration
Continuous performance
Ecosystem Integration – Hadoop / M.R.
+
=
•
Vertica Approach
–Support and leverage the Hadoop ecosystem rather than reinventing the MR wheel
•
Technology
–Hadoop connector
–Squeal optimizing compiler for Pig programs
•
Use cases
–Hadoop for exploratory analysis
• Existing MR, Pig scripts
–Vertica for stylized, interactive analysis
• With shared features, often faster
than Hadoop with a fraction of HW
Automated / Unified Platform Management
Cloud On Premise Virtualized Cloud HADOOP •Visualize
–Analytic resources –Health / Status •Provision
–Dynamically deploy –Distribute resources •Manage
–Unlimited cluster sizes
–Geographically distributed
SQL Analytics
+
- Built for Big Data
Features
– Time series gap filing and interpolation
– Event window functions and sessionization
– Social Graphing
– Pattern matching
– Event series join
– Statistical functions
– Geospatial functions
Benefits
– High performance (Keep Data close to CPU)
– Low cost (Industry Standard building blocks)
– Ease of use (Automated + Available)
Use Cases
Tickstore data cleanups CDR/VOD data analysis Clickstream sessionization
Data aggregation and compression Monte Carlo simulation
Graph algorithms Sensor Data
Process Control Time Series SmartGrid
Geospatial Analytics
•
Store and query using SQL:
–
Locations
as Points of Interest
–
Networks
, e.g. roads, utilities, etc. as Line Segments
–Regions
, e.g. sales territories, high risk zones, etc.
•
Use cases
–Mobile check-in and gaming services (e.g. Foursquare, SCVNGR) –Asset management, insurance
Statistical Modeling Extensions
•
Use Cases
–Loan default prediction
–Customer labeling on purchasing behavior
•
Technology
–Classification – logistic regression and decision trees
•
Native Vertica implementation is
MPP and
Vertica Analytics Platform SDK
•
A framework for Open Source
and 3
rdParty plug-in Analytics
–Simple: concise APIs and examples accelerate deployment
–Flexible: operate on Structured and Unstructured data sets
–Efficient: In-process, fully parallel
•
Fully leverage CPUs, Disks,
Memory investments
•
Present Data in Business-Friendly OLAP Form
–Transform data in-database for maximum efficiency and scale –Present it in the form readily consumable by Business users
and their favorite Business Intelligence tools
•
Fast and Efficient
– Eliminate latency and storage of multiple copies
– MPP: tackles data sets at scale – impractical or
impossible on a workstation
– Visualize insights within a timeframe that empowers decisions
– Servers belong in a data room – use a mobile device and
retire those noisy workstations
AES Encryption
•
Secure sensitive data, even from DBAs
–
Secure – Applies standard AES libraries – Protect without impacting manageability – Encrypt entire columns or individual cells•
Fast and Efficient
– Executes in parallel, in process, on multiple nodes – Little to no net increase in storage requirements
In-Database Location GeoCoding
•
Understand the position of any
Address or Place Name
– Flatten arbitrary address formats to simple Latitude and Longitude
– Segment by boundary or proximity in Vertica’s built-in Geospatial library
•
Simple Lookups, or Complex
Analytics In-Database
– Identify valuable regional or social activity trends
Web Server Log & Click Stream Analysis
•
Scalable library functions for
IIS and WC3 log formats
– Extracts all fields from each web server log format – Executes in parallel on multiple nodes, cores
•
Bolsters Vertica’s optimized in-database
sessionization, pattern matching, and
event series join capabilities
– Implemented as extensions to familiar SQL analytic syntax
Sentiment Analysis Package
•
Mine customer interactions
and online comments
– Scoring (negative/neutral/positive) on any text string – Score customer service case notes and transcripts – Score tweets and blogs mentioning your brand or
products (or your competitor’s)
•
Manage a complete Business
Communication Strategy
– Stay informed of customer sentiment from all internal and external sources
XML Parsing & Transformation
•
XML within Vertica
– Store and Transform XML documents in-database – Generate XML documents from queries
– Query external Web Services directly from Vertica –MPP scale : parse more documents at lower latency
•
Avoid complexity
– In-Database processes are more maintainable
– Inherently High Availability: no investment in redundant external transformation software or gateway servers
•
Acquire data on demand from within Vertica
– No external infrastructure to maintain – Low latency access to critical information
•
Twitter Access API
– Highly maintainable: store keywords in the database for visibility and easy maintenance
•
Google Analytics connection,
query, and record extraction
– Detailed data on demand for real-time analysis and value
Document Relevance Comparison
•
Cluster and Tag documents for search and comparison
– Quickly isolate the collection of documents surrounding a topic of interest
•
Compute relevance vectors with
scalable performance
– Scores the relevance of a word or sentence vs. another – Runs in parallel on multiple nodes, multiple cores
•
Includes “Tag Cloud” Example
Natural Language Processing Functions
•
Common “Generalized” Functions for
Machine Processing of Natural Language
– Optimized for performance and scale – Used in many common search algorithms – Suitable for low latency, high volume text
streams in a variety of languages
– Used across multiple industries: Online Gaming, Telco, Security, Insurance (to name a few)
Send SMS Messages from Vertica
•
Invoke SMS Messages from ordinary SQL
– Run direct marketing as the result of a SQL query – Notify end users of important information in real time
•
Automate administrative alerts
– Notify users of batch completion
Shell Command Framework
•
Secure – Accessible only where
privileges are specifically granted
–Leverages Vertica’s Role Based security model
•
Powerful and Flexible
–Invoke shell commands as SQL functions
• Results captured and transformed for use in query
• Easily automate administrative tasks