• No results found

The Vertica Database simply fast!

N/A
N/A
Protected

Academic year: 2021

Share "The Vertica Database simply fast!"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

The Vertica Database

simply fast !

Mastering Big Data with HP Software

Lior Tzabari - Regional Sales Manager

(2)

There is strategic value in

big data; with real-time

analytics, organizations are

able to maximize business

value and efficiencies

Welcome to the World of Big Data

Compliance Sarbanes-Oxley, HIPAA, Basel II Enterprise ERP CRM Products, Customers, Suppliers, Partners Financial Services High-frequency Trading Algorithmic Trading Healthcare

Electronic Patient Record Gene Sequencing Medical Imaging Mobility Technology Sensors, LOBs, XML Social Media Communications

Call Detail Records

Geophysical Exploration

(3)

50

%

98

%

34

%

35

%

HP survey responses - senior business and technology executives

3

HP Customers Big Data Concerns

Do not have an effective information strategy in place

Can not deliver the right information, at right time to support enterprise outcomes all of the time

Say that half of their information

is unconnected, undiscovered and unused

Are not effective at accessing enterprise information as and when needed for compliance or operational needs

(4)

How Big is Big Data?

• Storage capacity growing 23% per annum

• Computing capacity growing 54% per

annum

• 60% of the world’s population used mobile

phones in 2010

• 30 billion pieces of content shared every

month on Facebook

• 30 million network sensor nodes in 2010 –

annual growth rate > 30% a year

• 40% projected growth in global data

generated per year vs. 5% growth in global IT spending

Source: Big Data – The Next Frontier for

Innovation, Competition and Productivity McKinsey Global Institute

Figure 1: The Digital Universe 2009 - 2020

*Zettabyte = 1 trillion gigabytes

2020

35 ZB

*

2009 0.8 ZB* Growing by a Factor of 44

(5)

Extreme Information: volume, velocity, variety and complexity

What is Big Data?

BIG DATA:

datasets whose

velocity and/or volume is

beyond the capability of

typical database tools to

collect, store, manage and

analyze

Pattern- Based Analytics Targeted Engagement Contextual Relevance BIG DATA Social Media Video Audio Email Texts Mobile Transactional Data IT/OT Docs Search Engine Images

(6)

• IT Value:

• Big Data and analytics projects offer

higher ROI than any other IT projects

• Opportunity for IT, analysts, and business

users to come together (Moneyball!)

• Leverage previous skills and investments

in IT projects that collect and store information

It Can Be Monetized!

Why Should You Care About Big Data?

• Business Value Examples:

• $300B in annual U.S. Healthcare value

• Retailers can increase operating margin by

60% using Big Data

• Governments could save more than $149B

(Europe alone) annual through improved operational efficiency

(7)

Data volumes growing faster than

people, skills, disk, plant and power

The Big Data Paradox:

Outdated Technology:

– Traditional DBMS were never designed for today’s volume, velocity, complexity

– Ad hoc questions come from all users, even customers directly – Detailed data is where the interesting things happen

Shortage of People:

– U.S. alone faces shortage of 150,000+ people with deep analytic skills

– U.S. missing 1.5M managers and analysts to analyze data and make decisions

(8)

=

Mobile Sensor Individual Services Cloud Better Decisions Analysis Statistics Real Time Monetize

Vertica Analytics Platform – Real Time Big Data

SOFTWARE based

Real-Time Analytics Platform

SQL & NoSQL

analytics capabilities • Industry Leading LOAD

& QUERY Performance • SIMPLE installation &

use with AUTOMATIC

setup and tuning

• Highly SCALABLE,

ELASTIC and full parallelism – MPP

MONETIZE 100% of your data

(9)

Healthcare Financial Services Communications Consumer Marketing OnlineWeb & Gaming Retail

750

customers

+

(10)

Next Generation Administration and Design Tools

A Platform Designed for Big Data

True Column Store - RDBMS

Native and Performance Optimized High Availability

Real Time Massively Parallel Processing

Columnar Compression Concurrent Load & Query Elastic Cluster SQL

Analytics User- Defined Analytics

Optimized

(11)

Graphing with Vertica – It’s not just Social!

Visualize the Power of relationships

Relationships can be people, products, markets, compounds, etc.

Scale, performance, and elasticity are

(12)

Structured Unstructured Semi-structured

Big Data Analytics – Not Only SQL & Structured

Monetize 100% of your data

–All data sources

–Internal / External

–More data points = greater insight

Common Platform – Uncommon Results

–Real-time analytics with both SQL & NoSQL –Dynamically add / change sources

– Scale, elasticity, and simplicity – all with predictable performance

(13)

How HP/Vertica Predicted the

Oscars from Twitter Sentiment

Understand the Past,

Predict the Future…

• Loaded raw tweets from Twitter into

Vertica prior to Oscars

• Performed text parsing and

sentiment analysis in Vertica

• Scored each film category based

on positive/negative mentions

• Accurately predicted winners in

nearly every category!

(14)

Real Time Monetize Better Decisions Analysis Statistics

Vertica Analytics Platform - Monetizing Big Data

(15)

Telecommunications

• 7 of the top 10 global telecommunications firms run

their business on Vertica

• Revenue & Service Assurance and Fraud Detection • Sensor & Device management and performance

monitoring

• Subscriber insights and targeted marketing and

advertising

“Vertica opened doors to analyses that otherwise were too time-intensive or impossible. A larger team of business managers now have faster, easier access to more information. That knowledge is invaluable in an aggressively competitive market like ours.”

(16)

Internet Gaming/Web 2.0

• Predictive & targeted engagement for every individual • Pattern recognition, sentiment, and social media • Capture, analyze, and store PB’s of data – no pruning • Real-time analysis for actionable insights – NOW!

“…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing”

(17)

Financial Services

• Revolutionize catastrophe and risk management

• Real-time measurement and management to maximize

asset performance

• Integrated offerings for financial services – Institutional,

Retail, Liquidity, Risk, etc.

• Comprehensive structured and unstructured data

capabilities

“…with 100’s of clients and 1000’s of analyses understanding our portfolio used to take 3 months – with Vertica it doesn’t even take an hour. We’ve not only saved millions, but made even more…”

(18)

Healthcare

• Re-think health care in its entirety – payer, provider, and PMP

• $300BN annual value creation opportunity – two thirds in the

form of reductions to national health care expenditure

• Emergence of new business models powered by Big Data (e.g.

Blue Health Intelligence)

• Four distinct health care data silos

• Pharmaceutical R&D • Clinical

• Activity (claims) and cost • Patient behavior and sentiment

• Patient safety, protocol effectiveness, fraud detection and cost

reduction all Big Data opportunities

“…we went from waiting days to waiting seconds – the impact on every aspect of our business has been transformational…”

(19)

Built from the Ground Up: The Four C’s of Vertica

Achieve best data query performance with unique

Vertica column store

Columnar storage and execution

Linear scaling by adding more resources on the fly

Clustering

Store more data, provide more views, use less

hardware

Capacity Optimization

Query and load 24x7 with zero administration

Continuous performance

(20)

Ecosystem Integration – Hadoop / M.R.

+

=

Vertica Approach

–Support and leverage the Hadoop ecosystem rather than reinventing the MR wheel

Technology

–Hadoop connector

–Squeal optimizing compiler for Pig programs

Use cases

–Hadoop for exploratory analysis

• Existing MR, Pig scripts

–Vertica for stylized, interactive analysis

• With shared features, often faster

than Hadoop with a fraction of HW

(21)

Automated / Unified Platform Management

Cloud On Premise Virtualized Cloud HADOOP •

Visualize

–Analytic resources –Health / Status •

Provision

–Dynamically deploy –Distribute resources •

Manage

–Unlimited cluster sizes

–Geographically distributed

(22)

SQL Analytics

+

- Built for Big Data

Features

– Time series gap filing and interpolation

– Event window functions and sessionization

– Social Graphing

– Pattern matching

– Event series join

– Statistical functions

– Geospatial functions

Benefits

– High performance (Keep Data close to CPU)

– Low cost (Industry Standard building blocks)

– Ease of use (Automated + Available)

Use Cases

Tickstore data cleanups CDR/VOD data analysis Clickstream sessionization

Data aggregation and compression Monte Carlo simulation

Graph algorithms Sensor Data

Process Control Time Series SmartGrid

(23)

Geospatial Analytics

Store and query using SQL:

Locations

as Points of Interest

Networks

, e.g. roads, utilities, etc. as Line Segments

Regions

, e.g. sales territories, high risk zones, etc.

Use cases

–Mobile check-in and gaming services (e.g. Foursquare, SCVNGR) –Asset management, insurance

(24)

Statistical Modeling Extensions

Use Cases

–Loan default prediction

–Customer labeling on purchasing behavior

Technology

–Classification – logistic regression and decision trees

Native Vertica implementation is

MPP and

(25)

Vertica Analytics Platform SDK

A framework for Open Source

and 3

rd

Party plug-in Analytics

–Simple: concise APIs and examples accelerate deployment

–Flexible: operate on Structured and Unstructured data sets

–Efficient: In-process, fully parallel

Fully leverage CPUs, Disks,

Memory investments

(26)
(27)

Present Data in Business-Friendly OLAP Form

–Transform data in-database for maximum efficiency and scale –Present it in the form readily consumable by Business users

and their favorite Business Intelligence tools

Fast and Efficient

– Eliminate latency and storage of multiple copies

– MPP: tackles data sets at scale – impractical or

impossible on a workstation

– Visualize insights within a timeframe that empowers decisions

– Servers belong in a data room – use a mobile device and

retire those noisy workstations

(28)

AES Encryption

Secure sensitive data, even from DBAs

Secure – Applies standard AES libraries – Protect without impacting manageability – Encrypt entire columns or individual cells

Fast and Efficient

– Executes in parallel, in process, on multiple nodes – Little to no net increase in storage requirements

(29)

In-Database Location GeoCoding

Understand the position of any

Address or Place Name

– Flatten arbitrary address formats to simple Latitude and Longitude

– Segment by boundary or proximity in Vertica’s built-in Geospatial library

Simple Lookups, or Complex

Analytics In-Database

– Identify valuable regional or social activity trends

(30)

Web Server Log & Click Stream Analysis

Scalable library functions for

IIS and WC3 log formats

– Extracts all fields from each web server log format – Executes in parallel on multiple nodes, cores

Bolsters Vertica’s optimized in-database

sessionization, pattern matching, and

event series join capabilities

– Implemented as extensions to familiar SQL analytic syntax

(31)

Sentiment Analysis Package

Mine customer interactions

and online comments

– Scoring (negative/neutral/positive) on any text string – Score customer service case notes and transcripts – Score tweets and blogs mentioning your brand or

products (or your competitor’s)

Manage a complete Business

Communication Strategy

– Stay informed of customer sentiment from all internal and external sources

(32)

XML Parsing & Transformation

XML within Vertica

– Store and Transform XML documents in-database – Generate XML documents from queries

– Query external Web Services directly from Vertica –MPP scale : parse more documents at lower latency

Avoid complexity

– In-Database processes are more maintainable

– Inherently High Availability: no investment in redundant external transformation software or gateway servers

(33)

Acquire data on demand from within Vertica

– No external infrastructure to maintain – Low latency access to critical information

Twitter Access API

– Highly maintainable: store keywords in the database for visibility and easy maintenance

Google Analytics connection,

query, and record extraction

– Detailed data on demand for real-time analysis and value

(34)

Document Relevance Comparison

Cluster and Tag documents for search and comparison

– Quickly isolate the collection of documents surrounding a topic of interest

Compute relevance vectors with

scalable performance

– Scores the relevance of a word or sentence vs. another – Runs in parallel on multiple nodes, multiple cores

Includes “Tag Cloud” Example

(35)

Natural Language Processing Functions

Common “Generalized” Functions for

Machine Processing of Natural Language

– Optimized for performance and scale – Used in many common search algorithms – Suitable for low latency, high volume text

streams in a variety of languages

– Used across multiple industries: Online Gaming, Telco, Security, Insurance (to name a few)

(36)

Send SMS Messages from Vertica

Invoke SMS Messages from ordinary SQL

– Run direct marketing as the result of a SQL query – Notify end users of important information in real time

Automate administrative alerts

– Notify users of batch completion

(37)

Shell Command Framework

Secure – Accessible only where

privileges are specifically granted

–Leverages Vertica’s Role Based security model

Powerful and Flexible

–Invoke shell commands as SQL functions

• Results captured and transformed for use in query

• Easily automate administrative tasks

(38)

References

Related documents

1912), wherein the issues discussed move “from the authorial concerns of its preface to business and political matters” (LYNCH, 1984, p. This is precisely the tale that is

any legal representative of the whistleblower in the Commission action or related action; (c) the programmatic interest of the Commission in deterring violations of the

and Hunter, Rebecca and Paul, Lorna and Rafferty, Danny and Bowers, Roy and Mattison, Paul (2017) Functional electrical stimulation for foot drop in Multiple Sclerosis : a

This study was designed to prospectively determine the impact of a multimodality interventional bronchoscopy approach on an objective measurement of functional sta- tus, quality

As discussed earlier, the RLS, TCGA, and RCGA algorithms were used to estimate the parameters of the AVC system (4) based on the input and corresponding output of the plant model,

Theoretically, the research aim was to construct a conceptual framework that seeks to better explain the link between ICT (inclusive frugal ICT innovation) and

The palladium catalysts; Pd/C- CeO 2 (1:1) and Pd/C-CeO 2 (1:0.5) showed enhanced oxidation kinetics towards ethylene glycol and better and enhanced stability in alkaline

Power and Glory and Thanksgiving be to my Lord Jesus Christ forever and ever... [3] Then Judas, which had betrayed him, when he saw that