How to make BIG DATA work for
you. Faster results with
Microsoft SQL Server PDW
Roger Breu – PDW Solution Specialist Microsoft Western Europe
Marcus Gullberg – PDW Partner Account Manager Microsoft Sweden
SQL Server PDW - Your socket to Big Data
HP AppSystem for SQL 2012 Parallel Data Warehouse (PDW)
Agenda
• The Modern Data Warehouse and socket
to Big Data
• SQL Server Parallel Data Warehouse
(PDW) Overview
• fast, scalable, lowest TCO
• Interesting customer cases
• Q&A
The world of data has changed
Consumerization
of IT
10x
increase every five years
85%
from new data types
Data
explosion
4.3
connected devices per adult
27%
using social media input
The Large Hadron Collider
produces 1 PB/sec
But hold on, I’m not CERN and I don’t
have a Large Hadron Collider…
But you do have…
Sensors Clicks Logs
Transactional
records Call centers Images
Documents Signals from
social media Simulations
Big Data is All This and More:
volume, variety, velocity
(and volatility & variability)
Traditional Data is
Highly Structured
traditional databases are organized around planned queries
A Definition
Web app optimization
Smart meter monitoring Equipment
monitoring Advertising
analysis Life sciences
research Fraud
detection
Healthcare outcomes Weather
forecasting Social network analysis
Churn analysis
Traffic flow optimization IT infrastructure
optimization
Legal discovery
Natural resource exploration
GAIN COMPETITIVE ADVANTAGE BY
MOVING FIRST AND FAST IN THEIR
INDUSTRY
Common Big Data customer scenarios
Twitter Analytics with Microsoft Excel
Demo
Is this Big Data?
Is this Enterprise Ready?
Big Data is more than just new BI!
Big Data is more than just Hadoop!
The Big (Data) picture
Data Insights Value
Traditional data sources
“New” data sources
$$$
€€€
The Traditional Data Warehouse
Data sources
Increasing
data volumes
1
Real-time
data
2
Non-Relational Data
New data
sources & types
3
Cloud-born
data
4
Data sources Non-Relational Data
The Modern Data Warehouse
Combining structured and semistructured Data
with SQL Server PDW and Polybase
Demo
And remember, it’s not just
working with Twitter data
SQL Server PDW
APPLICATIONS DATA SYSTEMS
Microsoft Applications
DATA SOURCES
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data,
social media) HDInsight
Polybase
Feature in SQL PDW
CREATE TABLE Customers ([user_id] INT, name NVARCHAR(50)…
CREATE EXTERNAL TABLE ClickEvent
(url varchar(50), event_date date, user_ID varchar(50) ) WITH LOCATION
=‘hdfs://MyHadoop:5000/clickstream/click.txt’;
SELECT COUNT(*) FROM Customers c JOIN ClickEvent e ON
c.[user_ID] = e.[user_ID]
WHERE c.Name = 'Jones';
Socket for BigData: Hadoop access made simple
Parallel or not parallel?
SQL Instance
Storage
Scale up (SMP)
» Traditional approach
» Build for specific requirement
» Build HA etc. additionally
» Maintain and Tune (Load/File
Distribution)
» Unknown Future workloads
» Still a very good data mart solution in
a Hub and Spoke architecture with
SQL Server PDW
Scale out (MPP)
» Modern way of data warehousing
» Resilient & Predictable
» Big data / DW Best Practices in a box
» Deploy Fast and Drive Value
» Built-in HA
» Scalable (start small/grow when needed)
SQL Instance #2 SQL Instance #1 SQL Instance #4 SQL Instance #3 SQL Instance #8 SQL Instance #....
Storage Storage
Storage Storage Storage
Storage
Performance comparison Traditional Data
Warehouse vs. Modern Data Warehouse
(SQL Server PDW)
Demo
Interesting customer cases
BENEFITS
Further scale existing SQL Server based solution and get ready for Big Data
Build on existing SQL Server knowledge
SQL Server PDW is a hosted solution from their infrastructure outsourcer
One of the largest fresh-food retailers in Scandinavia uses SQL Server PDW as backend supporting a Microsoft BI application handling billions of rows of POS data (Point of Sales). Customer will now have the ability to make lightning fast customer basket analysis, campaign management and receipt line analysis.
Large fresh-food retailer in Scandinavia uses SQL Server PDW
to improve customer basket analysis and optimize marketing
campaigns
BENEFITS
Lower TCO due to standardization (previously had DWH on Oracle and SQL Server)
Lower price/TB compared to existing SAN based solutions Expand/meet future capacity requirements
Customer operates in Denmark and Sweden, is large in the advertising market in the Nordic region and a leading operator in logistics services to, from and within the Nordic region. PDW is going to be used as the primary data warehouse platform. Previously customer has been operating multiple data warehouse environments on even different platforms. SQL Server PDW will be used as the new standard while also embracing big data in the future.
Leading Scandinavian logistic company decides for
SQL Server PDW to lower their TCO
BENEFITS
Cuts Storage Costs Through Data Compression
Reduces Support Work Needed by an Estimated 90 Percent
Improves Data-Loading Performance
In testing semiconductor wafers, AMD uses a data warehouse to process and analyze one terabyte of data each week. When its data warehouse began foundering under the load, AMD switched to Microsoft SQL Server 2008 R2 Parallel Data Warehouse. With the new system, AMD has increased performance and enabled a sustainable and scalable solution.
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001887
AMD Boosts Data Warehouse performance with PDW
BENEFITS
Large US Stock Exchange needed an MPP Appliance to improve performance, scale and a complete BI solution
PDW Appliance is the BI backbone for DirectEdge
PDW delivered 142X Query Performance gain, linear scale and complete BI Solution
Direct Edge, one of the largest equities exchanges in the world, wanted a better, faster business
intelligence (BI) solution for its financial analysts to use to create reports. Direct Edge implemented a data warehouse and BI solution based on Microsoft SQL Server 2008 R2 Parallel Data Warehouse, which has given the company more visibility into its data. The firm can also provide analytical reports in seconds rather than hours and can better drive business growth.
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000002540
Direct Edge gets 142X performance gain with Parallel Data
Warehouse (PDW) Appliance
BENEFITS
Boosts Query Performance by 100 Times
Scales to Meet Future Data Growth
Gets Critical Business Data to Analysts Faster
SQL Server Parallel Data Warehouse solution helps Hy-Vee recognize changes in their customers’ buying habits. They can then respond to those changes before competitors do, and that gives them a huge business advantage.
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2008-R2-Enterprise/Hy-Vee/Hy-Vee-Boosts-Performance-Speeds-Data- Delivery-and-Increases-Competitiveness/710000000776
Upgrading SQL Server to PDW gains 100x Improvement
With PDW: From batch to real time insights
A Classic Dark Data Example: Audit & Fraud Detection at a large Oil & Gas Company
Reporting Scalability processing time only increased by 47%
Collect data from all audited source
systems ETL Performance
Expand data window to address regulatory requirements
Create Auditing Reports and Datasets faster
Reporting Performance
Run multiple Auditing Reports in parallel
+ multiple types + concurrency
Query / Reporting Scalability
Run multiple Auditing Reports in parallel with expanded dataset
Validate future growth/
Reporting Scalability
Drill down into SSAS OLAP Cubes faster
Processing
Performance/Valida te future growth 6h 12min improvement
5x-192x faster
Concurrency handled without impact
100x more data, min.
increase in runtimes
20x faster
Summary
PDW is the SQL Server Scale Out solution
Massively Parallel Processing (MPP) engine
MPP Provides Near Linear Scale Out
• Massively Parallel Processing (MPP) Architecture
• Scale Out: Incrementally add HW for Near Linear Scale
• Shared Nothing
Scale Out
10X Faster Than
SMP DW Compute
Heavy Tasks Near Linear
Scale Easy to Scale
(No forklift)
Start small with a few TB and linearly Scale Out
Seamlessly add capacity
Smallest (0TB) To Largest (6PB)
• Start small with a few Terabyte warehouse
• Add capacity up to 6 Petabytes
0TB 6 PB
Add Capacity
Add Capacity
Largest
Warehouse
PB
Start Small
And Grow Minimal
Downtime
Lightning Fast Data Query Processing
Columnstore gives next-gen performance
Customer Sales CountrySupplier
Products
Columnstore Provides Dramatic Performance
• Updateable and clustered columnstore
• Stores data in columnar format
• Memory-optimized for next-generation performance
• Updateable to support bulk and/or trickle loading
Save Time
and Costs Real-Time
Up to DW
50X Faster Up to 15x
compression
PolyBase for transparent access to Hadoop
Fundamental breakthrough in data processing
Single Query; Structured and Unstructured
• Query and join Hadoop tables with Relational Tables
• Use Standard SQL language
• Select, From Where
Existing SQL
Skillset No IT
Intervention Save Time
and Costs
Databas
e HDFS
(Hadoop)
SQL Server 2012 PDW Powered by PolyBase
SQL
Analyze All
Data Types
Additional resources
» SQL Server Parallel Data Warehouse (PDW) Landing Page:
»
www.microsoft.com/PDW
» Introduction to Polybase:
»
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
» Price/TB comparison:
»
http://www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf
» HP QuickSpecs
»
http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html
»
http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.pdf
» Brand New SQL Server PDW Overview Whitepaper:
»
http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26-FEF9550EFD44/SQL Server 2012
Parallel Data Warehouse - A Breakthrough Platform.docx
» Modernize your Data Warehouse:
»
www.upgradetoPDW.com
SQL Server PDW - Your socket to Big Data
HP AppSystem for SQL 2012 Parallel Data Warehouse (PDW)
Thank you very much for
your attention!
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
37
Learn more about this topic
Use HP’s Augmented Reality (AR)
to access more content
1. Launch the HP AR app*
2. View this slide through the app
3. Unlock additional information!
*Available on the App Store and Google Play
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.