2
Use the mobile app to complete a session survey
1. Access “My schedule”
2. Click on this session 3. Go to “Rate & review”
If the session is not on your schedule, just find it
via the session scheduler, click on this session and then go to “Rate & review”.
Thank you for providing your feedback,
which helps us enhance content for future events.
Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms
Please give me your feedback
Unlock your insights with Big Data in a box
Claude Lorenson, Ph. D
Cloud & Enterprise Marketing Group; Microsoft
Wendy Harms
HP Converged Systems
Agenda
• The modern Data Warehouse
• Insights from your data
• Microsoft Analytics Platform System
• Big Data
• Performance
• Value
Data sources
The traditional data warehouse
6
Data sources Non-relational data
The traditional data warehouse
Non-relational data
Data sources
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
8
Insights from all your data
Enrich and optimize your data from non-traditional sources
8
Roadblocks to evolving to a modern data warehouse
Limited
scalability and ability to handle new data types
Keep legacy investment
Buy new tier-one hardware appliance Acquire Big Data
solution
Acquire business intelligence
Significant training and data silos
High acquisition and migration
costs
Complex with low adoption
Introducing the Microsoft Analytics Platform System
The turnkey modern data warehouse appliance
• Relational and non-relational data in a single appliance
• Enterprise-ready Hadoop
• Integrated querying across Hadoop and PDW using T-SQL
• Direct integration with Microsoft BI tools such as Microsoft Excel
• Near real-time performance with In-Memory Columnstore
• Ability to scale out to
accommodate growing data
• Removal of data warehouse bottlenecks with MPP SQL Server
• Concurrency that fuels rapid adoption
• Industry’s lowest data
warehouse appliance price per terabyte
• Value through a single appliance solution
• Value with flexible hardware options using commodity hardware
Microsoft Analytics Platform System
The turnkey modern data warehouse appliance
What is Big Data and why is it valuable to the business
?
Evolution in the nature and use of data in the enterprise
Data complexity:
variety and velocity
Petabytes
Historical analysis
Insight analysis
Predictive analytics
Predictive forecasting
Value to the business
What is Hadoop?
1
Core Services 3
OPERATIONAL SERVICES
DATA SERVICES
HDFS SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS OOZIE
AMBARI
YARN MAP REDUCE
HIVE &
HCATALOG
PIG HBASE FALCON
Hadoop Cluster
compute
&
storage . . .
. . .
. .
compute
&
storage
. .
Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
Move HDFS into the warehouse before analysis
ETL
Learn new skills
T-SQL
Build Integrate Manage Maintain Support
Hadoop alone is not the answer to all Big Data challenges
Steep learning curve, slow and inefficient
Hadoop ecosystem
New data sources
“New” data sources New data sources
High performance and tuned within the appliance
End-user
authentication with Active Directory
Accessible insights for everyone with Managed and
monitored using System Center 100-percent Apache
Hadoop SQL Server
Parallel Data Warehouse
Microsoft HDInsight
PolyBase
APS delivers enterprise-ready Hadoop with HDInsight
Manageable, secured, and highly available Hadoop integrated into the appliance
Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL
Uses the power of MPP to enhance query execution performance
Supports Windows Azure HDInsight to enable new hybrid cloud scenarios
Provides the ability to query non-Microsoft Hadoop distributions, such as Hortonworks and Cloudera
SQL Server Parallel Data
Warehouse Microsoft Azure
HDInsight
PolyBase
Microsoft HDInsight Hortonworks for
Windows and Linux Cloudera
Connecting islands of data with PolyBase
Bringing Hadoop point solutions and the data warehouse together for users and IT
Result set Select…
Use cases where PolyBase simplifies using Hadoop data
Bringing islands of Hadoop data together
Running high performance queries against Hadoop data
Archiving data warehouse data to Hadoop (move)
Exporting relational data to Hadoop (copy)
Importing Hadoop data into a data warehouse (copy)
Big Data insights for anyone
New insights with familiar tools through native Microsoft BI integration Minimizes IT
intervention for discovering data with tools such as Microsoft Excel
Enables DBA and power users to join relational and Hadoop data with T-SQL
Offers Hadoop tools like
MapReduce, Hive, and Pig for data scientists
Takes advantage of high adoption
of Excel, Power View, PowerPivot, and SQL Server Analysis Services
Power users
Data scientist
Everyone else using Microsoft BI tools
Shinsegae Corporation, a major department store chain in Korea, needed better performance for customer data mining and basket purchase analysis. Shinsegae took advantage of the integration of PDW and Hadoop to combine 40 terabytes of data, and was pleased to see PolyBase performing nearly twice as fast as their best Hive/Hadoop environment.
#1 Retail company in Korea
We are really satisfied with the performance of PolyBase to allow us to join relational and Hadoop data (weather data, board data, text data) faster and easier. PolyBase is a really powerful feature of PDW to deploy a Big Data system. PolyBase is one of the reasons we selected PDW as our Big Data platform.
Microsoft Analytics Platform System
The turnkey modern data warehouse appliance
Performance limitations and scale with a traditional data warehouse
Diminishing scale as requirements grow
Scale up Rowstore
Sub-optimal performance for many data warehouse queries
Data
Page 1 Page 2 Page 3
Querying data by row
C1 C2 C3 C4 R1 R1 R1 R1 R2 R2 R2 R2 R3 R3 R3 R3 R4 R4 R4 R4 R5 R5 R5 R5 R6 R6 R6 R6
Forklift Forklift
Scale out Multiple nodes with dedicated CPU, memory, and storage
Ability to incrementally add hardware for near-linear scale to multiple
petabytes
Ability to handle query complexity and concurrency at scale
No “forklift” of prior warehouse to increase capacity
Ability to scale out HDInsight and PDW
Scaling out your data to petabytes
Scale-out technologies in the Analytics Platform System
PDW
0 terabytes 6 petabytes
PDW / HDInsight
PDW / HDInsight
PDW / HDInsight
PDW /
HDInsight PDW /
HDInsight PDW / HDInsight
Blazing-fast performance
MPP and In-Memory Columnstore for next-generation performance
• Store data in columnar format for massive compression
• Load data into or out of memory for next- generation performance with up to 60%
improvement in data loading speed
23
Up to
100x
faster queries
Updateable clustered columnstore vs. table with customary indexing
Up to
15x
more compression
Columnstore index representation
Parallel query execution
Query Results
BI tools
SSRS and SSAS SQL Server SMP
Rapid adoption fueled by concurrency and mixed workloads
Great performance under stress
Analytics Platform System
ETL/ELT with SSIS, DQS, and MDS
ERP CRM LOB APPS
ETL/ELT with DWLoader
Hadoop and Big Data
PDW
HDInsight PolyBase
Ad-hoc queries
MEC, a global media agency, uses SQL Server PDW with in-memory technology to cut query time—helping marketers unlock the value of their data.
SQL Server Parallel Data Warehouse gives us massively parallel advantages. Whereas it would take up to four hours to run
queries scaling across multiple nodes, now it takes just minutes.
Microsoft Analytics Platform System
The turnkey modern data warehouse appliance
Giving you what you need … when you need it ..
HP’s rapid data warehouse appliance evolution
2011
HP Enterprise Data Warehouse Appliance
• 1st HP/Microsoft DW appliance
2013
HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse
• Massive jump in scalability and functionality
• Up to 6PB with Hadoop connector
2014
HP ConvergedSystem 300 for Microsoft Analytics Platform
• New high-performance, high- availability platform
• Fully integrated Hadoop
Scalability
Functionality
Base Parallel Data Warehouse (PDW) components
HP ConvergedSystem 300 for Microsoft Analytics Platform
InfiniBand (data network) and Ethernet (management network) connectivity
Virtualized control and management node;
failover node for high availability (HA)
PDW: Massive parallel scale-out query processing
PDW Region Base Scale Unit Orchestration Server
APS Rack & Network
• 1 x HP 642 Shock Intelligent Series Rack
• 2 x FDR InfiniBand
• 2 x HP 5120 switches (2)
• 2 x power distribution unit
– Choice of single-phase or multi-phase (priced separately)
PDW Region Base Scale Unit (Control)
• 1 x Orchestration Server (PDW)
• 1 x Failover Server (PDW)
• 1 x Optional Failover Server (PDW)
PDW Region Base Scale Unit (Processing)
• 2 x PDW Data Servers
• 1 x DAS Storage Block
– 1 TB, 2 TB, or 3 TB choice
Failover Server Failover Server
PLUS integrated analysis with Hadoop-based HDInsight
Manage data of any size or type
• Relational or non-relational
Perform more complex analysis faster
• SQL Server PDW’s PolyBase
• One appliance for integrated analysis of HDInsight non-relational and SQL Server PDW relational data 100% Apache Hadoop-based data platform Scale quickly and easily
• Add up to 3 x HDI Data Scale Units and 1 x HDI Failover Server per rack
Orchestration Server
PDW Region Base Scale Unit Orchestration Server
HDI Region Base Scale Unit
HDI Region Base Scale Unit (Processing)
• 2 x HDI Data Servers
• 1 x DAS Storage Block
HDI Region Base Scale Unit (Control)
• 1 x Orchestration Server (HDInsight)
• 1 x Failover Server (HDInsight)
Flexible, mix-and-match PDW and HDInsight
From the factory or expandable in the field (examples )
• Modular
• Flexible
• Cost-effective or or
PDW Region Base Scale Unit
Orchestration Server Orchestration Server
HDI Region Base Scale Unit
PDW Data Scale Unit
PDW Data Scale Unit
Orchestration Server
PDW Region Base Scale Unit Orchestration Server
HDI Data Scale Unit
PDW Data Scale Unit HDI Region Base
Scale Unit
Orchestration Server
PDW Region Base Scale Unit Orchestration Server
HDI Data Scale Unit
HDI Data Scale Unit
HDI Region Base Scale Unit
PDW Region Base Scale Unit Orchestration Server
Massive scalability
HP ConvergedSystem 300 for Microsoft Analytics Platform
Base rack Fully populated
base rack (8 nodes) Easily expand by adding racks
Up to 6 PB 5
(PDW Region)
Up to 1.2 PB
(HDInsight Region)
Up to 64 nodes per workload
region
Simplified management for increased ROI
Exclusive to HP: HP Support Pack
Unique HP tools ensure solution runs at optimal performance and delivers expected return on
investment (ROI)
Validation of Microsoft Reference Architecture (MRA) compliance
Reports all serial numbers and rack locations of all devices
Diagnostic tools to validate configuration
Provide Proactive Care reporting for support
ConvergedSystem firmware/driver update package
The Royal Bank of Scotland—the leading UK provider of corporate banking services—needed a powerful
analytics platform to improve performance and
customer services. The bank implemented a Microsoft SQL Server 2012 Parallel Data Warehouse appliance to increase productivity by 40 percent for faster response to business needs.
I knew that it would be easy for my team to transition from managing SQL Server databases to SQL Server 2012 PDW, and the solution cost about 85 percent less than products from other vendors.
Microsoft Analytics Platform System
No-compromise modern data warehouse solution
Meeting today’s Big Data analytics requirements
Enterprise-ready Hadoop with HDInsight and the simplicity of PolyBase
Optimized performance with MPP technology and In-Memory Columnstore
Providing value with a low TCO
For more information
Hewlett Packard
HP ConvergedSystem 300 for Microsoft Analytics Platform
hp.com/go/convergedsystem/cs300aps
HP ActiveAnswers
hp.com/solutions/activeanswers
Microsoft
Microsoft Analytics Platform Server microsoft.com/aps