Big Data Analytics Using SAP HANA Dynamic Tiering
Balaji Krishna – SAP Labs
How Dynamic Tiering reduces the TCO of HANA
solution
Data aging concepts using in-memory and
on-disk storage
Single Install/Admin/Monitoring
IDC predictions for 2014
Data explosion
Data volumes will continue to explode to 6 billion petabytes
Social networking
Social networking will become embedded in cloud platforms and most enterprise apps and processes
Cloud
Cloud spending will surge by 25%, reaching over $100 billion. There will be a doubling of cloud data centers.
Internet of Things
30 billion devices, sensors in 2020 – driving $8.9 Trillion in revenue Mobile CRM Data Planning Opportunities Transactions Customer Sales Order Things Instant Messages Demand Inventory
Big Data
Sales OrderThings
Mobile Demand Big Data CRM Data Customer PlanningTransactionsSAP End to End Data Management for Real Time Business
Business & Consumer Applications
Big Data
SAP DATA MANAGEMENT
STORETRANSACT ANALYZE PREDICT
Custom
Development ISVs & OEMs ERP
Internet of Things Workforce of the Future Cloud Industries
e
SAP HANA platform
Processing Engine
Application Function Lib. & Data Models Integration Services
SAP HANA PLATFORM
Real-time transactions + end-to-end analytics
Operational Analytics
Big Data Warehousing
Predictive, Spatial & Text Analytics
REAL-TIME ANALYTICS
Sense &
Respond OptimizationPlanning & Consumer Engagement
REAL-TIME APPLICATIONS
SAP ESP
SAP ASE
Replication
Server
SAP SQL AnywhereSAP IQ
SAP Data
Services
Extended Application Services
SAP Data Management Portfolio
End-to End Data Management & App Platform for Real-Time Business
Database
Time Value of Data
Time
Value
Last time accessed Value of immediate data access declinesWhen you need it again
Archive Access Event
• Regulatory audit
• Business critical reference data • Source data
• Size and cost constraints may prohibit all in-memory solution
• Not all data has the same value
• Warm data has lower latency requirements than hot data
Why is warm data
management important for
SAP HANA?
• SAP HANA dynamic tiering excels at ad hoc queries on structured
data from terabyte to petabyte scale
• SAP HANA dynamic tiering is a deeply integrated, high performance
solution in a single system
SAP HANA dynamic tiering
utilizes disk backed, smart
column store technology based
on SAP IQ
Why is SAP HANA dynamic
tiering the best solution for
warm data management?
• Hadoop has unlimited capacity for raw data processing
• Hadoop is best suited for batch processing of raw, unstructured
data
• Hadoop is an external data store with technical integration into
HANA – with higher TCO in order to manage the additional system
What about Hadoop for warm
data storage and processing?
Warm/Cold Data Management
Questions about SAP HANA dynamic tiering
•
Manage data cost effectively, yet with desired
performance based on SLAs
•
Handle very large data sets – terabytes to
petabytes
•
Update and query all data seamlessly via
HANA tables
•
Application defines which data is “hot”, and
which data is “warm”
•
Native Big Data solution to handle a
large percentage of enterprise data needs
without Hadoop
Hot Store
Fast data movement and optimized push down query processing
SAP HANA System with dynamic tiering option
Worker host Worker host Worker host ES host Column
Table Row Table Extended Table
Warm Store
HANA application
HANA Database
Introducing SAP HANA dynamic tiering
Hot
Warm
Data for daily reporting,
other high-priority data
Other data required to
operate the application
NLS
Data that is (normally) not updated, infrequently accessed
Traditional Archive
Data that‘s kept for legal reasons or similar
Externalize
Data Qualities and Data Temperatures
How to think about it
SAP HANA Platform
Data in the database
Different data temperatures
Maximum access performance
Hot data - always in memory
Reduced access performance:
Warmdata - not (always) in memory
All part of the database’s data image
Data moved out of the database
Different data qualities
Available for read access
Near-line storage
Not accessible without IT process Traditional archive
Data is stored and managed outside of the application database
SAP HANA Database
Hot data
Warm data
Primary image in memoryDurability
Cache / Processing Primary Image on diskDynamic Tiering
All in one
database
Hot Store
Warm Store
RAM
SAP HANA dynamic tiering
Map data priorities to data management
Hot Store- Classic HANA tables
•
Primary data image in memory
•
DB algorithms optimized for in-memory data
•
Persistence on disk to guarantee durability
Warm Store -Extended Tables
•
Primary data image on disk
•
Data processing using algorithms optimized
for disk-based data
•
Main memory used for caching and
processing.
Implementation choices
SAP HANA dynamic tiering
–
one database / one experience
for HANA
application developers and admins
SAP HANA dynamic tiering
• Reduced TCO
• Optimized for performance
• Single database experience
• Centralized operational control
Centralized monitoring / admin High speed data ingest Common installer and licensing model Unified backup and restore Integrated security Optimized query processing
SAP HANA
dynamic
tiering
SAP HANA dynamic tiering
The overall system layout
SAP HANA with dynamic tiering consists of two types of hosts:
• Regular worker hosts (running the classical HANA processes: indexserver, nameserver, daemon, xsserver,…)
• HANA hosts can be single-node or scale-out; appliance or TDI
• “ES hosts” (running nameserver, daemon, and esserver) • esserver is the database process of the warm store
Hot Store
Fast data movement and optimized push down query processing
SAP HANA System with dynamic tiering service
Worker host(*) Worker host Worker host
Client
Application
Connect ES host (controller) Further ES hosts ColumnTable Row Table
Extended Table
Warm Store
(*) Standby hosts not shown
• One single SAP HANA database: one SID, one instance number
• All client communication happens through index server / XS server
Database Catalog
HANA Extended Tables
HANA Database
Warm
Store
Data
HANA extended table
schema is part of HANA
database catalog
HANA extended table
data resides in warm
store
HANA extended table is
a first class database
object with full ACID
compliance
Hot
Store
Table DefinitionData
Table DefinitionClassical HANA
column/row table
Extended table
(warm table)
High Speed Data Ingest
Warm Extended
Table
IMPORT FROM CSV FILE ‘data.csv’ INTO t_extended CSV DATAHot HANA
column Table
MaterializationData movement between hot and warm store
HANA Database
Import from CSV files:
IMPORT FROM CSV FILE ‘bigfile.csv’ INTO t1
Bulk array insert:
INSERT INTO t1 (col1, col2, col3...) VALUES (val1, val2, val3...)
High-speed data movement between HANA tables and HANA extended tables:
INSERT INTO t_extended select c1 FROM t_hana
Concurrent inserts from multiple connections:
A HANA extended table may be a DELTA enabled table, which allows multiple concurrent writes
Optimized Query Processing
Parallel query processing
• Data is pulled from HANA hot store into HANA warm store query processing engine using multiple streams, and processed in parallel
Push/Pull query optimization and transformation
•
Query operations ship to hot or warm store as appropriate for native
performance
Extended tables may be used in HANA CALC views
•
HANA Calc engine and HANA SQL engine share extended table query
performance optimizations
Joining Grouping Ordering T3 T4 T1 T2Example Query Plan
Customer is a native HANA table in HANA memory
Product is a HANA extended table in the warm store select "account_num", count(*) as account_count from VXM_FOODMART.CUSTOMER C where
"lname" >= 'Ga' and "lname" < 'Gb' and exists ( select * from VXM_IQSTORE.PRODUCT P where "product_id" = "customer_id" ) group by "account_num" order by "account_num";
HANA Monitoring and Administration
HANA Cockpit:
• New, web based monitoring and administration console for HANA Extended Storage
• HANA Studio will be used for design and modeling of HANA extended tables
• HANA Cockpit displays status, CPU/memory/storage resource utilization, table usage statistics
• Provides access to and search of server logs and custom traces
• Shows alerts triggered by extended storage
• Enables administration of extended storage: add and drop storage, or increase size of file
User Tables By top usage Top 14 Total 100 10 ES 100 CL/RW 30 MB 50 MB 200 MB 30 MB 20 Top 100 Totals 100 times / day
• HANA backup manages backup of both hot and warm store
• Point in Time Recovery (PITR) is supported
HANA
Extended
Storage
Data backups (manual or scheduled) Log backups (automatic, or none) Data backupLog backup System crash
Restore
Time
t1 t2 t3
Data backups with log backups allow restore to
Point in Time or most recent state: t1-> t3
Data backups alone allow
restore to specific backup only: t1 or t2
Log area
Backup History
• High availability
• Compute node failure will result in failover to standby node
(manual for warm store nodes)
• Storage failure will depend on inherent storage vendor disk
mirroring and fault tolerance capabilities
• Hot and warm store should use the same storage to facilitate auto-failover in the future
• Disaster recovery
• HANA without dynamic tiering supports continuous
replication to maintain a disaster recovery site
• HANA with dynamic tiering will maintain a disaster recovery
site through backup and restore capabilities only
• Disaster recovery through system replicationis planned for a future release
• Disaster recovery through storage replication may be added independently from software releases
High Availability and Disaster Recovery
Classical HANA services
Compute node
Hot Store
Warm Store Service
Compute node Standby node Manual Failover Standby node Warm Store Auto-Failover mirror mirror
Each extended store is dedicated to exactly one tenant database:
SAP HANA Multitenant Database Containers
HANA Cluster
Compute node Tenant Database Extended Store Tenant Database Extended Store Tenant Database Compute node Compute node Compute node (No ES)ES may be added to certd. HANA storage, or may be using individual storage Certd. HW Box Certd. HW Box Certd. HW Box HANA Scale-Out Certd. HW Box
Node 1 Standby Node
ES DB logs warm data hot data ES DB Node
Hardware Layout View
Recommended Option: Use Homogeneous Hardware for All Hosts
Node 2
HANA Clients (HANA Studio, ...)HANA Clients (HANA Studio, ...)HANA Clients (DB clients, Studio, ...)
2
3
HANA System (One SID)
1
2 1
Intra-node Network
Client Network 3 Storage Network for HANA and ES
Non-certd. Storage for /hana/shared/
redo logs binaries, traces, core
dumps hot
data redo logs
Non-certd. Storage for ES Certd. HW Box Certd. HW Box Certd. HW Box HANA Scale-Out Non-certd. HW Box
Certd. Storage for data and redo logs of HANA
Node 1 Standby Node
ES DB logs warm data hot data ES DB Node
Hardware Layout View
Alternative Option: Use Individual Hardware
Node 2
HANA Clients (HANA Studio, ...)HANA Clients (HANA Studio, ...)HANA Clients (DB clients, Studio, ...)
2
3
HANA System (One SID)
1
2 1
Intra-node Network
Client Network 3 HANA Storage Network
Non-certd. Storage for /hana/shared/
redo logs binaries, traces, core
dumps hot data redo logs 4 4 ES Storage Network
SAP BW and native HANA applications
© 2014 SAP SE or an SAP affiliate company. All rights reserved. Public 25
Frequent reporting and/or HANA-native operations
SAP NetWeaver BW powered by SAP HANA
Data Classification by Object Type
BW – Operational Data
Data Categories in a BW System
Staging Layer
Analytic Mart
Business TransformationEDW Propagation
EDW Transformation
C
o
rp
o
rate
M
emo
ry
A
rch
iv
e/
N
L
S
“Old”, “out-of-use” data – Archive, read-only, different SLAs
Limited reporting, limited HANA-native operations
© 2014 SAP SE or an SAP affiliate company. All rights reserved. Public 26
SAP HANA database
Database Catalog
Extended Tables in HANA BW
Use Case: Staging and Corporate Memory
Object Classification in BW
Data Sources and write-optimized
DSOs can have the property
“Extended Table”
Generated Tables are of type “Extended”
All BW standard operations supported – no changes
Only minor temporary RAM required in HANA
InfoCubes and Regular or Advanced
DSOs
Generate standard column table
Hot Store
Warm store
BW System
Write-optimized DSO Corporate Memory Data Source Staging Area Table SchemaData
PSA Table Table SchemaData
Active Table InfoCube Data Mart Table SchemaData
Fact TableSAP HANA dynamic tiering for Big Data
SAP HANA with Dynamic Tiering provides native Big Data solution
•
Cutting edge, in-memory platform
•Transact/analyze in real-time
•
Native predictive, text, and spatial
algorithms
Petascale,
HANA extended
tables
• Petascale extension to HANA with disk backed,
columnar database technology
• Expand HANA capacity with warm/cool structured
data in HANA warm store
• Tight integration between HANA hot store and
HANA warm store for optimal performance
Hot data
SAP HANA
Petascale, warm structured data
HANA extended
tables
HANA with Dynamic Tiering
Native Big Data solution for a multitude of use cases
P
SAP HANA Dynamic Tiering for Big Data Use Cases across Industries
Telecommunications: Network service data in HANA extended tables analyzed and
correlated with customer loyalty data in SAP HANA, to anticipate customer churn and initiate customer retention response activities.
Financial services: Stock tick data streamed into SAP HANA for immediate price
fluctuation analysis and trading actions, with historical stock price data stored in HANA
extended tables for trend analysis and portfolio management.
Public utilities: enterprise data stored in SAP HANA and large amounts of smart
meter data stored in HANA extended tables, to identify operational problems, and establish incentive pricing for more efficient energy use.
Airline route profitability analysis: SAP HANA analyzes revenue, variable operating
costs (fuel, landing fees...), and fixed operating costs in real time to make decisions on network, pricing, and marketing to determine where to fly, when, and how often. All data must be analyzed in real time.
Future DirectionDirection
SAP HANA dynamic tiering roadmap
FUTURE
• HANA ES host auto-failover (HA)
• SAP HANA system replication for disaster
recovery
• Enhanced backup and restore (BACKINT
and storage snapshots)
• Hybrid extended tables with rule based
automatic data movement / aging
• Further performance optimizations for
HANA Calculation Engine
• Series data support in extended tables • Support of extended tables in Core Data
Services (CDS)
PLANNED
• SAP HANA dynamic tiering available to
be used by any HANA application
• Common installer
• Unified administration and monitoring
using HANA Cockpit
• Extended Storage (ES) engine is part of
HANA topology
• Single authentication model • Single licensing model
• Combined error log / trace handling • Fully integrated backup/restore
Automatic, rules-based, asynchronous data movement between hot and warm stores Hot partitions in HANA memory; remaining partitions in warm store
Single HANA table that spans hot and warm stores