ABS Developer Days
January 17th and 18th, 2013
Unterföhring
metafinanz / Carsten Herbe
Database Performance
The goal of this presentation is to give you an understanding of in-memory databases, when they can be used and when better to use other technologies to store your data.
Today we will look at …
disk-based and in-memory databases
in-memory database products
other data storage options
©
More than 8 years data warehousing experience
Strong technical skills in Oracle & OWB
Certified Hadoop Developer
Oracle partnership manager Data Quality
(Profi-ling & Cleansing)
Project Management
Big Data (Hadoop, NoSQL) Training (Oracle, DWH, Hadoop, etc.) Performance Tuning DWH Architectures & Dimensional Modeling Databases (Oracle, in-mem., columnar)
ETL Processes & Tools (OWB & SAS DI)
Within our business line Business Intelligence & Risk, there are five groups: Risk,
Insurance Reporting, Insurance Analytics, Customer Intelligence and Data Warehousing.
Data Warehousing
Technologies Capabilities
Your contact
We support the complete data warehouse lifecycle from requirement gathering to tuning existing ETL processes. Our team includes skilled architects, developers and project managers with broad data warehousing experience. Turning Data into Information Carsten Herbe
Contents Introduction 1 About databases 2 Oracle TimesTen 3 IBM solidDB 4
SAS Visual Analytics
6
Architectures
7
Alternative storage technologies
8
Conclusions
9
SAP HANA
©
Introduction
The main reason for introducing an in-memory database is performance!
©
In-Memory databases store all their data in the RAM. Hard-disk drives are only used for log-files and backups.
In-memory databases & disks
The RAM price is continuously decreasing. Systems with 1 terabyte of RAM are affordable today.
Decreasing RAM prices
0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 €/MB Jahr
©
About databases
Relational databases are the standard way of storing all kinds of data securely and
consistently. Data is queried and manipulated using SQL (Structured Query Language).
Relational databases
table PRODUCTS
PROD_ID NAME PRICE CAT_ID
101 MyPhone 1 499,00 1 102 MyPad 1 799,00 2 103 MyPhone 2 699,00 1 … … … … table CATEGORIES CAT_ID NAME 1 Mobiles 2 Tablets … …
SQL
SELECT
INSERT, UPDATE, (MERGE)
DELETE
SELECT p.name, c.name, p.price
FROM products p JOIN categories c ON p.CAT_ID = c.CAT ID WHERE p.price < 500; Primary Key Foreign Key
©
Databases never are 100% compatible.
Databases and compatibility
Database performance with in-memory solutions 11 The interface to a relational database is SQL
The ANSI SQL standard is usually supported (ca. 99%)
• many vendors add custom functions
• Scripting language support (like PL/SQL on Oracle)
Some database claim to be Oracle compatible (ca. 99%)
Databases
... means
SQLs can be “compiled” and executed on different systems
Databases are compatible from a user/application developer perspective
BUT: SQLs on different systems are executed differently
• Results are identical
• But runtime may vary!
Database tuning and administration is different and requires different skills
Backup Server
Not only data (including indexes) is causing disk i/o. UNDO is required for read-consistency, REDO for restoring data.
Database architecture: disk-based
RDBMS DB processes UNDO DATA DATA cache DATA backup REDO REDO backup DATA
User & system data:
tables, indexes, programs
REDO
Required for durability:
Data restore uses backup and redo.
UNDO
Required for read-consistency: used to „undo“ changes made by other sessions after your own query has started.
©
Backup Server
An in-memory database holds all data in the RAM. But REDO is still written to disk to guarantee durability. SELECTs are processed in-memory only.
Database architecture: in-memory
RDBMS DB processes DATA snapshot DATA DATA backup REDO REDO backup DATA
Data is stored in the RAM and is asynchronously written to snapshot on disk (only used for a restore).
REDO
Required for durability: Data restore uses data snapshot and redo.
UNDO
UNDO is in-memory only.
UNDO
column-orientated row orientated
Row-orientated databases are good for handling single rows of a transactional system, columnar-orientated databases good for analyzing huge data sets in data warehouses.
Row- and column-oriented databases
1 Müller Karl 45000 2 Bauer Fritz 62000 3 Meier Hans 54000 4 Schmidt Paul 70000 1 Müller Karl 45000 2 Bauer Fritz 62000 3 Meier Hans 54000 4 Schmidt Paul 70000
PersNo Last Name First Name Salary PersNo Last Name First Name Salary
storage 1, Müller, Karl, 45000 2, Bauer, Fritz, 62000 3, Meier, Hans, 54000 4, Schmidt, Paul, 70000 storage 1, 2, 3, 4
Müller, Bauer, Meier, Schmidt Karl, Fritz, Hans, Paul
©
Oracle TimesTen
TimesTen Server
Client
Client Application
Oracle TimesTen is a relational in-memory database acquired in 2005. It‘s compatibility with the Oracle database has been improved ever since, including PL/SQL support.
TimesTen architecture
Oracle Server Cache Agent Network Oracle DB In-memorydata Client Driver
Server
Process Network
Direct linked Application
Shared Libraries Log Files Checkpoint Files DB Processes
©
TimesTen Cache
Cached tables do not need to contain all attributes or all rows!
TimesTen
Tables can be cached partially. If TimesTen cannot process a query, it is passed through to the underlying Oracle database
Oracle DB Tab1 Tab3 Tab2 Tab4 Cache Group 1 Tab1 Tab2 Tab4 passthrough Cache Group 2 Client Application Server Process
IBM solidDB
©
IBM acquired solidDB in 2007. It works similarly to Oracle TimesTen, but it also supports disk-based tables. Direct linking from application is supported using libraries.
IBM solidDB
19 source: IBM redbook „IBM solidDB Delivering Data with Extreme Speed“, 2011
IDB solidDB can be used as a cache for a disk-based RDBMS. Contrary to TimesTen, different databases are supported.
IBM solidDB Universal Cache
Caching for Oracle, DB2, IDS, Sybase ASE, and Microsoft SQL Server
Caches can be read-only or read/write
©
SAP HANA
SAP HANA is being developed by SAP and was first released in 2010. It makes use of the latest processor technology and therefore requires hardware certified by SAP.
SAP HANA
High-Performance Analytic Appliance
In-memory Database
• column-based storage and/or
• row-based storage
supports SQL and MDX
©
This Fujitsu PRIMERGY RS 900 S2 was used for a PoC by metafinanz and AMOS in 2012.
This is what 1 TB of RAM looks like
1.024 GB (= 1 TB) RAM
80 cores
4.500 GB disk storage for data
1.280 GB SSD-based storage with
Fusion I/O for logging
8 sockets 8-HE Rack Server
X86-based, Intel Xeon
E7-8800-Processor-Family
SAS Visual Analytics
©
SAS Visual Analytics is not a relational database, but an analytical in-memory solution that includes data storage and a graphical front-end for analysis and reporting.
SAS Visual Analytics architecture
SAS Visual Analytics includes graphical tools for reporting, data exploration and a mobile iOS app.
SAS Visual Analytics
Central Entry Point Integration Role-based Views
MOBILE BI DATA PREPARATION EXPLORER DESIGNER
• Native iOS application that delivers interactive reports created in the designer
• Monitor SAS® LASR™ Analytic server • Load and join data • Create calculated
columns
• Perform ad-hoc analysis and data discovery
• Create dashboard style reports for web
or mobile
©
Architectures
classical architecture in-memory dedicated in-memory shared
A classical disk-based database is replaced by an in-memory database.
Complete replacement
database server application server network database disk-based Java Application JDBC database server application server network database in-memory Java Application JDBC application + db server database in-memory Java Application JDBC direct access©
server cache classical architecture
On the application server, an in-memory database is used to cache currently used data.
Server Cache
database server application server network database disk-based Java Application JDBC application + db server database in-memory Java Application JDBC direct access database server network database disk-baseddistributed cache
In a distributed environment, the in-memory database acts as a local cache. Concurrency problems must be handled by the application.
Distributed cache
distributed architecture location 0 location 1 slow or unreliable network database disk-based Java Application JDBC location n Java Application JDBC location 1 database in-memory Java Application JDBC direct access location 2 database in-memory Java Application JDBC direct access location 0 slow or unreliable network database disk-based©
in-memory BI in-memory data mart
classical BI architecture
A data mart, i.e. an (aggregated) subset of a data warehouse, is stored in the in-memory database.
In-memory data mart
DB server DB server data warehouse disk-based data mart disk-based BI server JDBC BI Application JDBC DB server DB server data warehouse disk-based data mart in-memory BI server JDBC BI Application JDBC DB server data warehouse disk-based BI server JDBC BI Application integrated in-memory
Alternative storage technologies
©
During the last couple of years, a lot new data storage technology have emerged. But still, relational databases are the most general purpose option and most widely used.
Others ways to store your data
Hadoop
= HDFS (distributed file system)
+ MapReduce (parallel programming framework)
Data (big files like 64MB) is written only once and never modified
Runs on clusters of commodity hardware
OLAP
Stores data in cubes (similar to Excel pivot)
For analytics (slice & dice)
Data can be added but not modified
NoSQL
“Not only SQL”
Often only “eventually consistent”
Typical: Key-value- or wide-column store (like a hash table with multiple values)
All types of special purpose databases: documents, graphs, …
Columnar
DBs
Data is stored in columns instead of rows
Good compression and good for analytics
For big servers or clusters of commodity hardware
All product and vendor lists are only examples and are therefore incomplete. 33 Database performance with in-memory solutions
First analyse the requirements, than chose an appropriate technology for your problem.
Classification
data usage
data
st
ruct
ure
operational analytical Strong struct ered disk-based & in-memory RDBMS W eak struct ered OLAP Hadoop columnar RDBMS key-value stores SAS Visual Analytics©
Conclusions
An in-memory database can boost your performance, but there are some points to consider.
In-memory DBs can boost your performance, but …
Understand
… your current system!
Don‘t use your database as a black box!
Learn how your database works!
Tune
… the existing system!
Tune the SQL!
Tune the database instance!
Tune the application (design)!
Plan
... the migration, it is never easy!
Understand the new technology and use it properly!
New technology adds complexity both in dev & op!
© 37
We offer open group trainings or customized trainings for individual companies.
metafinanz training
Einführung Oracle in-memory Datenbank TimesTen
Big Data mit Hadoop
Data Warehousing & Dimensionale Modellierung
Oracle SQL Tuning
OWB Skripting mit OMB*Plus
Oracle Warehousebuilder 11.2 New Features
Einführung in Oracle: Architektur, SQL und PL/SQL
More details about our trainings can be found at http://www.metafinanz.de/arbeitsweise/was-trainieren-wir All trainings are also available in English on request.
NEW 2012
NEW 2013/Q2
If you have any questions …
ask now ...
Carsten Herbe
Head of Data Warehousing
mail [email protected] phone +49 89 360531 5039
Thank you for your attention!
metafinanz Informationssysteme GmbH Leopoldstr. 146 80804 München Phone: +49 89 360531-0 Fax: +49 89 350531-5015 Email: [email protected] www.metafinanz.deDatabase Performance
with In-Memory Solutions
Fachblog und Forum zu Solvency II:
http://solvencyii.metafinanz.de