PostgreSQL
Business Intelligence
& Performance
The research leading to these results has
received funding from the European Union's
Seventh Framework Programme
(FP7/2007-2013) under grant agreem ent num ber
318633
AXLE Project
• Analytics on Xtremely Large European data
– Secure – Big
– Fast
– Hardware
optimised
Topics
• Business Intelligence & Architecture
• BI Performance Feature Effectiveness • Benchmark Analysis & Opportunities • New Features in Progress
Business Intelligence
• ETL
• Reporting
• Ad-hoc queries • Data Mining
• Many query types • Counting
• Summarisation
• Strategic Analysis • Analytics
BI Architecture
• SQL was invented for Business Intelligence • Classic DW
– DB2 v Teradata
• Specialist Databases
Specialist OLTP Problems
• M ongoDB
– Joins don't scale!
• V oltDB
– No concurrency
– All SQL must run in same duration
Specialist DW Problems
• Second specialist system required
• ETL middleware also often required for loading • Data delayed on route to second system
• Frequently highly compressed, so read only or
Get Real
• Big Data
– 99% of databases are <100GB
• Real Time results
– Business Intelligence required 24x7
– Closed loop processing requires fast response
• SQL is much easier to use than alternatives
– More expressive and easier to use – Already the de facto standard for BI
Minimal Approach
• Emphasise that additional BI technology will
not reduce costs and may not offer solutions
• Keep Business Intelligence on PostgreSQL
• Use Hot Standby to expand capacity and
isolate Business Intelligence workloads
• Minimise ETL whenever possible
• Gain benefits of SQL and concurrency
Things To Learn
• Query performance is important
• Custom/special data structures are important
in increasing performance
• Stale answers are acceptable for many
BI Feature Effectiveness
• Problem 1: Get the work into the database
• Problem 2: Speed up the work in the database
+++++ Work Avoidance
++ Algorithmic Improvement
Orange Data Mining
• Orange 3.0 generates SQL for all data flows • Directly utilises the power of databases • Integrates with PostgreSQL
BI Tuning Opportunities
• COPY batch optimisations defeated
• Btree insert bottlenecks on large data loads • Aggregate Optimisation
– Use sum()/count() not avg()
• Join Estimate/Actual Mismatch – Use enable_nestloops = off
• Plan Pushdown
Speed Up: Work Avoidance
• Cacheing
– Result Cache
– Materialized Views
• Approximation
• Partition Elimination
Speed Up: Algorithms
• Compression
• Column Orientation
• Vectors
Speed Ups: Brute Force
• Parallel Query is a brute force approach
• Gains in performance come from additional utilisation of resources, not from being smarter
– Reduces overall concurrency
– Still requires extensive optimizer changes • The industry thinks we need it
• Some queries do require it • PostgreSQL should do this,
9.4 BI Features In-Progress
• Min Max Indexes
• Parallel Sort & Parallel Query infrastructure • Materialized Views++
• Multi-core scalability gains (lwlocks)
• (DDL Locking impact reductions)
Min Max Indexes (9.4)
• Automatic Partitioning
– Store min and max tuples for each page range
– Use theorem proving to avoid sections of scan
– Covers all columns, not just defined partition key
Min Max Index results
• 2 GB table MinMax B-Tree • Index build time 11s 96s
• Index size 24kB 1.1GB • Load time w index 1 x2-3 • Index SEL (1 row) x2-3 1
MinMax Indexes
• Does not require complex DDL • Generate almost no index inserts
– Fits in RAM even for Petabytes of data • Generate almost no additional WAL
– Works well with Hot Standby data warehousing • Only works with some data distributions
PostgreSQL BI Roadmap
Advanced Business Intelligence 9.4 10.0 10.1
High Security
Online Change
2ndQuadrant
Consulting, Migration
Training Support,
RemoteDBA Open Source Development