• No results found

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

N/A
N/A
Protected

Academic year: 2021

Share "PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

PostgreSQL

Business Intelligence

& Performance

(2)

The research leading to these results has

received funding from the European Union's

Seventh Framework Programme

(FP7/2007-2013) under grant agreem ent num ber

318633

(3)

AXLE Project

• Analytics on Xtremely Large European data

– Secure – Big

– Fast

– Hardware

optimised

(4)

Topics

• Business Intelligence & Architecture

• BI Performance Feature Effectiveness • Benchmark Analysis & Opportunities • New Features in Progress

(5)

Business Intelligence

• ETL

• Reporting

• Ad-hoc queries • Data Mining

• Many query types • Counting

• Summarisation

• Strategic Analysis • Analytics

(6)

BI Architecture

• SQL was invented for Business Intelligence • Classic DW

DB2 v Teradata

• Specialist Databases

(7)

Specialist OLTP Problems

M ongoDB

Joins don't scale!

V oltDB

No concurrency

All SQL must run in same duration

(8)

Specialist DW Problems

• Second specialist system required

• ETL middleware also often required for loading • Data delayed on route to second system

• Frequently highly compressed, so read only or

(9)

Get Real

• Big Data

99% of databases are <100GB

• Real Time results

– Business Intelligence required 24x7

– Closed loop processing requires fast response

• SQL is much easier to use than alternatives

– More expressive and easier to use – Already the de facto standard for BI

(10)

Minimal Approach

• Emphasise that additional BI technology will

not reduce costs and may not offer solutions

• Keep Business Intelligence on PostgreSQL

• Use Hot Standby to expand capacity and

isolate Business Intelligence workloads

• Minimise ETL whenever possible

• Gain benefits of SQL and concurrency

(11)

Things To Learn

• Query performance is important

• Custom/special data structures are important

in increasing performance

• Stale answers are acceptable for many

(12)

BI Feature Effectiveness

• Problem 1: Get the work into the database

• Problem 2: Speed up the work in the database

+++++ Work Avoidance

++ Algorithmic Improvement

(13)

Orange Data Mining

Orange 3.0 generates SQL for all data flows • Directly utilises the power of databases • Integrates with PostgreSQL

(14)

BI Tuning Opportunities

• COPY batch optimisations defeated

• Btree insert bottlenecks on large data loads • Aggregate Optimisation

Use sum()/count() not avg()

• Join Estimate/Actual Mismatch – Use enable_nestloops = off

• Plan Pushdown

(15)

Speed Up: Work Avoidance

• Cacheing

– Result Cache

– Materialized Views

• Approximation

• Partition Elimination

(16)

Speed Up: Algorithms

• Compression

• Column Orientation

• Vectors

(17)

Speed Ups: Brute Force

Parallel Query is a brute force approach

Gains in performance come from additional utilisation of resources, not from being smarter

Reduces overall concurrency

Still requires extensive optimizer changes • The industry thinks we need it

• Some queries do require it • PostgreSQL should do this,

(18)

9.4 BI Features In-Progress

• Min Max Indexes

• Parallel Sort & Parallel Query infrastructure • Materialized Views++

• Multi-core scalability gains (lwlocks)

• (DDL Locking impact reductions)

(19)

Min Max Indexes (9.4)

• Automatic Partitioning

– Store min and max tuples for each page range

– Use theorem proving to avoid sections of scan

– Covers all columns, not just defined partition key

(20)

Min Max Index results

• 2 GB table MinMax B-Tree • Index build time 11s 96s

• Index size 24kB 1.1GB • Load time w index 1 x2-3 • Index SEL (1 row) x2-3 1

(21)

MinMax Indexes

• Does not require complex DDL • Generate almost no index inserts

– Fits in RAM even for Petabytes of data • Generate almost no additional WAL

– Works well with Hot Standby data warehousing • Only works with some data distributions

(22)

PostgreSQL BI Roadmap

Advanced Business Intelligence 9.4 10.0 10.1

High Security

Online Change

(23)

2ndQuadrant

Consulting, Migration

Training Support,

RemoteDBA Open Source Development

(24)

The research leading to these results has

received funding from the European Union's

Seventh Framework Programme

(FP7/2007-2013) under grant agreem ent num ber

318633

References

Related documents

All services for which compensation is requested were performed by the undersigned in connection with the Debtor’s bankruptcy case and were not for services in any other matter..

Pretreatment of Vero cells with supernatants from p53-expressing EJp53 infected with SeV was also more eff ective at preventing repli- cation of recombinant Newcastle disease

CSM and Capitol Tech agree that students from CSM, under the articulation agreement, may transfer credits earned for the Associate of Applied Science in Software Development towards

# Brake cylinder piston takes longer time to release after each application of brakes because single train pipe.. # successive brake application on gradients are not

Magician dips the erasor of his pencil in some itching powder as he walks towards the audience and touches people

The influence of Japanese kimono on European bustles and their representation in the paintings of the late nineteenth century.. I RIA R OS P IÑEIRO (University of Valencia,

Align process analysis and improvement with the voice-of-the-customer (VOC) using a value stream map (VSM) of the process workflow or a floor layout of the work area associated

Through a landslide inventory and morphometric analysis of the subaqueous slope failures at the coastal areas of the populated distal basin of Lago Nahuel Huapi (where the 1960