Technical Deep-Dive in a Column-Oriented In-Memory Database

141  Download (0)

Full text

(1)

Technical Deep-Dive in a

Column-Oriented

In-Memory Database

Martin Faust

Martin.Faust@hpi.uni-potsdam.de

Research Group of Prof. Hasso Plattner

Hasso Plattner Institute for Software Engineering University of Potsdam

(2)

Research Group of Prof. Dr. h.c. Hasso Plattner

§  Research focuses on the technical aspects of enterprise software and

design of complex applications

§  In-Memory Data Management for Enterprise Applications

§  Human-Centered Software Design and Engineering

§  Programming Model for Enterprise

Applications

§  Maintenance and Evolution of

Service-Oriented Enterprise Software

§  RFID Technology in Enterprise Platforms

§  Teaching activities:

§  Lectures, Exercises, Seminars

§  Bachelor/Master projects

§  Master thesis

Enterprise Platform and Integration Concepts Group - Topics

(3)

The Design Thinking

Methodology

Viability Business Factors Feasibility Technical Factors Solutions Needs & Opportunities Desirability Human Factors

(4)

Demand Prediction Supply Chain Innovation Database Technology Massachusetts Institute of Technology Software Hardware Cloud Computing UC Berkeley Design Research In-Memory Data Management Stanford University Hasso Plattner Institute,

Enterprise Systems & Applications Chair

End-Users

Setup    

Enables  Leading  Research  in  the  Field     of  In-­‐Memory  Data  Management

(5)

Learning Map of our

Online Lecture @ openHPI.de

Founda'ons  for  a   New  Enterprise   Applica'on   Development  Era   Founda'ons  of   Database  Storage   Techniques   The  Future  of  

Enterprise   Compu'ng   Advanced   Database   Storage   Tech-­‐ niques   In-­‐Memory   Database   Operators  

(6)

Goals

Deep  technical  understanding  of  a  column-­‐oriented,  dicAonary-­‐ encoded  in-­‐memory  database  and  its  applicaAon  in  enterprise   compuAng  

 

Chapters  

□  The  future  of  enterprise  compuAng  

□  FoundaAons  of  database  storage  techniques  

□  In-­‐memory  database  operators  

□  Advanced  database  storage  techniques  

□  ImplicaAons  on  ApplicaAon  Development  

(7)

Chapter 1:

The Future of Enterprise

Computing

(8)

New Requirements for

Enterprise Computing

□  Sensors  

□  Events  

□  Structured  +  unstructured  data  

□  Social  networks  

□  ...  

(9)

Enterprise Application

Characteristics

(10)

¨  Modern  enterprise  resource  planning  (ERP)  systems  are   challenged  by  mixed  workloads,  including  OLAP-­‐style   queries.  For  example:    

§  OLTP-­‐style:  create  sales  order,  invoice,  accounAng  documents,  

display  customer  master  data  or  sales  order  

§  OLAP-­‐style:  dunning,  available-­‐to-­‐promise,  cross  selling,  

operaAonal  reporAng  (list  open  sales  orders)  

¨  But:  Today’s  data  management  systems  are  opAmized   either  for  daily  transac'onal  or  analy'cal  workloads   storing  their  data  along  rows  or  columns  

O

n

l

ine  

T

ransacAon    

P

rocessing  

O

P

rocessing  

n

l

ine  

A

nalyAcal  

OLTP vs. OLAP

(11)

Drawbacks of the Separation

¨  OLAP  systems  do  not  have  the  latest  data  

¨  OLAP  systems  only  have  predefined  subset  of  the  data  

¨  Cost-­‐intensive  ETL  processes  have  to  sync  both  systems    

¨  SeparaAon  introduces  data  redundancy  

¨  Different  data  schemas  introduce  complexity  for   applicaAons  combining  sources  

(12)

Enterprise Workloads are

Read Dominated

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100% OLTP OLAP W orkload 0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100% TPC-C W orkload Select Insert Modification Delete Write: Read: Lookup Table Scan Range Select Insert Modification Delete Write: Read:

§  Workload  in  enterprise  applicaAons  consAtutes  of    

§  Mainly  read  queries  (OLTP  83%,  OLAP  94%)  

§  Many  queries  access  large  sets  of  data  

(13)

Few Updates in OLTP

Pe rc en tag e   of  ro w s   up date d  

(14)

Combine  OLTP  and  OLAP  data    

using  

modern

 hardware  and  database  systems  

to  create  a  

single  source  of  truth,

 

enable  

real-­‐'me  analy'cs  

and    

simplify

 applicaAons  and  database  structures.    

 

AddiAonally,  

¨  ExtracAon,  transformaAon,  and  loading  (ETL)  processes    

¨  Pre-­‐computed  aggregates  and  materialized  views    

become  (almost)  obsolete

.  

Vision

(15)

¨  Many  columns  are  not  used  even  once  

¨  Many  columns  have  a  low  cardinality  of  values  

¨  NULL  values/default  values  are  dominant  

¨  Sparse  distribuAon  facilitates  high  compression  

Standard  enterprise  so`ware  data  is  sparse  and  wide.  

(16)

Many Columns are not

Used Even Once

55%  unused  columns  per  company  in  average    

40%  unused  columns  across  all  12  analyzed  companies    

0 % 10% 20% 30% 40% 50% 60% 70% 80% 1 - 32 33 - 1023 1024 - 100000000 13 % 9 % 78 % 24 % 12 % 64 %

Number of Distinct Values

Inventory Management Financial Accounting

% of Columns

(17)
(18)

A  

Changes  in  Hardware…  

…  give  an  opportunity  to  re-­‐think  the  assump'ons  of   yesterday  because  of  what  is  possible  today.  

 

§  Main  Memory  becomes  cheaper  and  larger  

■  MulA-­‐Core  Architecture    

(128  cores  per  server)  

■  Large  main  memories:  2TB  /

blade  

■  One  blade  ~$50.000  =    

1  enterprise  class  server  

■  Parallel  scaling  across  blades  

■  64-­‐bit  address  space  

■  12TB  in  current  servers  

■  Cost-­‐performance  raAo  rapidly  

declining  

■  Memory  hierarchies  

(19)

In the Meantime

Research has come up with…

§  Column-­‐oriented  data  organizaAon    

(the  column  store)  

§  Sequen'al  scans  allow  best  bandwidth  uAlizaAon  

between  CPU  cores  and  memory    

§  Independence  of  tuples  allows  easy  parAAoning  and  

therefore  parallel  processing  

§  Lightweight  Compression  

§  Reducing  data  amount  

§  Increasing  processing  speed  through  late  materializaAon  

§  And  more,  e.g.,  parallel  scan/join/aggregaAon  

…  several  advances  in  soRware  for  processing  data  

(20)
(21)

SanssouciDB: An In-Memory

Database for Enterprise Applications

Main Memory at Blade i

Log Snapshots

Passive Data (History)

Non-Volatile Memory Recovery Logging Time travel Data aging

Query Execution Metadata TA Manager

Interface Services and Session Management

Distribution Layer at Blade i

Main Store Differential Store Active Data M er ge C ol um n C ol um n C om bi ne d C ol um n C ol um n C ol um n C om bi ne d C ol um n Indexes Inverted Object Data Guide In-­‐Memory  Database  (IMDB)  

¨  Data  resides  permanently  in  

main  memory  

¨  Main  Memory  is  the  primary  

“persistence”  

¨  SAll:  logging  and  recovery              

to/from  flash  

¨  Main  memory  access  is    

the  new  boTleneck  

¨  Cache-­‐conscious  algorithms/  

data  structures  are  crucial  

(locality  is  king)    

Main Memory at Server i

Distribution Layer at Server i

(22)

Chapter 2:

Foundations of Database

Storage Techniques

(23)

Learning Map of our

Online Lecture @ openHPI.de

Founda'ons  for  a   New  Enterprise   Applica'on   Development  Era   Founda'ons  of   Database  Storage   Techniques   The  Future  of  

Enterprise   Compu'ng   Advanced   Database   Storage   Tech-­‐ niques   In-­‐Memory   Database   Operators  

(24)
(25)

Memory Basics (1)

□  Memory  in  today’s  computers  has  a  linear  address  layout:  

addresses  start  at  0x0  and  go  to  0xFFFFFFFFFFFFFFFF  for  64bit  

□  Each  process  has  its  own  virtual  address  space  

□  Virtual  memory  allocated  by  the  program  can  distribute  over  

mulAple  physical  memory  locaAons  

(26)

Memory Basics (2)

□  Memory  layout  is  only  linear  

□  Every  higher-­‐dimensional  access  (like  two-­‐dimensional  

database  tables)  is  mapped  to  this  linear  band  

26  

0x0 0xFFFF FFFF FFFF FFFF

FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF

(27)

Memory Hierarchy

Memory Page

Nehalem Quadcore

Core 0 Core 1 Core 2 Core 3

L3 Cache L2

L1 TLB

Main Memory Main Memory

QPI

Nehalem Quadcore

Core 0 Core 1 Core 2 Core 3

L3 Cache L2 L1 TLB QPI L1 Cacheline L2 Cacheline L3 Cacheline

(28)

Physical Data Representation

§  Row  store:    

§  Rows  are  stored  consecuAvely  

§  OpAmal  for  row-­‐wise  access  (e.g.  SELECT  *)  

§  Column  store:    

§  Columns  are  stored  consecuAvely  

§  OpAmal  for  akribute  focused  access  (e.g.  SUM,  GROUP  BY)  

§  Note:  concept  is  independent  from  storage  type  

+

28   Doc Num Doc Date Sold-To Value Status Sales Org Row 4 Row 3 Row 2 Row 1 Row-Store Column-store

(29)

Row Data Layout

□  Data  is  stored  tuple-­‐wise  

□  Leverage  co-­‐locaAon  of  akributes  for  a  single  tuple  

□  Low  cost  for  tuple  reconstrucAon,  but  higher  cost  for  

sequenAal  scan  of  a  single  akribute  

C B A C B A C B A C B A C B A C B A C B A C B A Column Operation Row Operation

(30)

Columnar Data Layout

□  Data  is  stored  akribute-­‐wise  

□  Leverage  sequenAal  scan-­‐speed  in  main  memory    

for  predicate  evaluaAon  

□  Tuple  reconstrucAon  is  more  expensive  

C C C C B B B B A A A A C C C C B B B B A A A A Column Operation Row Operation

Column Column Column

Column Column Column

(31)

Row-oriented storage

A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4

(32)

Row-oriented storage

32   A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4

(33)

Row-oriented storage

A1 B1 C1 A2 B2 C2

A3 B3 C3 A4 B4 C4

(34)

Row-oriented storage

34  

A1 B1 C1 A2 B2 C2 A3 B3 C3

(35)

Row-oriented storage

(36)

Column-oriented storage

36   A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4

(37)

Column-oriented storage

B1 C1 B2 C2 B3 C3 B4 C4 A1 A2 A3 A4

(38)

Column-oriented storage

38   A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4

(39)

Column-oriented storage

(40)
(41)

Motivation

□  Main  memory  access  is  the  new  bokleneck  

□  Idea:  Trade  CPU  Ame  to  compress  and  decompress  data  

□  Compression  reduces  number  of  memory  accesses  

□  Leads  to  less  cache  misses  due  to  more  informaAon  on  a  

cache  line  

□  OperaAon  directly  on  compressed  data  possible  

□  Offseong  with  bit-­‐encoded  fixed-­‐length  data  types  

(42)

Dictionary Encoding Example

8  billion  humans   □  Akributes   §  first  name     §  last  name     §  gender   §  country   §  city   §  birthday   à  200  byte  

□  Each  akribute  is  stored  

dicAonary  encoded   42  

(43)

Sample Data

rec  ID   fname   lname   gender   city   country   birthday   …   …   …   …  

39   John   Smith   m   Chicago   USA   12.03.1964   40   Mary   Brown   f   London   UK   12.05.1964   41   Jane   Doe   f   Palo  Alto   USA   23.04.1976   42   John   Doe   m   Palo  Alto   USA   17.06.1952   43   Peter   Schmidt   m   Potsdam   GER   11.11.1975  

(44)

Dictionary Encoding a Column

□  A  column  is  split  into  a  dicAonary  and  an  akribute  vector  

□  DicAonary  stores  all  disAnct  values  with  implicit  value  ID  

□  Akribute  vector  stores  value  IDs  for  all  entries  in  the  column  

□  PosiAon  is  implicit,  not  stored  explicitly  

□  Enables  offseong  with  fixed-­‐length  data  types  

44  

Rec  ID   fname  

…   …   39   John   40   Mary   41   Jane   42   John   43   Peter   …   …  

Dic'onary  for  “fname”   Value  ID   Value  

…   …   23   John   24   Mary   25   Jane   26   Peter   …   …  

ATribute  Vector  for  “fname”   posi'on   Value  ID  

…   …   39   23   40   24   41   25   42   23   43   26   …   …  

(45)

Sorted Dictionary

□  DicAonary  entries  are  sorted  either  by  their  numeric  value  or  

lexicographically  

§  DicAonary  lookup  complexity:  O(log(n))  instead  of  O(n)  

□  DicAonary  entries  can  be  compressed  to  reduce  the  amount  

of  required  storage  

□  SelecAon  criteria  with  ranges  are  less  expensive  (order-­‐

(46)

Data Size Examples

46  

Column   Cardi-­‐

nality   Needed  Bits   Item  Size   Plain  Size   (Dic'onary  +  Column)  Size  with  Dic'onary   Compression  Factor  

First  

name   5  million   23  bit   50  Byte   373  GB   238.4  MB  +  21.4  GB   ≈  17  

Last  

name   8  million   23  bit   50  Byte   373  GB   381.5  MB  +  21.4  GB   ≈  17  

Gender   2   1  bit   1  Byte   7  GB   2  bit  +  953.7  MB   ≈  8  

City   1  million   20  bit   50  Byte   373  GB   47.7  MB  +  18.6  GB   ≈  20  

Country   200   8  bit   47  Byte   350  GB   9.2  KB  +  7.5  GB   ≈  47  

Birthday   40,000   16  bit   2  Byte   15  GB   78.1  KB  +  14.9  GB   ≈  1  

(47)

Chapter 3:

In-Memory

(48)

Learning Map of our

Online Lecture @ openHPI.de

Founda'ons  for  a   New  Enterprise   Applica'on   Development  Era   Founda'ons  of   Database  Storage   Techniques   The  Future  of  

Enterprise   Compu'ng   Advanced   Database   Storage   Tech-­‐ niques   In-­‐Memory   Database   Operators  

(49)

Scan Performance (1)

8  billion  humans   ¨  Akributes   §  First  Name   §  Last  Name     §  Gender   §  Country   §  City   §  Birthday   è  200  byte  

¨  QuesAon:  How  many  men/women?  

(50)

Scan Performance (2)

50   Row 1 Row 2 Row 3 ... Row 8 x 10 First Name Last Name Gender Country City Birthday Table: humans 9

Row Store – Layout

¨  Table size =

8 billion tuples x 200 bytes per tuple à ~1.6 TB

¨  Scan through all rows

with 2 MB/ms/core à ~800 seconds with 1 core

(51)

Scan Performance (3)

Row 1 Row 2 Row 3 ... First Name

Last Name Country

City Birthday Table: humans Row 8 x 109 Gender

Row Store – Full Table Scan

¨  Table size =

8 billion tuples x 200 bytes per tuple à ~1.6 TB

¨  Scan through all rows

with 2 MB/ms/core à ~800 seconds with 1 core

(52)

Scan Performance (4)

52   Row 1 Row 2 Row 3 ... First Name

Last Name Country

City Birthday Table: humans Data not loaded Data loaded and used Data loaded but not used Row

8 x 109

Gender

¨  8 billion cache accesses à

64 byte

à ~512 GB

¨  Read with 2 MB/ms/core

à ~256 seconds with 1 core

(53)

Scan Performance (5)

First Name

Last Name Country

City

Birthday

Table: humans

Gender

Column Store – Layout

¨  Table size

§  Attribute vectors: ~91 GB

§  Dictionaries: ~700 MB

à Total: ~92 GB

(54)

Scan Performance (6)

54  

Column Store – Full Column Scan on “Gender”

First Name

Last Name Country

City Birthday Table: humans Data not loaded Data loaded and used Data loaded but not used Gender

¨  Size of attribute vector

“gender” =

8 billion tuples x 1 bit per tuple à ~1 GB

¨  Scan through

attribute vector with 2 MB/ms/core à

(55)

Scan Performance (7)

Column Store – Full Column Scan on “Birthday”

First Name

Last Name Country

City

Birthday

Table: humans

Gender

¨  Size of attribute vector

“birthday” =

8 billion tuples x 2 Byte per tuple à ~16 GB

¨  Scan through

column with

2 MB/ms/core à

(56)

Scan Performance – Summary

56  

¨  How many women, how many men?

Column

Store Row Store

Full table

scan access Stride Time in

seconds 0.5 800 256

1,600x

(57)
(58)

Tuple Reconstruction (1)

¨  All  akributes     are  stored   consecuAvely   ¨  200  byte  à  4  cache   accesses  à  64  byte   à  256  byte     ¨  Read  with     2  MB/ms/core     à  ~0.128  μs     with  1  core    

Accessing  a  record  in  a  row  store  

58  

(59)

Tuple Reconstruction (2)

¨  All  akributes  are   stored  in  separate   columns  

¨  Implicit  record  IDs   are  used  to  

reconstruct  rows    

Virtual  record  IDs  

(60)

Tuple Reconstruction (3)

¨  1  cache  access  for   each  akribute   ¨  6  cache  accesses     à  64  byte     à  384  byte     ¨  Read  with     2  MB/ms/core     à  ~0.192  μs     with  1  core   60  

Virtual  record  IDs  

(61)
(62)

SELECT Example

SELECT fname, lname FROM world_population

WHERE country="Italy" and gender="m"

fname   lname   country   gender  

Gianluigi   Buffon   Italy   m   Lena   Gercke   Germany   f   Mario   Balotelli   Italy   m   Manuel   Neuer   Germany   m   Lukas   Podolski   Germany   m   Klaas-­‐Jan   Huntelaar   Netherlands   m  

(63)

Query Plan

¨  MulAple  plans  exist  to  execute  query  

¨  Query  OpAmizer  decides  which  is  executed  

¨  Based  on  cost  model,  staAsAcs  and  other  parameters  

¨  AlternaAves  

¨  Scan  “country”  and  “gender”,  posiAonal  AND  

¨  Scan  over  “country”  and  probe  into  “gender”  

¨  Indices  might  be  used  

¨  Decision  depends  on  data  and  query  parameters  like  e.g.  selecAvity  

SELECT fname, lname FROM world_population

(64)

Query Plan (i)

PosiAonal  AND:  

¨  Predicates  are  evaluated  and  generate  posiAon  lists  

¨  Intermediate  posiAon  lists  are  logically  combined  

¨  Final  posiAon  list  is  used  for  materializaAon  

64  

country = "Italy" gender = "m"

fname, lname position list position list positional AND π  

(65)

Query Execution (i)

Posi'on   0   2   Posi'on   0   2   3   4   5   Posi'on   0   2   country  =    3  ("Italy")   gender  =  1  ("m")   AND  

Value  ID   Dic'onary   for  “country”   0   Algeria   1   France   2   Germany   3   Italy   4   Netherlands   …   Value  

ID   Dic'onary  for   “gender”  

0   f   1   m  

fname   lname   country   gender  

Gianluigi   Buffon   3   1   Lena   Gercke   2   0   Mario   Balotelli   3   1   Manuel   Neuer   2   1   Lukas   Podolski   2   1   Klaas-­‐Jan   Huntelaar   4   1  

(66)

country = "Italy" Gender fname, lname position list Probe Positional Filter gender = "m"

Query Plan (ii)

Based  on  posiAon  list  produced  by  first  selecAon,  

gender  column  is  probed.  

66  

(67)
(68)

Insert

□  Insert  is  the  dominant  modificaAon  operaAon    

§  Delete/Update  can  be  modeled  as  Inserts  as  well  (Insert-­‐only  

approach)  

□  InserAng  into  a  compressed  in-­‐memory  persistence  can  be  

expensive  

§  UpdaAng  sorted  sequences  (e.g.  dicAonaries)  is  a  challenge  

§  InserAng  into  columnar  storages  is  generally  more  expensive  than  

inserAng  into  row  storages  

(69)

Insert Example

rowID   fname   lname   gender   country   city   birthday   0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

...   ...   ...   ...   ...   ...   ...  

world_populaAon  

INSERT  INTO  world_populaAon  

(70)

INSERT (1) w/o new Dictionary entry

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   ...   ...   ...   ...   ...   ...   ...  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

0   0   1   1   2   3   3   2   4   3   0   Albrecht   1   Berg   2   Meyer   3   Schulze   D   AV  

Akribute  Vector  (AV)   DicAonary  (D)  

(71)

0   Albrecht  

1   Berg  

2   Meyer  

3   Schulze  

1.  Look-­‐up  on  D  à  entry  found  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

Akribute  Vector  (AV)   DicAonary  (D)  

D  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   ...   ...   ...   ...   ...   ...   ...   AV   0   0   1   1   2   3   3   2   4   3  

(72)

0   0   1   1   2   3   3   2   4   3   5   3  

1.  Look-­‐up  on  D  à  entry  found  

2.  Append  ValueID  to  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

Akribute  Vector  (AV)   DicAonary  (D)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   ...   ...   ...   ...   ...   ...   ...   AV   0   Albrecht   1   Berg   2   Meyer   3   Schulze   D  

INSERT (1) w/o new Dictionary entry

(73)

INSERT (2) with new Dictionary Entry I/II

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze  

...   ...   ...   ...   ...   ...   ...  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

0   0   1   0   2   1   3   2   4   3   D   AV   0   Berlin   1   Hamburg   2   Innsbruck   3   Potsdam  

Akribute  Vector  (AV)   DicAonary  (D)  

(74)

1.  Look-­‐up  on  D  à  no  entry  found  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)   D   AV   0   0   1   0   2   1   3   2   4   3   0   Berlin   1   Hamburg   2   Innsbruck   3   Potsdam  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)  

INSERT (2) with new Dictionary Entry I/II

(75)

1.  Look-­‐up  on  D  à  no  entry  found  

2.  Append  new  value  to  D  (no  re-­‐sorAng  necessary)  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

0   Berlin   1   Hamburg   2   Innsbruck   3   Potsdam   4   Rostock   D  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   AV   0   0   1   0   2   1   3   2   4   3  

(76)

1.  Look-­‐up  on  D  à  no  entry  found  

2.  Append  new  value  to  D  (no  re-­‐sorAng  necessary)  

3.  Append  ValueID  to  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   0   0   1   0   2   1   3   2   4   3   5   4   0   Berlin   1   Hamburg   2   Innsbruck   3   Potsdam   4   Rostock   D   AV  

INSERT (2) with new Dictionary Entry I/II

(77)

0   Anton  

1   Hanna  

2   MarAn  

3   Michael  

4   Sophie  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)   D   AV   0   2   1   3   2   1   3   0   4   4  

Akribute  Vector  (AV)   DicAonary  (D)  

(78)

1.  Look-­‐up  on  D  à  no  entry  found  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   0   Anton   1   Hanna   2   MarAn   3   Michael   4   Sophie   D   AV   0   2   1   3   2   1   3   0   4   4  

INSERT (2) with new Dictionary Entry II/II

(79)

1.  Look-­‐up  on  D  à  no  entry  found  

2.  Insert  new  value  to  D  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   0   Anton   1   Hanna   2   Karen   3   MarAn   4   Michael   5   Sophie   D   AV   0   2   1   3   2   1   3   0   4   4  

(80)

1.  Look-­‐up  on  D  à  no  entry  found  

2.  Insert  new  value  to  D  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   0   Anton   1   Hanna   2   MarAn   3   Michael   4   Sophie   D  (old)   AV   0   2   1   3   2   1   3   0   4   4   0   Anton   1   Hanna   2   Karen   3   MarAn   4   Michael   5   Sophie   D  (new)  

INSERT (2) with new Dictionary Entry II/II

(81)

1.  Look-­‐up  on  D  à  no  entry  found  

2.  Insert  new  value  to  D  

3.  Change  ValueIDs  in  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   AV  (old)   0   Anton   1   Hanna   2   Karen   3   MarAn   4   Michael   5   Sophie   D  (new)   AV  (new)   0   3   1   4   2   1   3   0   4   5   0   2   1   3   2   1   3   0   4   4  

(82)

1.  Look-­‐up  on  D  à  no  entry  found  

2.  Insert  new  value  to  D  

3.  Change  ValueIDs  in  AV  

4.  Append  new  ValueID  to  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Karen   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...   Akribute  Vector  (AV)   DicAonary  (D)   0   Anton   1   Hanna   2   Karen   3   MarAn   4   Michael   5   Sophie   D   AV   0   3   1   4   2   1   3   0   4   5   5   2  

INSERT (2) with new Dictionary Entry II/II

82  

Changed   Value  IDs  

(83)

Result

rowID   fname   lname   gender   country   city   birthday   0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Karen   Schulze   f   GER   Rostock   11-­‐15-­‐2012  

world_populaAon    

INSERT  INTO  world_populaAon  

(84)

Chapter 4:

Advanced Database

Storage Techniques

(85)

Learning Map of our

Online Lecture @ openHPI.de

Founda'ons  for  a   New  Enterprise   Applica'on   Development  Era   Founda'ons  of   Database  Storage   Techniques   The  Future  of  

Enterprise   Compu'ng   Advanced   Database   Storage   Tech-­‐ niques   In-­‐Memory   Database   Operators  

(86)
(87)

Motivation

□  InserAng  new  tuples  directly  into  a  compressed  structure  can  be  

expensive  

§  Especially  when  using  sorted  structures  

§  New  values  can  require  reorganizing  the  dicAonary  

§  Number  of  bits  required  to  encode  all  dicAonary  values  can  change,  

(88)

Differential Buffer

¨  New  values  are  wriken  to  a  dedicated  differenAal  buffer  (Delta)  

¨  Cache  SensiAve  B+  Tree  (CSB+)  used  for  fast  search  on  Delta  

Table Main Store

Differential Buffer

Data Modifying Operations

Read Operations Table DicAonary   Akribute  Vector   fname   …   (compr esse d)   Main  Store   DicAonary   Akribute  Vector   CSB+   fname   …  

DifferenAal  Buffer/  Delta   Write   Read   world_population 0 0   1 1   2 1   3 3   4 2   5 1   0 Anton   1 Hanna   2 Michael   3 Sophie   0   Angela   1   Klaus   2   Andre   0 0   1 1   2 1   3 2  

8  Billion  entries   up  to  50,000  entries  

(89)

Differential Buffer

□  Inserts  of  new  values  are  fast,  because  dicAonary  and  

akribute  vector  do  not  need  to  be  resorted  

□  Range  selects  on  differenAal  buffer  are  expensive  

§  Unsorted  dicAonary  allows  no  direct  comparison  of  value  IDs  

§  Scans  with  range  selecAon  need  to  lookup  values  in  dicAonary  for  

comparisons  

□  DifferenAal  Buffer  requires  more  memory:  

§  Akribute  vector  not  bit-­‐compressed  

(90)

DifferenAal  Buffer   Main  Store  

Tuple Lifetime

recId   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   ...   ...   ...   ...   ...   ...   ...   8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979  

Main  Table:  world_populaAon  

Michael  moves  from  Berlin  to  Potsdam  

UPDATE    "world_populaAon"   SET    city="Potsdam"  

WHERE    fname="Michael"  AND  lname="Berg"  

(91)

DifferenAal  Buffer   Main  Store  

Tuple Lifetime

recId   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   ...   ...   ...   ...   ...   ...   ...   8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979  

UPDATE    "world_populaAon"   SET    city="Potsdam"  

WHERE    fname="Michael"  AND  lname="Berg"   Main  Table:  world_populaAon  

(92)

Tuple Lifetime

recId   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   ...   ...   ...   ...   ...   ...   ...   8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979  

UPDATE    "world_populaAon"   SET    city="Potsdam"  

WHERE    fname="Michael"  AND  lname="Berg"  

DifferenAal  Buffer   Main  Store  

0   Michael   Berg   m   GER   Potsdam   03-­‐05-­‐1970  

92  

Main  Table:  world_populaAon  

(93)

Tuple Lifetime

□  Tuples  are  now  available  in  Main  Store  and  DifferenAal  Buffer  

□  Tuples  of  a  table  are  marked  by  a  validity  vector  to  reduce  the  

required  amount  of  reorganizaAon  steps  

§  AddiAonal  akribute  vector  for  validity  

§  1  bit  required  per  database  tuple  

□  Invalidated  tuples  stay  in  the  database  table,  unAl  the  next  

reorganizaAon  takes  place  

□  Query  results  

§  Main  and  delta  have  to  be  queried  

(94)

recId   fname   lname   gender   country   city   birthday   valid  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   0   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   1   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   1   4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   1   5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   1   ...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979   1  

Tuple Lifetime

UPDATE    "world_populaAon"   SET    city="Potsdam"  

WHERE    fname="Michael"  AND  lname="Berg"  

0   Michael   Berg   m   GER   Potsdam   03-­‐05-­‐1970   1   DifferenAal    

Buffer   Main  Store  

94  

Main  Table:  world_populaAon  

(95)

Tuple Lifetime

UPDATE    "world_populaAon"   SET    city="Potsdam"  

WHERE    fname="Michael"  AND  lname="Berg"  

recId   fname   lname   gender   country   city   birthday   valid  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1   1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   0   2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   1   3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   1   4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   1   5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   1   ...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979   1  

0   Michael   Berg   m   GER   Potsdam   03-­‐05-­‐1970   1   DifferenAal    

Buffer   Main  Store   Main  Table:  world_populaAon  

(96)
(97)

Handling Write Operations

¨  All  Write  operaAons  (INSERT,  UPDATE)  are  stored  within  a   differenAal  buffer  (delta)  first  

¨  Read-­‐operaAons  on  differenAal  buffer  are  more  expensive  than   on  main  store  

□  DifferenAal  buffer  is  merged  periodically  with  the  main  store  

§  To  avoid  performance  degradaAon  based  on  large  delta  

(98)

Merge Overview I/II

¨  The  merge  process  is  triggered  for  single  tables  

¨  Is  triggered  by:  

§  Amount  of  tuples  in  buffer  

§  Cost  model  to  

§  Schedule  

§  Take  query  cost  into  account  

§  Manually  

(99)

Merge Overview II/II

¨  Working  on  data  copies  allows  asynchronous  merge  

¨  Very  limited  interrupAon  due  to  short  lock  

¨  At  least  twice  the  memory  of  the  table  needed!  

Main Store Differ-ential Buffer Before Data Modifying Operations Read Operations Main Store Differential Buffer (new)

During the Merge Process

Data Modifying Operations Read Operations Main Store (new) Differ-ential Buffer (new) After Data Modifying Operations Read Operations Main Store (new) Differ-ential Buffer Merge Operation

(100)

Attribute Merge

Step  1:  Dic'onary  Merge  

¨  Merge  main  and  delta  dicAonary  

¨  OpAonally  remove  unused  values  

¨  Merge  of  two  sorted  arrays  

¨  Create  mapping  if  valueIDs  changed  

 

Step  2:  Update  ATribute  Vector  

¨  Create  new  merged  main  parAAon  

¨  Update  valueIDs  reflecAng  changed  dicAonary  

(101)

Delta  

Main  Store  

Example

valueID   fname   city  

0   Albert   Berlin   1   Michael   London   2   Nadja   recID   valid   0   1   1   1   2   1  

recID   fname   city  

0   2   0  

1   1   1  

2   0   0  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city   recID   fname   city   recID   valid  

(102)

Delta   Main  Store  

Example

102   recID   valid   0   1   1   0   2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

recID   valid  

0   1  

recID   fname   city  

0   0   0  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Michael  moves  from  London  to  Berlin  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

(103)

Main  Store   Delta  

Example

recID   valid   0   0   1   0   2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

recID   valid  

0   1  

1   1  

recID   fname   city  

0   0   0  

1   1   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Nadja  moves  from  Berlin  to  Potsdam  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

(104)

Main  Store   Delta  

Example

104   recID   valid   0   0   1   0   2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin   1   Nadja   Potsdam   recID   valid   0   0   1   1   2   1  

recID   fname   city  

0   0   0  

1   1   1  

2   0   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Michael  moves  from  Berlin  to  Potsdam  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

(105)

Main  Store   Delta  

First Step:

Validity Detection

recID   valid   0   0   1   0   2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin   1   Nadja   Potsdam   recID   valid   0   0   1   1   2   1  

recID   fname   city  

0   0   0  

1   1   1  

2   0   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Will  be  deleted  /  moved  into  history  parAAon  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

(106)

Main  Store  

Delta  

First Step:

Dictionary Merge

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

Dic'onaries  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

Dic'onaries  

city:  old  to  new  Main    

valueID  (old)   0   1   valueID  (new)   0   NA  

fname:  old  to  new  Main  

valueID  (old)   0   1   2   valueID  (new)   0   1   2  

city:  Delta  to  Main  

valueID  (Delta)   0   1   valueID  (Main)   0   1  

fname:  Delta  to  Main  

valueID  (Delta)   0   1   valueID  (Main)   1   2  

Mappings  

New  Combined  Main  Store  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   Potsdam  

2   Nadja  

(107)

Main  Store  

Delta  

Second Step:

New Main Column

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

recID   valid  

0   1  

1   1  

recID   fname   city  

0   1   1  

1   0   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

recID   valid  

0   1  

recID   fname   city  

0   0   0  

fname:  old  to  new  Main  

valueID  (old)   0   1   2  

fname:  Delta  to  Main  

valueID  (Delta)   0   1  

Mappings  

city:  old  to  new  Main    

valueID  (old)   0   1  

city:  Delta  to  Main  

valueID  (Delta)   0   1   valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

(108)

Main  Store  

Delta  

Second Step:

New Main Column

valueID   fname   city  

0   Albert   Berlin   1   Michael   Potsdam   2   Nadja   recID   valid   0   1   1   1   2   1  

recID   fname   city  

0   0   0  

1   2   1  

2   1   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city   recID   fname   city   recID   valid  

Dic'onaries   ATribute  Vectors   Validity  Vector  

(109)

Chapter 5:

Implications on

(110)

Learning Map of our

Online Lecture @ openHPI.de

Founda'ons  for  a   New  Enterprise   Applica'on   Development  Era   Founda'ons  of   Database  Storage   Techniques   The  Future  of  

Enterprise   Compu'ng   Advanced   Database   Storage   Tech-­‐ niques   In-­‐Memory   Database   Operators  

(111)

How does it all come together?

1. Mixed Workload combining OLTP and analytic-style queries

■  Column-Stores are best suited for analytic-style queries

■  In-memory databases enable fast tuple re-construction

■  In-memory column store allows aggregation on-the-fly

2. Sparse enterprise data

■  Lightweight compression schemes are optimal

■  Increases query execution

■  Improves feasibility of in-memory databases

3. Mostly read workload

■  Read-optimized stores provide best throughput

■  i.e. compressed in-memory column-store

■  Write-optimized store as delta partition

to handle data changes is sufficient

Changed Hardware Advances in data processing (software) Complex Enterprise Applications Our focus 111  

Figure

Updating...

References

Updating...

Related subjects :