• No results found

Designing Access Methods:

N/A
N/A
Protected

Academic year: 2021

Share "Designing Access Methods:"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Designing  Access  Methods:  

The  RUM  Conjecture

Manos  Athanassoulis1,      Michael  S.  Kester1,      Lukas  M.  Maas1,  

Radu Stoica2,      Stratos Idreos1,      Anastasia  Ailamaki3,      Mark  Callaghan4 1Harvard  University              2IBM  Research              3EPFL              4Facebook  Inc.

(2)

memory  wall

storage  wall

(3)

3

(4)
(5)

Memory  Wall

5

(6)

Storage  Wall

HDD ü capacity ü sequential  access × random  access × latency  plateaus

SSD  (Single  Level  Cell)

ü random  reads ü low  latency × capacity × endurance

× read/write  asymmetry

SSD  (Multi  Level  Cell)

ü capacity

× endurance  (worse)  

HDD  (Shingled  Magnetic  Rec.)

ü capacity

(7)

every  byte  we  read  counts

every  byte  we  allocate counts

every  byte  we  read  counts every  byte  we  update  counts every  byte  we  allocate counts

7

(8)
(9)

we  build  

access  methods

(10)

...  since  a  long  time  ago!

Can  we  stop  worrying  about

building  access  methods?

(11)

why  do  we  keep  building?

Tree Index Proj ec tio n Ha sh  In de x

every  access  method …  is  optimizing  for  the  tradeoff  between

Reads

Updates

Memory

(12)

The  RUM  Conjecture

every  access  method  has  a  (quantifiable) • read  overhead

update  overhead • memory  overhead

the  three  of  which  form  a  competing  triangle

Read

Update Memory

we  can  optimize  for  two  of  

the  overheads  at  the  

expense  of  the  third

max min

min min

(13)

The  RUM  Conjecture

every  access  method  has  a  (quantifiable) • read  overhead

update  overhead • memory  overhead

the  three  of  which  form  a  competing  triangle

Read

Update Memory

we  can  optimize  for  two  of  

the  overheads  at  the  

expense  of  the  third

13 read-­‐optimized max min min min

(14)

The  RUM  Conjecture

every  access  method  has  a  (quantifiable) • read  overhead

update  overhead • memory  overhead

the  three  of  which  form  a  competing  triangle

Read

Update Memory

we  can  optimize  for  two  of  

the  overheads  at  the  

expense  of  the  third

update  &  memory optimized

max min

min min

(15)

The  RUM  Conjecture

every  access  method  has  a  (quantifiable) • read  overhead

update  overhead • memory  overhead

the  three  of  which  form  a  competing  triangle

Read

Update Memory

we  can  optimize  for  two  of  

the  overheads  at  the  

expense  of  the  third

15 memory-­‐optimized min min min max

(16)

what  would  be  an  

optimal  read

behavior?

X data

read(X)

oracle

read(x) accesses  only  the  bytes  of  object  X

R

(17)

what  would  be  an  

optimal  read  

behavior?

X data

read(X)

oracle

read(x) accesses  only  the  bytes  of  object  X how  free  can  an  oracle  be?

17

R

U M

(18)

what  would  be  an  

optimal  read  

behavior?

1 3 4 5 8

unique,  positive  integers

(19)

what  would  be  an  

optimal  read  

behavior?

19

1 3 4 5 8

(20)

what  would  be  an  

optimal  read  

behavior?

1 2 4 5 8

insert  2

delete  8

minimum  read  overhead

bound  update  overhead

update  4  -­‐>  3

(21)

what  would  be  an  

optimal  read  

behavior?

21

1 2 4 5 8 17

insert  17

minimum  read  overhead

bound  update  overhead

unbounded  memory  overhead

3

(22)

X

what  would  be  an  

optimal  update  

behavior?

data

always  append,  and  invalidate  on  update

R ? ? A B X X Always  scan

(23)

X

what  would  be  an  

optimal  update  

behavior?

data

always  append,  and  invalidate  on  update

23 R U M ? ? A B X X Always  scan

update  (X)  changes  the  minimal  number  of  bytes

C D

(24)

what  would  be  an  

optimal  memory  

overhead?

X data scan  and  find

scan  and  in-­‐place  updates

R

?

?

(25)

what  would  be  an  

optimal  memory  

overhead?

X data scan  and  find

scan  and  in-­‐place  updates

25

R

U M

?

?

no  metadata  whatsoever,  would  result  in  the  smallest  memory  footprint

do  we  need  to  reach  the  optimal(s)?

No!

(26)

a  tight  column:

reads have  to  scan

no  memory  overhead

in-­‐place  updates and  efficient  inserts

RUM  Conjecture:  an  example

8 2 1 7 6 9 3

R

(27)

a  tight  column:

reads have  to  scan

no  memory  overhead

in-­‐place  updates and  efficient  inserts

RUM  Conjecture:  an  example

8 2 1 7 6 9 3 a  tight  sorted  column:

very  efficient  reads (logarithmic  search)no  memory overhead

updates  &  inserts  reorganization

1 2 3 6 7 8 9

27

R

U M U M

(28)

a  tight  column:

reads have  to  scan

no  memory  overhead

in-­‐place  updates and  efficient  inserts

adding  clustering:

efficient  reads

small  memory overhead

updates &  inserts:  reorganization

RUM  Conjecture:  an  example

8 2 1 7 6 9 3 a  tight  sorted  column:

very  efficient  reads (logarithmic  search)no  memory overhead

updates  &  inserts  reorganization

1 2 3 6 7 8 9 R U M U M R 2 1 3 7 6 9 8 U M R

(29)

a  tight  column:

reads have  to  scan

no  memory  overhead

in-­‐place  updates and  efficient  inserts

adding  clustering:

efficient  reads

small  memory overhead

updates &  inserts:  reorganization

RUM  Conjecture:  an  example

8 2 1 7 6 9 3

2 1 3 7 6 9 8

a  tight  sorted  column:

very  efficient  reads (logarithmic  search)no  memory overhead

updates  &  inserts  reorganization

…  and  ghost  values:

efficient  reads

small  memory overhead  (but  increased)

updates:  reorganization  (but  inserts  for  free)

1 2 3 6 7 8 9 29 R U M U M R U M R 2 1 3 7 6 9 8 U M R

(30)

RUM-­‐aware  access  methods

Can  we

…  add  flexibility to  existing  access  methods? ...  have  arbitrary  RUM  balance?

(31)

some  active  research  directions

üclassify  existing  access  methods  [Tutorial  -­‐ SIGMOD2016]

üadd  more  metadata,  to  optimize  for  updates/reads o Bitmap  indexing  [SIGMOD2016] and  LSM-­‐Trees  in  the  works

üpre-­‐partition  data:  minimize  both  read  and  update  cost

üshape-­‐shifting  index:  match  workload  without  offline  optimizations

(32)

Thanks!

http://daslab.seas.harvard.edu/rum/

daslab.seas.harvard.edu

References

Related documents

a chronic condition or pre- palliative state Observations may be delayed due to night-time ward management reasons Level of compliance with scheduled observation is used as

The IDMEF standard relies on the communication protocol for integrity and authentication. There is a long and diverse discussion within the groups concerning the IODEF standard, in

A limited number of tickets are available at $65 each for the pre-show talk with James Dunn and reception, including musical performances, hors d’oeuvres, preferred seating at

[r]

At the Issues and options consultation stage of the City Centre Action Plan the Council notified the organisations and individuals within the following 3 groups which it considers

Which of the following does not belong to type of competency area in cloud business analytics?. Business intelligence and

A mathematically equivalent formula (McCulloch 1996a) has already been used by McCulloch (1985, 1987) and Hales (1997) to evaluate options on bonds and foreign exchange rates,

Results from experiments using externally generated packets will show the actual performance of the CBE-based firewall.. The network topology consisted of the PS3, a 1 Gbps switch, and