• No results found

Indexing Big Data. Michael A. Bender. ingest data ?????? ??? oy vey

N/A
N/A
Protected

Academic year: 2021

Share "Indexing Big Data. Michael A. Bender. ingest data ?????? ??? oy vey"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

30,000 Foot View of Databases

oy vey

???

??? ???

organize data on disks query your data

365

42

ingest data

Big data problem

Indexing Big Data

Michael A. Bender

(2)

30,000 Foot View of Databases

oy vey

???

??? ???

organize data on disks query your data

365

42

ingest data

Big data problem

Indexing Big Data

Michael A. Bender

Goal: Index big data so that it can be queried quickly.

(3)

Setting of Talk-- “Type II streaming”

Type II streaming: Collect and store streams of data to answer queries. Data is stored on disks.

• Dealing with disk latency has similarities to dealing with network latency.

We’ll discuss indexing.

• We want to store data so that it can be queried efficiently.

(4)

Setting of Talk-- “Type II streaming”

Type II streaming: Collect and store streams of data to answer queries. Data is stored on disks.

• Dealing with disk latency has similarities to dealing with network latency.

We’ll discuss indexing.

• We want to store data so that it can be queried efficiently.

• To discuss: relationship with graphs.

Sorry it didn’t render.

(5)

For on-disk data, one traditionally sees funny

tradeo

s in the speeds of data ingestion, query

speed, and freshness of data.

oy vey

??? ??? ???

organize data on disks query your data

365

42 ingest data

(6)

Don’t Thrash: How to Cache Your Hash in Flash

data indexing query processor

queries + answers ??? 42 data ingestion

Funny tradeoff in ingestion, querying, freshness

• “Select queries were slow until I added an index onto the timestamp field... Adding the index really helped our reporting, BUT now the inserts are taking forever.”

‣ Comment on mysqlperformanceblog.com

• “I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it.”

‣ MySQL bug #9544

• “They indexed their tables, and indexed them well, And lo, did the queries run quick!

But that wasn’t the last of their troubles, to tell– Their insertions, like treacle, ran thick.”

(7)

Don’t Thrash: How to Cache Your Hash in Flash

data indexing query processor

queries + answers ??? 42 data ingestion

Funny tradeoff in ingestion, querying, freshness

• “Select queries were slow until I added an index onto the timestamp field... Adding the index really helped our reporting, BUT now the inserts are taking forever.”

‣ Comment on mysqlperformanceblog.com

• “I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it.”

‣ MySQL bug #9544

• “They indexed their tables, and indexed them well, And lo, did the queries run quick!

But that wasn’t the last of their troubles, to tell– Their insertions, like treacle, ran thick.”

(8)

Don’t Thrash: How to Cache Your Hash in Flash

data indexing query processor

queries + answers ??? 42 data ingestion

Funny tradeoff in ingestion, querying, freshness

• “Select queries were slow until I added an index onto the timestamp field... Adding the index really helped our reporting, BUT now the inserts are taking forever.”

‣ Comment on mysqlperformanceblog.com

• “I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it.”

‣ MySQL bug #9544

• “They indexed their tables, and indexed them well, And lo, did the queries run quick!

But that wasn’t the last of their troubles, to tell– Their insertions, like treacle, ran thick.”

(9)

Tradeoffs come from different ways to organize data on disk

Like a

librarian?

(10)

Tradeoffs come from different ways to organize data on disk

Like a

librarian?

Fast to find stu

.

Slow to add stu

.

(11)

Tradeoffs come from different ways to organize data on disk

Like a

teenager?

Like a

librarian?

Fast to find stu

.

Slow to add stu

.

(12)

Tradeoffs come from different ways to organize data on disk

Like a

teenager?

Like a

librarian?

Fast to find stu

.

Slow to add stu

.

“Indexing”

Fast to add stu

.

Slow to find stu

.

(13)

This talk: we don’t

need tradeoffs

Write-optimized data

structures:

Faster indexing

(10x-100x)

Faster queries

Fresh data

These structures

efficiently scale to very

big data sizes.

Fractal-tree® index LSM tree Bɛ-tree 8

(14)

Our algorithmic work appears in two commercial products

Tokutek’s high-performance MySQL and MongoDB.

File System MySQL Database -- SQL processing, -- query optimization

Application

libFT

Disk/SSD

TokuDB

File System Standard MongoDB -- drivers,

-- query language, and -- data model

Application

libFT

(15)

Our algorithmic work appears in two commercial products

Tokutek’s high-performance MySQL and MongoDB.

File System MySQL Database -- SQL processing, -- query optimization

Application

libFT

Disk/SSD

TokuDB

File System Standard MongoDB -- drivers,

-- query language, and -- data model

Application

libFT

TokuMX

The

Fractal Tree

engine

implements the

persistent structures

for storing data on disk.

(16)

Write-Optimized Data

Structures

(17)

Write-Optimized Data

Structures

Would you like

them with an

algorithmic

performance

model?

(18)

Don’t Thrash: How to Cache Your Hash in Flash

How computation works:

• Data is transferred in blocks between RAM and disk.

• The number of block transfers dominates the running time.

Goal: Minimize # of block transfers

• Performance bounds are parameterized by block size B, memory size M, data size N.

An algorithmic performance model

Disk

RAM

B

B

M

[Aggarwal+Vitter ’88] 11

(19)

Memory and disk access times

Disks: ~6 milliseconds per access.

(20)

Memory and disk access times

Disks: ~6 milliseconds per access.

RAM: ~60 nanoseconds per access

Analogy:

disk = elevation of Sandia peak

(21)

Memory and disk access times

Disks: ~6 milliseconds per access.

RAM: ~60 nanoseconds per access

Analogy:

disk = walking speed of the giant tortoise (0.3mph)

RAM = escape velocity from earth (25,000 mph)

Analogy:

disk = elevation of Sandia peak

(22)

The traditional data structure for disks is the B-tree

(23)

The traditional data structure for disks is the B-tree

Adding a new datum to an N-element B-tree

uses O(log

B

N) block transfers in the worst case.

(Even paying one block transfer is too expensive.)

(24)

Don’t Thrash: How to Cache Your Hash in Flash

Write-optimized data structures perform better

If

B

=1024

, then insert speedup is

B

/log

B

100

.

Hardware trends mean bigger

B

, bigger speedup.

Less than 1 I/O per insert.

B-tree Some write-optimized structures Insert/delete

O

(log

B

N

)=

O

(

logN

)

O

(

)

logB logBN

Data structures: [O'Neil,Cheng, Gawlick, O'Neil 96], [Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook 00],

[Argel 03], [Graefe 03], [Brodal, Fagerberg 03], [Bender, Farach,Fineman,Fogel, Kuszmaul, Nelson’07], [Brodal, Demaine, Fineman, Iacono, Langerman, Munro 10], [Spillane, Shetty, Zadok, Archak, Dixit 11].

(25)

Don’t Thrash: How to Cache Your Hash in Flash

Optimal Search-Insert Tradeoff

[Brodal, Fagerberg 03]

insert

point query

Optimal

tradeoff

(function of

ɛ

=0...1)

B-tree

(

ɛ

=1)

O

log

B

N

p

B

O

(log

B

N

)

O

(log

B

N

)

ɛ

=

1/2

O

log

N

B

O

(log

N

)

ɛ

=

0

O

log

1+B"

N

O

log

1+B"

N

B

1 "

O

(log

B

N

)

10x-100x fa ster ins er ts

(26)

Don’t Thrash: How to Cache Your Hash in Flash

Illustration of Optimal Tradeoff

[Brodal, Fagerberg 03]

Inserts

Po

int

Q

ueri

es

Fast

Slow

Sl

ow

Fast

Logging B-tree Logging Optimal Curve

(27)

Don’t Thrash: How to Cache Your Hash in Flash

Illustration of Optimal Tradeoff

[Brodal, Fagerberg 03]

Inserts

Po

int

Q

ueri

es

Fast

Slow

Sl

ow

Fast

Logging B-tree Logging Optimal Curve Insertions improve by 10x-100x with almost no loss of

point-query performance

(28)

Don’t Thrash: How to Cache Your Hash in Flash

Performance of write-optimized data structures

MongoDB

MySQL

Write performance on large data

16x faster >100x faster

(29)

Don’t Thrash: How to Cache Your Hash in Flash

Performance of write-optimized data structures

MongoDB

MySQL

Write performance on large data

16x faster >100x faster

(30)

How to Build

Write-Optimized Structures

(31)

How to Build

Write-Optimized Structures

(32)

Don’t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure

O(log

N

) queries and O((log

N

)/

B

) inserts:

A balanced binary tree with bu

ers of size

B

Inserts + deletes:

Send insert/delete messages down from the root and store

them in bu

ers.

When a bu

er fills up, flush.

(33)

Don’t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure

O(log

N

) queries and O((log

N

)/

B

) inserts:

A balanced binary tree with bu

ers of size

B

Inserts + deletes:

Send insert/delete messages down from the root and store

them in bu

ers.

When a bu

er fills up, flush.

(34)

Don’t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure

O(log

N

) queries and O((log

N

)/

B

) inserts:

A balanced binary tree with bu

ers of size

B

Inserts + deletes:

Send insert/delete messages down from the root and store

them in bu

ers.

When a bu

er fills up, flush.

(35)

Don’t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure

O(log

N

) queries and O((log

N

)/

B

) inserts:

A balanced binary tree with bu

ers of size

B

Inserts + deletes:

Send insert/delete messages down from the root and store

them in bu

ers.

When a bu

er fills up, flush.

(36)

Don’t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure

O(log

N

) queries and O((log

N

)/

B

) inserts:

A balanced binary tree with bu

ers of size

B

Inserts + deletes:

Send insert/delete messages down from the root and store

them in bu

ers.

When a bu

er fills up, flush.

(37)

Don’t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure

O(log

N

) queries and O((log

N

)/

B

) inserts:

A balanced binary tree with bu

ers of size

B

Inserts + deletes:

Send insert/delete messages down from the root and store

them in bu

ers.

When a bu

er fills up, flush.

(38)

Don’t Thrash: How to Cache Your Hash in Flash

Analysis of writes

An insert/delete costs amortized O((log

N)

/B) per

insert or delete

A bu

er flush costs O(1) & sends

B

elements down one

level

It costs O(1/

B

) to send element down one level of the tree.

There are O(log

N

) levels in a tree.

(39)
(40)
(41)

Don’t Thrash: How to Cache Your Hash in Flash

Analysis of point queries

To search:

examine each bu

er along a single root-to-leaf path.

This costs O(log

N

).

(42)

Don’t Thrash: How to Cache Your Hash in Flash

Obtaining optimal point queries + very fast inserts

Point queries cost O(log

B

N)= O(log

B

N)

This is the tree height.

Inserts cost O((log

B

N)/

B

)

Each flush cost O(1) I/Os and flushes

B

elements.

B

B

...

fanout: √B

(43)

Don’t Thrash: How to Cache Your Hash in Flash

Powerful and Explainable

Write-optimized data structures are very

powerful.

They are also not hard to teach in a standard

algorithms course.

(44)

Don’t Thrash: How to Cache Your Hash in Flash

What the world looks like

Insert/point query asymmetry

Inserts can be fast: >50K high-entropy writes/sec/disk.

Point queries are necessarily slow: <200 high-entropy reads/

sec/disk.

We are used to reads and writes having about the

same cost, but writing is easier than reading.

Reading is hard. Writing is easier.

(45)

Don’t Thrash: How to Cache Your Hash in Flash

The right read optimization is write optimization

The right index makes queries run fast.

Write-optimized structures maintain indexes e

ciently.

data indexing query processor

queries

???

42 answers data ingestion 28

(46)

Don’t Thrash: How to Cache Your Hash in Flash

The right read optimization is write optimization

The right index makes queries run fast.

Write-optimized structures maintain indexes e

ciently.

Fast writing is a currency we use to accelerate

queries. Better indexing means faster queries.

data indexing query processor

queries

???

42 answers data ingestion 28

(47)

Don’t Thrash: How to Cache Your Hash in Flash

The right read optimization is write optimization

MongoDB TokuMX

Adding more indexes

leads to faster queries.

If we can insert faster, we can afford to maintain more

indexes (i.e., organize the data in more ways.)

query I/O load on a production server

Index maintenance has been notoriously slow.

(48)
(49)
(50)

Summery Slide

The right read optimization is write optimization.

(51)

Summery Slide

We don’t need to trade off ingestion speed, query speed, and data freshness.

The right read optimization is write optimization.

(52)

Summery Slide

How do write-optimized data

structures help us with graph queries?

We don’t need to trade off ingestion speed, query speed, and data freshness.

The right read optimization is write optimization.

(53)

Summery Slide

How do write-optimized data

structures help us with graph queries?

We don’t need to trade off ingestion speed, query speed, and data freshness.

The right read optimization is write optimization.

Can we use write-optimization on large Sandia machines?

There are similar challenges with network and I/O latency.

Figure

Illustration of Optimal Tradeoff  [Brodal, Fagerberg 03]
Illustration of Optimal Tradeoff  [Brodal, Fagerberg 03]

References

Related documents

In other words, we compared whether old/recent articles with high/low citation count and/or the search term occurring (frequently) in the title, full text and/or

Therefore, various laboratory equipment used in learning media with the help of ICT can be developed simulation application.. Particularly in the field of

In view of the node deployment of WSN and coverage problem of operation and maintenance optimization, the genetic algorithm is used to adjust the dormancy and energy state of

We have shown that non-motor symptoms are extremely common in Parkinson’s disease in Pakistani population and the frequency of symptoms is similar in mild and severe disease

Our empirical findings imply that despite their potential economic benefits, family ownership, management, and intra‐family succession expectations reduce the probability

We gave a comparative simulation result that showed a similar correlation structure to that seen for measured Internet traffic, which might be expected to arise in very large

Because plant species have different fungal communities, a change in the plant community composition will lead to a change in the overall fungal community even if there is no change