• No results found

Turbocharging the DBMS Buffer Pool using an SSD

N/A
N/A
Protected

Academic year: 2021

Share "Turbocharging the DBMS Buffer Pool using an SSD"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison1/20

Turbocharging the DBMS Buffer Pool using an SSD

Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, Alan Halverson

(2)

Memory Hierarchy

DRAM

Disk HDD

For over three decades…

Now: a disruptive change…

??

SSD wisdom:

- Store hot data.

- Store data with random-I/O access.

Fast random I/Os; but expensive.

Cache

(3)

Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison3/20

Take Home Message

• Use an SSD to extend the Buffer Pool.

• Implemented in Microsoft SQL Server 2008R2.

• Evaluated with TPC-C, E, and H.

• Up to 9X speedup.

(4)

Prior Art

• [Holloway09] A. L. Holloway. Chapter 4: Extending the Buffer Pool with a Solid State Disk. In Adapting Database Storage for New Hardware, UW-Madison Ph.D. thesis, 2009.

• [KV09] Koltsidas and Viglas. The Case for Flash-Aware Multi- Level Caching. University of Edinburgh Technical Report, 2009.

• [KVSZ10] B. M. Khessib, K. Vaid, S. Sankar, and C. Zhang. Using Solid State Drives as a Mid-Tier Cache in Enterprise Database OLTP Applications. TPCTC’10.

• [CMB+10] M. Canim, G. A. Mihaila, B. Bhattacharjee, K. A.

Ross, and C. A. Lang. SSD Bufferpool Extensions for Database Systems. In VLDB’10.

State-of-the-art:

(5)

Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison5/20

Research Issues

• Page flow

• SSD admission policy

• SSD replacement policy

• Implication on checkpoint

(6)

Implemented Designs

• Temperature-Aware Caching (TAC)

• Dual-Write (DW)

• Lazy-Cleaning (LC)

(7)

Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison7/20

Page Flow

BP Operations:

read  evict  read  modify  evict

TAC writes a clean page to the SSD right after reading from the disk.

C

Buffer pool

Disk SSD BP

C C

C

Buffer pool

Disk SSD BP

C

C

Buffer pool

Disk SSD BP

C

TAC Dual-Write Lazy-Cleaning

(8)

TAC Dual-Write Lazy-Cleaning

Page Flow

BP Operations:

read  evict  read  modify  evict

C

Buffer pool

Disk SSD BP

C C

C

Buffer pool

Disk SSD BP

C

C

Buffer pool

Disk SSD BP

C

DW/LC writes a clean page to the SSD upon eviction from BP.

C C

(9)

Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison9/20

TAC Dual-Write Lazy-Cleaning

Page Flow

BP Operations:

read  evict  read  modify  evict

C

Buffer pool

Disk SSD BP

C C

C

Buffer pool

Disk SSD BP

C

C

Buffer pool

Disk SSD BP

C

Read from the SSD: same for all.

C C

(10)

C C

C

TAC Dual-Write Lazy-Cleaning

Page Flow

BP Operations:

read  evict  read  modify  evict

Buffer pool

Disk SSD BP

C C

Buffer pool

Disk SSD BP

C

Buffer pool

Disk SSD BP

C

Upon dirtying a page, TAC does not reclaim the SSD frame.

C C

I

D D D

I I

(11)

11/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

TAC Dual-Write Lazy-Cleaning

D

D D D

I

Page Flow

BP Operations:

read  evict  read  modify  evict

Buffer pool

Disk SSD BP

C

Buffer pool

Disk SSD BP

Buffer pool

Disk SSD BP

Upon evicting a dirty page:

- TAC and DW are write through;

- LC is write back.

I C

Lazy cleaning

C

(12)

SSD Admission/Replacement Policies

• TAC

– Admission: if warmer than the coldest SSD page.

– Replacement: the coldest page.

• DW/LC

– Admission: if loaded from disk using a random I/O.

– Replacement: LRU2.

(13)

13/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

Implication on Checkpoint

• TAC/DW

– No change, because every page in the SSD is clean.

• LC

– Needs change, to handle the dirty pages in the SSD.

(14)

Experimental Setup

Configuration Machine HP Proliant DL180 G6 Server

Processor IntelÂŽ XeonÂŽ L5520 2.27GHz (dual quad core) Memory 20GB

Disks 8X SATA 7200RPM 1TB

SSD 140GB Fusion ioDrive 160 SLC OS Windows Server 2008 R2

(15)

15/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

TPC-C

100GB200GB400GB 0

2 4 6 8 10

TAC DW LC

Speedup Relative to

noSSD

Q: Why is LC so good?

A: Because TPC-C is update

intensive. In LC, dirty pages in the SSD are frequently re-

referenced.

83% of the SSD references are to dirty SSD pages.

LC is 9X better than noSSD, or 5X better than DW/TAC.

(16)

TPC-E

100GB200GB400GB 0

2 4 6 8 10

TAC DW LC

Speedup Relative to

noSSD

Q: Why do the three designs have similar speedups?

A: Because TPC-E is read intensive.

Q: Why does the highest speedup occur for 200GB database?

A: For 400GB, a smaller fraction of data is cached in the SSD;

For 100GB, a larger fraction of data is cached in the

memory BP.

(17)

17/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

TPC-H

45GB 160GB 0

2 4 6 8 10

TAC DW LC

Speedup Relative to

noSSD

Q: Why are the speedups smaller than in C or E?

A: Because most I/Os are sequential.

For random I/Os: Fusion is 10X faster;

For sequential I/Os: 8x disks are 1.4X faster.

(18)

Disks are the Bottleneck

As long as disks are the bottleneck…

8 Disks

0 5 10 15 20 25 30 35 40 45 50

read write

Time (hours)

I/O Bandwidth (MB/s)

0 10 20 30 40 50

readwrite

Time (hours)

I/O Bandwidth (MB/s)

SSD

capacity reached!

about half capacity

I/O traffic to the disks and SSD, for TPC-E 200GB.

(19)

19/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

Long Ramp-up Time

If restarts are frequent…

Restart from the SSD may reduce rampup time.

0.00 1.00

2.00 3.00

4.00 5.00

6.00 7.00

8.00 9.00

10.00 0

20 40 60 80 100

120 TAC DW LC

Time (#hours)

(# trans/ sec)tpsE

TPC-E (200GB) Q: Why does rampup take 10 hours?

A: Because the SSD is being filled

slowly, gated by the random read speed of the disks.

(20)

Conclusions

• SSD buffer pool extension is a good idea.

– We observed a 9X speedup (OLTP) and a 3X speedup (DSS).

• The choice of design depends on the update frequency.

– For update-intensive (TPC-C) workloads: LC wins.

– For read-intensive (TPC-E or H) workloads: DW/LC/TAC have similar performance.

• Mid-range SSDs may be good enough.

– With 8 disks, only half of FusionIO’s bandwidth is used.

• Caution: rampup time may be long.

– If restarts are frequent, the DBMS should restart from the SSD.

(21)

21/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

Backup Slides

(22)

Architectural Change

Buffer Manager

I/O Manager

Disk

BP Buffer Manager

I/O Manager

Disk

SSD Manager

SSD BP

BP

(23)

23/20 Microsoft Jim Gray Systems Lab & University of Wisconsin, Madison

Data Structures

(24)

Further Issues

• Aggressive filling

• SSD throttle control

• Multi-page I/O request

• Asynchronous I/O handling

• SSD partitioning

• Gather write

References

Related documents

Rub the sore spot as you repeat the setup statement thrice Even though my logical mind doubts abundance I deeply and completely love and accept myself and I choose to trust that

Model parameters include the population prevalence of HIV (noted h), the fraction of PLWH who know that they are HIV-positive (d), the uptake of ART among diagnosed PLWH (a) and

1 part Three Olives Orange Vodka 1 part white crème de cacao Splash lemon-lime soda Shake with ice and strain into a chilled shot glass.

Here, we present a new reduction method for SIP, where the multi-local optimization is carried out with a multi-local branch-and-bound method, the reduced (nite) problem

For example, if a Natural program is newly cataloged in one z/OS image, the propagation will cause the program to be deleted from all other global buffer pools in the z/OS

For deployments using more physical memory and multiple SSDs to store a Buffer Pool Extension file that was the same size as the database, a significant performance increase can be

Agenda SAP Architecture Table Buffer Content Server liveCache BI Accelerator.. Summary Special Data Containers Future

A unique code for each race will be published and distributed via email from Cycling Victoria in the few days prior to the event. Test Event Codes for Sunday are listed