• No results found

LightKV: A Cross Media Key Value Store with Persistent Memory to Cut Long Tail Latency

N/A
N/A
Protected

Academic year: 2021

Share "LightKV: A Cross Media Key Value Store with Persistent Memory to Cut Long Tail Latency"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

LightKV: A Cross Media Key Value Store with Persistent

Memory to Cut Long Tail Latency

Shukai Han,

Dejun Jiang, Jin Xiong

Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences

(2)

Outline

ü

Background & Motivation

• Design

• Evaluation

• Conclusion

(3)

Key-Value Store

• Key-Value (KV) stores are widely deployed in data centers.

• KV stores are latency-critical applications.

(4)

Log-Structured Merge Tree (LSM-Tree)

Immutable MemTable MemTable

LOG

SSTable Level0 Level1 Level2 Level k-1 Level k

……

……

……

……

3.Flush

KV Pair

1.WAL

2.Write

KV data

(sorted)

Metedata

(bloom filter, index, etc)

SSTable Structure

4.Compaction

4

(5)

Limitations of Persistent KV Store

Inefficient indexing for cross-media

Immutable MemTable MemTable

LOG

SSTable Level0 Level1 Level2 Level k-1 Level k

……

……

……

……

4.Compaction

3.Flush

KV Pair

1.WAL

2.Write

4

4

ü On one hand, LSM-Tree adopts

skiplist to index in-memory data.

ü On the other hand, LSM-Tree

builds manifest files to record key range of each on-disk SSTable.

Read

R

ea

(6)

Limitations of Persistent KV Store

High write amplification

Immutable MemTable MemTable

LOG

SSTable Level0 Level1 Level2 Level k-1 Level k

……

……

……

……

4.Compaction

3.Flush

KV Pair

1.WAL

2.Write

4

4

Write log and data transfer between levels increase write amplification.

Write (read) amplification is defined as the ratio between the amount of data written to (read from) the underlying storage device and the amount of data requested by the user.

(7)

8.80 10.47 11.42 12.29 12.90 13.28 13.57 14.01 14.41 14.64 6.84 7.78 8.69 9.38 10.14 10.60 10.75 10.90 11.23 11.68 5.16 6.30 7.80 9.22 9.43 9.72 10.16 10.56 10.54 10.95 0 2 4 6 8 10 12 14 16 18 20 10G 20G 30G 40G 50G 60G 70G 80G 90G 100G w ri te a m pl ifi ca tio n LevelDB HyperLevelDB RocksDB

Limitations of Persistent KV Store

High write amplification

10X+

The write amplification of LSM-Tree can reach 10x, and with the continuous increase of data amount, the write amplification

(8)

Limitations of Persistent KV Store

Heavy tailed read latency under mixed workload

ü We first warm up LevelDB with

100 GB data.

ü We measure the average latency

as well as 99th and 99.9th

percentile read latencies every 10 seconds.

The maximum 99th and 99.9th percentile read latencies can reach

13 and 28 times than the average read latency.

t1: Run a mixed workload of randomly reading 50 GB existing data and

randomly inserting another 50 GB data.

(9)

Limitations of Persistent KV Store

Heavy tailed read latency under mixed workload

t2: Run read-only workload.

After the compaction finishes, the read tail latency is significantly reduced. Reducing write amplification is not only helpful for reducing the total write

amount of the disk, increasing system throughput, but also helping to reduce the read tail latency under mixed read and write loads.

(10)

1.NVM can persist data after power off

Non-Volatile Memory

• The first PM product, Intel Optane DC Persistent Memory (PM), was

announced [19] in April 2019.

2.The write latency of Optane DC PM is close to DRAM, while its read

latency is 3 to 4 times that of DRAM.

3.The write and read bandwidths of Optane DC PM are around 2GB/s and 6.5GB/s, which is about 1/8 and 1/4 that of DRAM separately.

• Non-Volatile Memories (NVMs) provide low latency and byte addressable

features.

(11)

Outline

Background & Motivation

ü

Design

• Evaluation

• Conclusion

(12)

LightKV System Overview

1.Radix Hash Tree

(RH-Tree)

index

Segment

……

flush

2.Persistent Write Buffer (PWB) SSTable ……

Partition1 Partition2 Partition N

……

……

compaction

3.Main Data Store

Persistent

Memory

DRAM

(13)

Challenges

• How does Radix Hash tree index KV items across media?

• How does Radix Hash tree balance performance and data

growth?

• How does Radix Hash tree conduct well-controlled data

compaction to reduce write amplification?

(14)

Radix Hash Tree Structure

Prefix Search Tree

HashTable HashTable

……

HashTable

[0,32] [128,255] [0,64] [96,255]

[64,255]

Prefix Search Tree

SSTable or Semgnet

K

V KV KV KV pointer

HashTable

signature cache kv offset

HashTable Bucket (64B)

(15)

RH-Tree split

IN1 LN1

[0,3]

……

IN1 LN1

[0,127]

……

IN1 LN1 [0,63] LN2 [64,127]

……

normal split

IN1

[0,3]

……

IN2 [0,127] [128,255]

level split

(16)

Linked hash leaf node

LN1

Segment1

index

LN1’

persist

SSTable

flush

index

LN2

Segment2

link

link

LN2

LN1’ Segment2 SSTable

index

index

stage1

LN1

index

Segment1

stage2

stage3

DRAM

PM

SSD

(17)

RH-Tree placement

Prefix Search Tree

Segment

……

Persistent Write Buffer SSTable …… …… ……

Radix Hashing Tree

……

……

DRAM

Persistent

Memory

SSD

……

……

……

……

……

(18)

Partition-based data compaction

S5 (1)

S5 (1)

S10 (1)

compaction

t1

t2

t3

t4

t5

t6

1

log

k

N

S21 (2)

Compaction Size (CS) is 4

S1 (0)

S2 (0)

S3 (0)

S4 (0)

S6 (0)

S7 (0)

S8 (0)

S9 (0)

S5 (1)

S10 (1)

S15 (1)

S110)

S12(0)

S13(0)

S14 (0)

S16 (0)

S17 (0)

S18(0)

S19(0)

S5 (1)

S10 (1)

S15 (1)

S20 (1)

(19)

Recovery

Prefix Search Tree

Segment

……

……

……

Persistent Write Buffer SSTable …… …… ……

Radix Hashing Tree

DRAM

Persistent

Memory

SSD

(20)

Outline

Background & Motivation

• Design

ü

Evaluation

(21)

Experiment Setup

System and hardware configuration

– Two Intel Xeon Gold 5215 CPU (2.5GHZ), 64GB memory and one Intel DC

P3700 SSD of 400GB.

– CentOS Linux release 7.6.1810 with 4.18.8 kernel and use ext4 file system.

Compared systems

– LevelDB

RocksDB

– NoveLSM

SLM-DB

Workloads

– db_bench as microbenchmark

Workload YCSB Workload Description

A 50% reads and 50% updates B 95% reads and 5% updates C 100% reads

D 95% reads for latest keys and 5% inserts E 95% scan and 5% inserts

(22)

Reducing write amplification

LightKV are reduced by

7.1x, 5.1x, 2.9x and

2.3x

compared to that of LevelDB, RocksDB, NoveLSM, and SLM-DB respectively.

When the total amount of written data increases, the write amplification of LightKV remains stable (e.g. from 1.6 to 1.8 when the data amount increases from 50 GB to 100 GB).

(23)

Basic Operations

Thanks to the global index and partition compaction, LightKV can effectively

(24)

Basic Operations

The performance of LightKV in short range query is low. This is because it needs to search all SSTables in one or more partitions when performing a short range query.

(25)

Tail latency under read-write workload

Thanks to lower write amplification and global indexing, LightKV provides a lower

and stable read and write tail latency.

lower and stable

(26)

Results with YCSB

(27)

Outline

Background & Motivation

• Design

• Evaluation

(28)

Conclusion

• LSM-Tree based on traditional storage devices faces problems such as read-write amplification

• At the same time, the emergence of non-volatile memory provides opportunities and challenges for building efficient key-value storage systems

• In this paper, we propose LightKV a cross media key-value store with persistent memory. LightKV effectively reduces the read-write amplification of the system by establishing a RH-Tree and adopting a column-based partition compaction.

• The experiment results show that LightKV reduces write amplification by up to 8.1x and improves read performance by up to 9.2x. It also reduces read tail latency by up to 18.8x under read-write mixed workload.

(29)

THANK YOU !

Q & A

(30)

Sensitivity analysis

As the maximum number of partitions increases, the read and write performance

of LightKV increases, but the NVM capacity consumption also increase.

(31)

Sensitivity analysis

As the compaction size increases, the merging frequency is reduced, and the write amplification is reduced, which is beneficial to improve the write performance, but is

References

Related documents

As shown in this study, loyalty to the organization resulting from merger or acquisition has different intensity level for employees in different hierarchical

This spell brings into being a shimmering, swordlike plane of force. The sword strikes at any opponent within its range, as the caster de- sires, at the end of each round, including

Proudhon would have agreed that key social relations of power are left unmolested by republican constitutionalism – indeed the latter is the enforcement of the former – but his

The INTEGRAL technology consists of software and a hardware shell with applications coupled to the grid to ICT networks and business systems in a tighter or looser

Greater noida authority and my complaint infrastructure limited are properties in cases against habitech panch tatva is to use cookies to entertain the claim made by greater noida..

public class XorSocket extends Socket { private final byte pattern;}. private InputStream in

A critical theory is ad­ dressed to the members of this particular social group in the sense that it describes their epistemic principles and their ideal of the 'good life' and

If the client’s transaction does not read any value that was writ- ten by the winning transactions for log position k, the client begins Step 1 of the commit protocol for log position