• No results found

Counting Problems in Flash Storage Design

N/A
N/A
Protected

Academic year: 2022

Share "Counting Problems in Flash Storage Design"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Counting Problems in Flash Storage Design

Counting Problems in Flash Counting Problems in Flash

Storage Design Storage Design

Bongki

Bongki Moon Moon

Department of Computer Science Department of Computer Science

University of Arizona University of Arizona Tucson, AZ 85721, U.S.A.

Tucson, AZ 85721, U.S.A.

Flash Talk

Flash Talk

Flash Talk

(2)

NVRAMOS’09, Jeju, Korea, October 2009 -2- COMPUTER SCIENCE DEPARTMENT

COMPUTER SCIENCE DEPARTMENT COMPUTER SCIENCE DEPARTMENT

Counting Problems Counting Problems Counting Problems

• • Counting is Counting is

  One of the fundamental problems, found in most One of the fundamental problems, found in most applications

applications

  The study of approximate counting has a long The study of approximate counting has a long history in computer science

history in computer science

• • Goal of this talk is Goal of this talk is

  Bring attention to the issues, provide a new angle Bring attention to the issues, provide a new angle to the design of flash storage systems

to the design of flash storage systems

(3)

Counting Problems in Flash Counting Problems in Flash Counting Problems in Flash

• • Wear Leveling Wear Leveling

  Keep track of block erase counts Keep track of block erase counts

  For a 128 GB SSD with 1 million blocks For a 128 GB SSD with 1 million blocks

•• A (sub)wordA (sub)word for each block requires 2~4 MB RAMfor each block requires 2~4 MB RAM

• • Garbage Collection Garbage Collection

  Keep track of # of valid pages in blocks Keep track of # of valid pages in blocks

•• For example, FMAX and FASTFor example, FMAX and FAST

  For a 128 GB SSD with 1 million blocks For a 128 GB SSD with 1 million blocks

•• A byte for each block requires 1 MB RAMA byte for each block requires 1 MB RAM

(4)

NVRAMOS’09, Jeju, Korea, October 2009 -4- COMPUTER SCIENCE DEPARTMENT

COMPUTER SCIENCE DEPARTMENT COMPUTER SCIENCE DEPARTMENT

Counting Problems in Flash Counting Problems in Flash Counting Problems in Flash

• • Hot / Cold Data Separation Hot / Cold Data Separation

 Keep track of update counts for all pages(?)Keep track of update counts for all pages(?)

 For a 128 GB SSD with 64 million pagesFor a 128 GB SSD with 64 million pages

A (sub)wordA (sub)word for each page requires 128~256 MB RAMfor each page requires 128~256 MB RAM

• • Counting Bloom Filter [Hsieh TOST’ Counting Bloom Filter [Hsieh TOST 06] 06]

 Approximate update frequencies by storing relative Approximate update frequencies by storing relative frequencies in a fixed number of fixed

frequencies in a fixed number of fixed--length bitlength bit--vectorsvectors

 AgingAging (divide(divide--byby--2) occurs when any counter overflows2) occurs when any counter overflows

 Accuracy will be degraded seriously at the presence of a Accuracy will be degraded seriously at the presence of a few very hot items, because aging kicks in too often

few very hot items, because aging kicks in too often

Almost impossible to separate cold data pages from warm onesAlmost impossible to separate cold data pages from warm ones

(5)

Approx Counting Approx Counting

• Numerous approaches

 Unbiased vs biased

 Memory use: linear vs. logarithmic

 …

• Log Count

• Lossy Couting

• AMS Sketch

(6)

NVRAMOS’09, Jeju, Korea, October 2009 -6- COMPUTER SCIENCE DEPARTMENT

Counting Heavy Hitters Counting Heavy Hitters

• Manku & Motwani [VLDB’02]

 Biased estimation of frequencies for heavy hitters

• For all heavy hitters (ftrue > ττττ = sN), without false negatives, the estimated frequency fest is found

• Always underestimates but with margin ≤ εεεεN, that is, fest ftrue

fest + εεεεN

• N = # of objects seen so far, 0 < εεεε << s < 1

• Sticky Sampling and Lossy Counting

 LC deems superior to SS, and uses O(log N) memory in the worst case

(7)

AMS Sketch AMS Sketch

• Alon, Matias and Szegdy [STOC’96]

 Randomized linear projection of a frequency vector F = (f1, …, fn) for a set of n objects in stream S (|Dom(S)| = n)

• Randomized linear projection

 fi denotes the frequency of i in S

 vi is a four-wise independent binary random variable

• vi = +1 or -1 with equal probability, i.e., Pr(vi = +1) = Pr(vi = -1) = ½

• Pr(4-tuple of vi = 4-tuple of {-1,+1}) = 1/16

(8)

NVRAMOS’09, Jeju, Korea, October 2009 -8- COMPUTER SCIENCE DEPARTMENT

Unbiased Estimation by AMS Unbiased Estimation by AMS

• To compute the sketch X

 Initialize X=0

 Add vi to X each time i appears in S

• X = fivi

• To estimate f

q

(the frequency of q)

 E(X

∙∙∙∙

vq) = E((f1v1 + … + fnvn) vq) = E(fq vq2) = fq

• Linearity of expectation

• E(vi2) = 1, because vi2 is either (-1)2 or (+1)2

• E(vivj) = 0 for i j, because Pr(vivj = +1)= Pr(vivj = -1)= ½

 Thus, X

∙∙∙∙

vq is an unbiased estimator of fq

(9)

How Accurate?

How Accurate?

• Not so accurate

 Var(X ∙∙∙∙ v

q

) ≤ ≤ ≤ ≤ |SJ(S)| = ∑ ∑ ∑ ∑ f

i2

• To improve the accuracy

 Maintain s

1

× × × × s

2

independent and identically distributed instances of X, say X

ij

 Compute s

2

(column-wise) averages of X

ij

∙∙∙∙ v

q

• Yj = avg(X1j∙∙∙∙ vq + … + Xs1,j∙∙∙∙ vq)/s1

 Then, take the median of them as an estimate

(10)

NVRAMOS’09, Jeju, Korea, October 2009 -10- COMPUTER SCIENCE DEPARTMENT

Memory Usage by AMS Memory Usage by AMS

s1 controls accuracy, and s2 controls confidence

 The values of s1 and s2 are determined by target error bound and confidence level

 Still, the total memory use is O(log|S| + log n)

(11)

Approx Counting for Wear Leveling Approx Counting for Wear Leveling

• AMS sketch adopted for wear leveling

 Store synopsis of block erase counts instead of actual counts, consuming much less memory

 Patent KR 10-0817204 [Min & Moon, March 2008]

(12)

NVRAMOS’09, Jeju, Korea, October 2009 -12- COMPUTER SCIENCE DEPARTMENT

Problem Solved?

Problem Solved?

• Far from it!

 AMS returns an approximate count for a query

 But it keeps neither a hot list nor a cold list

• Dynamic wear leveling

 Select a youngest block from a free list

 AMS may work well if the free list is short

• By submitting each block in the free list as a query

• Static wear leveling

 Need find blocks storing cold data

 AMS will be extremely slow!

• By submitting ALL blocks as a query

(13)

SW Leveler SW Leveler

• Chang et al. [DAC 2007]

 Use a bit vector (BET) to mark blocks erased during a certain period; BET is just a set of Boolean flags

 Space-efficient

• For an SSD of 128GB (1 M blocks), the size of BET is just 128KB/2k (for k ≥≥ 0)

• However, serious drawbacks

 Any block un-erased during the period can be randomly chosen as a cold data block; very inaccurate

 Memoryless, because BET is reset between the periods

• No long-term information on block erasures is accumulated

(14)

NVRAMOS’09, Jeju, Korea, October 2009 -14- COMPUTER SCIENCE DEPARTMENT

Hot Separation by AMS Hot Separation by AMS

• Hot data blocks can be identified by AMS easily

 AMS can handle deletions

 Heavy hitters can be managed in a heap separately

(15)

Hot Separation by AMS Hot Separation by AMS

• For low frequency items

 Accuracy of approximation improves without

heavy hitters

(16)

NVRAMOS’09, Jeju, Korea, October 2009 -16- COMPUTER SCIENCE DEPARTMENT

Static Wear Leveling Static Wear Leveling

• Still, a big question remains for Static WL

 Identify cold data blocks, not only the hot ones

 Biased (or underestimating) approximations are useless

• AMS + BET?

 When a triggering condition is satisfied, look for un- erased blocks from BET

 Inquire the AMS Sketch for the approximate erase count of each un-erased block

 Quickly find cold data blocks if the number of un-erased blocks is small; further evaluation is in order

(17)

Concluding remarks Concluding remarks

• TB-scale flash storage devices are on the horizon

 Scalable and economical designs are must

• Algorithmic innovations will make a difference

 That’s where real competitiveness comes from!

References

Related documents

Organizing Committee: Richard Lockhart, Simon Fraser University; Charmaine Dean, Western University; Peter Guttorp, University of Washington; Will Kleiber, Colorado; Bo Li,

Database Administration Concentration The Department of Computer Science and Information Technology offers a Bachelor of Science degree with a major in Computer Science

[2] Department of Computer Science and Engineering Academic Integrity Policy.

We included two dif- ferent levels of in-kind payments with variation in the delivery plus two kinds of cash payment: (a) annual cash payments in April each year; (b) monthly

This paper contributes to the literature by using a panel of semi-annual vintages of fiscal and macro forecasts of the European Commission (EC), as the measure of the expectations

Perhibere Testimonium Veritati.. Our Vision Our vision at All Hallows is to form happy, successful students who reach their full potential and leave the school with integrity and

Economic selection index development for Beefmaster cattle II: General-purpose breeding objective..

This workshop will bring together researchers, research service providers, academic leaders with strategic decision-making authority, funding agencies, and industry in