• No results found

Implementation and Challenging Issues of Flash-Memory Storage Systems

N/A
N/A
Protected

Academic year: 2021

Share "Implementation and Challenging Issues of Flash-Memory Storage Systems"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Implementation and Challenging Issues of Flash-Memory Storage

Systems

Department of Computer Science &

Information Engineering National Taiwan University

Tei-Wei Kuo

Agenda

Introduction

Management Issues

Performance vs Overheads

Other Challenging Issues

Conclusion

(2)

3/29/2007 Embedded Systems and Wireless Networking Lab. 3

Introduction – Why Flash Memory

Diversified Application Domains

Portable Storage Devices Consumer Electronics Industrial Applications

SoC and Hybrid Devices

Critical System Components

Trends in VLSI Technology

(3)

3/29/2007 Embedded Systems and Wireless Networking Lab. 5

Introduction – Trends in Storage Technology

Source: Using multilevel cell NAND flash technology in consumer applications, Electronic Engineering Times, July ,2005

Samsung 2GB USB 2.0 Flash Drive

Price: $49.99

Less Rebate: - $25.00

Final Price: $24.99*

T-One 2GB

Microdrive/3600RPM

$144.99

September 2006

Introduction – Trends in Storage Technology

Source: Using multilevel cell NAND flash technology in consumer applications,

Transcend 8GB CompactFlash Card Price: $84.85

Microdrive 4GB Compact Flash Type II

ScanDisk 4GB CompactFlash Card Price: $55.99

March 2007

(4)

3/29/2007 Embedded Systems and Wireless Networking Lab. 7

The Ultimate Limit on Mechanical Devices

Fly By Night

Boeing

747

2,000,000 Miles Per Hour

1/100” Flying Height

Source: Richard Lary, The New Storage Landscape: Forces shaping the storage economy, 2003.

Source:

http://www.hitachigst.com/

* This slide was from the ASP-DAC’06 talk delivered by Prof. Sang L. Min from the Seoul National University.

uA Microdrive Example

Introduction – The Characteristics of Storage Media

[Reference] DRAM:2-2-2 PC100 SDRAM. NOR FLASH: Intel 28F128J3A-150.

NAND FLASH: Samsung K9F5608U0M. Disk: Segate Barracuda ATA II.

1

- 12.4 ms(512B)

(average) 12.4ms (512B)

(average)

DISK

2ms (16KB) 201us (1B)

226us (512B) 10.2us (1B)

35.9us (512B)

NAND FLASH

1.2s (16KB) 211us (1B)

3.52ms (512B) 150ns (1B)

14.4us (512B)

NOR FLASH

- 60ns (2B)

2.56us (512B) 60ns (2B)

2.56us (512B)

DRAM

Erase Write

Read

Access time Media

15X

100X

400X 50X

(5)

3/29/2007 Embedded Systems and Wireless Networking Lab. 9

Introduction – Single-Level Cell (SLC)

Selected cell I

DS

Control Gate

Drain

Source

Each Word Line is connected to control gates.

Each Bit Line is connected to the drain.

Cell

Introduction – Multi-Level Cell

(MLC) vs SLC

(6)

3/29/2007 Embedded Systems and Wireless Networking Lab. 11

1-bit/Cell SLC NAND Flash

100,000 Program/Erase cycles (with ECC) [1]

10 years Data Retention [1]

2-bits/Cell MLC NAND Flash

10,000 Program/Erase cycles (with ECC) [2]

10 years Data Retention [2]

4-bits/Cell MLC NAND FLASH Developers (2006)

M-systems, Intel, Samsung, and Toshiba

[1] ST Micro-electronics NAND SLC large page datasheet (NAND08GW3B2A) [2] ST Micro-electronics NAND MLC large page datasheet (NAND04GW3C2A)

* USD34.65 per GB for NOR, USD6.79 per GB for NAND

Comparison of SLC and MLC

Introduction – Consumer Applications

(7)

3/29/2007 Embedded Systems and Wireless Networking Lab. 13

Bandwidth Requirements – Video

ˇ ˇ

Electronic Engineering Times, July 2005

Bandwidth Requirements – Audio

ˇ

ˇ

(8)

3/29/2007 Embedded Systems and Wireless Networking Lab. 15

Introduction – Challenges in Flash- Memory Storage Designs

Requirements in Good Performance Limited Cost per Unit

Strong Demands in Reliability Increasing in Access Frequencies

Tight Coupling with Other Components Low Compatibility among Vendors

Agenda

Introduction

Management Issues

Performance vs Overheads

Other Challenging Issues

Conclusion

(9)

3/29/2007 Embedded Systems and Wireless Networking Lab. 17

Management Issues – System Architectures

AP

File-System Layer

AP AP

Block Device Layer (FTL emulation)

Flash Memory MTD drivers

AP

Management Issues – Flash-Memory Characteristics

Block 0 Block 1 Block 2 Block 3

Erase one block 1 Page = 512B

1 Block = 32 pages(16KB)

Write one

page

(10)

3/29/2007 Embedded Systems and Wireless Networking Lab. 19

Management Issues – Flash-Memory Characteristics

Write-Once

No writing on the same page unless its residing block is erased!

Pages are classified into valid, invalid, and free pages.

Bulk-Erasing

Pages are erased in a block unit to recycle used but invalid pages.

Wear-Leveling

Each block has a limited lifetime in erasing counts.

Management Issues – Flash-Memory Characteristics

Example 1: Out-place Update

Live pages Free pages

A B C D

Suppose that we want to update data A and B…

(11)

3/29/2007 Embedded Systems and Wireless Networking Lab. 21

Dead pages

A B C D A B

Management Issues – Flash-Memory Characteristics

Example 1: Out-place Update

A live page A dead page A free page This block is to be recycled.

(3 live pages and 5 dead pages)

L D D L D D L D

L L D L L L F D

L F L L L L D F

F L L F L L F D

Management Issues – Flash-Memory Characteristics

Example 2: Garbage Collection

(12)

3/29/2007 Embedded Systems and Wireless Networking Lab. 23

L L D L L L D

L F L L L L D

L L F L L F D

L L

D D D D

A live page A dead page A free page Live data are copied to somewhere else.

L

D D D

D

Management Issues – Flash-Memory Characteristics

Example 2: Garbage Collection

A live page A dead page A free page

The block is then erased.

Overheads:

•live data copying

•block erasing.

L L D L L L D

L F L L L L D

L L F L L F D

L L F F F F F F F F

L

Management Issues – Flash-Memory Characteristics

Example 2: Garbage Collection

(13)

3/29/2007 Embedded Systems and Wireless Networking Lab. 25

Management Issues – Flash-Memory Characteristics

Example 3: Wear-Leveling

L D D L D D L D

L L D L L L F D

L F L L L L D F

F L L F L L F D

100 10

20 15

Erase cycle counts

Wear-leveling might interfere with the decisions of the block- recycling policy.

A live page A dead page A free page

A

B

C

D

Management Issues – Challenges

The write throughput drops significantly after garbage collection starts!

The capacity of flash-memory storage systems increases very quickly such that memory space requirements grows quickly.

Reliability becomes more and more critical when the manufacturing capacity increases!

The significant increment of flash-memory

access rates seriously exaggerates the

Read/Program Disturb Problems!

(14)

3/29/2007 Embedded Systems and Wireless Networking Lab. 27

Agenda

Introduction

Management Issues

Performance vs Overheads – FTL vs NFTL

Other Challenging Issues Conclusion

System Architecture

Garbage Collection

Address Translation

FTL/NFTL Layer File system (FAT, EXT2, NTFS...)

Device Driver fwrite(file,data)

Block write (LBA,size)

Flash I/O Requests

Control signals

File Systems process process

process Applications

Flash-Memory Storage System

Physical Devices

(Flash Memory Banks)

process

(15)

3/29/2007 Embedded Systems and Wireless Networking Lab. 29

Management Issues – Flash- Memory Characteristics

*FTL: Flash Translation Layer, MTD: Memory Technology Device

Policies – FTL

FTL adopts a page-level address translation mechanism.

The main problem of FTL is on large memory space

requirements for storing the address translation information.

(16)

3/29/2007 Embedded Systems and Wireless Networking Lab. 31

Policies – NFTL (Type 1)

. . .

(9)

Write data to LBA=1011

. . .

NFTL

Address Translation Table (in main-memory)

Free Free Free Used Free Free Free Free

Free Free Free Free Free Free Free Free A Chain Block

Address = 9

A Chain Block Address = 23

VBA=126

Block Offset=3

If the page has been used Write to the page with block

offset=3

A logical address under NFTL is divided into a virtual block address and a block offset.

e.g., LBA=1011 => virtual block address (VBA) = 1011 / 8 = 126 and block offset = 1011 % 8 = 3

Free Free Free Free Free Free Free Free A Chain Block Address = 50

Used

Write to the page with block

offset=3

Policies – NFTL (Type 2)

A logical address under NFTL is divided into a virtual block address and a block offset.

e.g., LBA=1011 => virtual block address (VBA) = 1011 / 8 = 126 and block offset = 1011 % 8 = 3

. . .

(9,23)

Write data to LBA=1011

. .

NFTL

Address Translation Table (in main-memory)

Free Free Free Used Free Free

Used Used Used Free Free Free A Primary Block

Address = 9

A Replacement Block Address = 23

Block

If the page has been

Write to the used

(17)

3/29/2007 Embedded Systems and Wireless Networking Lab. 33

Policies – NFTL

NFTL is proposed for the large-scale NAND flash storage systems because NFTL adopts a block-level address translation.

However, the address translation

performance of read and write requests might deteriorate, due to linear searches of address translation information in primary and replacement blocks.

Policies – FTL or NFTL

Small Large

Memory Space Requirements

Long Short

Address Translation Time

More Less

Garbage Collection Overhead

Low High

Space Utilization

NFTL FTL

The Memory Space Requirements for one 1GB

NAND (512B/Page, 4B/Table Entry, 32 Pages/Block) FTL: 8,192KB (= 4*(1024*1024*1024)/512) NFTL: 256KB (= 4*(1024*1024*1024)/(512*32))

Remark: Each page of small-block(/large-block) SLC NAND can store 512B(/2KB) data, and there are 32(/64) pages per block.

(18)

3/29/2007 Embedded Systems and Wireless Networking Lab. 35

Address Translation Time - NFTL

The address translation performance of read and write requests can be deteriorated, due to linear searches of physical addresses.

1. Assume that each block contains 8 pages.

2. Let LBA A, B, C, D, and E be written for 5, 5, 1, 1, and 1 times, respectively. Their data distribution could be like to what in the left figure.

3. For example, it might need to scan 9 spare areas for LBA B.

Garbage Collection Overhead - NFTL

1. Copy the most-recent content to the new primary block.

2. Erase the old primary block and the replacement block.

(19)

3/29/2007 Embedded Systems and Wireless Networking Lab. 37

Space Utilization - NFTL

3 free pages are wasted.

Agenda

Introduction

Management Issues

Performance vs Overheads – An Adaptive Two-Level Mapping Mechanism

Other Challenging Issues

Conclusion

(20)

3/29/2007 Embedded Systems and Wireless Networking Lab. 39

Motivation

An adaptive two-level management design of a flash translation layer, called AFTL.

Exploit the advantages of the fine-grained address mechanism and the coarse-grained address mechanism.

Low More Long Small NFTL

Much Better than NFTL

High Space Utilization

Much Better than NFTL

Less Garbage Collection

Overhead

Much Better than NFTL

Short Address Translation

Time

A little larger than NFTL

Large Memory Space

Requirements

AFTL FTL

AFTL – Coarse-to-Fine Switching

PPBA RPBA

) , , (VBAPPBARPBA

1. AFTL doesn’t erase the two blocks immediately.

2. AFTL moves the mapping information of the replacement block to the fine-grained hash table by adding fine- grained slots.

+5 RPBA

+7 RPBA

) 1 , , (VBAPPBA

3. The RPBA field of the

corresponding mapping

information is nullified.

(21)

3/29/2007 Embedded Systems and Wireless Networking Lab. 41

AFTL – Fine-to-Coarse Switching

The number of the fine-grained slots is limited.

Some least recently used mapping information of fine-grained slots should be moved to the coarse-grained hash table.

PPBA RPBA

) , , (VBAPPBARPBA

+5 RPBA

+7 RPBA

1. Assume that this fine-grained slot is to be replaced.

2. Data stored in the page with the given (physical) address are copied to the primary or replacement block of the corresponding coarse-grained slot, as defined by NFTL.

3. If there dose not exist any corresponding coarse-grained slot, a new one is created.

(F, PBAE)

AFTL – Fine-to-Coarse Switching

Coarse-to-fine switches would introduce fine-to- coarse switches and overhead in valid page copying.

It is because the number of the fine-grained slots is limited.

Stop any coarse-to-fine switch when some frequency bound in coarse-to-fine switches is reached.

We set a parameter in the experiments to control the

frequency of switches to explore the behavior of the

proposed mechanism.

(22)

3/29/2007 Embedded Systems and Wireless Networking Lab. 43

The Advantages of AFTL

Improve the address translation performance.

It is because the moving of their mapping information to the fine-grained hash table.

Improve the garbage collection overhead.

The delayed recycling of any replacement block reduces the potential number of valid data copyings and blocks erased.

Improve the space utilization.

The delayed recycling of any primary block lets free pages of a primary block be likely used in the future.

PPBA cg.

δ δcg.RPBA

) , , . (

RPBA PPBA VBA cg

cg cg δ

δ δ

5 .RPBA+ δcg

7 .RPBA+ δcg

Chin-Hsien Wu and Tei-Wei Kuo, 2006, “An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems,”

IEEE/ACM 2006 International Conference on Computer-Aided Design (ICCAD), November 5-9, 2006.

Agenda

Introduction

Management Issues

Performance vs Overheads – Performance Evaluation

Other Challenging Issues

Conclusion

(23)

3/29/2007 Embedded Systems and Wireless Networking Lab. 45

Performance Evaluation

Performance Setup

The characteristics of the experiment trace was over a 20GB disk.

1,669,228 Different LBA’s

13,198,805 / 2,797,996 sectors Total Write / Read Requests

One week Durations

Web Applications, E-mail Clients, MP3 Player, MSN Messenger, Word, Excel, PowerPoint, Media, Player,

Programming, and Virtual Memory Activities

Applications

NTFS File Systems

Windows XP OS

320 MB RAM

Intel Celeron 750MHz CPU

Performance Evaluation

Performance Setup

The maximum number of fine-grained slots is controlled by a parameter MFS.

A parameter ST controls the frequency of switches between the two address translation mechanisms – n/ST.

ST=0 => No constraint on the number of switches.

Smaller ST => More switches.

Larger ST => Less switches.

(24)

3/29/2007 Embedded Systems and Wireless Networking Lab. 47

Memory Space Requirements

1. MFS ranged from 2,500, 5,000, 7,500, 10,000, 12,500, to 15,000.

2. AFTL uses a little more memory space than NFTL.

0 200 400 600 800 1000 1200

MFS=2,500 MFS=5,000

MFS=7,500 MFS=10,000

MFS=12,500 MFS=15,000

NFTL

Memory Space(K)

AFTL NFTL

Address Translation Performance

600000 650000 700000 750000 800000 850000 900000 950000 1000000

MFS=2,500 MFS=5,000

MFS=7,500 MFS=10,000

MFS=12,500 MFS=15,000

NFTL

Address Translation Time (ms)

ST=64 ST=32 ST=16 ST=0 NFTL

1 . Larger MFS => smaller address translation time

- More address translations going through the fine-grained address translation mechanism.

2. Smaller ST => longer address translation time

(25)

3/29/2007 Embedded Systems and Wireless Networking Lab. 49

Garbage Collection Overhead

4 5 6 7 8 9 10

MFS=2,500 MFS=5,000

MFS=7,500 MFS=10,000

MFS=12,500

MFS=15,000 NFTL Average Number of Valid Pages Copied

ST=64 ST=32 ST=16 ST=0 NFTL

350000 370000 390000 410000 430000 450000 470000 490000 510000 530000 550000

MFS=2,500 MF S=5,000

MF S=7,500 MF S=10,000

MF S=12,500 MF S=15,000

NFTL

Number of Blocks Erased

ST=64 ST=32 ST=16 ST=0 NFTL

AFTL outperforms NFTL.

- Coarse-to-fine switches can avoid immediate recycling of their primary and replacement blocks and related valid data copyings.

Space Utilization

0 0.5 1 1.5 2 2.5 3

MFS=2,500 MFS=5,000

MFS=7,500 MFS=10,000

MFS=12,500 MFS=15,000

NFTL Average Number of Free Pages Left

ST=64 ST=32 ST=16 ST=0 NFTL

The Space utilization might be better under AFTL.

- Coarse-to-fine switches can delay the recycling of replacement blocks.

- Free pages of primary blocks might be used in the future..

(26)

3/29/2007 Embedded Systems and Wireless Networking Lab. 51

Summary

AFTL is proposed to

exploit the advantages of fine- grained/coarse-grained address translation mechanisms, and to

switch dynamically and adaptively the mapping information between the two address translation mechanisms.

AFTL does provide good performance in address mapping and space

utilization and have garbage collection overhead and memory space

requirements under proper management.

PPBA cg.

δ δcg.RPBA

) , , . (

RPBA PPBA VBA cg

cg cg δ

δ δ

5 .RPBA+ δcg

7 .RPBA+ δcg

Chin-Hsien Wu and Tei-Wei Kuo, 2006, “An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems,”

IEEE/ACM 2006 International Conference on Computer-Aided Design (ICCAD), November 5-9, 2006.

Agenda

Introduction

Management Issues

Performance vs Overheads

Other Challenging Issues

Conclusion

(27)

3/29/2007 Embedded Systems and Wireless Networking Lab. 53

Challenging Issues – Reliability

Selected cell I

DS

Control Gate

Drain

Source

Each Word Line is connected to control gates.

Each Bit Line is connected to the drain.

Cell

Challenging Issues – Reliability

Read Operation

When the floating gate is not charged with electrons, there is current I D (100uA) if a reading voltage is applied. (“1” state)

5V

1V

Program Operation

Electrons are moved into

the floating gate, and the

threshold voltage is thus

raised.

(28)

3/29/2007 Embedded Systems and Wireless Networking Lab. 55

Challenging Issues – Reliability

Over-Erasing Problems

Fast Erasing Bits à All of the cells connected to the same bit line of a depleted cell would be read as “1”, regardless of their values.

Read/Program Disturb Problems

DC erasing of a programmed cell, DC programming of a non-programmed cell, drain disturb, etc.

Flash memory that has thin gate oxide makes disturb problems more serious!

Data Retention Problems

Electrons stored in a floating gate might be lost such that the lost of electrons will sooner or later affects the charging status of the gate!

Challenging Issues – Observations

The write throughput drops significantly after garbage collection starts!

The capacity of flash-memory storage systems increases very quickly such that memory space requirements grows quickly.

Reliability becomes more and more critical when the manufacturing capacity increases!

The significant increment of flash-memory access rates seriously exaggerates the Read/Program Disturb Problems!

Wear-leveling technology is even more critical when

flash memory is adopted in many system components

or might survive in products for a long life time!

(29)

3/29/2007 Embedded Systems and Wireless Networking Lab. 57

Wear Leveling versus Product Lifetime

Settings

File system: FAT16 file system with 8KB cluster size

Flash memory: 256MB small-block flash memory with 100K erase cycles

Updating of a 16MB file repeatedly with the throughput:

0.1MBs

The file requires 2K clusters = 16MB ÷ 8KB(cluster size) The FAT size of this file is 4KB (2K(clusters) x 2 bytes)

The 20% of blocks in flash memory joins the dynamic wear leveling Data of a 16MB file is stored in 1K blocks (16MB ÷ 16KB(block size)) Suppose flash memory is managed in the block level

File systems update the FAT in each cluster writing so that FAT is updated 2K times for a 16MB file

Writing of a 16MB incurs 1K block erases because of the reclaiming of invalid space.

Wear Leveling versus Product Lifetime

Ways in Data Updates

In-Place-Updates: Rewriting on the Same Page

Dynamic Wear Leveling: Rewriting over Another Free Page with Erasing over Blocks with Dead Pages

Static Wear Leveling: Rewriting over Another Free Page with Erasing over Any Blocks

Expected Lifetime of

) days ( 5 . 60 987 60 24 ) 1 2 (

100 0%

0 1 (blocks) 16

ond) 0.1(MB/sec

16(MB) Leveling

Wear Static

) days ( 5 . 60 197 60 24 ) 1 2 (

100 20%

(blocks) 16

ond) 0.1(MB/sec

16(MB) Leveling

Wear Dynamic

) days ( 09 . 60 0 60 24 2

100 ond)

0.1(MB/sec 16(MB) update)

place - (in Leveling Wear NO

× ≈

×

× +

×

× ×

=

× ≈

×

× +

×

× ×

=

× ≈

×

× ×

=

K K

K K

K K

K K

K

K

(30)

3/29/2007 Embedded Systems and Wireless Networking Lab. 59

Use a counter for each block

The garbage collector always finds the block with the least erase count.

Some heuristic approach erases a block to maintain 2 free blocks when the garbage collector finds the erase count of the block is over a given threshold.

Problems:

High extra block erases and live-page copyings

High main-memory consumption High computation cost

Wear Leveling versus Product Lifetime

Static Wear Leveling – Block-Level Mapping

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Flash Memory

Counter

: free block : dead block : block contains (some) valid data : index in the selection of a victim block

: index to the selected free block 5 5 5 4 5 5 4 4 4 5 5 5 4 4 4 5

Update data in block 3 Write new data to block 4 GC startsà find a victim block

5

Update block 15 GC starts

5 5 5 5

Conclusion

Summary

The Characteristics of Flash Memory and Management Issues

Popular Implementations: FTL vs NFTL Adaptive Two-Level Address Translation Performance, Cost, and Reliability

Challenges

(31)

3/29/2007 Embedded Systems and Wireless Networking Lab. 61

Conclusion

What Is Happening?

Solid-State Storage Devices

New Designs in the Memory Hierarchy

More Applications in System Components and Products

Challenging Issues: Performance, Cost, and Reliability

Scalability Technology Reliability Technology Customization Technology

Contact Information

• Professor Tei-Wei Kuo

l [email protected]

l URL: http://csie.ntu.edu.tw/~ktw

l Flash Research:

http://newslab.csie.ntu.edu.tw/~flash/

l Office: +886-2-23625336-257

l Fax: +886-2-23628167

l Address:

Dept. of Computer Science & Information Engr.

National Taiwan University, Taipei, Taiwan 106

(32)

Q & A

References

Related documents

Achieving this initial weight loss goal will help to lower your risk for heart disease and other conditions, including high blood pressure, type 2 diabetes, osteoporosis, and

Forex trading software open source, forex robot trader software, forex trading simulator mt4, forex cyborg robot download, forex scalping robot review , are managed account fees

Gets or creates the specified referring site record in the memory dataset..

In order to determine the optimal level of sanction and probability, the government would need, first, information as to the benefits a potential violator would expect from an

Although not shown in Figure 6, we have verified that all requests denied to benign nodes in the simulations, when the Oversight protocol was used, were due to nodes storing

The total costs of ECPR were calculated based on how many patients received ECPR following the decision tree outcomes: a patient received ECPR according to the treatment strategy

La clase social alta vertebra su consumo sobre tres medios: la televisión, internet y la radio, siendo internet el medio que logra un mayor incremento en la penetración durante

• We offer free, expert guidance on the currency markets and help customers make transactions at the right time, offering highly competitive exchange rates.. • Our