Managing the evolution of Flash
: beyond memory to storage
Tony Kim
Director, Memory Marketing Samsung Semiconductor I nc.
Nonvolatile Memory Seminar Hot Chips Conference
August 22, 2010 Memorial Auditorium
Contents
NAND Flash technology
Flash storage management
Flash storage architecture by apps
Future trend
Contents
NAND Flash technology
Flash storage management
Flash storage architecture by apps
Future trend
NAND Technology Shifts Smoothly
Density has doubled every year since 2004
There are breaking points on key technology beyond 2013 Lithography shrink slow s, NAND Reliability degrades
MLC SLC TLC 10K 100K 50K ~ 100K 1.5K 5K < Reliability> 4Gb 8Gb 64Gb 16Gb 256Mb 128Gb < Density> 64Mb 256Gb 16Gb 8Gb 1K 0.5K 3K TLC MLC SLC
4 / 34
Reliability Tolerance by Technology
I nfluence on Scaling- dow n of Floating Gate
Increase of F-Poly interference
Cell interference↑
Decrease of coupling ratio
VPGM/ERS↑
Reduction of charge loss damage tolerance Charge loss↑
JEDEC Standard : Cycle & Data Retention
Density Write operations should occur across device life time 10 year retention after lifetime w rite cycle is unpractical
Endurance cycle Data retention
Endurance cycle Data retention
Max. spec cycle
Max. spec cycle 10years
Until 2006 As of 2007
100% END Cycle + 10y DTN 10% END Cycle + 10y DTN 100% END Cycle + 1y DTN
6 / 34
Controller : Critical for Maintaining Reliability
More intelligence of controller can offset some of the generic degradation of NAND reliability from scaling
Te ch n ology Target Endurance 1 K 3 K 4 x n m 3 x n m 2 x n m En du r a n ce 3bit 2bit Role of Controller Endurance of cell
NAND E C C E n g in e F T L (S / W ) Legacy Controller Vth R1 R2 R3 R4 R5 R6 R7 3 bit MLC Bit errors
Technology Engine w ith ECC/ Bit Error/ FTL
Legacy controller only w ith ECC cannot reliably handle 2 bit in 3xnm and beyond, let alone 3 bit
Optimization w ith Flash cell characteristics in mind is crucial New metric for reliability measurement is needed such as lifetime data amount w ith standard pattern
H o s t I/ F E C C E n g in e F T L (S / W ) H o s t I/ F B it E rr o r E n g in e NAND New Controller Melded together
8 / 34
Samsung’s I nnovative Flash Technology
Samsung is exploring new technology to break status- quo
Samsung believes 3D- NAND is the most likely successor for Planar NAND in the coming future
Planar NAND 3D- NAND
Cell architecture
Advantages Easy to produce with
simple process
High reliability Process reuse
Challenges Hard to shrink under
3D- NAND details
Potential benefit
• Better reliability/ endurance than planar since cell design rule is
much more relaxed.
I ssues in future scaling
• Bit cost reduction is done by increasing the stacking layer, thereby
increasing by 2X per each generation becomes more difficult and unlikely
• Block size w ill be larger than the planar- equivalent
CSL Control Gate (W/L) GSL Gate SSL Gate n + n + p-Sub n + 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 8 16 24 32 W/ L share 8 W/ L share 6 W/ L share 4 Stacking layer # Block Size ( MB)
Contents
NAND Flash technology
Flash storage management
Flash storage architecture by apps
Future trend
Why Softw are I s Needed for Flash?
Small data unit in Program and the large unit in Erase requires another block new ly allocated through mapping
Logical block #1 is mapped to physical block #1 and #N after updating
Block #1 page page page page page …. Block #N Block #1 New Data Invalidated Valid Data Copy Logical #1 #1 #1 Log . Phy. Map Table #1 #1 #N Log. Phy. Map Table (updated) updating New Data New Data
12 / 34
FTL ( Flash Translation Layer)
Manages mapping from logical to physical address Detects and maps out Bad Blocks
Does Wear- leveling for life extension
Managed NAND File System Application FTL File System Application FTL NAND Flash Host Driver NAND Sector Read/Write Page Read/ Program/Erase Sector Read/Write
Reclaiming Valid Data : Garbage Collection
Over time Flash is mixed w ith valid and invalid data Free space needs to be reclaimed to w rite new data
Garbage collection merges the valid data from the scattered blocks Data Data updated Data updated [Block #1] [Block #2] Data [Block #3] Data Data Data updated Data Data Invalid Data Data Data updated Physical Block Logical Block #1 #2 #3 #4 ??? No more free block Merge Erased #1 Data updated Data Data #2
Allocated for Logical Block #3
Invalid
To be updated
Data Data
14 / 34
What is WAI ( Wear Acceleration I ndex) ?
WAI : The index that represent how much FTL accelerate the w ear- out of NAND
WriteSize BlockSize EraseCount WAI Bytes Bytes Total× =
FTL
Write Request (2 blocks)
Write (4 blocks) + Erase (4 blocks)
2 256 128 4× = = KB KB
WAI
* Block Size: 128KBWAI NAND Life time Meaning
High Short Bad
Low Long Good
Example:
What is Wear- leveling ?
Hot data like FAT can w ear out certain portion of cell array Wear- leveling maximizes the life span of NAND flash as each cell is used evenly
Before Wear-leveling After Good Wear-leveling
Er a se Cou n t Er a se Cou n t Block N u m be r Block N u m be r Ev e r y ce ll is u se d e v e n ly FAT Data Fr e qu e n t Upda t e
Contents
NAND Flash technology
Flash storage management
Flash storage architecture by apps
Future trend
NAND Flash Market Outlook ( ` 10 ~ ` 15)
Samsung expected NAND Market CAGR of 53% betw een 2010 and 2015 Key applications for NAND market grow th for next decade are
: Flash Card, Smart Phone, Tablet PC and SSD
W / W A n n u a l U n it @ 1 6 G b E q u iv . [ B il li o n s ] Tablet PC SSD Smart Phone 2010- 2015, 5- Yr CAGR: 53% ( Source : Samsung ) Year Flash Card 6.7 Billion 56 Billion
18 / 34
Performance Multiplied w ith Multi- Chips
Flash storage performance can be easily expanded w ith multi- w ay w rite interleaving along w ith multi- channels
During program Busy, data can be loaded and programmed into other NAND devices on the same bus
Flash Controller nCE1 nCE2 nCE3 nCE4 NAND Controller No.1 NAND Controller No.2 I / O’s I / O’s Load PROGRAM nCE1 Load Load Load nCE2 PROGRAM CH 1 0 1000 2000 3000 4000 5000 1 2 4 8 16 Write I OPS Number of chips
Application Segmented by Technology
Applications w ill be fragmented by performance and reliability
Closer communications needed to understand user requirements and tailor appropriate solutions
500 50 1K 3K 1.5K 30K 100K 300K E n d u ra n c e 2.5MB/ s 5MB/ s 8MB/ s 13MB/ s Write Performance TLC MLC SLC Consumer Card, UFD SD Class2 Consumer Card, UFD SD Class4, OEM UFD SD Class4 MP3 , PND MP3 , PND SD Class6 SD Class6 Consumer SSD Mobile Phone, PMP Mid- End Enterprise SSD Low - End Server SSD
High- End Card High- End Server SSD
Contents
NAND Flash technology
Flash storage management
Flash storage architecture by apps
Future trend
Storage Architecture Evolution : Technical Trend
New environment on the storage drives higher performance
Multi tasking with Swap IO
High performance App with fast data IO High bandwidth Network with User Data IO
2009 2010 2011 2012 2013 20 40 60 80 100 Former eMMC eMMC 4.4 150 UFS 11~ 40Mbps 802.11b 1Gbps 802.11ac 54Mbps 802.11g 600Mbps 802.11n 300Mbps 802.11n~ g 1.3Gbps UWB Netw ork Environment System Environment Data Environment - Single Task - Multi Task -App install -Web access -Multi Task
-Full brow sing
-Multimedia play
-Async I O & Sync I O -Multimedia Data -App Data -Data base -Sw ap Data -Web Cache -HD Media File -Small Random I O -Big Size Sequential Write -Text Data -Photo Data
Storage Bandw idth [ MB/ sec]
22 / 34
Architecture for high Performance and Low Latency
Architecture of “OneNAND + moviNAND” can be unified to “moviNAND” only w ith fast random w rite and low latency I mplementing HPI can resolve the problem
CTRL MLC On eNAND eM M C SLC CPU SLC MLC SLC CTRL CPU CPU MLC SLC CTRL DRAM eM M C M B/ S ( W r it e) TI M E
2 – channel storage 1 – channel storage 1 – channel storage
Preparing Host for New Features of eMMC & UFS
The enhancements in read/ w rite performance, data integrity at sudden system- pow er failure and low er pow er consumption at idle are only realized w ith the relevant support of file- system, and OS in some cases
Linux open source community is far behind in aligning to them Chipset and handset vendors should w ork out the responsibility details
Features Purpose eMMC UFS
Trim Write speed up √ √ Reliable write Data integrity at power-loss √ √ HPI Write-suspend for fast read access √ √ Command queuing Read/ Write speed up √ Sleep Power Lower power at idle √ √
24 / 34
Data Attributes in Mobile System
Data in mobile device has different data attribute
System data/ Code data / Swap data
-Syst em working, Mult i- t asking, Syst em applicat ion working
-High speed random I/ O perform ance & dat a reliabilit y are m ainly
required
User data
-High densit y Mult im edia dat a read & writ e
-Sequent ial I/ O is m ainly required
code System Meta Sw ap User Read Centric Sequential √ Random √ √ √ Write Centric Sequential √ Random √ √ Reliability √ √ √
New Feature for e- MMC 4.4 : Multi Partition
Flexibility to host to manage eMMC 4.4
4 general purpose partitions and enhanced user data area can be set in normal user data area
Partitions NAND type Default Size Remarks
Boot Area Partition 1 SLC Mode 128KB Size as multiple of 128KB (max. 32MB) Boot Area Partition 2 SLC Mode 128KB Size as multiple of 128KB (max. 32MB) RPMB Area Partition SLC Mode 128KB Size as multiple of 128KB (max. 32MB) General Purpose Partitions MLC “or”
Enhanced Area 0KB
Available size can be seen by following:
(EXT_CSD[145]* 82+ EXT_CSD[144]* 81+ EXT_CSD[143]) *
HC_WP_GPR_SIZE*HC_ERASE_GPR_SIZE * 512KB byte User Data
Area
Enhanced Area SLC Mode 0KB Start address multiple of Write Protect Group size
Default Area MLC 93.1%
Normal User Data Area
Boot 1 Boot 2 RPMB (Secure Data)
General Purpose Partition
Enhanced User data area(SLC mode)
26 / 34
Data Usage Model in eMMC4.4
SLC and MLC partitions to be tailored by use scenarios
Boot Area Partition RPMB Area Partition
eMMC 4.4
General Purpose Area Partition 1
General Purpose Area Partition 2
User Data Area Partition
Enhanced User Data Area Partition
Host Side Data
Boot Code System Code System File Security Data Sw ap Data Application Code User Data SLC MLC Read I O, Reliability Read I O, Reliability Read/ Write I O, Sequential Pattern
Better Multi- tasking Performance w ith Trim
Trim reduces long w rite latency and in turn read latency at multi- tasking. Mobile phones very likely having some free space benefit from Trim
No Im pact as File Delet ed
Perform ance Increase as Files Delet ed (Source : Microsoft )
Thread 1 (4k RD) Thread 2
(64K WR) Merge Program Merge Program
RD RD RD
Program Program Program Program
Thread 1 (4k RD) Thread 2 (64K WR) RD RD RD RD
28 / 34
How Write Latency Affects for Multi- tasking
PROCESS A PROCESS B Traditional eMMC HOST Device
Dow nload Data MP3 PLAY Write Read APP STOP M B/ s ( W r it e) TI M E I nternal Write ( Merge)
Multi- tasking needs low w rite latency in a time- out value for uninterrupted audio/ video play- back
I nterrupting Write- Busy Sustains Real- time Task
Long w rite- busy of eMMC should be interrupted for high priority real- time task such as audio/ video play- back
PROCESS A PROCESS B MoviNAND Start Write I O HPI CMD I O Resume Write I O of Process B Request
HPI Time- Out
Point Time- Out Point HOST DEVI CE Cache Data Write Video Data Playing Suspend CMD 0 STOP CMD 0 & I nterrupt CM D 0 CM D 1 RESUME CMD 0
30 / 34
Next Generation Embedded Storage : UFS
Universal Flash Storage based on serial M- Phy
Serial interface : 300MB/s bandwidth
Native Command Queuing : Support parallel NAND Flash working for Random/Sequential IO
Page mapping with DRAM : Reduce internal Merge operation
CTRL DRAM NCQ w / Q depth 4 CMD1 CMD2 CMD3 CMD4 HOST 300MB/ s Serial I nterface System Data Code data Sw ap data User data
Page Mapping & Data Buffer w ith DRAM
UFS Command Queuing Enhance Performance
Normal data buffered and w ritten to different NAND dies in parallel Host system should indicate critical data like file- system meta to be w ritten synchronously CMD ① CMD ③ CMD ② CMD ④ CMD ⑤ CMD ⑥ UFS CTRL NAND chip no.1 NAND chip no.2 Operate CMD①③ Operate CMD②⑤ Operate CMD ④⑥ Command Queuing Command Reordering & Parallel Request C M D ① C M D ③ C M D ② C M D ④ C M D ⑤ C M D ⑥
32 / 34
Queue Depth & Random I O Performance
* IOPS(Input/Output Operation Per Second) is a common benchmark for computer storage media Write
IOPS
Read IOPS
UFS w/o UCQ
UFS w/ UCQ Q Depth : 2
Q Depth : 1 Q Depth : 2 Q Depth : 4
Random Performance depends on parallel I O number
Optimized Command Queue Depth
Depends on the number of NAND channel or way in UFS Device Consider Data loss while sudden power off
Register for Dynamic setting of Command Queue Depth
UFS Standardization Roadmap
UFS version 1.0 is only the baseline
Additions for performance, low pow er and reliability w ill be put into the follow - up spec
2010 2011 Q3 Q4 Q1 Q2 Q3 Q4 UFS Specification e- MMC features ERASE/ TRI M/ Reliable w rite… NCQ Hybrid Pow er
Version 1.0 Version 1.x Final
1st showing
(Aug. interim meeting)
2nd showing
2nd showing
34 / 34
Conclusions
Various NAND Flash technology target different markets w ith different focus in architecture and performance, resulting in w ider product portfolio
High performance mobile systems should tailor the use of different NAND Flash technology of single storage w ith
regards to performance and reliability for various application scenarios
NAND- friendly system- level solutions such as Trim, HPI as w ell as Reliable Write on eMMC w ill play a critical role in managing high reliability and performance
Command queue architecture on UFS provides a path to
future high performance by significantly increasing effective I OPS