S. Vassiliadis et al. (Eds.): SAMOS 2007, LNCS 4599, pp. 46–54, 2007. © Springer-Verlag Berlin Heidelberg 2007
Model and Validation of Block Cleaning Cost for Flash
Memory*
,**
Seungjae Baek1, Jongmoo Choi1, Donghee Lee2, and Sam H. Noh3
1
Division of Information and Computer Science, Dankook University, Korea, Hannam-Dong, Yongsan-Gu, Seoul, 140-714 Korea
{ibanez1383,choijm}@dankook.ac.kr
2
Department of Computer Science, University of Seoul, Korea, Jeonnong-Dong, Dongdaemun-Gu, Seoul, 130-743 Korea
3
School of Computer and Information Engineering, Hongik University, Korea, Sangsu-Dong, Mapo-Gu, Seoul, 121-791, Korea
Abstract. Flash memory is a storage medium that is becoming more and more popular. Though not yet fully embraced in traditional computing systems, Flash memory is prevalent in embedded systems, materialized as commodity appliances such as the digital camera and the MP3 player that we are enjoying in our everyday lives. The cost of block cleaning is an important factor that strongly influences Flash memory file system performance analogous to the seek time in disk storage based systems. We show that three performance parameters, namely, utilization, invalidity, and uniformity characteristics of Flash memory strongly effect this block cleaning cost and present a model for the block cleaning cost based on these parameters. We validate this model using synthetic workloads on commercial Flash memory products.
Keywords: Flash memory, model, validation, block cleaning.
1 Introduction
Recent developments in Flash memory technology have brought about numerous products that make use of Flash memory. While still controversial, optimists envision Flash memory will replace much of the territory that disk storage has been occupying. Whether this will happen or not will have to be seen [1]. However, one sure thing is that Flash memory is a storage medium that is being more and more widely used in everyday commodity embedded systems and is bringing about significant changes to the computing environment.
In view of these developments, in this paper, we explore and identify the characteristics of Flash memory and analyze how they influence the latency of data
*
This work was supported in part by grant No. R01-2004-000-10188-0 from the Basic Research Program of the Korea Science & Engineering Foundation.
**
access. We identify the cost of block cleaning as the key characteristic that influences latency. A performance model for analyzing the cost of block cleaning is presented based on three parameters that we derive, namely, utilization, invalidity, and uniformity, which we define clearly later. We find that the cost of block cleaning is strongly influenced by uniformity just like seek is a strong influence for disk based storage.
The rest of the paper is organized as follows. In Section 2, we elaborate on the characteristics of Flash memory and on block cleaning, in particular. Then, we present the block cleaning cost model in Section 3. In Section 4, we present the experimental setting and results that are used to validate the model. We briefly discuss related works in Section 5 and conclude in Section 6.
2 Flash Memory and Block Cleaning
Flash memory that is most widely used today is either of the NOR type or the NAND type. Other types of Flash memory such as OR type or AND type do exist, but are not popular. One key difference between NOR and NAND Flash memory is the access granularity. NOR Flash memory supports word-level random access, while NAND Flash memory supports page-level random access. Hence, in embedded systems, NOR Flash memory is usually used to store code, while NAND Flash memory is used as storage for the file system. NOR and NAND Flash memory also differ in density, operational time, and bad block marking mechanisms.
Flash memory is organized as a set of blocks, each block consisting of a set of pages. According to the block size, NAND Flash memory is further divided into two classes, that is, small block NAND and large block NAND. In small block NAND Flash memory, each block has 32 pages, where the page size is 528 bytes. A 512-byte portion of these bytes is the data area used to store data, while the remaining 16-byte portion is called the spare area, which is generally used to store ECC and/or bookkeeping information. In large block NAND Flash memory, each block has 64 pages of 2112 bytes (2048 bytes for data area and 64 bytes for spare area).
Flash memory as a storage medium has characteristics that are different from traditional disk storage. These characteristics can be summarized as follows [2].
1. Access time in Flash memory is location independent similar to RAM. There is no “seek time” involved.
2. Overwrite is not possible in Flash memory. Flash memory is a form of EEPROM (Electrically Erasable Programmable Read Only Memory), so it needs to be erased before new data can be overwritten.
3. Execution time for the basic operations in Flash memory is asymmetric. Traditionally, three basic operations, namely, read, write, and erase, are supported. An erase operation is used to clean a used page so that the page may be written to again. In general, a write operation takes an order of magnitude longer than a read operation, while an erase operation takes another order or more magnitudes longer than a write operation.
4. The unit of operation is also different for the basic operations. While the erase operation is performed in block units, read/write operations are performed in page units.
5. The number of erasures possible on each block is limited, typically, to 100,000 or 1,000,000 times.
Let us now consider the specific operations used in Flash memory. Reading data from Flash memory is simply like reading from disk. The distinction from a disk is that all reads take a constant amount of time. For a write operation, a distinction has to be made between a new write and a write that is modifying existing data. When totally new data is being written, this is almost identical to a disk write, that is, a page is allocated and written to. However, there are occasions when no free pages are available to be written to. In such a case, an erase operation must precede the write operation. This will result in considerable delay in writing out the data.
For writes that update existing data, the story is totally different. As overwrite to the updated page is not possible, various mechanisms for non-in-place update have been developed [3,4,5,6]. Though specific details differ, the basic mechanism is to allocate a new page, write the updated data onto the new page, and then, invalidate the original page that holds the (now obsolete) original data. The original page now becomes a dead or invalid page. Likewise, in this situation, an erase operation may have to precede the write operation.
Fig. 1. Page state transition diagram
Note that from the above discussions that a page can be in three different states. A page holding legitimate data is in a valid state. If the page is deleted or updated, the page becomes an invalid page and transitions into an invalid state. Note that a page in this state cannot be written to until the block it resides in is first erased. Finally, if the page has not been written to in the first place or the block in which the page resides has just been erased, then the page is clean, and this page is in a clean state. Figure 1 shows the state transition diagram of pages in Flash memory.
Note from the tri-state characteristics that the number of clean pages diminishes not only as new data is written, but also as existing data is updated. In order to store more data and even to make updates to existing data, it is imperative that invalid pages be continually cleaned. Since, cleaning is done via erase operation, which is
done in block units, valid pages in the block to be erased must be copied to a clean block. This exacerbates the already large erase overhead needed for cleaning a block.
3 Block Cleaning Cost Model
In this section, we identify the parameters that affect the cost of block cleaning. We formulate a cost model based on these parameters and analyze their effects on the cost.
3.1 Performance Parameters
Two types of block cleaning are possible in Flash memory. The first is when valid and invalid pages coexist within the block that is to be cleaned. Here, the valid pages must first be copied to a clean page in a different block before the erase operation on the block can happen. We shall refer to this type of cleaning as ‘copy-erase cleaning’. The other kind of cleaning is where no valid page exists in the block to be erased. All pages in this block are either invalid or clean. This cleaning imposes only a single erase operation, and we shall refer to this type of cleaning as ‘erase-only cleaning’.
Observe that for copy-erase cleaning the number of valid pages is a key factor that affects the cost of cleaning as all the valid pages need to be moved to another block before cleaning may happen. For erase-only cleaning, the way in which the invalid pages are distributed plays a key role. From these observations, we identify three parameters that affect the cost of block cleaning. They are defined as follows:
z Utilization (u): the fraction of valid pages in Flash memory
z Invalidity (i): the fraction of invalid pages in Flash memory
z Uniformity (p): the fraction of uniform blocks in Flash memory,
where a uniform block is a block that does not contain both valid and invalid blocks simultaneously.
Fig. 2. Situation where utilization (u=0.2) and invalidity (i=0.2) remains unchanged, while uniformity (p) changes (a) p = 0.2 (b) p = 0.6 (c) p = 1
Figure 2 shows three page allocation situations where utilization and invalidity are the same, but uniformity is different. Since there are four valid pages and four invalid pages among the 20 pages for all three cases utilization and invalidity are both 0.2. However, there are one, three, and five uniform blocks in Figure 2(a), (b), and (c), hence uniformity is 0.2, 0.6, and 1, respectively.
Utilization determines, on average, the number of valid pages that need to be copied for copy-erase cleaning. Invalidity determines the number of blocks that are candidates for cleaning. Finally, uniformity refers to the fraction of uniform blocks. A uniform block is a block with zero or more clean pages and the remainder of the pages in the block are uniformly valid or uniformly invalid pages. Another definition of uniformity would be “1 – the fraction of blocks that have both valid and invalid pages.” Of all the uniform blocks, only those blocks containing invalid pages are candidates for erase-only cleaning.
From these observations, we can formulate the cost of block cleaning as follows:
Cost of Block Cleaning
= Cost of copy-erase cleaning + Cost of erase-only cleaning = ((1-p)·min(B, i·P))·((rt + wt)·(P/B·u)+et)) + ((p·B·i) ·et)
where
u: utilization (0 ≤ u ≤ 1) i : invalidity (0 ≤ i ≤ 1- u) p: uniformity (0 ≤ p ≤ 1)
P: number of pages in Flash memory (P=capacity/size of page)
B: number of blocks in Flash memory (P/B: # of pages in a block)
rt : read operation execution time
wt: write operation execution time
et : erase operation execution time
The formula for cost of block cleaning consists of two terms. The first term is the cost for copy-erase cleaning of non-uniform blocks, where P/B·u denotes the number of valid pages in a block, and hence, need to be copied. Each copy is associated with a read and a write operation denoted by rt and wt, respectively, after which an erase operation is performed, denoted by et. (Note that instead of using (rt + wt) as the copy overhead, as some Flash memory controllers support the copy-back operation, that is, copying of a page to another page internally [7], this copy-back operation execution time could be used. In general, the copy-back execution time is similar to the write operation execution time.) This cleaning action is executed only on blocks that have invalid pages, denoted by min(B, i·P), and of those, that are non-uniform blocks, represented by (1-p).The second term is the cost of cleaning uniform blocks that have invalid pages denoted by (p·B·i) costing et erase time for each block.
4 Model Validation
In this section, we discuss the experimental environment used to validate the model and also present the validation results.
4.1 Platform and Workload
We use an embedded hardware platform to validate the block cleaning cost model. Hardware components of the system include a 400MHz XScale PXA CPU, 64MB SDRAM, 0.5MB NOR Flash memory, and embedded controllers. A small block 64MB NAND Flash memory that has 128K pages and 4096 blocks is used for Flash memory [7]. Table 1 summarizes the hardware components and their specifications [8].
Table 1.Hardware component and specification Hardware Components Specification
CPU 400MHz XScale PXA 255
RAM 64MB SDRAM
Flash 64MB NAND, 0.5MB NOR
Interface CS8900, USB, RS232, JTAG
The Linux kernel 2.4.19 was ported on the hardware platform and YAFFS is used to manage the NAND Flash memory [3]. YAFFS uses the open(), read(), write() interface provided by the VFS layer. Below the YAFFS layer, the MTD layer uses the
readchunkfromnand(), writechunktonand(), and eraseblockinnand() interface to
actually access and control Flash memory [3].
The workload that we use in the experiments is the Postmark benchmark. This is a popular benchmark widely used for measuring file system performance [18,19]. This benchmark creates a large number of randomly sized files. It then executes read, write, delete, and append transactions on these files. We create 500 files (the default number) and perform 500 transactions (again, the default value) for our experiments.
To measure the block cleaning cost, we developed a tool that sets the state of Flash memory based on the three parameter values, which may be manually designated. Based on these settings, block cleaning is performed. Another tool that we developed is used to measure the actual cost of block cleaning. Both tools are implemented within YAFFS to validate our model. For space reasons, we do not elaborate on the details of these tools.
Another issue that must be clarified is the level at which the model is to be validated. In the model, the read, write, and erase times are used to estimate the block cleaning cost. The times used for these operations will drastically influence the model estimation time. The simplest way to determine these values is by using the data sheet provided by the Flash memory chip vendor. However, through experiments we observe that the values reported in the datasheet and the actual time seen at various levels of the system differ considerably. Figure 3 shows these results. The results shows that while the datasheet reports read, write, and erase times of 0.01ms, 0.2ms,
Fig. 3. Execution time at each level
and 2ms, respectively, for the Flash memory used in our experiments, the times observed for directly accessing Flash memory at the device driver level is 0.19ms, 0.3ms, and 1.7ms, respectively. Furthermore, when observed just above the MTD layer, the read, write, and erase times are 0.2ms, 1.03ms, and 1.74ms, respectively. Which values are used in the model will drastically influence the accuracy of the model. In our study, we use the observations made just above the MTD layer as this level is where the block cleaning cost is measured.
4.2 Validation Results
Figure 4 compares the measured block cleaning cost and the cost estimated by the model. In each figure, the initial values of the three parameters are all set to 0.5. Then, we decrease utilization in Figure 4(a), increase uniformity in Figure 4(b), and decrease invalidity in Figure 4(c). The measured values and the estimated values show similar results as well as similar trends. The results indicate that the block cleaning model that we derived is fairly accurate.
Also, from these figures, we find that the impact of utilization and uniformity on block cleaning cost is higher than that of invalidity. Since utilization is not controllable by the system, this implies that to keep cleaning cost down, keeping uniformity high may be a better approach.
5 Related Works
The issue of block cleaning has been considered for both Flash memory and disk based systems. For disk based systems, segment cleaning in the Log-structured File System (LFS) is closely related to block cleaning for Flash memory. LFS writes data to a clean segment and performs segment cleaning to reclaim space occupied by obsolete data [9,10,11,12,13].
In the Flash memory arena, studies related to block cleaning have been conducted in many studies [4,14,15,16,17]. Among these Kawaguchi et al. propose using two separate segments for cleaning: one for newly written data and the other for data to be copied during cleaning [4]. Wu and Zwaenepoel present a hybrid cleaning scheme that combines the FIFO algorithm for uniform access and locality gathering algorithm for highly skewed access distribution [16].
These works, however, generally take an algorithmic approach to improve block cleaning. The focus of this paper is in identifying and modeling the key ingredients that affect block cleaning cost in Flash memory.
6 Conclusion
In this paper, we identify three performance parameters from features of Flash memory and derive a performance model for block cleaning cost based on these parameters. We validate the model through experimental measurements of block cleaning cost of a 64MB NAND Flash memory chip on an embedded board. The results show that the model that we propose accurately captures the block cleaning cost observed at the MTD layer.
Using this model, we are able to observe the factors that strongly influence block cleaning cost. These observations form the basis for our next step, which is to develop a new page allocation scheme. The new page allocation scheme should take into consideration the factor that most strongly influences block cleaning, namely, the uniformity factor.
References
1. Goldstein, H.: Too little, too soon [solid state flash memories]. IEEE Spectrum 43(1), 30– 31 (2006)
2. Sharma, A.K.: Advanced Semiconductor Memories: Architectures, Designs, and Applications, WILEY Interscience (2003)
3. Aleph One,YAFFS: Yet another Flash file system, www.aleph1.co.uk/yaffs/
4. Kawaguchi, A., Nishioka, S., Motoda, H.: A Flash-memory based file system. In: Proceedings of the 1995 USENIX Annual Technical Conference, pp. 155–164 (1995)
5. Gal, E., Toledo, S.: A transactional Flash file system for microcontrollers. In: Proceedings of the 2005 USENIX Annual Technical Conference, pp. 89–104 (2005)
6. Woodhouse, D.: JFFS: The journaling Flash file system, Ottawa Linux Symposium (2001), http://source.redhat.com/jffs2/jffs2.pdf
7. Samsung Electronics, NAND Flash Data Sheet, http://www.samsung.com/Products/Semiconductor/NANDFlash
8. EZ-X5, www.falinux.com/zproducts
9. Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Transactions on Computer Systems 10(1), 26–52 (1992)
10. Blackwell, T., Harris, J., Seltzer, M.: Heuristic cleaning algorithms in log-structured file systems. In: Proceedings of the 1995 Annual Technical Conference, pp. 277–288 (1993) 11. Matthews, J., Roselli, D., Costello, A., Wang, R., Anderson, T.: Improving the
performance of log-structured file systems with adaptive methods. In: ACM Symposiums on Operating System Principles (SOSP), pp. 238–251 (1997)
12. Wang, J., Hu, Y.: WOLF - a novel reordering write buffer to boost the performance of log-structured file systems. In: Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pp. 46–60 (2002)
13. Wang, W., Zhao, Y., Bunt, R.: HyLog: A High Performance Approach to Managing Disk Layout. In: Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pp. 145–158 (2004)
14. Gal, E., Toledo, S.: Algorithms and Data Structures for Flash Memories. ACM Computing Surveys 37(2), 138–163 (2005)
15. Chiang, M-L., Lee, P.C.H., Chang, R-C.: Using data clustering to improve cleaning performance for Flash memory. Software: Practice and Experience 29(3), 267–290 (1999) 16. Wu, M., Zwaenepoel, W.: eNVy: a non-volatile, main memory storage system. In:
Proceeding of the 6th International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS), pp. 86–97 (1994)
17. Chang, L.P., Kuo, T.W., Lo, S.W.: Real-time garbage collection for Flash memory storage systems of real time embedded systems. ACM Transactions on Embedded Computing Systems 3(4), 837–863 (2004)
18. PostMark, http://www.netapp.com/ftp/postmark-1_5.c
19. Katcher, J.: PostMark: A New File System Benchmark. Technical Report TR3022, Network Appliance Inc. (1997)