Storage Channels with Write Errors: Two-dimensional Magnetic Recording and Advanced Memory Systems

(1)

Storage Channels with Write Errors:

Two-dimensional Magnetic Recording and Advanced Memory Systems

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in the

Department of Electrical and Computer Engineering

Euiseok Hwang

B.S., Nuclear Engineering, Seoul National University

M.S., Mechanical Design and Production Engineering, Seoul National University M.S., Electrical and Computer Engineering, Carnegie Mellon University

Carnegie Mellon University Pittsburgh, PA

(2)

(3)

Carnegie Mellon University

CARNEGIE INSTITUTE OF TECHNOLOGY

THESIS

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF Doctor of Philosophy

TITLE Storage Channels with Write Errors:

PRESENTED BY Euiseok Hwang

ACCEPTED BY THE DEPARTMENT OF

Electrical and Computer Engineering

CO-ADVISOR, MAJOR PROFESSOR DATE

DEPARTMENT HEAD DATE

APPROVED BY THE COLLEGE COUNCIL

(4)

(5)

THESIS COMMITTEE

Prof. Vijayakumar Bhagavatula, Co-advisor Prof. Rohit Negi, Co-advisor

Prof. James Bain

(6)

(7)

Storage Channels with Write Errors:

Copyright c 2011

by Euiseok Hwang

(8)

(9)

Abstract

For emerging data storage systems, allowing errors during writing and correcting them by customized coding and signal processing during reading may provide higher customer densities than the conventional approach of trying to completely avoid write errors. Two high-density storage channels with write errors are investigated in this thesis; two-dimensional magnetic recording (TDMR) and advanced memory systems.

The main contents of this thesis are (1) channel model for channels with write errors; (2) investigation of information-theoretic limits of channels with write errors; and (3) coding schemes with side information for TDMR and advanced memory systems. Channels with write errors are modeled by state-dependent channel models with side information, where the side information about write errors can be used during encoding or decoding or both. Information-theoretic study shows that potential performance improvements can be achieved by taking into account write errors as side information during coding. Several coding schemes are proposed for TDMR and advanced memory systems, where the encoder can use complete or partial side information about the channel states associated with write errors. Numer-ical evaluations show that the proposed coding schemes using side information outperform conventional schemes by taking advantage of write errors in these emerging storage channels.

The contributions of this thesis to the field of coding and signal processing for emerging data storage systems can be summarized as follows. Channel capacity bounds are derived (1) for TDMR channels with write errors based on a random Voronoi grains media model with idealized write and readback and (2) for memory channels with write errors where error cor-recting code (ECC)-provided partial side information is available at the encoder. Then, for TDMR, (3) additive encoding with low-density parity-check (LDPC) codes using complete or partial side information is proposed for ideal write and readback TDMR. In addition, (4) coding and signal processing for channels with write errors and other read channel

(10)

impair-ments are investigated with position and timing uncertainty effect. For memory systems, (5) a scrubbing scheme using side information is proposed for radiation-tolerant memory sys-tems and (6) an iterative encoding scheme with side information based on the cross-entropy method is proposed for advanced memory systems. Also, (7) an endurance coding scheme is proposed for flash memory systems, which can enhance the lifetime of flash memory sys-tems by reducing the amount of charge travelling the tunneling barrier in flash memory cells. These coding schemes are evaluated numerically using MATLAB and C, and simulation re-sults show significant improvements in customer bit storage density with the proposed coding schemes.

(11)

Acknowledgements

I would like to express my sincere appreciation to my advisors, Prof. Vijayakumar Bhaga-vatula and Prof. Rohit Negi for their invaluable advice, constant and tremendous help and encouragement through the course of my research. They provided such brilliant guidance of a well-balanced blend of theory and practice for my thesis.

I am grateful to the members of my doctoral committee, Dr. Roger Wood (Hitachi Global Storage Technologies) and Prof. James A. Bain for their valuable comments for the thesis. I had a very pleasant experience working with Dr. Wood during my research in two-dimensional magnetic recording (TDMR). Prof. Bain’s course on storage systems helps me a lot in my research on emerging memory systems.

Majority of my thesis work is supported by the data storage systems center (DSSC) at CMU, information storage industry consortium (INSIC), and Jet Propulsion Laboratory (JPL). I am extremely grateful for their financial support. I also want to thank to Prof. Jian-Gang (Jimmy) Zhu, Dr. Michael K. Cheng from JPL and Dr. Kheong Sann Chan from Data Storage Institute (DSI), who contributed resources, expertise, and/or guidance to my thesis research.

It was a great time for me to be with my colleagues Andres, Andrew, Hyunggi, Jon, Joseph, Kathy, Kyuhwan, Lakshmi, Murali, Naresh, Qi, Qiao, Ramu, Satashu, Seungjune, Soowoong, Sheida, Sungchul, Vinay, Xinde, and Yibin. I thank them for sharing happiness and sadness in these years with me; they will be my unforgettable friends. I also thank Marilyn L. Patete and Elaine Lawrence for their patience and warm support for me and other students.

My deepest thanks go to my family. Their support and encouragement had a significant influence in my life. I have been blessed to have the companionship of my wife, Yuenha, and

(12)

my daughter, Alicia. Their love, support and patience through my Ph.D. program have been essential in seeing its completion.

(13)

(14)

1.1 Motivation: high capacity storage systems by taking into account write errors 1 1.2 Introduction to two-dimensional magnetic recording (TDMR) with write errors 3 1.3 Introduction to advanced memory systems with write errors . . . 5 1.4 Thesis contributions . . . 6 1.5 Thesis organization . . . 7

2 Information-Theoretic Limits on Storage Channels with Write Errors 10 2.1 State-dependent channel model with side information . . . 11 2.2 Information-theoretic limits for TDMR channels with write errors . . . 12 2.2.1 Review of capacity bounds for TDMR channels . . . 13 2.2.2 Capacity bounds of an ideal write and readback TDMR channel . . . 15 2.3 Information-theoretic limits for advanced memory channels with write errors 24 2.3.1 Review of capacity bounds for memory channels . . . 24 2.3.2 Capacity bounds for a memory channel for space applications using

error correcting code (ECC)-provided partial side information . . . 25 2.4 Summary . . . 30

3 Conventional Approaches for Channels with Write Errors 32 3.1 State-agnostic approaches . . . 33 3.2 Coding with side information at the decoder (CSID) . . . 35 3.3 Coding with side information at the encoder (CSIE) . . . 36

(15)

3.3.1 Random binning scheme . . . 37

3.3.2 Partitioned linear block code (PLBC) . . . 38

3.4 Numerical evaluation . . . 41

3.5 Summary . . . 42

4 Coding for Two-dimensional Magnetic Recording (TDMR) with Write Er-rors 44 4.1 Coding with side information for an ideal write and readback TDMR channel 45 4.1.1 Additive encoding with low-density parity-check (AE-LDPC) code for an ideal write and readback TDMR channel . . . 46

4.1.2 Numerical evaluation . . . 51

4.2 Coding and signal processing for TDMR channel with write errors and read channel impairments . . . 56

4.2.1 TDMR channel model with write errors and read channel impairments 56 4.2.2 Position and timing estimation for TDMR . . . 59

4.2.3 Read channel signal processing and coding for TDMR . . . 60

4.3 Summary . . . 67

5 Coding for Advanced Memory Systems with Write Errors 69 5.1 Coding for memory systems in space . . . 70

5.1.1 Structured binning schemes . . . 71

5.1.2 Coding with ECC-provided partial side information at the encoder in memory scrubbing channels . . . 74

5.2 Iterative encoding with side information for memory systems . . . 79

5.2.1 Cross-entropy encoding with side information . . . 79

5.3 Coding for Flash memory endurance . . . 86

5.3.1 Endurance coding with side information . . . 87

(16)

6 Conclusions 95 6.1 Thesis contributions . . . 96 6.2 Future work . . . 98

A Capacity and symmetric information rate (SIR) for a 2D channel 101

B Capacity of a binary asymmetric channel (BAC) 103

(17)

List of Figures

1.1 Illustration of shingled magnetic recording for two-dimensional magnetic recording (TDMR) [70]. The corner head writes a bit to the area wider and longer than the bit cell, while sequential writing with heavy overlap between tracks enables an extremely dense recording. . . 4

2.1 State-dependent channel model with side information about the channel states. Srepresents the channel state andSeandSddenote the side information about the channel state available at the encoder and decoder, respectively. A message W is encoded with side information Se to a codeword X. After the channel, the transmitted codeword becomesY and is decoded with side informationSd to the estimated message ˆW. . . 12 2.2 A state-dependent channel model for TDMR approximated by a simple binary

discrete memoryless channel (DMC) model. The state of the cell,s, can be_◦ or_× corresponding to normal and write error states, respectively. The write error probability is q = Pr(s= ×), where the output of a write error bit-cell is either 0 or 1, depending on the probability of the input x. The random channel error probability is p. . . 14 2.3 Illustrative example of an ideal write and readback TDMR channel. (a) Input

channel-bit array where the dashed line denotes the bit-cell boundaries and white and gray cells represent ₋1 and 1, respectively, (b) random Voronoi grains generated with random offsets from Von Mises distribution of κ = 1, where crosses denote grain cell nuclei and the asterisks correspond to the bit-cell centers, (c) recorded grains, and (d) ideal readback channel-bit outputs. . 17 2.4 Equivalent grain shapes of an ideal write and readback TDMR channel. (a)

Random Voronoi grains and (b) their equivalent grain shapes. . . 18 2.5 Histograms of a posteriori probabilities (APPs) based on (a) a row,

(18)

2.6 A state-dependent channel model for memory systems with hard errors [19]. The state of the cell,s, can beα, 1, and 0 corresponding to normal, stuck-at 1 and stuck-at 0 states, respectively. The hard error probability isq = Pr(s6=α), where all hard errors are assumed to be either stuck-at 0 or stuck-at 1 with equal probability. The soft error probability isp. . . 25 2.7 Channel capacities of coding with and without side information. The soft error

probabilitypis fixed at 10−6 and the hard error probabilityq is changed from 10−4 _{to 10}−1_{. . . .} ₂₆ 2.8 Memory scrubbing with partial side information at the encoder with a memory

scrubbing period of durationTs. . . 27 2.9 A binary asymmetric channel (BAC) model for the modified states_{S′_}. The

state transition probabilities are listed in Table 2.3. . . 29 2.10 Channel capacities of coding with and without side information. The soft error

probability pT is fixed at 10−6. The curves Cmax and Cmin are the channel capacities for the original channel of Fig. 2.6 when complete side information is available at the encoder and when no side information is available at all, respectively. The Cmax{S′} curve is the capacity upper bound of the channels with ECC-provided partial side information available at the encoder. . . 31

3.1 Block error rates (BLERs) of the readdressing scheme (n = 1068, k = 1024, te= 4) and state-agnostic BCH code (n= 1023, k = 983, t= 4) with coding overheads of 0.041 and 0.039, respectively. Thex-axis is the number of stuck cells out ofncells. . . 35 3.2 BLERs of various write error handling schemes with varying overheads. The

length of a block isn∼210 and the hard error probability isq = 2×10−3. No random channel (soft) errors are considered in the simulations. . . 43

4.1 The means and standard deviations of residual bit error rates (BERs) of the 4k-bit AE-LDPC encoded blocks as a function of the coding overhead kr/n devoted to additive encoding with complete side information (CSI) and partial side information (PSI). . . 53 4.2 The means and standard deviations of residual bit error rates (BERs) of the

16k-bit AE-LDPC encoded blocks as a function of the coding overhead kr/n devoted to additive encoding with CSI and PSI. . . 54 4.3 Block error rates (BLERs) of the 4k-bit AE-LDPC coding schemes with

com-plete (CSI) and partial side information (PSI) as a function of a customer density (user-bit per grain). . . 55 4.4 Block error rates (BLERs) of the 16k-bit AE-LDPC coding schemes with CSI

(19)

4.5 Illustration of a data block with position and timing fields written upon a random Voronoi grains medium. Separate timing fields are written for odd and even tracks, allowing clearer signals to be picked up by the relatively wide read head and allowing the relative timing of adjacent tracks to be discerned. In this model, grains are written at their centroids using an ideal rectangular head. . . 58 4.6 An example of TDMR readback with a 2D Gaussian read sensitivity function

(left) and fixed position and timing offsets. The scanned data (middle) is blurred with continuously varying levels, rather than red and blue (binary) in Fig. 4.5. The readback samples (right) experience 2D interference, jitters, and AWGN in the advanced TDMR channel. . . 59 4.7 Conditional probability density functions (pdfs) of equalized output, z. The

ˆ

fx(z)’s are empirical pdfs and the ˆfxG(z)’s and ˆfxGm(z)’s are pdfs fitted to Gaussian and Gaussian mixture noise model, respectively. . . 62 4.8 Log likelihood ratios (LLRs) based on empirical, single Gaussian, and Gaussian

mixture pdfs for an advanced TDMR channel. . . 63 4.9 ECC corrected error rates as a function of net customer areal-density (Tb/in2₎

including all the overhead required for timing and position fields. An error-rate near 10−5 is taken as the threshold below which reliable performance can be achieved. . . 67

5.1 Simulation results of structured binning schemes in a channel with only hard errors,qT = 10−3 and pT = 0. Block error rates are evaluated as a function of coding overhead with the length of codeword selected to ben= 1024. . . 77 5.2 Simulation of ECC concatenated with structured binning schemes in a channel

with hard (qT = 10−3) and soft (pT = 10−6) errors. Performances of at= 4-error correcting (1023,983) BCH code and at= 5-error correcting (1023,973) BCH code without CSIE but with erasing the hard errors are provided for comparison. . . 78 5.3 Simulation of a (1024,983) BCH code concatenated with coding with partial

side information. The partial side information is derived from the BCH de-coder. The hard and soft error rates are λh =λs = 10−6 and the scrubbing interval is 2 hours. . . 78 5.4 Histograms of number of residual errors froml= 40 hard errors from initial and

several intermediate steps of cross-entropy encoding. An (n= 1023,k= 823) cross-entropy coding with N = 105, β = 0.2 is applied for l= 40 hard errors and histograms of

(20)

5.5 Numerical evaluations of residual errors after cross-entropy encoding with vary-ing N and β. The length of codeword is fixed to n = 1023 and randomly generated 40 hard errors are tested. . . 83 5.6 Numerical simulations of block error rates (BLERs) with cross-entropy coding

schemes. The length of codeword is fixed ton= 1023 and randomly generated 40 hard errors are tested. No soft errors are considered in the simulations. . . 85 5.7 Numerical simulations of block error rates (BLERs) with cross-entropy coding

schemes. The length of codeword is fixed ton= 1023 and randomly generated 40 hard errors are tested with soft errors of probability p= 10−4. . . 85 5.8 Illustration of Waterfall coding [47] for q = 16-level MLC and b = 2-bit per

write. By Waterfall coding, the 16-level MLC can be used 5 times as a 2-bit storage before erase. . . 89 5.9 Wear cost distributions of Waterfall coding and endurance coding with side

information. The mean and variance of the wear cost can be significantly reduced by the proposed endurance coding schemes using side information. . 91 5.10 Endurance gain with endurance coding with side information for q = 16 and

n = 64. By endurance coding with side information, the endurance can be improved significantly by trading the instantaneous capacityCi. . . 92 5.11 Lifetime capacity gain with endurance coding with side information forq= 16

andn= 64. By endurance coding with side information, the lifetime capacity can be more than 7 times higher than the capacity without endurance coding, which is nearly twice the sum capacities of the Waterfall codes. . . 93

B.1 Binary asymetric channel (BAC) with cross-over probabilities Pr (Y = 0_|X= 1) = p10 and Pr (Y = 1|X= 0) =p01. . . 104

(21)

List of Tables

2.1 Ten possible grain shapes and their corresponding probabilities in the idealized TDMR channel of half grains per channel-bit model (mean area of Voronoi cell is 2 channel-bit units). The grains are generated with random offsets from Von Mises distribution ofκ= 0, 1, and 2, where the standard deviation of the area of the cells are 0.488, 0.352, and 0.249 channel-bit units, respectively. . . 19 2.2 Lower bound of the SIR (information bits / grain) of the idealized TDMR

channel based on APPs, _{p(xi,j|y_i,1i,N)} and {p(xi,j|y_ii+1,N−1,1 )}. Voronoi grains generated for Table 2.1 are used for computing information rates. . . 23 2.3 Transition probabilities for the state-dependent channel model illustrated in

Fig. 2.9. There are four possible states and the crossover probabilities vary depending on the channel state. . . 30

3.1 An example of mapping table for an (n= 8, k = 6) random binning scheme . 39

4.1 TDMR channel simulation results with a perturbed lattice random Voronoi grains model . . . 66

(22)

Chapter 1

Introduction

1.1

Motivation: high capacity storage systems by taking into

account write errors

A recent International Data Corporation (IDC) study on Digital Universe measuring and forecasting data growth rate indicates an exponential increase in the amount of digital information being created and an emerging gap between created information and available storage [15]. In order to meet the ever-increasing demand for data storage, significant effort has been devoted to increase the storage capacity of magnetic recording and semiconductor memory systems. In this thesis, aggressively scaled storage channels with possible write

errors are investigated, where the overall performance of the storage systems can be further

improved by advanced coding and signal processing that take into account the write errors. As the processing technology improves, more computations needed for advanced coding and signal processing can become less expensive than trying to avoid write errors.

In traditional data storage systems, the writing process is assumed to be perfect. Thus, most research is usually focused on read channel signal processing without much concern for

(23)

writing impairments. Nevertheless, a complete writing strategy (i.e., a write strategy that attempts to avoid any write errors) may use the limited resources inefficiently or may not even be possible (e.g., if the media grain sizes are larger than intended bit sizes). Magnetic recording systems use a recording density (grains per channel-bit) high enough to avoid possible failures in writing [67], where each bit corresponds to many grains. Memory systems reserve a part of memory cells for re-mapping defective cells or blocks containing such cells that cannot be written [6; 61]. In contrast, this thesis investigates a novel approach that allows write errors, while utilizing the resulting freed resources for aggressive channel coding and signal processing. Although the increased channel impairments resulting from write errors degrade the quality of the raw readback signals, it is hoped that aggressive channel coding and signal processing schemes using additional resources can ensure sufficient reliability of user data. By carefully designing efficient but powerful data protection schemes that take into account write errors, the overall performance of the storage systems can be improved. Two such storage channels with write errors are investigated in this thesis: two-dimensional magnetic recording (TDMR) [7; 8; 25; 26; 28; 40; 41; 45; 65; 70] and advanced memory systems with possible defective cells [4; 14; 19; 51; 55].

The question this thesis investigates is whether using resources for aggressive coding and signal processing while allowing channel write errors is beneficial in terms of overall performance quantified by either storage capacity (user-bit density) or reliability (bit or block error rate). The channels with write errors can be described by a state-dependent channel model, where the writing impairments depend on the channel states [19; 43; 46]. The side information, also called channel state information, may be available at the encoder or decoder, and can increase the capacity of the channels, [16; 19]. For example, defective cells can be detected by memory cell tests and can be used as side information during encoding or decoding for memory systems. These problems are well known as channel coding problems with side information at the encoder (CSIE) or decoder (CSID), also denoted with the transmitter

(24)

(CSIT) or receiver (CSIR). For TDMR and advanced memory systems, the side information about write errors may be available for the encoder or decoder after special tests such as write-after-read tests. Alternatively, side information may be partially available for the encoder or decoder with negligible additional overhead. Using this state-dependent channel model with complete or partial side information, this thesis investigates the information-theoretic limits of TDMR and advanced memory systems to quantify the possible performance gain. Then, in order to determine the actual performance gain provided by finite-complexity algorithms, practical schemes for getting the side information and using it during coding and signal processing are proposed for TDMR and advanced memory systems.

1.2

Introduction to two-dimensional magnetic recording

(TDMR) with write errors

Two-dimensional magnetic recording (TDMR) has been proposed to achieve user-bit density beyond 1 Tb/in2 by recording a channel-bit to as few as 0.5 magnetic grains in conventional media, while advanced two-dimensional (2D) signal processing and powerful error correcting codes (ECCs) may provide such high storage density with sufficient reliability [70]. Shingled magnetic recording (SMR) [17; 70] is employed for TDMR to record such small bits with overlap in the layout of data tracks, as illustrated in Fig. 1.1. During the sequential SMR, some channel-bits (as many as 50%) may be replaced by the following channel-bit recording, when the channel-bit shares the grain with others, causing write errors. Because of its aggressive scaling of the channel-bit size, TDMR suffers from severe 2D (i.e., down-track and cross-down-track) interference and uncertainty in channel-bit positions. On the other hand, such a dense recording (with write errors) may provide enough overhead for coding and signal processing to recover data reliably. Consequently, it is hypothesized that the overall

(25)

storage capacity of TDMR may be higher than that of the conventional magnetic recording approaches.

Figure 1.1. Illustration of shingled magnetic recording for two-dimensional magnetic record-ing (TDMR) [70]. The corner head writes a bit to the area wider and longer than the bit cell, while sequential writing with heavy overlap between tracks enables an extremely dense recording.

Therefore, the initial TDMR research is aimed at investigating whether TDMR can achieve such a high user-bit density, based on channel models. Theoretical channel capacities of TDMR have been approximated using simple binary discrete memoryless channel (DMC) models, such as the binary symmetric channel (BSC) and the binary erasure and error channel (BEEC) [70]. In addition, simplified grain models have been investigated for computing chan-nel capacity bounds by employing idealized one-dimensional (1D) grains or 2D rectangular grains [40; 41; 70], where the grains are perfectly aligned to the channel bit-cell boundaries.

For more accurate assessments of the customer density of TDMR, two TDMR channel models with random Voronoi grains are investigated in this thesis. An idealized write and readback TDMR channel model is introduced to find out a tighter lower bound of TDMR channel capacity in Section 2.2 and a new coding with side information at the encoder (CSIE) scheme for this idealized channel is developed in Section 4.1. Getting complete side informa-tion for TDMR may be challenging due to the high write error probability (around 1/2) and severe read channel impairments. Thus, coding with partial side information is also investi-gated, where partial side information can be extracted from the distribution of the grains,

(26)

instead of complicated read-after-write tests. In addition, an advanced TDMR channel model incorporating write errors, read channel impairments, and positioning and timing uncertainty is investigated in Section 4.2 for numerical evaluation of the storage capacity of the TDMR channel.

1.3

Introduction to advanced memory systems with write

er-rors

Memory devices experience errors in the form of permanent defects (hard errors causing write errors) and temporary upsets (soft errors). Because of the aggressive scaling of the technology node and challenging workload environments, memory cells are becoming more vulnerable to write errors. For example, multi-level cell (MLC) Flash memory shows write errors after a limited number of program/erase (P/E) cycles [4; 5; 51; 72]. Phase change memory (PCM) cells also have been observed to get stuck in either one of the states (low or high resistance states) after a number of P/E cycles [14; 44; 54]. Semiconductor memory cells in a space radiation environment may experience a similar increase in cell defects over time, due to the accumulated effect of radiation [14; 55]. In order to investigate memory channels with such write errors, a state-dependent channel model with side information was introduced and information-theoretic limits were investigated [16; 19; 46]. From the results of informa-tion rate investigainforma-tions, using side informainforma-tion during coding shows a potential performance improvement by taking advantage of write errors. In order to achieve this performance gain from the channels with write errors, several coding schemes using side information during encoding or decoding have been discussed [12; 18; 48; 75], and are summarized in Chapter 3.

This thesis further addresses efficient ways to use side information for memory systems with write errors, focused on coding with side information at the encoder (CSIE). In order to alleviate the burden of the memory cell tests for acquiring side information, channels with

(27)

error correcting code (ECC)-provided partial side information are investigated in Sections 2.3 and 5.1. For the special case of memory systems in space, scrubbing periodically updates the memory contents with ECC decoding, and coding with ECC-provided partial side information can yield performance improvement without additional cell tests. In addition, a new iterative encoding scheme is developed for memory systems with severe write errors in Section 5.2, where conventional CSIE schemes becomes grossly inefficient. Finally, CSIE schemes can also be applied for Flash memory systems to suppress wear of memory cell and to improve the endurance performance, which is discussed in Section 5.3.

1.4

Thesis contributions

The contributions of this thesis to the field of channel coding and signal processing for emerging data storage systems can be summarized as follows. Storage channels with write errors are modeled by state-dependent channels with side information and channel capacity bounds are derived (1) for TDMR channels with write errors based on a random Voronoi grains media model and idealized write and readback and (2) for memory channels with write errors where ECC-provided partial side information is available at the encoder. Then, for TDMR, (3) an additive encoding scheme with low-density parity-check codes (AE-LDPC) using complete or partial side information is proposed for the idealized TDMR channel. In addition, (4) TDMR channels with write errors are investigated in conjunction with read chan-nel impairments including position and timing uncertainty effect. For memory systems, (5) a scrubbing scheme using partial side information is proposed for radiation-tolerant memory systems and (6) an iterative additive encoding scheme is proposed based on the cross-entropy method for memory systems suffering from both hard (stuck-at) and soft (transient) errors. Also, (7) an endurance coding scheme is proposed for multi-level cell (MLC) flash memory, where coding with side information can enhance the lifetime of flash memory systems by

(28)

reducing the amount of charges travelling the tunneling barrier in the cell. These coding schemes are evaluated numerically using MATLAB and C, where simulation results show that the proposed CSIE schemes provide favorable performance gains in customer storage density while ensuring sufficient reliability.

1.5

Thesis organization

In Chapter 2, information-theoretic limits of the channels with write errors are investi-gated. State-dependent channel models with side information or channel state information are introduced in Section 2.1, where the side information represents the state of the bit-cell (for TDMR) or the memory-cell (in memory systems), i.e., whether the cell can be written as desired or not. Channel capacity bounds of the two channels with write errors are discussed in 2.2.1 and 2.3.1, respectively. Furthermore, capacity bounds of two specific channel mod-els with write errors, an ideal write and readback TDMR channel and a memory scrubbing channel in a space radiation environment, are investigated in 2.2.2 and 2.3.2, respectively.

Chapter 3 reviews conventional coding and signal processing schemes proposed for the channels with write errors, particularly focused on memory channels with hard errors, since magnetic recording systems with write errors have not received much research attention until now. Section 3.1 summarizes state-agnostic schemes where coding and signal processing avoids or ignores write errors in the channel. Alternatively, side information about write errors can be used during encoding or decoding. Section 3.2 reviews the erasure decoding scheme, which can be viewed as coding with side information at the decoder (CSID) since side information about defective cells is used to assign erasures during decoding. On the other hand, Section 3.3 describes several coding with side information at the encoder (CSIE) schemes proposed for memory systems with write errors. In Section 3.4, the conventional coding and signal processing schemes are numerically evaluated using a memory channel

(29)

model with hard errors. Bose, Chaudhuri and Hocquenghem (BCH) codes and their variations are implemented for performance comparison.

In Chapter 4, coding and signal processing schemes for TDMR channels are investigated and storage density of TDMR channels is numerically evaluated. In Section 4.1, a new CSIE scheme is proposed for an ideal write and readback TDMR channel, where the proposed scheme uses side information for additive encoding jointly with LDPC codes. Since acquiring complete side information may be challenging for TDMR, CSIE using partial side information requiring negligible additional overheads is also investigated in Section 4.1. In Section 4.2, an advanced TDMR channel model with other channel impairments (in addition to write errors) is investigated to assess the storage capacity of the TDMR. The overheads for coding and positioning are numerically evaluated for the advanced TDMR channel model.

In Chapter 5, several CSIE schemes are proposed for emerging memory systems with possible write errors and numerically evaluated. In Section 5.1, new CSIE schemes for mem-ory systems in a space radiation environment are proposed, where radiation-induced hard errors are dominant due to the accumulated radiation effect for space missions. For memory scrubbing systems that periodically update the memory contents with ECC decoding, a CSIE scheme using ECC-provided partial side information is also investigated. In Section 5.2, a CSIE scheme with iterative encoding is developed for memory systems with severe write er-rors where soft erer-rors are also non-negligible. Additive encoding schemes with linear block codes (LBCs) introduced in Sections 3.3 and 4.1 are generalized with a finite complexity cod-ing based on the cross-entropy method. In Section 5.3, an endurance codcod-ing scheme is also proposed for multi-level cell (MLC) Flash memory, where the average number of charging and discharging cycles in a Flash cell device is reduced using additive encoding with side in-formation. This reduction of wear stresses effectively enhances the endurance of MLC Flash memory.

(30)

Chapter 6 concludes the thesis with a summary of thesis contributions in Section 6.1 and discussion of future work in Section 6.2.

(31)

Chapter 2

Information-Theoretic Limits on

Storage Channels with Write Errors

In this chapter, information-theoretic limits of storage channels with write errors are discussed, focusing on two-dimensional magnetic recording (TDMR) and advanced mem-ory systems. These limits provide useful measures of the potential storage densities of the channels and can be used for assessing performance of channel coding and signal processing schemes. In order to model storage channels with write errors, a state-dependent channel model with side information or channel state information is introduced in Section 2.1. The side information models the state of the bit-cell for TDMR or the memory-cell for semicon-ductor memory, i.e., whether the cell can be written as intended or not. Information-theoretic limits of TDMR and advanced memory systems are investigated in Sections 2.2 and 2.3, re-spectively. Channel capacity bounds for the TDMR channel are reviewed in 2.2.1. These bounds are derived using discrete memoryless channel (DMC) approximations [41; 70] and a simplified channel with rectangular grains model [41; 45]. In 2.2.2, an ideal write and readback TDMR channel with random Voronoi grains is introduced and the lower bound of

(32)

symmetric information rate (SIR) of the channel is investigated [26]. The relations between SIR and channel capacity for a 2D channel are summarized in Appendix A. Then, the chan-nel capacity bounds for memory systems with hard errors are reviewed in 2.3.1. In 2.3.2, an upper bound on the channel capacity of space-borne memory systems is investigated, where the previous stage error correcting code (ECC) decoding provides partial side information to the next stage encoding [22].

2.1

State-dependent channel model with side information

Storage channels with write errors can be modeled by a state-dependent channel model, where side information or channel state information may be available at the encoder or decoder, as illustrated in Fig. 2.1;S represents the channel state andSe andSd are the side information about the channel state available at the encoder and decoder, respectively. W, X, and Y represent a message, codeword, and received word, respectively. ˆW denotes the message estimate. For example, semiconductor memories in a space radiation environment may have defective stuck cells or hard errors, causing write errors [19]. The cell stateS can be α, 0, and 1 denoting normal, stuck-at 0 and stuck-at 1 cell. After conducting cell tests, information about defective cells may be available at the encoder or decoder, represented as side information Se and Sd, respectively. Similarly, a TDMR channel bit-cell, or unit area assigned for one bit, can be simply divided into two types: a normal response bit-cellS =_◦ and an abnormal response bit-cell S = _× suffering from the write error [70]. Information about abnormal response bit-cells may be available to the encoder Se or decoder Sd after conducting read-after-write tests. This side information can be used during encoding or decoding or both to reduce the effects of write errors in the channel.

The study of channel coding with side information was initiated by Shannon [60] and applied to various problems, as summarized in [43]. In particular, this thesis uses coding

(33)

with side information at the encoder (CSIE), where the channel states are available at the encoder completely, for example Se = S, or partially, for example {Pr (Se=S)}. Similar problems in the area of coding with side information at the transmitter (CSIT) have been studied in [11; 43; 60]. W X Y _Wˆ S Se Sd Channel Decoder Channel state Encoder

Figure 2.1. State-dependent channel model with side information about the channel states. S represents the channel state andSe andSd denote the side information about the channel state available at the encoder and decoder, respectively. A messageW is encoded with side information Se to a codeword X. After the channel, the transmitted codeword becomes Y and is decoded with side informationSd to the estimated message ˆW.

2.2

Information-theoretic limits for TDMR channels with

write errors

In this section, previous efforts on channel capacity bounds of TDMR are summarized in 2.2.1 and capacity bounds of the ideal write and readback TDMR are investigated in 2.2.2. A TDMR channel with write errors is modeled by a state-dependent channel model, where the channel states depend on the bit locations relative to the corresponding magnetic grains. When one grain is used to record more than one channel-bit during shingled writing, the last bit actually determines what is written on the grain and the previously written bits are replaced by the last bit, resulting in write errors. Therefore, a channel state S can be S ∈ {◦,×} denoting a normal and write error bit-cell, respectively [25; 65; 70] or S ∈ {◦,→,↓,ց,ւ}, where ◦ denotes the bit-cell recorded by the corresponding bit and the other states denote the bit-cell replaced by one of its neighbors pointed to by the arrow

(34)

[26; 41]. Based on the state-dependent channel models, channel capacity bounds are discussed in the following subsections.

2.2.1 Review of capacity bounds for TDMR channels

The channel capacity of the TDMR channel is difficult to compute since the channel state of a bit is correlated with the states of its neighbors. In addition, acquiring side information from a TDMR channel by write-and-verify tests is challenging since read channel impairments are significant, and the write error probability can be as high as 1/2. The theoretical capacity of a TDMR channel has been approximated based on simple binary discrete memoryless channel (DMC) models shown in Fig. 2.2. The input x and output y of the channel are binary, x, y _{∈ {}0,1_}, and the state s can be _◦ or _× corresponding to the normal and write error bit-cell, respectively. The channel outputy can be represented by

y=       

x+zr, s=◦ (no write error) zw, s=×(write error),

(2.1)

wherezr denotes the random channel error with the error probabilityp = Pr(zr = 1|s=◦) and the write error probability isq= Pr(s=_×). The output of the write error bit-cellzw is either 0 or 1 depending on the distribution of the inputx, Pr(zw =x|s=×) = Pr(x). The channel capacities with no side information and complete side information at the encoder and decoder are given by Cmin and Cmax, respectively, based on the results from memory systems with cell defects [19],

Cmin= 1−h(q/2 + (1−q)p) and (2.2) Cmax= (1−q)(1−h(p)), (2.3) whereh(v) is the binary entropy function,

(35)

When there is no side information, the channel corresponds to a binary symmetric channel (BSC) with cross-over probabilityq/2+(1₋q)p, corresponding to half the mis-writes resulting in errors and the other half not resulting in errors. On the other hand, when complete side information is available at the encoder and decoder, write error bits can be set as erasures, and the channel corresponds to a binary erasure and error channel (BEEC) with erasure probability q and cross-over probability (1−q)p. The half grain per channel-bit TDMR without other channel errors (q = 0.5 and p = 0) yields Cmin = 0.1887 and Cmax = 0.5 user-bits per channel-bit, which correspond to 0.3774 and 1 user-bits per grain, respectively [70]. S ◦ × Pr(S) 1 -q q Pr(Y|X, S) 1 1 0 0 1−p p Pr(x= 1) Pr(x= 0) X Y 1 1 0 0

Figure 2.2. A state-dependent channel model for TDMR approximated by a simple binary discrete memoryless channel (DMC) model. The state of the cell, s, can be _◦ or _× cor-responding to normal and write error states, respectively. The write error probability is q = Pr(s=_×), where the output of a write error bit-cell is either 0 or 1, depending on the probability of the inputx. The random channel error probability isp.

Alternatively, simplified grain models have been developed for computing channel ca-pacity with correlated channel states, based on idealized one-dimensional (1D) grains or 2D rectangular grains [41; 45; 70]. In the idealized models, the grains are modeled by rectangles of 3 or 4 different shapes and assumed to be perfectly aligned to the bit-cell boundaries. The lower bounds of channel capacity can be as high as 0.75 and 0.6 user-bit per grain, for 1D and 2D rectangular grains models, respectively [41; 70]. For the rectangular grains model

(36)

investigations, the dominant impairment is approximately 50% of write errors and Markov process optimization was used to compute the bound.

2.2.2 Capacity bounds of an ideal write and readback TDMR channel

For further investigation of information-theoretic limits of the TDMR channel with write errors, an ideal write and readback TDMR channel with random Voronoi grains media is discussed [26] in this section. A dominant impairment in this channel is approximately 50% of write errors, while a lot more grain shapes are taken into account based on random Voronoi grains. The investigation results show that the lower bound of the symmetric information rate (SIR) of the idealized TDMR channel can be as high as 0.5 user-bits per grain, which is nearly 15 times higher than that of the conventional magnetic recording system. The SIR is a lower bound of the channel capacity, and their relations are summarized in Appendix A. In this investigation, a posteriori probability (APP) of channel states is extracted from the channel outputs, where statistical information of dominant grain shapes is used. Thus, an ideal write and readback TDMR channel can be interpreted as a channel with partial side information at the decoder, where using partial side information at the decoder enhances the storage capacity of the channel.

The channel state for each bit-cell depends on the cell location relative to the magnetic grains near the cell. Magnetic grains are modeled by the perturbed lattice based random Voronoi cells [65; 70], where the nuclei of the cells are obtained by adding random offsets to the ideal position of grains. The normalized random offsets ∆ are generated based on Von Mises distributions [65], with a parameterκ, where the probability density function (pdf) of ∆ is given by

fκ(∆) =eκcos(2π∆)/I0(κ), (2.5) where−1/2≤∆≤1/2, andI0is the modified Bessel function of order zero. This distribution is also called Tikhonov or circular normal distribution, where the parameterκcan control the

(37)

randomness of ∆ from uniform random variable (κ= 0) to a constant (κ=_∞), which results in varying grain size variance. During the recording, a grain modeled by a Voronoi cell is magnetized by the channel-bit whose center falls in that Voronoi cell. When the Voronoi cell covers more than one bit-cell center, that grain acquires the magnetization of the last recorded bit in the sequential recording. The readback is assumed to be ideal so that the channel-bit output corresponds to the magnetization of the grain covering the center of the bit-cell. As a result, both channel-bit input and output are binary, represented with{−1,1}for a magnetic recording channel, where some of the channel-bits are over-written by neighboring bits.

An example of an idealized TDMR channel with a half grain per channel-bit is shown in Fig. 2.3. A binary input array of the channel, denoted by _{xi,j}, is shown in Fig. 2.3 (a), where the dashed line denotes the bit-cell boundaries of 1 channel-bit unit, and white and gray cells represent the binary inputs, ₋1 and 1, respectively. Random Voronoi grains generated with random offsets from κ = 1 in (2.5) are shown in Fig. 2.3 (b), where the crosses and asterisks denote grain cell nuclei and bit-cell centers, respectively. The regions divided by solid lines are Voronoi cells, where the points inside a Voronoi cell are closer to its nucleus than any other nuclei, and the average cell area is 2 channel-bit units. Fig. 2.3 (c) and (d) show recorded grains and their ideal readback, respectively. Note that about half of the channel-bits are rewritten during sequential recording; for example, the bit (1,2) is rewritten by the bit (1,3) since the corresponding grain covers two bit-cells.

In the idealized TDMR channel, each bit-cell can be assigned to one of the five states, depending on whether the bit is the last one recording that grain, denoted by s = ◦, or overwritten by one of its following neighbors, denoted by arrowss∈ {→,↓,ց,ւ}, pointing to the overwriting bit-cell. Then, the Voronoi grain can be mapped to a combination of bit-cells as illustrated in Fig. 2.4 (b). For example, the top central grain covers the bit-cells (1,2) and (1,3), and is equivalent to the 1_×2 rectangular grain and the bit-cell (1,2) is a write error bit-cell with the state s1,2 =→ and the bit-cell (1,3) is a normal bit-cell with

(38)

cross−track down−track (a) 0 2 4 0 2 4 6 8 0 2 4 0 2 4 6 8 down−track (b) 0 2 4 0 2 4 6 8 down−track (c) down−track (d) 0 2 4 0 2 4 6 8

Figure 2.3. Illustrative example of an ideal write and readback TDMR channel. (a) Input channel-bit array where the dashed line denotes the bit-cell boundaries and white and gray cells represent −1 and 1, respectively, (b) random Voronoi grains generated with random offsets from Von Mises distribution of κ = 1, where crosses denote grain cell nuclei and the asterisks correspond to the bit-cell centers, (c) recorded grains, and (d) ideal readback channel-bit outputs.

the state s1,3 =◦. Depending on the κ of the random offset ∆ in (2.5), the variance of the cell area (grain size) changes and the distributions of equivalent grain shapes are also varied. For example, whenκ = 0 (uniform distribution), κ = 1, and κ = 2, ten pages of 724_×724 grains are generated, and the standard deviations of the areas of the cellsσAs are observed to be 0.488, 0.352, and 0.249 channel-bit units, respectively. Table 2.1 shows the probabilities of 10 dominant combinations, covering nearly 99% of all equivalent Voronoi shapes. The cells which do not correspond to any of the ten listed shapes are ignored in the following investigation.

Based on a finite set of channel-bit states and discretized grain shapes, the idealized TDMR channel can be described by a finite state trellis along the row (down-track direction). Let gi,j be the grain state of the bit-cell (i, j), gi,j ∈ {◦,→,↓,ց,ւ}. Then, the channel-bit output, yi,j, is one of {xi,j, xi,j+1, xi+1,j, xi+1,j−1, xi+1,j+1}, depending on the gi,j. In addition, the probabilities of the grain states Pr(gi,j+1) are uniquely determined by the grain

(39)

0 2 4 0 2 4 6 8 down−track (a) cross−track 0 2 4 0 2 4 6 8 down−track (b) → → ° → ° → ° → ° → ° → → → ° ° ° ↓ ↓ ° ↓ ° ↓ ° ↓ ° → ↓ ° → ° →

Figure 2.4. Equivalent grain shapes of an ideal write and readback TDMR channel. (a) Random Voronoi grains and (b) their equivalent grain shapes.

states of the preceding neighbors Gi,j+1 = {gi,j, gi−1,j, gi−1,j+1, gi−1,j+2}, when the biggest equivalent rectangular grain is 2×2 bit-cells. Let mi,j be the combined state, including the set of the channel-bit inputs,_{xi,j, xi+1,j−1, xi+1,j}, and the grain states of the Gi,j+1. Then, the joint probability of theith _{row channel inputs and outputs in the idealized TDMR} channel can be expressed as below,

p(y_i,1i,n, mi,n_i,1) =p(mi,1) n

Y

j=2

p(yi,j, mi,j|mi,j−1), (2.6) where yi,n_i,1 and mi,n_i,1 represent _{yi,j} and {mi,j}, respectively, for j = 1,2, ..., n and the transition probability, p(yi,j, mi,j|mi,j−1), does not depend on the bit location, (i, j). Based on the discretized grain shapes in Table 2.1, a grain set_Gi,j+1 has 278 possible combinations of grain states and the combined statemi,j has 278×8 = 2224 states.

Now, a lower bound of the SIR is derived based on the a posteriori probability (APP) of the idealized TDMR channel, where the APP can be computed from the modified Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm based on the estimated grain states and their transition probabilities. The information rate of a finite state channel can be computed using the forward sum-product recursion of the BCJR algorithm from a large sample sequence [1].

(40)

Table 2.1. Ten possible grain shapes and their corresponding probabilities in the idealized TDMR channel of half grains per bit model (mean area of Voronoi cell is 2 channel-bit units). The grains are generated with random offsets from Von Mises distribution of κ = 0, 1, and 2, where the standard deviation of the area of the cells are 0.488, 0.352, and 0.249 channel-bit units, respectively.

index grain κ=0 κ=1 κ=2 shapes σA=0.488 σA=0.352 σA=0.249 1 _◦ 0.275 0.284 0.306 2 → ◦ 0.218 0.228 0.225 3 ↓ 0.218 0.228 0.225 ◦ 4 _ւ 0.0188 0.0105 4.70_×10−3 ◦ 5 ց 0.0184 0.0104 4.62×10−3 ◦ 6 _↓ 0.0512 0.0474 0.0409 → ◦ 7 _ց↓ 0.0512 0.0476 0.0409 ◦ 8 ց 0.0513 0.0476 0.0409 → ◦ 9 _↓ւ 0.0514 0.0475 0.0409 ◦ 10 ց↓ 0.0345 0.0476 0.0713 → ◦ others 0.0126 1.34_×10−3 _5.51_×₁₀−5

However, due to the spatial variations of the 2D interference in the TDMR channel, direct computation of the information rate is difficult even for a simplified 2D TDMR channel model

(41)

[41]. In this study, the lower bound of the information rate with a symmetric input is derived for the idealized TDMR channel model.

For a 2D page ofM_×N bit-cells, the information rate of the 2D channel can be represented as below, I(X;Y)≡ lim N,M→∞IM,N =N,Mlim→∞ 1 M NI(X M,N 1,1 ;Y M,N 1,1 ), (2.7)

where the computational complexity ofIM,N is exponential inM N. Here,X1,1M,N and Y1,1M,N represent _{Xi,j} and {Yi,j}, respectively, for 1≤i≤M and 1≤j ≤N. For the symmetric input, the information rate can be bounded as below,

IM,N = 1 M N X i,j I(Xi,j;Y_1,1M,N|X_1,1i,j−1) (2.8) = 1 M N M,N X i=1,j=1

{H(Xi,j|X1,1i,j−1)−H(Xi,j|Y1,1M,N, Xi,j

−1 1,1 )} (2.9) ≥ _{M N}1 M,N X i=1,j=1 {H(Xi,j)−H(Xi,j|Y1,1M,N)} (2.10) = 1 M N M,N X i=1,j=1 I(Xi,j;Y1,1M,N), (2.11)

whereH(_·) denotes an entropy function. The inequality is obtained byH(Xi,j|Y1,1M,N, Xi,j

−1

1,1 )≤ H(Xi,j|Y1,1M,N) since conditioning reduces entropy, while H(Xi,j|X1,1i,j−1) = H(Xi,j) due to the independent input assumption. Using a subset of channel-bit output YS ⊂ Y1,1M,N for computing mutual information, the information rate can be further lower bounded as below,

lim N,M→∞IM,N ≥N,Mlim→∞ 1 M N M,N X i=1,j=1 I(Xi,j;Y1,1M,N) (2.12) ≥ lim N,M→∞ 1 M N X i,j I(Xi,j;YS). (2.13) As Arnold et. al. [1] showed, since{Xi,j} and{Yi,j}are stationary and ergodic, the average mutual information can be computed using a sample average. Based on this result, the lower

(42)

bound of the SIR can be computed from_{xi,j}and {yi,j}as below, lim N,M→∞IM,N ≥N,Mlim→∞ 1 M N X i,j log₂ p(xi|ys) p(xi) ) (2.14) = 1 + lim N,M→∞ 1 M N M,N X i=1,j=1 log₂(p(xi|ys)), (2.15) wherep(xi) = 1/2.

Now, the APP p(xi,j|ys) of the bit-cell (i, j) given a subset of the channel-bit outputs needs to be computed. Since the idealized TDMR channel can be described by the trellis for a row, the BCJR algorithm can provide an exact inference of the APP based on a row p(xi,j|yi,N_i,1 ) as below,

p(xi,j|y_i,1i,N) = X mi,j\xi,j

p(mi,j|y_i,1i,N), (2.16) wheremi,j\xi,j are all the variables of a combined statemi,j except the channel input of the bitxi,j. The histograms ofp(xi,j =−1|yi,Ni,1 ) from the previous example withκ= 1 are shown in Fig. 2.5 (a), where white bars are the occurrences of the set of bitsxi,j =−1, and blue bars are those ofxi,j = 1. Note thatp(xi,j =yi,j|yi,1i,N) is always larger than p(xi,j 6=yi,j|yi,1i,N) in the idealized TDMR channel, due to the non-zero probability of the bit recording the grain, p(gi,j = ◦|yi,N_i,1 ) > 0. In order to take into account the 2D memory nature of the TDMR channel, the BCJR algorithm is modified to include the channel-bit output of neighboring rows (tracks). The transition matrix for computing messages in the BCJR algorithm is modified to include the channel-bit output of the neighboring rows, p(yi+1,j_i₋_1,j, mi,j|mi,j−1). The modified BCJR algorithm computes p(xi,j|yi+1,N_i−1,1 ), which provides better separation in the resulting histograms as shown in Fig. 2.5 (b). When considering neighboring rows, some of the bits are identified to be recorded, or to have a grain state _◦, which enables p(xi,j =−1|yii+1,N−1,1 ) to be either zero or one, as shown in the leftmost and the rightmost bars of the plot. This can provide a tighter lower bound of the SIR than one based only on a single row, which will be shown by the following numerical simulations.

(43)

0 0.2 0.4 0.6 0.8 1 0 2 4 6 x 104 p(x i,j=−1|yi,1

i,N ) (a) Occurrence 0 0.2 0.4 0.6 0.8 1 0 2 4 6x 10 4 p(x

i,j=−1|yi−1,1 i+1, N₎ (b) Occurrence x i,j=−1 x i,j=1

Figure 2.5. Histograms of a posteriori probabilities (APPs) based on (a) a row,_{p(xi,j|yi,Ni,1 )}, and (b) three rows,_{p(xi,j|yii+1,N−1,1 )}.

The lower bound of the SIR of the idealized TDMR is investigated numerically. The page size is fixed to N = M = 1024, and binary inputs, xM,N_1,1 , are generated randomly. Random Voronoi grains of size 724_×724 are generated with κ= 0, 1, and 2, where a grain corresponds to 2 channel-bits on average. The resulting grain states for the bit-cells g_1,1M,N are extracted and the channel-bit outputs, y_1,1M,N, are obtained from the grain states g_1,1M,N and the channel-bit inputsxM,N_1,1 . The transition probabilities used in the BCJR algorithms are obtained from numerical simulations by counting the row-wise (down-track direction) transitions of the grain states of the bit, from_Gi,j+1 toGi,j+2, in ten sets of grains generated for eachκ. For each pair of the channel-bit inputs, grain states, and outputs, a lower bound on the SIR of the channel is computed from the APPs based on a row, {p(xi,j|yi,Ni,1 )}, and three rows,{p(xi,j|yii+1,N−1,1 )}. The resulting lower bounds on the SIR are shown in Table 2.2. The resulting lower bound on the SIR can be as large as 0.40 (κ= 0, one row) ∼0.51 (κ= 2, three rows) information bits per grain, corresponding to 8.0∼10.2 Tb/in2 of user bit density,

(44)

Table 2.2. Lower bound of the SIR (information bits / grain) of the idealized TDMR channel based on APPs, _{p(xi,j|yi,Ni,1 )} and {p(xi,j|yi+1,Ni−1,1 )}. Voronoi grains generated for Table 2.1 are used for computing information rates.

Voronoi grains _{p(xi,j|yi,N_i,1 )} {p(xi,j|y_ii+1,N−1,1 )} κ=0 (σA=0.488) 0.401 0.461 κ=1 (σA=0.352) 0.429 0.488 κ=2 (σA=0.249) 0.449 0.511

based on an assumed grain density of 20 Teragrain/in2 [70]. Note that increasingκ reduces the randomness of the grains, which enables a higher information rate for the idealized TDMR channel. In addition, the SIR based on three rows provides a tighter lower bound than that of one row, since the modified BCJR algorithm provides better estimates of side information by taking into account the channel-bit output of neighboring rows.

(45)

2.3

Information-theoretic limits for advanced memory

chan-nels with write errors

This section reviews the capacity bounds for the memory channels with write errors in 2.3.1, and a channel capacity bound for a special memory channel with ECC-provided partial side information is derived in 2.3.2.

2.3.1 Review of capacity bounds for memory channels

Memory systems with hard errors were first modeled by Kuznetsov and Tsybakov, using a state-dependent channel model with non-causal side information at the transmitter (encoder), and achievable rates were derived [46]. The channel capacity of a state-dependent channel with complete non-causal side information at the encoder was derived by Gelfand and Pinsker [16]. Heegard and Gamal [19] investigated achievable rates of memory systems with partial side information at the encoder or decoder based on a binary DMC model. A state-dependent channel model for memory systems with hard errors is shown in Fig. 2.6, where q and p denote hard and soft error probabilities, respectively. In this thesis, all hard errors of memory systems are assumed to be either stuck-at 1 or stuck-at 0 with equal probability; the cell state, denoted bys, is in_{α,1,0_}, whereα, 1, and 0 are normal, stuck-at 1 and stuck-at 0 states, with probabilities 1₋q,q/2, andq/2, respectively. The channel output of the cell, y, depends on its input xand state s, and can be expressed as,

y =       

x+z, s=α (no hard error) s, s₆=α (hard error),

(2.17)

wherez_{∈ {}0,1_}is a soft error withp= Pr(z= 1_|s=α). The channel capacities with no side information and complete side information at the encoder and decoder are given byCmin and Cmaxas represented in (2.2) and (2.3), respectively, based on BSC and BEEC models [19]. In

(46)

addition, the channel capacity Cenc of the system where only the encoder has complete side information (Se=S, Sd=∅) is equal toCmax[19]. Channel capacities of coding without and with side information,Cmin andCmax, are plotted in Fig. 2.7, where the soft error probability pis fixed at 10−6 and the hard error probability q is changed from 10−4 to 10−1. Using side information at the encoder can increase the capacity, while the difference betweenCmin and Cmax is significant only for a largeq, when the state dependency of the channel is significant.

S α 1 0 Pr(S) 1 -q q/2 q/2 Pr(Y|X, S) 1 1 0 0 1−p p X Y 1 1 0 0 1 1 0 0 S α 1 0 Pr(S) 1 -q q/2 q/2 Pr(Y|X, S) 1 1 0 0 1−p p X Y 1 1 0 0 1 1 0 0

Figure 2.6. A state-dependent channel model for memory systems with hard errors [19]. The state of the cell, s, can be α, 1, and 0 corresponding to normal, stuck-at 1 and stuck-at 0 states, respectively. The hard error probability isq = Pr(s ₆=α), where all hard errors are assumed to be either stuck-at 0 or stuck-at 1 with equal probability. The soft error probability isp.

2.3.2 Capacity bounds for a memory channel for space applications using

error correcting code (ECC)-provided partial side information

Memory devices aboard spacecrafts are susceptible to radiation induced errors. Some memory cells will only experience temporary bit reversals called soft errors. But if the radiation dose is strong enough, a memory cell can become permanently damaged and remains stuck at a fixed value [55]. The radiation dosage is not uniform in space and is very high

(47)

10−4 10−3 10−2 10−1 0.7

0.8 0.9 1

Hard error probability, q

Capacity (info. bits/cell)

C

max=Cenc

C

min

Figure 2.7. Channel capacities of coding with and without side information. The soft error probabilityp is fixed at 10−6 _{and the hard error probability}_q _{is changed from 10}−4 _{to 10}−1_. during periods of Sun spots or in certain regions, e.g., near Jupiter [55]. To protect against radiation effects, memory controllers aboard spacecrafts periodically “scrub” memory devices for errors and update memory contents with the ECC decoder output [59; 74]. By scrubbing a memory device, soft errors can be removed by rewriting the cells, however stuck-at bits remain and accumulate over time. In legacy memory systems, the defective cells are treated simply as soft errors which makes error correction inefficient because stuck-at bits cannot be corrected. Alternatively, an entire memory block can be marked unusable even if only a few cells in the block are defective. Although these approaches have low overhead, they do not make best use of available storage area. If the location of the stuck-at bit, modeled by side information, is available at the decoder, the stuck-at bits can be set as erasures to make ECC decoding more efficient [31; 32; 33].

In memory devices where the controller periodically scrubs for errors, the ECC decoder can provide partial side information to the encoder to use to improve the overall error

(48)

cor-recting performance. The bit positions where error correction occurred can be assumed to be the locations of hard errors. This assumption is not entirely correct because not all errors are hard errors and those stuck-at bits that agree with the information are not identified as hard errors. Therefore, comparing the decoder output and input only provides partial and not complete side information. However, hard errors will accumulate over time and become the dominate error source, thereby making our assumption increasingly accurate. The capacity of channels with partial side information is unknown in general [63]; however, a tight upper bound can be computed as follows. The procedure of obtaining partial side information from the decoder and forwarding this information to the encoder during scrubbing is illustrated in Fig. 2.8. At the mth _scrubbing _T ₌ _mT

s with a scrubbing interval Ts, comparing the ECC decoder inputY(m−1) _{and output ˆ}_X(m−1) _{provides partial side information about the} cell state S(m)_{, i.e.,}

Pr S(m)_|_S′(m−1) _{, where} _S′(m−1) _≡h_Y(m−1)_Xˆ(m−1)i _{denotes a state} descriptor. Y(m−1) _X_ˆ(m−1) X(m) Y(m) S(m) S′(m−1) Scrubbing at time T=mT_s ECC Decoder Channel ECC Encoder w/ Side Info.

Figure 2.8. Memory scrubbing with partial side information at the encoder with a memory scrubbing period of durationTs.

A state-dependent channel model for memory systems in a space radiation environment is similar to Fig. 2.6, where hard and soft error probabilities depend on a time period T, defined as [32]

qT = 1−e−λhT and (2.18)

(49)

whereλh and λs are hard and soft error rates (errors/bit/day). Then, this “memory scrub-bing channel” can be decomposed into a set of binary asymmetric channels (BAC) each with transition probabilities that depend on the side information provided by comparing the de-coder output to its input. We truncate the scrubbing time index (m) from the following steps. The modified state-dependent channel is shown in Fig. 2.9, by introducing a state descriptor S′ that can take on one of four possible states {YXˆ = 11,10,01,00} depending on the action of the decoder. The input to the decoder is denoted by Y and output by ˆX. So S′ = {11} or {00} indicates that a bit remained the same before and after the decoder while S′ ={10} or {01} indicates that the bit has changed. Again, the number of errors is assumed to be within the correction bound of the ECC. One of two possible scenarios would lead to a bit remaining constant before and after the decoder. One is that the bit is not stuck at a value and did not experience any error and this event occurs with a probability of rT(1−pT), where rT = (1−qT). The other scenario is that the bit is stuck at ‘1’ or ‘0’ and this event occurs with a probability ofqT/2. One of two possible scenarios would lead to a bit being changed after decoding. One is that the bit is not stuck at a value and did experience an error and this event occurs with a probability ofrTpT. The other scenario is that the bit became stuck at a value before decoding and this event occurs with a proba-bility of qT/2. From this description, the Pr(S′) column of Table 2.3 can be filled. Then, the channel transition probabilities can be computed as follows. Without loss of generality, assume that the channel is in state S′ = {11} and a ‘1’ is written to this bit position but this bit became a ‘0’ on the memory device. This scenario, marked by transitionp10, occurs only when the bit is not stuck at 1 and the bit experiences a soft error and is given by the probability [(rT(1−pT)/2)/Pr(S′ ={11})]pT. Continuing similarly, assume the same state S′ =_{11_}, but now consider the scenario where a ‘0’ is written to this bit location but the bit became a ‘1’ on the memory device. This event, marked by transition p01, occurs when the bit is stuck at ‘1’ or when the bit is not stuck at ‘1’ and experiences a soft error and is

(50)

given by the probability [(qT/4)/Pr(S′ ={11})] + [(rT(1−pT)/2)/Pr(S′ ={11})]pT. With similar reasoning, the remaining entries in Table 2.3 can be filled out.

1 1 0 0 1−p( ´₁₀S) p( ´₁₀S) 1−p( ´₀₁S) p( ´₀₁S) X Y

Figure 2.9. A binary asymmetric channel (BAC) model for the modified states {S′}. The state transition probabilities are listed in Table 2.3.

When there is no side information available at the encoder and decoder, the modified channel is equivalent to the original channel of Fig. 2.6 and the channel capacities are equivalent, i.e.,C_min{S′}=Cmin. With complete side information, the capacity of the modified channel, denoted byCmax{S′}, can be computed as

C_max{S′}=X S′ Pr(S′) max p(x|S′₎ I(X;Y)_|S′ , (2.20)

where max_p(x_|_S′₎[I(X;Y)|S′] is the capacity of the BAC [53] and can be computed as sum-marized in Appendix B. To compare the channel capacities for specific error rate parameters, the soft error probability is fixed at pT = 10−6 and the hard error probability is varied in the range of 10−4 _≤ _q

T ≤ 10−1, leading to the capacity curves in Fig. 2.10. The curves Cmax and Cmin are the channel capacities for the original channel of Fig. 2.6 when complete side information is available at the encoder and when no side information is available at all, respectively. The Cmax{S′} curve is the capacity of the modified channel with complete side information available at the encoder. The capacity of the modified channel when only partial side information is available at the encoder would lie between theCmax{S′} andCmin curves.

(51)

Table 2.3. Transition probabilities for the state-dependent channel model illustrated in Fig. 2.9. There are four possible states and the crossover probabilities vary depending on the channel state. S′ Pr(S′) p(S₁₀′) p(S₀₁′) 11 2rT(1−pT) +qT 4 2rT(1−pT)pT 2rT(1−pT) +qT 2rT(1−pT)pT +qT 2rT(1−pT) +qT 10 2rTpT +qT 4 2rTp2_T 2rTpT +qT 2rTp2_T +qT 2rTpT +qT 01 2rTpT +qT 4 2rTp2T +qT 2rTpT +qT 2rTp2T 2rTpT +qT 00 2rT(1−pT) +qT 4 2rT(1−pT)pT +qT 2rT(1−pT) +qT 2rT(1−pT)pT 2rT(1−pT) +qT

2.4

Summary

In this chapter, a state-dependent channel model is introduced in Section 2.1, which can be applied to both two-dimensional magnetic recording (TDMR) and advanced memory systems. In the channel model, channel states are associated with write errors and side information about channel states may be available at the encoder or decoder for coding. The capacity bounds of TDMR and memory systems are reviewed at the beginning of Sections 2.2 and 2.3. Capacity bounds of two special channels with write errors are also investigated in the later parts of these Sections. In Section 2.2, an ideal write and readback TDMR channel with random Voronoi grains is introduced and a lower bound on the symmetric information rate (SIR) of the channel is derived. The SIR is itself a lower bound of channel capacity. The lower bound on SIR can be as high as 0.401_∼0.511 user-bit per grain, from numerical evaluations with varying grain size variations [26]. In Section 2.3, for memory scrubbing systems that periodically update the memory contents with error correcting code (ECC) decoding, the previous stage ECC decoding can provide partial side information for the next

(52)

10−4 10−3 10−2 10−1 0.7

0.8 0.9 1

Hard error probability, q T

Capacity (info. bits/cell) _C max=Cenc

C

min

C {S‘}

max

Figure 2.10. Channel capacities of coding with and without side information. The soft error probabilitypT is fixed at 10−6. The curvesCmax andCmin are the channel capacities for the original channel of Fig. 2.6 when complete side information is available at the encoder and when no side information is available at all, respectively. The Cmax{S′} curve is the capacity upper bound of the channels with ECC-provided partial side information available at the encoder.

stage encoding. An upper bound on the channel capacity of this scrubbing memory system is derived. Coding with partial side information provides an intermediate capacity between coding with no side information and coding with complete side information [22]. In the following chapters, coding schemes using side information will be discussed, which efficiently handle the write errors of TDMR and advanced memory systems.

(53)

Chapter 3

Conventional Approaches for

Channels with Write Errors

This chapter reviews the coding and signal processing schemes that underlie the proposed schemes of this thesis. This chapter mainly considers memory channels with hard errors, since magnetic recording systems with write errors have not received much research attention, un-til now. Section 3.1 summarizes state-agnostic schemes where coding and signal processing avoids or ignores write errors in the channel [6; 49]. Alternatively, side information about write errors can be used during encoding or decoding. Section 3.2 reviews the erasure de-coding scheme which can be viewed as a de-coding with side information at the decoder (CSID) scheme where side information about defective cells is used to assign erasures [19; 68]. On the other hand, Section 3.3 describes coding with side information at the encoder (CSIE) schemes such as random binning scheme [19] and partitioned