44. Data Integrity and Protection

(1)

44. Data Integrity and Protection

Operating System: Three Easy Pieces

(2)

Disk Failure Modes



Common and worthy of failures are frequency of latent-sector errors(LSEs) and block corruption.

Cheap Costly

LSEs 9.40% 1.40%

Corruption 0.50% 0.05%

Frequency of LSEs and Block Corruption

(3)

Disk Failure Modes (Cont.)



Frequency of latent-sector errors(LSEs)

 Costly drives with more than one LSE are as likely to develop additional.

 For most drives, annual error rate increases in year two

 LSEs increase with disk size

 Most disks with LSEs have less than 50

 Disks with LSEs are more likely to develop additional LSEs

 There exists a significant amount of spatial and temporal locality

 Disk scrubbing is useful (most LSEs were found this way)

(4)

Disk Failure Modes (Cont.)



Block corruption:

 Chance of corruption varies greatly across different drive models

 Within the same drive class

 Age affects are different across models

 Workload and disk size have little impact on corruption

 Most disks with corruption only have a few corruptions

 Corruption is not independent with a disk or across disks in RAID

 There exists spatial locality, and some temporal locality

 There is a weak correlation with LSEs

(5)

Handling Latent Sector Errors



Latent sector errors are easily detected and handled.



Using redundancy mechanisms:

 In a mirrored RAID or RAID-4 and RAID-5 system based on parity, the system should reconstruct the block from the other blocks in the parity group.

(6)

Detecting Corruption: The Checksum



How can a client tell that a block has gone bad?



Using Checksum mechanisms:

 This is simple the result of a function that takes a chunk of data as input and computes a function over said data, producing a small summary of the contents of the data.

(7)

Common Checksum Functions (Cont.)

 Different functions are used to compute checksums and vary in strength.

 One simple checksum function that some use is based on exclusive or(XOR).

 XOR is a reasonable checksum but has its limitations.

 Two bits in the same position within each checksumed unit changed the checksum will not detect the corruption.

365e c4cd ba14 8a92 ecef 2c3a 40be f666

0011 0110 0101 1110 1100 0100 1100 1101 1011 1010 0001 0100 1000 1010 1001 0010 1110 1100 1110 1111 0010 1100 0011 1010 0100 0000 1011 1110 1111 0110 0110 0110

0010 0000 0001 1011 1001 0100 0000 0011

If we view them in binary, we get the following:

It is easy to see what the resulting checksum will be:

The result, in hex, is 0x201b9403.

(8)

Common Checksum Functions (Cont.)

 Addition Checksum

 This approach has the advantage of being fast.

 Compute 2’s complement addition over each chunk of the data

 ignoring overflow

 Fletcher Checksum

 Compute two check bytes, s1 and s2.

 Assuming a block D consists of bytes d1…dn; s1 is simply in turn is

 s1 = s1 + di mod 255(compute over all di);

 s2 = s2 + s1 mod 255(again over all di);

 Cyclic redundancy check(CRC)

 Treating D as if it is a large binary number and divide it by an agreed upon value.

 The remainder of this division is the value of the CRC.

(9)

Checksum Layout



The disk layout without checksum:



The disk layout with checksum:

 Store the checksums packed into 512-byte blocks.

D0 D1 D2 D3 D4 D5 D6

D0 D1 D2 D3 D4

C[D0] C[D1] C[D2] C[D3] C[D4]

D0 D1 D2 D3 D4

C[D0] C[D1] C[D2] C[D3] C[D4]

(10)

Using Checksums



When reading a block D, the client reads its checksum from disk Cs(D), stored checksum



Computes the checksum over the retrieved block D, computed checksum Cc(D).



Compares the stored and computed checksums;

 If they are equal (Cs(D) == Cc(D)), the data is in safe.

 If they do not match (Cs(D) != Cc(D)), the data has changed since the time it was stored (since the stored checksum reflects the value of the data at that time).

(11)

A New Problem: Misdirected Writes



Modern disks have a couple of unusual failure-modes that require different solutions.

 Misdirected write arises in disk and RAID controllers which the data to disk correctly, except in the wrong location

C[D0] disk=1 block=0

D0 _C^[D0^]

disk=1 block=1

D1 _C^[D0^]

disk=1 block=2

D2

C[D0] disk=1 block=0

D0 _C[D0] disk=1 block=1

D1 _C[D0] disk=1 block=2

D2 Disk 1Disk 0

(12)

One Last Problem: Lost Writes



Lost Writes, occurs when the device informs the upper layer that a

write has completed but in fact it never is persisted.

(13)

Scrubbing



When do these checksums actually get checked?

 Most data is rarely accessed, and thus remain unchecked.



To remedy this problem, many systems utilize disk scrubbing.

 By periodically reading through every block of the system

 Checking whether checksum are still valid

 Reduce the chances that all copies of certain data become corrupted

(14)

Overhead of Checksumming



Two distinct kinds of overheads : space and time



Space overheads

 Disk itself: A typical ratio might be an 8byte checksum per 4KB data block, for a 0.19% on-disk space overhead.

 Memory of the system: This overhead is short-lived and not much of a concern.



Time overheads

 CPU must compute the checksum over each block

 To reducing CPU overheads is to combine data copying and checksumming into one streamlined activity.

(15)

 Disclaimer: This lecture slide set was initially developed for Operating System course in Computer Science Dept. at Hanyang University. This lecture slide set is for OSTEP book written by Remzi and Andrea at University of Wisconsin.