Block Chunking and Block-Layout - Motivating Example

4.1 Motivating Example

4.2.2 Block Chunking and Block-Layout

The key challenge here is how to determine the right chunk size to perform erasure coding and how to place the chunks in blocks, so that we do not increase the metadata size. With the fixed layout (as described in sub-section 4.2.1), we can easily determine the chunk locations of B/C/D blocks from the chunk locations of A, the original block size,

and erasure code being used. Based on the erasure code used and the type of request, i.e., write/update, the FINGER client performs erasure coding and decides if the data is written to the new block, appended with another block, or in-place updated.

Algorithm 1 Block-Chunking and Layout Algorithm 1: k ← length of data stripe

2: m ← length of parity stripe

3: Chunk each data block into k separate chunks 4: Encode each block using Algorithm 2

5: Write first block’s chunk contents to k new-blocks

6: Write first parity block’s chunk contents to m new-parity-blocks 7: for i = 2 to k do

8: Append k chunks of the i^th block to new-blocks

9: Append m parity chunks of the i^th block to new-parity-blocks

Let’s assume that the erasure code being used is (n, k) erasure code, where k is the number of data blocks and m = n − k is the number of parity blocks to be generated after encoding. For each block, the FINGER client chunks the large block into k smaller chunks/sub-blocks. If the request is a write operation, the FINGER client performs erasure coding on the first incoming block and produces m parity chunks. It then sends k + m chunks to be written to Datanodes as new blocks. When another block is written by the client to the FINGER client, it performs erasure coding on this new block. Instead of writing the new-chunks to new blocks, FINGER appends these chunks to the previous incomplete blocks from previous block chunking. As parities generated by each block’s erasure coding are equal to sub-block sizes, the parities are also appended until the parity-block becomes full. This process continues until k parity-blocks are erasure coded. Now the (k + 1)^th block being written to HDFS is erasure coded and written into a new block and

the (k + 2)^th block will be appended.

HDFS and HDFS-RAID both necessarily overwrite an entire file when any change, small or large, occurs. In contrast, we relax this condition by allowing the update size to be a multiple of the individual block size. This eliminates the need to re-write the large file for small changes in the file, thus we can improve the system’s small write/update performance.

In a FINGER client the update granularity is block-size. In addition to allowing update operation, the blocks are not immutable in FINGER, as they are in HDFS and HDFS-RAID. FINGER can seek to a certain block location and change the contents of that block.

The smallest seek granularity for writes are chunk/sub-block size, which is determined by the erasure coding parameters and the block size set by the client.

When a client provides new content for a file, FINGER looks into its Namenode’s metadata information, determines the blocks used for erasure coding and the block number being updated. FINGER then erasure codes the new block and generates the parity chunks.

Based on the block number, default block size, and block locations FINGER determines the seek location and seeks to that location for updating the content. Let’s say the block size is 128 MB, the erasure code being used is RS(6,4) code, the file has 8 blocks and we are updating the 3^rdblock. From the erasure coding information in Namenode’s metadata, we know that the 3^rdblock is combined with block 1, 2 and 4 to perform erasure coding. Note that the original HDFS-RAID also needs to keep this information in Namenode; as such, FINGER does not increase the metadata size w.r.t. HDFS-RAID. Now, the four erasure coded blocks have a single entry in their block locations and the chunk locations can be inferred from this information. To update block 3, we first seek to the beginning of all the 4 blocks (In Figure 35, it is the beginning of A1, A2, A3 and A4) then we compute the offset using Seek offset = block number × chunk size. We then seek using this offset and then perform new chunk/sub-block writes at this location. Since the chunk’s sizes are fixed for one instance of erasure-coded Hadoop, the new content does not overwrite the adjacent

block’s contents.

FINGER can also perform read requests in parallel, because a single block is stripped across multiple Datanodes. The disadvantage of this method is that the client needs to open streams to each of the nodes at specified offsets and then performs reads. One might argue that for multiple block read requests, there are fragmented reads; we would like to point out that the fragmentation is not huge. For random block access, a maximum of additional seek is equal to (k − 1) × chunk size w.r.t. HDFS-RAID or HDFS. The FINGER client appends these sub-blocks together and forwards to the user-client. From user client’s perspective, the block size has not changed at all.

In document Exploration of Erasure-Coded Storage Systems for High Performance, Reliability, and Inter-operability (Page 82-85)