Existing Techniques for Encoding and Reconstruction

The process for encoding coded blocks using network coding, as described in Section 2.1.2, is an O(n×m) operation. The entire file must be processed to create one coded block. In the case of a peer that does not yet have a complete copy of the file, it must process all of the data it has received so far to create one block. For a sufficiently large value of n, peers may not be able to generate coded blocks as fast as they can transmit them. This limitation slows down the transfer of the file to all peers. Even if less network traffic is required for all peers to obtain a copy of the file with network coding than would be required with a standard block-sharing protocol like BitTorrent, if the rate of transmission is limited by the computations required to perform network coding, it may take longer to distribute the file.

2.2.1 Generations

One way to limit the computational cost of generating a coded block is to divide the file into some number of equally-sized contiguous subsets called generations [GMR06]. When generating a coded block, instead of using every block of the file, only blocks from a particular generation are used. The resulting coded block contains some information about every block in its generation, but not about every block in the entire file. If the file containsn blocks, andg generations are to be used, then there will be dn/geblocks in each generation. A peer must collect a h=dn/ge

blocks for a particular generation in order to decode the data in that generation, and a peer needs to decode every generation to have all of the original file data.

Using generations effectively changes the large linear algebra problem that must be solved to decode the original file into g smaller problems. Because the number of elements in each generation is much lower than the number of elements in the entire file, generating coded blocks becomes much cheaper. Whether generations are used or not, if the size of each block remains constant, the number of total blocks necessary to decode the file remains the same. Without generations, a peer must receive n coded blocks to decode the entire file, and with generations, a peer must receive h coded blocks from each of g generations, for a total of h×g =n coded blocks.

Using generations not only reduces the cost of generating a coded block, but it also reduces the cost of decoding the file. Without generations, a coefficient vector contains one element for each block of the file, or n total elements. When it comes time to decode the file, the coefficient matrix will be ann×nmatrix, which represents a system of linear equations. Solving this system is an O(n3) operation. However, when using generations, each coefficient vector only containshelements—one for each block of the generation. When it comes time to decode the file, there are g separate

h×h coefficient matrices. Solving each of these is an O(h3) operation, so the cost of decoding the entire file is O(g×h3_{). Since} _h_×_g ₌ _n_{, for} _g _{greater than 1,} _h _must be less than n by a factor of g, so the cost of decoding the entire file is reduced by a factor of g. So, using generations can speed up both the encoding of coded blocks and the decoding of the original file data.

Furthermore, using generations also allows for progressive decoding of the file. Instead of waiting to receivencoded blocks before any of the file data can be decoded, as few ashcoded blocks allows a generation to be decoded. This decoding may allow the peer to start using the beginning of the file, while waiting for enough coded blocks to decode the rest of it.

Generations do not come without disadvantages. Using generations makes it more likely for coded blocks to be non-innovative, as there are fewer values in each coefficient vector which makes it more likely for linearly dependent or even identical coefficient vectors to be chosen. If a peer has enough coded blocks to decode several entire generations, only coded blocks generated from elements in the remaining generations of the file are useful. As a result, a problem similar to BitTorrent’slast block problem can arise, wherein a peer nearing completion of its download has trouble locating another peer that can provide blocks from the last generation that it needs to be able to decode all of the original file data. Ashdecreases andg increases, these issues intensify. Taken to the extreme case that h = 1 and g = n, the advantages that network coding offers over BitTorrent completely disappear.

2.2.2 Density

Another way to reduce the computational cost of generating coded blocks is to add a density parameter [WL06]. If there are n blocks in the entire file, the density parameter,q, whereq≤n, is used to determine how many of the blocks are to be used to generate a coded block. The use of the density parameter is rather straightforward. Instead of using alln blocks when generating a coded block, q blocks are selected at

random. Thus, coded blocks will only contain information about someq out of the n

blocks of the original file.

Using the density parameter speeds up the generation of coded blocks, as each block of the file does not need to be processed each time a coded block is created. Using a very low density can make the creation of coded blocks very fast. Encoding a block using the entire file is an O(n×m) operation, but with a density of q, it is reduced to O(q×m). However, now each coded block no longer contains information about each block of the original file, and using a low density increases the number of non-innovative blocks that are generated.

2.2.3 Folded Block Encoding

Folded block encoding [Cut07] allows for coded blocks to be created without the need to process the entire file. With folded block encoding, an index, i, is maintained starting at 0. Also, the last transmitted coefficient vector,V, and data vector,E, are stored, also initialized to all zeros.

When a peer needs to create a coded block to send to another peer, a random number s is generated. The ith row of the peer’s coefficient matrix is multiplied by

s and added to V, and the result becomes the coefficient vector associated with the new coded block. Similarly, the ith _{row of the peer’s data matrix is multiplied by}

s and added to E, and the result becomes the data vector associated with the new coded block. Now, the coded block is ready to be sent, its coefficient vector and data vector are stored as the new V and E. The value of i is incremented, looping back around to 0 once it exceeds the number of rows in the coefficient and data matrices.

Using this method, the first coded block that a peer transmits contains information about only the first block in its data matrix. The second coded block contains information about the first two blocks in its data matrix, the third coded block contains information about the first three blocks, and so on. Once a peer generates n

coded blocks, all coded blocks transmitted by the peer contains some information about every block of the original file.

This approach has the advantage that, to generate a coded block, a peer only needs to process one block of its file data, as opposed to processing the entire file as described in Section 2.1.2.1. The disadvantage is that early coded blocks generated by a particular peer contain information about a small number of blocks from the entire file, making it more likely for them to be non-innovative. Even so, early blocks generated by a particular peer that does not start with a complete copy of the file are highly likely to contain information about multiple blocks of the original file. The first block a peer receives will become the basis for the first block it generates, so it will contain information about the same blocks of the original file.

In document ABSTRACT. Performance Improvements to Peer-to-Peer File Transfers Using Network Coding. Aaron A. Kelley, M.S. Mentor: William B. Poucher, Ph.D. (Page 49-53)