CHAPTER 5. FEC in bacterial communications
5.6 Luby Transform (LT) coding
5.6.1 Introduction of LT codes
In the proposed channel model in this thesis, the symbols from the previous time durations have a relatively strong impact on the received symbol. In previous sections, block codes, including Hamming codes and MECs have been applied to improve the channel performance. For such codes, the code rate should be determined in compliance with the error probability, before transmission. However, in the proposed molecular channel in this work, the error probability is difficult to calculate precisely when the transmission distance is very small and the number of
138
molecules emitted at the start of each time slot is large, since it is affected by many previous time slots. Thus, in these cases when the error probability is less or more than that expected, it will either cause problems at the decode side or achieve a rate less than the transmission rate achievable. Also, another disadvantage of powerful block codes is the high computational cost of the overall encoding and decoding processes, which is a severe limit for bacterial nanomachines. Taking the above discussions into consideration, fountain codes [215], which are a new class of codes designed and ideally suited for reliable transmission of information over a communication channel with unknown error probability, are utilized as channel codes in the proposed channel model. Fountain codes are rateless codes, which means that the encoder can produce a potentially infinite number of output symbols. In this section, LT codes, as a type of fountain codes, are applied in the channel. The concept of LT codes was presented and put into practice by Luby [216], to reduce the encoding and decoding complexity. For LT codes, each encoded symbol is obtainable using √ { ⁄ } symbol operations, where is the number of input symbols and 1- is the probability that input symbols can be recovered [216]. Since as many or as few encoding symbols can be generated as needed and any data copy can be recovered from any set of encoding symbols that is slightly longer than the transmitted symbols in length, encoding symbols can be generated and sent over the communication channel until a certain number of symbols have arrived at the decoder to recover the data, regardless of the channel model [216].
Assuming that the message bits are denoted by Ζ = ( , ,… , ) and the codeword
is denoted by ℂ = ( , , ⋯ , ) where is the codeword length, the LT encoding process can be expressed as follows:
139
Randomly choose a degree ℓ of the encoding symbol from a degree distribution Φ(ℓ). Suitable choice of the degree distribution depends on the message length.
Uniformly randomly choose ℓ distinct message bits as neighbours of the encoding symbol, bitwise sum these bits modulo 2 and set the result as the value of the encoding symbol.
The encoding scheme mentioned above for LT codes is identical for any LT code design and can be associated with many kinds of decoding schemes, such as Gaussian Elimination, ML Detection, or Belief Propagation. For each kind of degree distribution, a specific decoding algorithm is chosen in terms of fast and accurate decoding. The encoded bit is then transmitted over the noisy channel, and the decoder receives a corrupted version of this bit. For a diffusion-based MC system, all the nanomachines can be synchronized through bio-inspired approaches [217]. Thus they share a common random number generator, which means that the decoder knows which ℓ bits are used to generate any given encoded bit, but not their values. Using the Belief Propagation algorithm [218], which is a simple and efficient method of decoding LT coded messages, the decoding process can be described as follows:
Find a codeword ∈ ℂ connected to only one original ∈ Ζ. If there is no
such a codeword, suspend the decoding process and report the decoding failure.
Set the value of to : = .
Find all the codewords (Λ ∈ [1, ]) that are connected to , update the
140
Remove the connections from the generator matrix.
Repeat the procedure above until no more codewords have the degree of 1.
The key factor in the LT coding process is the LT degree distribution Φ(ℓ), which refers to the probability that an encoding symbol has degree ℓ. Designing proper degree distributions is of great importance since it both affects the encoding and decoding costs and the overhead. The random procedure of the LT process is completely determined by Φ(ℓ), the number of encoded symbols and the number of information symbols . Here, the Robust soliton distribution, which has been proved to have a good performance by Luby [216], is applied as the degree distribution. Before going through the Robust Soliton distribution, the Ideal soliton distribution is introduced at first. It is defined as:
(ℓ) = ⎩ ⎨ ⎧ 1 ℓ ℓ = 1 1 ℓ(ℓ − 1) ℓ = 2, … , (5.20)
To obtain the Robust Soliton distribution (ℓ), an auxiliary function (ℓ) is defined:
(ℓ) = ℛ ℓ ⁄ ℓ = 1, … , ⁄ℛ − 1 ℛ (ℛ⁄ ) ℓ = ⁄ℛ 0 ℓ = ⁄ℛ + 1, … , (5.21) Then: (ℓ) = (ℓ) + (ℓ) ∑ℓ [ (ℓ) + (ℓ)] (5.22)
where is the allowable failure probability of the decoder to recover the data for a given number of encoding symbols and ℛ = ln ( ⁄ ) for some suitable constant
141
In addition to the BER, the energy efficiency of the code is also of interest. The power consumption for codeword is = , where is the codeword weight and is the symbol power, which is here normalised to be unity as previous sections. However, it is assumed that each codeword carries ( ) bits of information, which is different from previous sections.
5.6.2 Analytical results
In a similar way to the analysis of MECs, LT codes are compared with Hamming (7, 4) and (15, 11) codes in terms of BER and energy consumption as well. LT codes satisfy the minimum Hamming distance required by Hamming codes so here the respective LT information bit lengths are set to be 4 and 11. Furthermore, is 0.2 as chosen in [186] and the number of encoding symbols for LT codes is =
{∑ ( ) + ( )}. The error correction and energy performances of LT codes and
Hamming codes over a 4μm transmission distance are illustrated in Figure 5.9 and Figure 5.10.
142
Figure 5.9 BER comparison between LT codes and Hamming codes.
Figure 5.10 Average energy per bit comparison between LT codes and Hamming codes.
Figure 5.9 shows that although system reliability is improved by using both Hamming Codes and LT codes, LT codes have a better performance when the
101 102 103 10-12 10-10 10-8 10-6 10-4 10-2 100
Molecules per bit
B E R Uncoded LT code (c=0.01) LT code (c=0.03) (7,4) Hamming (15,11) Hamming 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11
Molecules per bit
E n e rg y p e r b it , J (7,4) Hamming LT code (k=4, c=0.01) LT code (k=4, c=0.03) (15,11) Hamming LT code (k=11, c=0.01) LT code (k=11, c=0.03) 70 75 80 1 1.5 2
143
number of molecules per bit is smaller than approximately 400, with a smaller BER. Also, for LT codes, larger values of the constant achieve a smaller BER. It should be noted that for LT codes, the BER performances are almost the same for different values of when the same value of is applied, thus are not plotted in Figure 5.9. The coding gain can again be determined by taking the ratio of the number of molecules for a given BER in the uncoded and coded cases. Thus at the 10 BER level, the coding gains for the Hamming codes are 0.94 dB and 1.65 dB for the (7, 4) and (15, 11) codes respectively, and for the LT codes, the figures are 1.37 dB and 2.04 dB for = 0.01 and = 0.03 respectively. In general, the system performance is better with a lengthy codeword. Figure 5.10 shows that LT codes consume more energy per bit values on average when the number of molecules per bit is smaller than approximately 60. The overall shape of the curves conforms to those seem with the other codes. For small numbers of molecules per bit extra energy is needed to deal with unreliable decoding but this effect levels out as the number of molecules per bit increases.
5.7
Summary
Based on the bacterial communication model proposed in Chapter 4, it has been indicated that the diffusion-based MC channel depends on free molecular diffusion to transport the information symbols from the transmitter to the receiver. To obtain the channel reliability, the BER is required and this is strongly affected by channel noise caused by ISI. Moreover, due to the limited storage capabilities and output power range of the bio-nanomachines, energy consumption is considered as a key factor as well. An error correction coding mechanism is considered to be an effective way to enhance the channel reliability and reduce the energy consumption. In this chapter, an overview of the existing related work has been presented, followed by
144
brief summaries of the essential features of FEC techniques. Three different categories of error correction codes, Hamming code, MEC and LT code, with OOK modulation, have been developed and applied as channel codes. The proposed MEC coding achieves the minimum codeword energy and guarantees a minimum Hamming distance at the price of lengthy codewords. Also, MECs are developed by keeping the codeword distance unconstrained but with a minimum average code weight. The LT codes are also considered as rateless codes, providing an endless stream of codewords for decoding. The parameters chosen conform to the methods used in the existing literature. Thus the results are believed to be representative of the performance of the coding schemes in the field. Two key factors, viz. BER and average energy consumption per bit, have been investigated. The results show that all the three kinds of codes offer coding gains which can be several dBs. Specifically, when the number of bacteria in the receiver node is 100 and the transmission distance is 4 , at the 10 BER level the coding gains for the Hamming codes are 0.94 dB and 1.65 dB for the (7, 4) and (15, 11) codes respectively; for the MECs, the figures are 4.67 dB and 9.14 dB for = 2 and = 2 respectively, indicating that MEC outperforms Hamming codes and that (15, 11) Hamming codes perform better than (7, 4) Hamming codes, with a larger coding gain. For the LT codes, the coding gains are 1.37 dB and 2.04 dB for = 0.01 and = 0.03 respectively. In addition, among the three proposed coding schemes, LT codes consume more energy than Hamming codes and MECs, especially when the number of molecules per bit is smaller than approximately 60, whilst MECs exhibit the lowest mean energy consumption per bit. However, it should be noted that the energy analysis is approximate, since it must rely on estimates of the energy dissipation in the nano processing units as no complete bio-node architecture is yet available.
145