CiteSeerX — Graph-based iterative decoding algorithms for parity-concatenated trellis codes

(1)

Graph-Based Iterative Decoding Algorithms for Parity-Concatenated Trellis Codes

Qi Wang, Student Member, IEEE, and Lei Wei, Senior Member, IEEE

Abstract—In this paper, we construct parity-concatenated trellis codes in which a trellis code is used as the inner code and a simple parity-check code is used as the outer code. From the Tanner–Wiberg–Loeliger (TWL) graph representation, several iterative decoding algorithms can be derived. However, since the graph of the parity-concatenated code contains many short cycles, the conventional min-sum and sum-product algorithms cannot achieve near-optimal decoding. After some simple modifications, we obtain near-optimal iterative decoders. The modifications include either a) introducing a normalization operation in the min-sum and sum–product algorithms or b) cutting the short cycles which arise in the iterative Viterbi algorithm (IVA). After modification, all three algorithms can achieve near-optimal performance, but the IVA has the least average complexity.

We also show that asymptotically maximum-likelihood (ML) de- coding and a posteriori probability (APP) decoding can be achieved using iterative decoders with only two iterations. Unfortunately, this asymptotic behavior is only exhibited when the bit-energy-to- noise ratio is above the cutoff rate.

Simulation results show that with trellis shaping, iterative de- coding can perform within 1.2 dB of the Shannon limit at a bit error rate (BER) of4 10 ⁵for a block size of 20 000 symbols. For a block size of 200 symbols, iterative decoding can perform within 2.1 dB of the Shannon limit.

Index Terms—Concatenated codes, iterative decoding algo- rithms, trellis-coded modulation (TCM), Tanner–Wiberg–Loeliger (TWL) graph, Viterbi algorithm (VA).

I. INTRODUCTION

T

RELLIS-CODED modulation (TCM) has been widely used as a combined coding and modulation technique for digital transmission over band-limited channels. Ungerboeck has shown that significant coding gains over uncoded modulation can be achieved using TCM with Viterbi decoding without sacrificing bandwidth efficiency on a band-limited channel [1], [2]. In the past decade, many variants of the basic TCM scheme have been developed to obtain higher coding gains [3], [4].

Recently, much interest has been focused on compound codes which are composed of a collection of interacting constituent codes. Examples of such compound codes include:

Manuscript received December 15, 1999; revised September 8, 2000.

Q. Wang was with the Department of Engineering, FEIT, Australian National University, Canberra, ACT 0200, Australia. He is now with the Institute for Telecommunications Research, University of South Australia, Mawson Lakes, SA 5095, Australia (e-mail: [email protected]).

L. Wei was with the Department of Engineering, FEIT, Australian National University, Canberra, ACT 0200, Australia. He is now with the Institute of Telecommunications Research (TITR), School of Electrical, Computer and Telecommunications Engineering, Faculty of Informatics, Iniversity of Wollon- gong, Wollongong, NSW 2522, Australia (e-mail: [email protected]).

Communicated by G. D. Forney, Jr., Guest Associate Editor.

Publisher Item Identifier S 0018-9448(01)01338-4.

serially concatenated codes [5], parallel concatenated codes [7], [6], Gallager’s low-density parity-check codes [8], various product codes [9], parity-concatenated codes [28], turbo TCM [22], and multilevel codes [23], [24]. Many iterative decoding algorithms have been developed, and excellent performance has been achieved. Our research attention has focused on iterative Viterbi decoding for parity-concatenated codes. Cabral, Costello, and Chevillat [25] noted significant similarities between the turbo decoding method and the bootstrap iterative decoding method developed by Jelinek and Cocke [26], [27].

Wei extended the results of [25]–[27] to near-optimally decoded parity-concatenated convolutional codes. One of the simplified bootstrap algorithms, which uses only the Viterbi algorithm (VA), was reported in [28] and is called the iterative Viterbi algorithm (IVA). In [31], the IVA was extended to decode parity-concatenated trellis codes and excellent performance has been achieved. A key benefit of the IVA and the parity-concatenated code (which is an example of a Forney concatenated code) is that concatenated codes and the VA have been widely used in industry for many years and manufacturers may adopt the IVA and parity-concatenated codes without significantly modifying their existing systems.

In [13], Tanner generalized Gallager’s low-density parity- check (LDPC) codes to codes on general bipartite graphs. In [12], Wiberg made a key extension of Tanner’s work to include trellis codes. The graph representation of these coding schemes, known as a Tanner–Wiberg–Loeliger (TWL) graph, provides an important recent unification [11], [12]. With the TWL graph, all the iterative decoding algorithms in [5]–[9] can be classified as one of two types of algorithms: min-sum and sum-product algorithms. In [11], a detailed description of these algorithms is given as well as a review of their rich history.

Complexity versus performance has been one of the major research areas in coding theory for many years (see the special issue of the IEEE TRANSACTIONS ONINFORMATIONTHEORYon Codes and Complexity, November 1996). Recently, much research effort has been focused on understanding the min-sum and sum-product algorithms for graphs with a single cycle (e.g., a tail-biting trellis code) [17]–[20]. It has been shown that nei- ther min-sum nor sum-product algorithm is optimal for a graph with a single cycle, but they are much simpler than optimal decoding algorithms and often achieve near-optimal performance [17]–[20], [37]. In [29], Wei showed that for a very short block (56 information bits) the IVA performance is 1 dB away from the Shannon sphere-packing bound [29], which is better than turbo codes with the same block length. An obvious question is whether the IVA performs better than the min-sum and sum- product algorithms in a graph with short cycles. In this paper,

(2)

Fig. 1. Encoding structure for parity-concatenated trellis codes.

we study how to improve the performance of the min-sum and sum-product algorithms for parity-concatenated trellis codes.

The paper is organized as follows. In Section II, the encoding structure, which concatenates a single-parity-check (SPC) code with an Ungerboeck’s TCM scheme, is described. In Section III, we present the TWL graph representation of parity-concatenated codes. We then present several iterative decoding algorithms for parity-concatenated trellis codes in Section IV, and show that the standard min-sum and sum-product algorithms need to be improved in order to achieve near-optimal decoding.

Section V presents numerical results and Section VI concludes the paper.

In this paper, we focus on additive white Gaussian noise (AWGN) channels.

II. CONSTRUCTION OFPARITY-CONCATENATEDTRELLIS

CODES

Concatenation is a specific method of constructing long codes from shorter codes. This method was first proposed by Forney [45]. A typical code is formed from two codes which are sepa- rated by an interleaver. The so-called parity-concatenated trellis code [31], [32] is an example of a Forney concatenated code.

That is, we use an SPC code as the outer code and an Unger- boeck trellis code as the inner code. The SPC has been used in [51], [28]. The reasons for selecting the SPC code as the outer code can be found in [29] and [23].

In this section, we will discuss the encoder of the parity-con- catenated trellis code in detail. Several concepts (such as full and partial parity protection, single and double parity-check struc- tures) will be introduced.

A block diagram of the parity-concatenated system is shown in Fig. 1. Assume that the output of an information source is a sequence of the binary digits “ ” and “ .” At the encoder, this binary information sequence is segmented into symbols of bits each, where is the number of bits per unit time that enter the inner trellis encoder. symbols are lined up as a vector and then encoded into an -symbol codeword by adding a parity-check symbol at the end of the vector. Define as the

th symbol in the vector and , for

. The parity-check symbol, , is given by (1) where

(2) are the parity-check bits, denotes modulo- addition, and the

notation , for , denotes informa-

tion bits. If , then we refer to the structure as full parity protection. If , then we refer to it as partial parity protec- tion, since only part of symbols are protected by the parity-check outer code. For partial parity protection, the first symbols consist of information bits each and the last symbol of the

Fig. 2. Block interleaver for parity-concatenated trellis codes.

Fig. 3. Schematic representation of a typical trellis code.

codeword consists of information bits. The code rate is .

Then, consecutive codewords are stored by row in an -symbol interleaving buffer, and are read out by column in symbols (as shown in Fig. 2). These symbols are then fed into the trellis encoder one symbol at a time. The trellis encoder consists of a rate- linear convolutional encoder and a mapper (as shown in Fig. 3). As usual, the first bits of each symbol are applied to the convolutional encoder. The output of the encoder consists of bits, called coded bits, which are used to choose one of partitions in the constellation.

This means that the constellation has been partitioned into subsets according to Ungerboeck’s set partitioning principles.

The remaining bits, called uncoded bits, are used to select one of the points in the chosen subset.

Let denote the output symbol corresponding to input

of codeword and . If , the

convolutional encoder is linear and the trellis is tail-biting, then we have [29]

(3) The parity constraints in (3) are very important for constructing iterative decoding algorithms in Section IV, since updating the extrinsic information is based on these constraints.

Three relevant issues will be discussed here: 1) partial versus full parity protection, 2) how to terminate the whole block of consecutive codewords (i.e., tail-biting or not), 3) single versus double parity check.

1) Partial Versus Full Parity Protection: The asymptotic performance of a trellis code is largely determined by the

(3)

minimum squared Euclidean distance between the transition paths determined by the coded bits, denoted as , and the minimum squared Euclidean distance between the parallel transition paths determined by the uncoded bits, denoted as . If we restrict the parity constraints to the coded bits,

i.e., and , then we can further

reduce the error probability of the coded bits. If we apply the parity constraints not only to the coded bits, but also to the uncoded bit(s), i.e.,

and , then both coded and uncoded bits can be further protected. However, the larger the number of parity-check protected bits, the lower the rate of the outer code. The rate loss will result in increasing the gap between the Shannon limit and the code performance. Thus, in terms of approaching the Shannon limit, the performance improvement introduced by the outer code may not be able to compensate for the additional gap caused by the rate loss, if the code is not designed properly.

Many trellis codes have , and erroneous

decisions occur much more frequently among the coded bits than the uncoded bits. Therefore, for these codes, partial parity protection can achieve a better performance in terms of approaching the Shannon limit. In this paper, we will focus on partial parity-check protection (i.e., ) unless otherwise mentioned.

2) Tail-Biting or Not: There are two methods to terminate the encoder after encoding a block of symbols (see Fig. 2). The first method is to force the encoder into a known state. If the encoder consists of shift registers connected in a feed-forward form (called a feed-forward encoder), then such a termination can be done by using zero bits to flush the contents of shift registers, i.e., force the encoder into the all-zero state. One of the advantages of this method is that the decoder knows the terminal state, and thus it can achieve better performance [52], [50]. The significant disadvantage is the rate loss due to the tail bits.

The second method is to terminate the encoder at its starting state (i.e., to form a tail-biting cycle) [21]. For the feed-forward encoder, we can do so by feeding the last information bits of the block into the shift registers before starting the encoding procedure. The advantage of the second method is no rate loss.

The disadvantages can be found in [17]–[21].

For high-rate TCM, a small amount of rate loss will result in a significant reduction in the Shannon limit. Thus, in this paper, we will focus on tail-biting trellis codes except in some specifically mentioned cases. The other reason for selecting the tail-biting structure is to preserve the parity constraints at the outputs of the encoder (see (3) and [29]).

3) Single and Double Parity-Check Structures: Up to this point, we have described a parity-concatenated trellis code, in which the outer code is an SPC code. We call this structure an SPC structure. During the study, we found that a more pow- erful concatenated code can be built through the following procedures.

At first, blocks of symbols are stored by row

in the first rows of a -symbol memory

device, as shown in Fig. 4. Each row is partially protected and encoded by a tail-biting trellis encoder as described in the SPC structure. The th row is constructed as follows. Each

Fig. 4. Double parity-check structure.

coded bit of the symbol in the th row is the parity check of the coded bits of the first rows in the corresponding column. Each uncoded bit of the symbol in the th row carries information like uncoded bits in the SPC structure.

Finally, the symbols are read out by row and mapped into a constellation for transmission.

All of these enhancements (tail-biting, partial protection, and double parity-check structure) try to ensure that the decoding performance can approach the Shannon limit as closely as possible.

III. TWL GRAPH REPRESENTATION FOR

PARITY-CONCATENATEDTRELLISCODES

In this section, we will describe a graph representation for parity-concatenated trellis codes. For simplicity, we will only focus on the SPC structure.

A TWL graph is a graphical representation of a code corresponding to a set of parity checks that specify the code.

Each symbol, parity check, and state are represented by a symbol node, a check node, and a state node, respectively.

It has been shown that turbo codes, LDPC codes, and other compound codes can be represented by TWL graphs [11], [12]. As an example, Fig. 5 illustrates the connection between the conventional trellis and graph representations for a simple four-state Ungerboeck code. In Fig. 5(b), a state node denotes a four-state state space (i.e., and ), a coded symbol node represents four possible outputs ( , , and ) and a parity-check node corresponds to the parity-check constraint between two state nodes.

Fig. 6 illustrates the graph representation of a parity-concatenated trellis code based on the SPC structure with and . In Fig. 6, we introduce a new type of node called the partial parity-check node. Based on the graph representation, we will study iterative decoding algorithms in the next section.

IV. GRAPH-BASEDITERATIVE DECODING FOR

PARITY-CONCATENATEDTRELLISCODES

In this section, we will first review conventional two-way iterative decoding algorithms [11] for a cycle-free graph. We then present iterative decoding algorithms for parity-concatenated codes. Then, we give a graph interpretation of a simplified iterative algorithm called the iterative VA which was introduced by Wei [28]. Fianlly, we show that for parity-concatenated codes the two-way algorithms need to be modified in order to achieve better performance.

(4)

Fig. 5. Trellis and graph representations for a four-state trellis code.

Iterative decoding is a generic term for decoding algorithms whose basic operation is to modify some internal states in small steps until a valid codeword is reached [12]. In our framework, the iterative decoder utilizes the parity-check constraints intro- duced by the outer code to “collect” the extrinsic information and then feed it back for further decoding. Fig. 7 illustrates the concept of an iterative decoder. It shows that iterative decoding includes two parts: decoding the trellis code and up- dating the extrinsic information. Both can be implemented using a two-way algorithm (TWA) [11], [13]. In [11], Forney summarized the TWAs and their rich history. Many standard decoding algorithms for turbo codes, LDPC codes, and other compound codes can be grouped as one of two types of TWAs: min-sum and sum-product algorithms.

A. Conventional Min-Sum and Sum-Product Algorithms for Cycle-Free Graphs

Now we review the conventional min-sum and sum- product algorithms. In [11], [12], conventional min-sum and sum-product algorithms have been explicitly described. In this subsection, we will briefly review these algorithms.

The min-sum algorithm is used to find the most likely decision on the codeword , i.e.,

(4) where represents a set of all possible concatenated codewords, denotes a valid codeword including the parity-check outer code, represents log-likelihood weights , denotes the noise-corrupted obser- vations at the receiver, and denotes the probability density function. For a cycle-free graph, if we cut any edge, the graph separates into two parts, called “upstream” and “downstream.”

The weight of any codeword may correspondingly be

expressed as the sum of the upstream

and downstream weights and , respectively. The

Fig. 6. Graph representation for a parity-concatenated trellis code.

Fig. 7. Block diagram of an iterative decoder.

upstream and downstream weights can be computed separately by the min-sum algorithm. With the sequential updating order, the min-sum algorithm for a finite cycle-free graph can be summarized as follows [11].

The Min-Sum Algorithm for a Finite Cycle-Free Graph

Step (a) Start at the “leaf” nodes, which are connected to the graph via a single edge; the upstream values of the edge connected to the leaf node are simply the values of that leaf node.

Step (b) At each interior node, update the weights of the values of each outgoing edge as soon as all incoming weights are known.

The update rules are as follows.

1) For an edge downstream of a symbol or state node, for each possible value xi of symbol node Vi, the outgoing weight is simply the sum of the incoming weights into the node plus the local weight of the node itself.

(5)

2) For an edge downstream of a parity-check node, for each value xi, the outgoing weight is the minimum of the sum of the incoming weights over the set of all incoming configu- rations consistent with xi.

Step (c) Repeat Step (b) until reaching every leaf node.

Step (d) The sum of the upstream and downstream weights gives the final weights of each value of each edge, and thus of its as- sociated symbol node.

The sum-product algorithm is exactly the same as the min-sum algorithm, except that the “min” is replaced by “sum,”

and the “sum” is replaced by “product” and the weights are replaced by . Thus, we will only focus on the min-sum algorithm in the rest of this section. For a cycle-free graph, both min-sum and sum-product algorithms are optimal [12].

B. Min-Sum Algorithm for Parity-Concatenated Trellis Codes As shown in Fig. 6, there are many short cycles in the graph.

For a graph with cycles, we can apply the min-sum algorithm as follows.

The Min-Sum Algorithm for a Parity-Concaten- ated Code

Step (a) Set n = 1. The weight of each symbol Vi; j is set to 0 log P (Ri; jjVi; j).

Step (b) At the nth iteration, we cut all partial parity-check nodes and apply the min-sum algorithm to an “unwrapped” trellis [17], [20] (as shown in Fig. 8). We obtain a weight for every symbol node. Define !ⁿ_V as the weight of symbol node Vi; j at the nth iteration.

Step (c) We then update the weight of each symbol node using the min-sum algorithm based on !ⁿ_V and the partial parity-check node.

Define !_Vⁿ⁺ as the updated weight of symbol node Vi; j. Fig. 9(b) illustrates the update procedure, which will be discussed in detail later.

Step (d) Set n = n + 1. Repeat Steps (b) and (c) using the updated symbol weights of the previous iteration, i.e., !_V⁽ⁿ⁰¹⁾⁺. The process continues until all parity constraints are satisfied or if n reaches a preset maximum number of iterations.

Only the partial parity-check node is new here. Thus, we go through an example to illustrate how to update the weight of a symbol via its partial parity-check node.

Consider a parity-concatenated code using the SPC structure

with , , , , and . Assume

that a linear rate– convolutional code is used and the output symbols are mapped into an 8PSK constellation (see Fig. 9(a)).

Fig. 8. An “unwrapped” trellis used in the min-sum algorithm for a parity-concatenated trellis code.

(a)

(b)

Fig. 9. Illustration of the updating procedure for the min-sum algorithm.

(6)

Now we focus on updating the weight for the symbol node . After Step (b) of the first iteration, suppose that

for constellation points respectively. The updating procedure can be divided into three stages.

Stage 1: List all subsets for every starting symbol node, i.e., and . Compute the weight of each subset by selecting the smallest weight in the subset. For example, in the subset of there are two constellation points and , with weights equal to and , respectively. Thus, the weight of this subset is .

Stage 2: Calculate the weight for every subset after the par- tial parity-check node is computed via the min-sum algorithm.

For example, for the subset , we have

Stage 3: List all subsets and the weights of the updating symbol, i.e., . Then, the updated weight for each subset is equal to the sum of its own weight and the corresponding subset weight . For example, for subset ,

.

Clearly, the number of comparisons and selections required

for updating each symbol are and

, at Stages 1 and 2, respectively.

The number of additions required per symbol is

and at Stages 2 and 3, respectively. Thus, for the above example, the total number of comparisons and selections per symbol is 20 and the total number of additions per symbol is 40. If we select a quadrature amplitude modulation (QAM) constellation with 256 points, , , and , then the numbers of comparison selections and additions per symbol are 5720 and 1472, respectively, which are far more complicated than the 256-state VA. However, we found that the updating procedure can be significantly simplified. This leads to our IVA.

C. The IVA for Parity-Concatenated Trellis Codes

The IVA in [28], [29] is a simplified min-sum algorithm. It follows the same steps as the above min-sum algorithm, except that the two-way min-sum algorithm in Step (b) is replaced by the one-way VA and the updating procedure in Step (c) is sim- plified significantly as follows. We go through these stages with the assistance of the example given in Fig. 9 and again we focus on updating the weight for the symbol node .

Stage 1: Randomly select a symbol (say ) from the set of all starting symbols, i.e., and in this example. List all subsets for this selected symbol. Compute the weight of each subset by selecting the smallest weight in the subset. Here the weight is the local weight (branch metric), i.e.,

since the standard VA does not produce soft outputs. Suppose

Fig. 10. Illustration of the updating procedure for the IVA.

then the weights for subsets

are and , respectively, as shown in Fig. 10.

Next, list the VA decision for the rest of starting symbols. Sup-

pose the decision for is after the

VA in Step (b).

Stage 2: Stage 2 is divided into two parts (a) and (b).

Stage 2 (a): Copy the weights from the selected symbol and then compute by modulo- addition of the corresponding bits of all starting symbols. For example, by modulo- addition of (of ) and (of ), we have . Similarly, by modulo- addition of (of ) and (of ), we have . Listing , , and copying the weight , we obtain the first row of the weight table in Stage 2 (a) as shown in Fig. 10.

Stage 2 (b): The weights are scaled down by a factor , which is introduced to control error propagation.

Stage 3: Stage 3 is identical to Stage 3 in the min-sum algo- rithm.

The numbers of comparison selection and addition per symbol can be dramatically reduced to and , respectively.

The two-way algorithms of Section IV-B are suboptimal.

But for many codes such as turbo codes and LDPC codes without small cycles, their performance can closely approach the Shannon limit. However, for the parity-concatenated codes the min-sum and sum-product algorithms perform much worse than the IVA, as we will show in Section V. In the next

(7)

subsection, we will study how to modify the iterative decoding algorithms for parity-concatenated codes.

D. Modified Iterative Min-Sum/Sum-Product Algorithms The poor performance is largely due to the large number of short cycles in the graph for parity-concatenated trellis codes. In turbo codes and LDPC codes, such short cycles can and should be avoided in code construction. Consequently, the “spread” interleaver [48] was proposed. However, for parity-concatenated trellis codes we can do nothing to change the code structure.

Can we find a way to fix the iterative decoding algorithms? In this subsection, we will first study the asymptotic behavior of the min-sum and sum-product algorithms, which will indicate how to fix these algorithms.

When is large, where is the energy per bit and denotes the (one-sided) noise power spectral density, error events with minimum distance will commonly dominate the error performance. Here, the error event is defined in the tra- ditional way such that it starts when the error path differs from the correct path and ends when the two paths agree again for the first time. Now we study maximum-likelihood (ML) and a posteriori probability (APP) decoders for a simple parity-con- catenated code which involves only a single error event.

Proposition 1: Considering the parity-concatenated trellis code given in Fig. 11,¹ where all rows transmit either path 1 or path 2 of an identical error event and the number of rows selecting path 2 is even, we can then construct an ML or APP decoder as follows.

Step (a) In the first iteration, apply the ML or APP algo- rithms to compute the metric (which is related to the ML or APP value) of the two paths in each row.

Step (b) Set .

Step (c) All rows other than row contribute their extrinsic information, i.e., the ML or APP metrics obtained in Step (a), to row . The extrinsic information is computed using the same procedure as in the conventional min-sum or sum-product algorithm, ex- cept that the extrinsic ML metrics are scaled down by a factor of and the extrinsic APP weights pass through a th-root device before being sent to row , where is the number of time units in which the symbol in the error path is different from the symbol of the correct path. In other words, can be obtained by subtracting the number of time units in which the two paths have the same symbol from the length of the error event. For parity-concatenated convolutional codes, equals the Hamming distance of the error event. Fig. 12 illustrates these operations.

Step (d) Reapply the ML or APP algorithms to row after the extrinsic information is sent to row and make a final decision.

1This figure can be obtained by rotating the TWL graph in Fig. 6 by90 and then terminating each row to the all-zero state.

Fig. 11. A simple parity-concatenated code.

Fig. 12. A TWL graph with information normalization.

Step (e) Set , and repeat Steps (c) and (d) until .

Proof: Delete the time units in which two paths have the same symbols, since they make no contribution to the decision. Let denote the metric of path in row , where

, denotes the re-

ceived signal sequence of row , and de-

notes the transmitted signal sequence of path in row . Then, the ML decoder for row is

(5)

(8)

where

(6)

(7)

The extra metric for every branch of the error event is . For branches, the extrinsic information has been added to row times; thus, without normalization, the decision for row will be

(8) which is not the ML rule. However, after normalization the decision of the algorithm is the ML decision.

Following a similar procedure, we can also prove that the normalized APP algorithm given in Proposition 1 is optimal.

There are three key differences between the conventional iterative decoding algorithms and those given in Proposition 1:

1) using a normalization factor; 2) only two iterations are re- quired; 3) extrinsic information is based on the first iteration of decoding, i.e., in Step (d), the extrinsic information is not updated. In Section V, we will show that the first modification (i.e., normalization) is very important for near-optimal iterative decoding. Thus, for a TWL graph containing many short cycles, we can achieve near-optimal iterative decoding using the min-sum/sum-product algorithms, except that the extrinsic in- formation in the short cycles must be normalized. We will also show that asymptotically the modified min-sum/sum-product algorithms need only two iterations, which is a very surprising result.

E. Iterative Decoding Algorithms for Double Parity-Check Structure

In the above discussion, we focused on the SPC structure.

These algorithms can be easily extended to the double parity- check structure. Now, we use the min-sum algorithm in Sec- tion IV-B as an example to illustrate how to extend it to the double parity-check structure.

The min-sum algorithm for the double parity-check structure has the same procedures as that for the SPC structure, except that 1) in Step (b) we need to apply the min-sum algorithm to each of blocks; 2) the weight of each symbol should be updated via two partial parity-check nodes, since each symbol node connects two partial parity-check nodes.

Here we go through an example to illustrate how to update the symbol weight via two partial parity-check nodes.

Consider a parity-concatenated code with three blocks (i.e., ). Each block has the same setting as that used in the example given by Figs. 8 and 9. We focus on updating the weight for symbol . Two partial parity-check nodes connect to this symbol. One connects with and , and the other connects with and . Applying Stages (1) and (2) in Fig. 9 to each node at a time, we obtain a set of weights for the subsets.

For two nodes, we obtain two sets of weights. Finally, at Stage

(3), the updated weight of symbol node is the sum of the set of its own weights and the two sets of weights corresponding to the two partial parity-check nodes.

Similarly, we can extend the IVA to the double parity-check structure.

V. NUMERICALRESULTS

In this section, we present simulation results for several types of parity-concatenated trellis codes. The trellis codes are typical Ungerboeck codes and the notation, following that of [1], is in octal form. In all cases, the rate loss due to the parity redundancy has been deducted from the computation.

Each simulation trial was terminated if 100 block errors were obtained, or if the total number of bits processed reached . Since we could not analytically derive the optimal performance of the parity-concatenated code, we will focus on the performance of a parity-concatenated code using the double parity- check structure and trellis-shaping technology [33]. This will allow us to directly compare the performance with the Shannon limit.

A. Conventional Iterative Decoding Versus Modified Iterative Decoding

First, we verify the results of proposition 1. In Fig. 13, we compare the bit error performance of the conventional iterative min-sum algorithm with the modified iterative min-sum algorithm for decoding a parity-concatenated trellis code using a 256-state trellis code with partial parity protection based on a double parity-check structure, in which , , and . Therefore, a total of 20 000 symbols per block are transmitted. The 256-state trellis code is the feed-forward form of

Ungerboeck code [3], mapped

onto a 256-QAM constellation. For the trellis code, , , and . A 16-state trellis-shaping code was applied in the simulation [31], [32].

Fig. 13 shows that the conventional min-sum algorithm without information normalization (i.e., ) performs much worse (about 0.8 dB) than the modified iterative min-sum algorithms. The figure also shows that the conventional min-sum algorithm (i.e., ) with five iterations performs worse than the one with only two iterations, which is due to error propagation caused by the too heavily weighted extrinsic information. For the 256-state trellis code, the length of the minimum-distance error event is , but some of the minimum-distance error events have the same symbol at the second time unit. For these error events, we have . In the other minimum-distance error events, all symbols of the error path are different from the symbols of the correct paths. Therefore, we have for these error events. Thus, we expect that the modified iterative min-sum algorithm with or will achieve the best asymptotic performance.

This figure shows that. When , error propagation is still severe. As approaches , error propagation becomes less of a problem. When reaches , the performance becomes worse again due to the underwighted extrinsic information.

In Fig. 14, we compare the modified iterative min-sum algorithm and the iterative ML algorithm given in Proposition 1.

(9)

Fig. 13. BER performance comparison between the conventional and modified iterative min-sum algorithm with different values ofd.

Fig. 14. BER performance comparison between the modified iterative min-sum algorithm and the iterative ML algorithm at high SNR.

(10)

Fig. 15. Performance of the modified iterative min-sum/sum-product algorithm for the 256-state parity-concatenated trellis code with double parity-check structure at a spectral efficiency of 5.805 bits/T and 20 000 symbols per block.

We use the parity-concatenated trellis code with an SPC structure, in which and . A total of 400 symbols is transmitted in each code block. A 16-state trellis code with using a 256-QAM constellation and 16-state trellis shaping is employed. Since the length of the minimum-distance error event is and the symbols of the two paths at all time units are different, we set . In the mod- ified iterative min-sum algorithm, the extrinsic information is updated following every application of the bidirectional VA algorithm to each row.

Fig. 14 shows that the iterative ML algorithm given in Propo- sition 1 is very close to the modified iterative min-sum algorithm at high SNRs, and the number of iterations required is only two.

At high SNRs, the performance of the ML algorithm with and iterations is close to the modified iterative min-sum algorithm. For low SNRs, a larger number of iterations is needed for near-optimal decoding.

B. Comparison of the Modified Iterative Min-Sum/Sum- Product Algorithms and the IVA

In Fig. 15, we compare the performance of the modified iterative min-sum/sum-product algorithms and the IVA for the same code used in Fig. 13. The peak iteration number was set to 20 for the modified iterative min-sum and sum-product algorithms ( ) and 50 for the IVA, since the IVA often con- verges much more slowly than the modified iterative min-sum

or sum-product algorithms. The 16-state trellis shaping is combined with each of these algorithms.

In Fig. 15, we see that the modified iterative sum-product algorithm achieves the best performance and the IVA gives the worst performance, but the difference in performance achieved by these algorithms is not substantial (less than 0.1 dB).

For comparison, the performance of a 256-state TCM scheme using the VA is also reported. The results show that at BER , about 2.7-dB coding gain can be achieved by the parity-concatenated code decoded using the modified sum-product algorithm compared to the VA with 16-state trellis shaping. In this case, the real spectral efficiency is 5.805 bits/ rather than 6 bits/ . The Shannon limit for this rate is 9.76 dB. Therefore, a performance 1.2 dB away from the Shannon limit at BER is achieved by the 256-state parity-concatenated trellis code using these iterative decoding algorithms. Thus, we can claim these modified iterative decoding algorithms are nearly optimal.

Fig. 16 shows the average number of iterations for each algorithm. From Fig. 16, we note that the modified iterative sum- product algorithm uses the least number of iterations, and the IVA uses the largest number of iterations. When the SNR increases from 11.0 to 11.2 dB, the average number of iterations for the IVA drops dramatically from 27.8 to 5.5. Considering the computational complexity of these algorithms and taking into account the additional “sum” and “product” operations needed for updating each symbol node, we found that the computational

(11)

Fig. 16. Average number of iterations for the iterative decoding algorithms for the simulation in Fig. 15

complexity of the modified iterative sum-product algorithm is still much higher than that of the IVA, especially at high SNRs.

In addition, we note that the error floors in Figs. 13 and 15 are mainly dominated by the errors in the uncoded bits at relatively low SNRs. We found that the error floors can be effectively reduced by using an appropriate Bose–Chaudhuri–Hocquenghem (BCH) outer code, as shown in [32].

In Fig. 17, we present the performance of the modified iterative min-sum/sum-product algorithms and the IVA for decoding the parity-concatenated trellis code based on an SPC structure with and . A 256-state trellis code and 16-state trellis shaping using 256-QAM are assumed. The real spectral efficiency is 5.800 bits/ . The peak iteration number is set to 50 for all algorithms. After deducting the Shannon limit shift caused by the rate loss due to parity-check bits, performance within about 2.1 dB of the Shannon limit is achieved using the IVA for a 200-symbol block. The figure shows that the IVA performs slightly better than the other algorithms and that the performance difference between the IVA and modified iterative min–sum and sum-product algorithms is within 0.15 dB.

Similarly, in Table I, we list the average number of iterations at several SNR values for each algorithm. We see that the average number of iterations is very low.

VI. CONCLUSION ANDDISCUSSION

In this paper, we studied several iterative decoding algorithms based on the TWL graph representation of parity-concatenated

TABLE I

AVERAGENUMBER OFITERATIONS OFSEVERALITERATIVEDECODING ALGORITHMS FOR THESIMULATION INFIG. 17

trellis codes. Since the TWL graph for these parity-concatenated codes contains many short cycles, the conventional iterative min-sum and sum-product algorithms do not achieve good performance. After some simple modifications, we obtain nearly optimal iterative decoders. The modifications include a) introducing a normalization factor in the conventional iterative min-sum and sum-product algorithms, or b) cutting the short cycles which results in an iterative VA. After modification, all three algorithms achieve near-optimal performance, but the IVA has the least average complexity.

We also showed that asymptotically the iterative ML and APP decoding algorithms can achieve near-optimal performance using only two iterations for parity-concatenated codes.

Unfortunately, this asymptotic behavior only shows up when is above the cutoff rate.

It is worth mentioning here that the reasons for selecting the feed-forward encoder as the inner code are twofold: a) the con-

(12)

Fig. 17. BER performance comparison of the iterative algorithms for a 256-state parity-concatenated trellis code with an SPC structure at a spectral efficiency of 5.800 bits/T and 200 symbols per block.

venience in forming a tail-biting trellis ring, and b) its performance. If we select a feedback encoder for the inner code, then a tail-biting trellis ring can be formed only at certain lengths, and not others. However, using a feed-forward encoder, we can construct a tail-biting trellis ring of any length. Also, during the initialization in each iterative decoding stage, the decoder needs to estimate the initial state for each trellis cycle, based on the previous iteration result. For feed-forward encoders, the initial state depends only on the previous decision bits. However, for feedback encoders, the initial state depends on the entire sequence, and one error in the sequence will result in an erroneous estimate. Thus, using feedback encoders the error rate of the system is higher.

The use of a nonrecursive inner encoder seems to contra- dict well-known published results, which show the importance of selecting recursive inner codes [5], [6], but it does not. The bounds in [5] and [6] show that the performance of concate- nated codes using a uniform interleaver can be improved as the interleaver length increases provided that a recursive systematic code (RSC) is selected as the inner code. We cannot generalize these results to claim that all concatenated codes with a nonrecursive feed-forward encoder for the inner code will not perform well. A typical example is a serial concatenated system using a Reed–Solomon code as the outer code and a convolutional code as the inner code. If it could be decoded optimally, then the system would perform better than the parity-concatenated codes. It is still an open question why iterative decoding is

near-optimal one for a wide range of codes and why for others it is not.

ACKNOWLEDGMENT

The authors wish to thank the three anonymous reviewers and G. D. Forney, Jr., for their constructive advice, which significantly improved the quality of this paper.

REFERENCES

[1] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 55–67, Jan. 1982.

[2] , “Trellis coded modulation with redundant signal sets, part I: In- troduction and Part II: State of the art,” IEEE Commun. Mag., vol. 25, pp. 5–22, Feb. 1987.

[3] E. Biglieri, D. Divsalar, P. J. McLane, and M. K. Simon, Introduction to Trellis-Coded Modulation with Applications. New York: Macmillan, 1991.

[4] F. Q. Wang and D. J. Costello, Jr., “Probabilistic construction of large constraint length trellis codes for sequential decoding,” IEEE Trans.

Commun., vol. 43, pp. 2439–2447, Sept. 1995.

[5] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Serial concatenation of interleaved codes: performance analysis, design, and iterative decoding,” IEEE Trans. on Inform. Theory, vol. 44, pp. 909–926, May 1998.

[6] S. Benedetto and G. Montorsi, “Unveiling turbo codes: Some results on parallel concatenated coding schemes,” IEEE Trans. Inform. Theory, vol. 42, pp. 409–428, Mar. 1996.

[7] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon limit error-correcting coding and decoding: Turbo codes,” in Proc. IEEE Int. Communications Conf. (ICC), Geneva, Switzerland, 1993, pp.

1064–1070.

(13)

[8] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA:

MIT Press, 1963.

[9] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 429–445, Mar. 1996.

[10] F. R. Kschischang and B. J. Frey, “Iterative decoding of compound codes by probability propagation in graphical models,” IEEE J. Select. Areas Commun., vol. 16, pp. 219–230, Feb. 1998.

[11] G. D. Forney Jr., “On iterative decoding and the two-way algorithm,” in Proc. Int. Symp. Turbo Codes and Related Topics, Brest, France, Sept.

1997.

[12] N. Wiberg, “Codes and decoding on general graphs,” Ph.D. dissertation, Linköping Univ., Linköping, Sweden, 1996.

[13] R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inform. Theory, vol. IT-27, pp. 533–547, Sept. 1981.

[14] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988.

[15] D. J. C. MacKay, “Good codes based on very sparse matrices,” IEEE Trans. Inform. Theory, vol. 45, pp. 399–431, Mar. 1999.

[16] R. J. McEliece, D. J. C. MacKay, and J. F. Cheng, “Turbo decoding as an instance of pearl’s ‘belief propagation’ algorithm,” IEEE J. Select.

Areas Commun., vol. 16, pp. 140–152, Feb. 1998.

[17] G. D. Forney Jr., F. R. Kschischang, and B. Marcus, “Iterative decoding of tail-biting trellises,” in Proc. Information Theory Workshop, San Diego, CA, Feb. 1998.

[18] S. M. Aji, G. B. Horn, and R. J. McEliece, “On the convergence of iter- ative decoding on graphs with a single cycle,” in Proc. IEEE Int. Symp.

Information Theory (ISIT’98), Cambridge, MA, 1998, p. 276.

[19] B. J. Frey, R. Koetter, and A. Vardy, “Skewness and pseudo codewords in iterative decoding,” in Proc. IEEE Int. Symp. Information Theory (ISIT’98) Cambridge, MA, USA, 1998, p. 148.

[20] J. B. Anderson and S. M. Hladik, “Tail-biting MAP decoders,” IEEE J.

Select. Areas Commun., vol. 16, pp. 297–302, Feb. 1998.

[21] G. Solomon and H. C. A. van Tilborg, “A connection between block and convolutional coded,” SIMA J. Appl. Math, vol. 37, no. 2, pp. 358–369, Oct. 1979.

[22] P. Robertson and T. Wörz, “Bandwidth-efficient turbo trellis-coded modulation using punctured component codes,” IEEE J. Select. Areas Commun., vol. 16, pp. 206–218, Feb. 1998.

[23] G. D. Forney Jr., M. D. Trott, and S.-Y. Chung, “Approaching AWGN channel capacity with coset codes and multilevel coset codes,” IEEE Trans. Inform. Theory, vol. 46, pp. 820–850, May 2000.

[24] U. Wachsmann and J. Huber, “Power and bandwidth efficient digital communication using turbo codes in multilevel codes,” Euro. Trans.

Telecommun., vol. 6, pp. 557–567, Sept. 1995.

[25] H. A. Cabral, D. J. Costello Jr., and P. R. Chevillat, “Bootstrap hybrid decoding using the multiple stack algorithm,” in Proc. IEEE Int. Symp.

Information Theory (ISIT’97), Ulm, Germany, June 29–July 4, p. 494.

[26] F. Jelinek and J. Cocke, “Bootstrap hybrid decoding for symmetrical binary input channels,” Inform. Contr., vol. 18, pp. 261–298, Apr. 1971.

[27] F. Jelinek, “Bootstrap trellis decoding,” IEEE Trans. Inform. Theory, vol. IT-21, pp. 318–325, May 1975.

[28] L. Wei, “On bootstrap iterative Viterbi algorithm,” in IEEE Int. Commu- nications Conf. (ICC’99), Vancouver, BC, Canada, June 6–9, 1999, pp.

1187–1192.

[29] , “Near Shannon limit iterative Viterbi algorithm for Forney con- catenated systems,” IEEE Trans. Inform. Theory, submitted for publica- tion.

[30] L. Wei and H. Qi, “Near-optimal limited-search detection on ISI/CDMA channels and decoding of long convolutional codes,” IEEE Trans. In- form. Theory, vol. 46, pp. 1459–1482, July 2000.

[31] Q. Wang, L. Wei, and R. A. Kennedy, “Near-optimal decoding for trellis-coded modulation using the BIVA and trellis shaping,” in Applied Algebra, Algebraic Algorithms and Error-Correcting Codes—13th Int.

Symp. (AAECC-13), Honolulu, HI, Nov. 1999, pp. 191–200.

[32] Q. Wang, L. Wei, and R. A. Kennedy, “Iterative Viterbi decoding, trellis shaping and multilevel structure for high-rate concatenated TCM,” IEEE Trans. Commun., submitted for publication.

[33] G. D. Forney Jr., “Trellis shaping,” IEEE Trans. Inform. Theory, vol. 38, pp. 281–300, Mar. 1992.

[34] A. R. Khandani and P. Kabal, “Shaping multidimensional signal spaces—Part I: Optimum shaping, shell mapping and Part II: Shell-ad- dressed constellations,” IEEE Trans. Inform. Theory, vol. 39, pp.

1799–1819, Nov. 1993.

[35] A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling on the Gaussian channel,” IEEE Trans. Inform. Theory, vol. 36, pp. 726–740, July 1990.

[36] J. E. Porath, “Algorithm for converting convolutional codes from feed- back to feedforward form and vice versa,” Electron. Lett., vol. 25, no.

15, pp. 1008–1010, July 20, 1989.

[37] R. Johannesson and K. S. Zigangirov, Introduction to Convolutional Coding. Piscataway, NJ: IEEE Press, 1999.

[38] G. D. Forney Jr., “Final report on a coding system design for advance solar mission,” NASA Ames Res. Ctr., Moffett Field, CA, Rep. NAS2- 3637, Contract NASA CR73167, 1967.

[39] H. Imai and S. Hirakawa, “A new multilevel coding method using error- correcting codes,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 371–377, May 1977.

[40] G. J. Pottie and D. P. Taylor, “Multilevel codes based on partitioning,”

IEEE Trans. Inform. Theory, vol. 35, pp. 87–98, Jan. 1989.

[41] G. D. Forney Jr. and G. Ungerboeck, “Modulation and coding for linear Gaussian channels,” IEEE Trans. Inform. Theory, vol. 44, pp.

2389–2415, Oct. 1998.

[42] J. G. Proakis, Digital Communications, 3rd ed. New York: McGraw- Hill, 1995.

[43] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Nu- merical Recipes in C. Cambridge, U.K.: Cambridge Univ. Press, 1992.

[44] G. D. Forney Jr., “Lower bounds on error probability in the presence of large intersymbol interference,” IEEE Trans. Commun. Technol., vol.

COM-20, pp. 76–77, Feb. 1972.

[45] , Concatenated Codes. Cambridge, MA: MIT Press, 1963.

[46] S. Couturier, D. J. Costello Jr., and F.-Q. Wang, “Sequential decoding with trellis shaping,” IEEE Trans. Inform. Theory, vol. 41, pp.

2037–2040, Nov. 1995.

[47] L. F. Wei, “Trellis coded Modulation with multidimensional constella- tions,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 483–501, July 1987.

[48] “Telemetry Channel Coding,” Jet Propulsion Laboratory, Report from CCSDS, Green Book, Oct. 1998.

[49] L. Wei, Z. Li, M. James, and I. Petersen, “A minimax robust decoding algorithm,” IEEE Trans. Inform. Theory, vol. 46, pp. 1158–1167, May 2000.

[50] L. Wei, T. Aulin, and H. Qi, “On the effect of truncation length on the exact performance of a convolutional code,” IEEE Trans. Inform.

Theory, pp. 1678–1682, Sept. 1997.

[51] L. Ping, S. Chan, and K. L. Yeung, “Iterative decoding of multi-dimen- sional concatenated single parity check codes,” in IEEE Int. Communi- cations Conf. (ICC’98), Atlanta, GA, 1998, pp. 131–135.

[52] S. Lin and D. J. Costello Jr., Error Control Coding: Fundamentals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983.