Analysis of Flexibility Requirements - Specialized Processor: The Soft-Output Sphere-Decoding A

6.6 Specialized Processor: The Soft-Output Sphere-Decoding ASIP

6.6.1 Analysis of Flexibility Requirements

In order to identify the functionality common to both depth-first and breadth-first approaches and to obtain the flexibility requirements of both these approaches, the control flow and the data flow are analyzed separately. The following subsections give a short overview about the requirements derived from the control-flow and data-flow analysis.

6.6.1.1 Control Flow Considerations

Abstracted examples of sphere-decoding control-flows for a breadth-first K-best traver- sal and a depth-first traversal are depicted in Figure 6.7. The following discussion briefly analyses these control flows and extracts potential commonalities suitable for ASIP instruction-set extensions.

Breadth-first sphere-decoding approaches traverse the combinatorial tree of candi- date symbols starting at the root by fully processing successively level by level without stepping back to higher tree levels (Figure 6.7a). Based on a parent reference level i+1 and a reference list of partial symbol vector candidates, the candidate symbol nodes for the current level i are computed and stored in an insertion list. For each descent to a lower tree level, these lists are swapped in order to minimize the memory requirements. Two nested loops are required for filling the insertion list. The outer loop needs to step through the reference list of parents while the inner loop generates the child nodes. The expansion degree of the children of a parent node differs among the breadth-first algorithms and thus needs to be kept flexible. For instance, a K-best

132 Chapter 6. Flexibility and Portability Aspects for Sphere-Decoding Implementations

insert root in ref. list

next parent node from ref. list

enumerate child node heap insertion (ins. list) last child? last parent? leaves done? swap lists &

descend done yes no yes no no yes (a)K-best push root enumerate child node pruning check leaf level? push & descend pop & ascend stack empty? enum. done failed passed no yes _no done yes (b)depth first

Figure 6.7: Abstracted exemplary control-flow examples for sphere-decoding algo-

rithms.

implementation needs to select the K best children among all parent nodes while SSFE based approaches have no dependencies between the children of different parent nodes. Therefore, the selection of enumerated nodes either requires no operation in the case of SSFE algorithms or a complex sorting operation for K-best algorithms. This sorting operation is typically realized by a heap-sort based algorithm utilizing a partially sorted balanced binary tree. Inside this tree called heap, every subtree stores its maximum metric at the root entry. With a fixed size of K entries, the final heap stores the iteratively refined list of the K best nodes. By storing the candidate node with the maximum metric at the heap root, a single comparison is required to

6.6. Specialized Processor: The Soft-Output Sphere-Decoding ASIP 133

determine if the heap needs to be updated. Heap insertions have a complexity propor- tional to the tree height and thus to ⌈log₂(K)⌉ in the case of K-best sphere decoding. Pure ASIC solutions typically utilize very low values of K ≤16 allowing single-cycle insertions and replacements. Even some ASIP implementations such as [11] use dedicated sorter units. However, the flexibility should not be limited to such low values of K for the ASIP designed in this work, particularly to provide sufficient support for soft-output.

Depth-first approaches traverse the combinatorial tree of candidate symbols also starting at the root but first descending to a single leaf node with the most promising metric (Figure 6.7b). During the descent, the path to the leaf node is saved as stack of parent nodes and enumeration states (“push” operations in Figure 6.7b). The states on the stack are recalled (“pop” operations in Figure 6.7b) when finishing the processing on a lower tree level and proceeding with the enumeration on a higher tree level. During the depth-first processing, every node needs to be checked against the pruning constraint before a further processing step can be performed. Although the depth- first approach does not exhibit explicit loops, the recursion can be formulated also in a loop with explicit stack management as indicated in Figure 6.7b.

As a similarity between depth-first and breadth-first traversals a generalized loop management can be identified, for instance for the enumeration of nodes and the processing of (stored) lists of nodes. These loops may have more than just a single point for loop continuation: For instance multiple paths lead to the parent-node list processing in Figure 6.7a. Independently from the algorithm, most of the loop continuation conditions originate from characteristic sphere-decoding states, e.g. the enumeration state or the results of pruning checks.

Aside from the commonalities of loop management, the access patterns required for node and state handling in different sphere-decoding algorithms (a stack for depth-first and two lists for breadth-first) differ. However, for both the depth-first and breadth-first traversal strategies, nodes, metrics and states need to be stored in similar data records. Only the access patterns (stack, heap, FIFO) to this storage differ and are thus dedicated for the application-specific flexibility of an ASIP. Therefore, a flexible sphere-decoding ASIP architecture requires

• an efficient loop management. Other than traditional zero-overhead loops (ZOL) with a fixed number of loop executions, the loops in the case of sphere decoding are bound to special conditions originating from constraint checks, tree levels and enumeration states.

• support for a flexible state and data record management bound to the current tree level. Relative references to the previous and/or next tree level are required. • support for flexible node and metric lists and/or history management.

6.6.1.2 Data Flow Considerations

For all sphere-decoding approaches, the data-flow similarities are significantly higher than the control-flow similarities. This is mainly related to the underlying enumera-

134 Chapter 6. Flexibility and Portability Aspects for Sphere-Decoding Implementations

tion process which only operates on the current tree level and hence does not depend on the tree-traversal policy. For enumerating the first child node for a parent node, the best candidate node needs to be enumerated. Once a child node is enumerated, the enumeration state is set up and the next best sibling candidate node can be enumerated. Assuming a column-wise enumeration as introduced in Section 3.5.3.1 and also used in the Cae2_{sar architecture, every enumeration step requires the metric and}

enumeration state update for one column and the selection of the node with the mini- mum metric among all columns. This is identical for different tree-traversal strategies, as long as a sequential execution is targeted as for this ASIP.

The use of an enumerated node is slightly different for depth-first and breadth- first approaches. In the breadth-first case, a partial candidate vector s(i) and its metric MP(s(i)) need to be stored, either in a simple list or on a heap as described in Sec-

tion 6.6.1.1. In the depth-first case, storing the node s_i and the metric MP(s(i)) on a

stack would be sufficient. However, for the sake of a unification with the breadth-first approach, also the full partial vector s(i) can be stored.

Summarizing the data-flow aspects, a flexible sphere-decoding ASIP architecture requires computational units for

• initializing the enumeration process for the current tree level by determining the first child of a parent node. This includes the computation of zi according to (3.37), the quantization to the best child candidate s(_i1) and the computation of its metric M′_C(s_i).

• updating the enumeration state of a given tree level i by computing the next best sibling node.

• computing the corresponding metrics as basis for decisions on the next enumer- ated node s(_ik+1).

• a flexible and efficient storage for data records consisting of partial vectors s(i) and their partial sum metrics MP(s(i)).

In document Efficiency and flexibility trade-offs for soft input soft output sphere decoding architectures (Page 141-144)