2.2 Soft-Input Soft-Output MIMO Detection
2.2.2 Suboptimal Tree Search-Based Algorithms
The worst-case complexity of depth-first sphere decoding, even after applying all the complexity-reduction techniques mentioned in Section 2.2.1, is O 2MTQ. Moreover, its inherently sequential nature and its variable runtime are significant hurdles on the way to hardware implementation. Therefore, much effort has been spent on reduc- ing and possibly fixing the complexity of sphere decoding and on transforming the
algorithm to enable the application of hardware design techniques such as paralleli- sation and pipelining. As a result, a multitude of tree search-based MIMO detection algorithms are available in the literature. The remainder of this section attempts to summarise the most relevant results for hardware implementation.
The first approach that drew attention as a hardware-friendly alternative to a depth-first tree search is breadth-first sphere decoding, whose most relevant example is the K-best algorithm [172]. This method traverses the tree only once, from the root to the leaves, and at every level only considers the K children with the lowest partial metrics. As a consequence, only K nodes have to be extended at any step in the tree search. K-best sphere decoding is characterised by a fixed complexity, since the number of candidate symbol vectors is decided a priori by fixing K, and it is easier to parallelise and pipeline than a depth-first search due to the one-way traversal and to the fewer dependencies among the nodes. Examples of hardware implementations can be found in [167], [139], [118] and [67], the latter being the only one to support iterative detection and decoding.
On the other hand, since the number of considered leaves is artificially restricted and the search only goes in the forward direction, the optimal solution may not be found, resulting in a communication performance penalty. In practice, especially at low SNR and for large QAM constellations, in order to include the optimal solution many candidates have to be considered and the complexity advantage with respect to depth-first sphere decoding decreases quickly. This issue gets worse when soft- output information has to be computed and hence many more candidates have to be considered to have good counter-hypotheses. Therefore, in the context of an iterative system the K-best algorithm achieves very limited performance gains over the IDD iterations, unless K is very large [67].
A similar approach to K-best is fixed-complexity sphere decoding (FSD), first intro- duced in [22] to target hard-output detection. While K-best sphere decoding considers the overall K best children on any given level, FSD extends from each node the same predefined number of its best children. For instance, near-ML hard-output perfor- mance can be achieved by considering all the 2Q nodes on the top tree level MT and then only the best child of each of them from level(MT−1)down to level 1, resulting in a total of 2Q candidate leaves [22].
FSD has similar properties to K-best SD when looking at hardware implementa- tion, with the additional advantage that it does not need a global sorting to find the subset of K nodes among all the K 2Q children that K-best SD considers on each level. On the other hand, the communication performance is degraded since the al- gorithm is more prone to missing the global optimal solution than K-best SD. FSD was extended in [23] to support iterative detection and decoding, with further im- provements in [44] and a first VLSI architecture in [43]. Similarly to K-best SD, the soft-input version of the algorithm suffers from a limited performance gain in the con- text of an iterative receiver, requiring a very large number of candidates to generate reliable soft information.
In order to overcome the drawbacks of breadth-first searches, a hybrid method was proposed in [102] and [104] under the name of tuple search; a corresponding
architecture, with gate-level results, was later described in [19]. The basic idea is to traverse the tree in a depth-first manner and save a list, or tuple, of the T leaves with the smallest overall metrics, instead of the bit-wise counter-hypotheses required for the max-log MAP performance. The largest metric among those stored in the tuple is used as the radius for tree pruning. The resulting algorithm requires a smaller list of candidates than the previously described breadth-first approaches due to the better quality of the search results. At the same time, complexity is reduced with respect to max-log MAP STS sphere decoding because the tree is pruned more quickly. This complexity advantage comes at a non-negligible performance loss [19].
Another algorithm that has led to a hardware implementation is trellis-based MIMO detection, which has many similarities with tree search-based detection. This approach represents the search space as a trellis rather than a tree, with each stage correspond- ing to an antenna and containing the 2Qnodes in the constellation. Developed in [156] for soft-output detection and later extended in [155] to include soft input, the algo- rithm searches the trellis for a list of candidates to compute the LLR values, similarly to the previously described sphere-decoding variants. Several heuristics are employed to reduce the search complexity, which is fixed. Although suboptimal, trellis-based detection shows a sightly better performance than K-best approaches.
To summarise, all the algorithms described in this section aim to approach the max-log MAP performance with a lower complexity than the STS sphere decoder in- troduced in Section 2.2.1. However, the performance gap is typically relevant unless the effort of the tree search is significantly increased, e.g., by substantially enlarging the set of considered candidates. As a consequence, the complexity of these subopti- mal algorithms approaches or even exceeds a max-log MAP depth-first tree search as the communication performance limit gets closer.
Moreover, in high SNR a depth-first search converges to the solution relatively quickly. Therefore, when targeting the same error-rate performance in the same op- erating point, the methods introduced in this section do not necessarily bring an advantage over depth-first sphere decoding. Furthermore, the efficiency of their im- plementation can be expected to be relatively low since many candidates that do not contribute to the final solution have to be included in the search not to miss the relevant ones. Intuitively, this property results in many unnecessary computations, which may affect significantly the energy efficiency of a hardware implementation even when parallelisation and pipelining can hide them from the throughput point of view. For these reasons, depth-first sphere decoding was herein preferred to breadth- first and other hybrid algorithms.