2.4 Key algorithms for future 4G SDR systems
2.4.1 MIMO-OFDM system model
OFDM is a block modulation scheme that converts a frequency-selective channel into many frequency at sub-carriers [SBM+04, B06]. The sub-carriers are orthogonal in time
domain, yet overlap in frequency domain, which leads to a good bandwidth eciency. OFDM is realized by an IFFT at the transmitter and an FFT at the receiver. In a MIMO-OFDM system, OFDM block modulation is used for multiple transmit antennas and the receiver applies multiple receive antennas. A block diagram of a MIMO-OFDM system is displayed in gure 2.12.
From a complexity point of view, the receiver side is the most demanding part of the physical layer in a MIMO-OFDM system. The major tasks that have to be performed at the receiver are the OFDM demodulation by an FFT, the detection of the most likely transmitted MIMO symbols (MIMO detection), the decoding of the FEC code (e. g. a turbo or LDPC code), time synchronization, frequency oset estimation and correction, and the estimation of channel parameters (i. e. channel matrix and SNR) [SBM+04].
Time synchronization, frequency oset estimation and correction, and channel estima- tion are usually performed using a preamble consisting of one or several training se- quences [SBM+04]. Once the necessary parameters have been computed, the receiver
Chapter 2 Overview of software dened radio principles and architectures IFFT CP DAC / RF F E C E n c o d in g IFFT CP DAC / RF M IM O : S p a ti a l M u lt ip le x in g o . M d M a p p in g o . M d M a p p in g TX FFT CPR ADC / RF F E C D e c o d in g FFT CPR ADC / RF M IM O D e te c ti o n Channel & SNR Estimation: RX
Figure 2.12: Block diagram of transmitter and receiver in a MIMO-OFDM system tracks changes for the duration of a frame, e. g. using pilot signals. None of these tasks depends on the incoming data.
OFDM demodulation, MIMO detection, and FEC decoding are tasks that have to be performed continuously on the incoming data, with the throughput requirements dened by the data rate of the MIMO-OFDM system. Hence, the computational complexity of the receiver is mostly dened by these three tasks. Therefore, OFDM-based block (de-)modulation, MIMO detection, and LDPC decoding have been selected to assess the potential performance and the limitations for SIMD processing of MIMO-OFDM systems on an SIMD-based SDR processor platform.
LDPC codes and turbo codes both can achieve error rates close to then Shannon limit. Yet, LDPC codes have an asymptotically better performance than turbo codes and enable trade-os between decoding complexity and performance [RSU01, RU01]. Furthermore, LDPC decoding algorithms are well suited for parallel processing. Therefore, LDPC de- coding has been investigated instead of turbo decoding.
2.4 Key algorithms for future 4G SDR systems MIMO symbol detection is performed based on sphere decoding, because sphere decoding algorithms oers an excellent BER performance (close to the optimum maximum likeli- hood solution). Furthermore, the high computational complexity of sphere decoding is a challenge for any baseband processing architecture.
Chapter 3
Scalable SIMD processor architecture
This chapter focuses on the development of a scalable SIMD processor architecture for SDR applications. The scalable SIMD processor architecture is described and design decisions are explained. Furthermore, the modeling in LISA and the methodology for evaluating the scalability of the architecture are explained. The scalable architecture has been synthesized and simulated for four dierent SIMD vector widths ranging from 128 bits to 1024 bits per vector and four dierent permutation network congurations. In section 3.1, design decisions for the scalable SIMD processor architecture are explained. The architecture is developed based on an analysis of baseband algorithms that have been implemented on the EVP [vHM+04, vHM+05, SVPG+10]. Following the description of
the processor architecture, section 3.2 explains the processor modeling using the LISA language [HNBM01, Hof02, L04, PHZM99, Pee02]. After briey introducing LISA, mod- eling issues for scalable SIMD models are described and a solution based on the GNU M4 [SPVB08] macro language is proposed. The section concludes with a discussion of the eectiveness of LISA for the development of large SIMD processors. The following section (section 3.3) briey discusses an alternative for LIW SIMD architectures based on vertical-horizontal vector operations [GZYC86]. Section 3.4 explains the methodology used for analyzing area, power, and performance of the dierent instances of the scalable SIMD processor architecture. The used tools and technologies are described and limitations of the methodology are explained.
3.1 Development of the SIMD processor architecture
based on algorithm requirements
The development of any processor architecture requires many design decisions. Instruction set and data type support have to be dened, register les must be dimensioned, and other features have to be dened as well. These decisions are not arbitrary and require careful consideration of the hardware complexity and, especially, the demands of the algorithms that shall be mapped on the processor architecture.
Chapter 3 Scalable SIMD processor architecture
Hence, as a rst step towards the development of a scalable SIMD processor architecture for SDR, the requirements of typical baseband algorithms have been identied. For this purpose, baseband algorithms that have been implemented on the EVP during a research project in cooperation with Nokia Siemens Networks were analyzed. Aspects such as typical word lengths, data types, useful instructions, conditional execution of operations, instruction level parallelism (ILP), register les, and support for permutation operations have all been considered. The implemented baseband algorithms that are the basis of the analysis are listed below:
• A linear minimum mean squared error (LMMSE) chip equalizer for single input, multiple output (SIMO) W-CDMA with two receive antennas and two times over- sampling at the receiver [WBAHS08b, WBAHS09]
• A spreader for HSDPA, which comprises the modulation mapping, coding by chan- nelization and scrambling codes, and the combining of dierent control and data channels [Tec07, WBAHS08b, WBAHS09]
• Matrix algorithms for MIMO OFDM systems [SM06]: QR decomposition, singular value decomposition (SVD) and the QRD-M algorithm for MIMO symbol detection • Radix-2 and mixed-radix FFTs for single carrier frequency division multiple access (SC-FDMA) and orthogonal frequency division multiple access (OFDM-A) in LTE [WBAHS08a]
In the following, the results of the algorithm analysis are presented. Based on this analysis, the scalable SIMD processor architecture has been developed.
In section 3.1.1, word lengths and data types are evaluated. The next section evaluates instructions and conditional execution modes that are useful for the considered baseband algorithms; based on the evaluation, the instruction set of the processor architecture is dened and instructions are mapped on processing units. Sections 3.1.3 and 3.1.4 discuss instruction level parallelism and the dimensioning of the register les. In section 3.1.5, alternatives for the vector permutation network are discussed. Section 3.1.6 lists further DSP features that have been implemented and summarizes the features of the scalable SIMD processor architecture.