The SDR principle describes a wireless communication device that implements the phys- ical layer processing exclusively or mostly on programmable processing architectures, for example digital signal processors (DSPs), application-specic processors (ASPs), or re- congurable architectures. Classic architectures for wireless communication are based on application-specic integrated circuits (ASICs), assisted by one or few DSPs [Ram07], and do not oer any exibility. The programmability of SDR solutions oers some advantages versus the traditional hardware solutions [SVPG+10]:
• SDR solutions enable performing software development and hardware design and verication in parallel. Furthermore, new protocols can be quickly implemented in software and mapped onto an existing SDR platform, without a hardware redesign. Therefore, development time and costs are signicantly lower than for classical ASIC solutions. Development costs are further reduced by the potentially higher chip volumes, as one SDR platform can be used for dierent applications.
• SDR solutions enable dynamically executing dierent wireless communication pro- tocols on the same processor architecture (multi-mode operation). Commonly used functions, such as lters, encoders/decoders, and transforms (FFT), can be adjusted
Chapter 2 Overview of software dened radio principles and architectures
at runtime. Multi-mode capability allows service providers to use one platform for dierent markets and oer more functionality to the end users. Furthermore, hardware costs are reduced if resources can be shared eectively between wireless protocols.
• The programmability of SDR systems also reduces maintenance costs, as new func- tionality (e. g. new standard releases) or bug xes can be applied either over-the-air or by other means of remote programming.
The keynote of these arguments is that SDR increases the exibility and reduces the costs of wireless communication devices. Yet, the increased exibility has its price, as SDR systems will always consume more power than an optimized ASIC-centered solution [Ram07]. The chip area of a programmable SDR architecture for one wireless protocol is also greater than the chip area using ASICs, but this can be canceled out by multi-mode support, i. e. multiple wireless protocols are supported on the same SDR architecture, while an ASIC-centered approach would require adding further ASICs, which increases the chip area.
One further argument sometimes used against programmable SDR solutions is the notion that many of the algorithms in the physical layer processing of wireless communication protocols could be eciently realized on dedicated hardware accelerators with limited re- congurability, e. g. allowing parameter adjustments for lters. However, this approach leads to an increased development time and increasing costs for designing accelerators that can support the requirements of multiple wireless protocols. Furthermore, an accelerator will only support one algorithm. Some critical physical layer tasks, such as MIMO sym- bol detection, can be performed by many dierent (possibly in-house) algorithms, which achieve dierent trade-os between algorithm complexity (e. g. runtime, required mem- ory) and algorithm performance (e. g. BER). An SDR architecture that oers sucient computing power can perform any of these algorithms, allowing companies to implement their own preferred solution.
Future 4G wireless protocols aim at data rates between 100 Mbps and 1 Gbps [Rep08, Tec09a]. Therefore, SDR solutions need to achieve high throughputs, while still comply- ing with the power restrictions of wireless communication devices. The power budget for baseband processing in mobile devices is approximately 500 mW (see [Lin08, Neu04]), with a power eciency of approximately 100 million operations per second (MOPS) per milli- watt required for 3G and even steeper requirements for 4G. Therefore, energy eciency is of essential importance for SDR systems. High energy eciency can be achieved by parallel processing and by employing application-specic instructions or processing units. Programmable architectures for SDR can be categorized into two philosophies recon- gurable architectures and architectures based on SIMD processors [Ram07].
2.1 Software dened radio
2.1.1 Recongurable SDR architectures
The design-ow for recongurable SDR architectures is as follows: First, algorithms or parts of algorithms that are used by multiple or all targeted wireless protocols are iden- tied. Next, recongurable data paths, which provide the necessary processing for this common functionality, are designed. The exibility of recongurable SDR architectures depends on the granularity of the decomposition into common data paths. An example for a ne-grain recongurable data path is a small DSP core, which implements elemen- tary functions, such as addition or multiplication. Coarse-grain recongurable data paths might implement complete algorithms, for example, a data path might be realized as an ASP for FFT processing.
Examples for ne-grain recongurable SDR architectures are University of Bologna's XiRisc processor architecture [LCB+06] and picoChip's picoArray [Pul08, BDT08]. The
XiRisc processor is a VLIW RISC processor, with two data paths with arithmetic and DSP-like functional units and a third data path based on a pipelined congurable gate ar- ray (PiCoGA) [LTC03]. Application-specic functions can be mapped onto the PiCoGA. The picoArray architecture consists of many independent RISC processors organized in a two-dimensional array. For example, the picoArray PC102 comprises 308 processors. Each processor executes its own instruction stream and processes its own data. Processors are connected by a high-speed time-division multiplexed bus system.
Intel's recongurable communication architecture (RCA) [CTC+04] and IMEC's ADRES
architecture [MVV+03, SVPG+10] are examples for coarse-grain recongurable SDR ar-
chitectures. Intel's RCA consists of a mesh of heterogeneous, recongurable processing elements (PEs), connected by routers. The PEs are optimized for dierent parts of the baseband processing, e. g. ltering or Turbo coding. The ADRES architecture combines a VLIW processor with a coarse-grained recongurable array. The coarse-grained array consists of recongurable functional units (FUs) with local memories. Neighboring FUs can exchange data without any register le accesses in between. The VLIW core and the FUs communicate through a global register le and shared memory. A typical ADRES instance consists of a 4 × 4 array [SVPG+10] with 128-bit SIMD FUs (12 recongurable
units and 4 units in the VLIW part of the architecture).
2.1.2 SIMD-based architectures for SDR
SIMD-based architectures for SDR utilize SIMD processor cores to achieve high energy eciency. The basics of SIMD processing are discussed in section 2.2. A system on a chip (SoC) for SDR based on SIMD processing consists of many SIMD processor cores and few ASICs [Ram07]. The ASICs accelerate algorithms that do not require programmability and/or cannot be eciently mapped on the SIMD processors. Figure 2.1 shows a block diagram of a SIMD-based SoC for SDR.
Chapter 2 Overview of software dened radio principles and architectures
SIMD DSP 1 ... SIMD DSP n
Acc. 1 ... Acc. m Memory Control
DSP
Figure 2.1: SIMD-based SoC for SDR
SIMD processor architectures for SDR can be categorized into short SIMD architectures and wide SIMD architectures. Most of the SIMD processors from both classes support long instruction word (LIW) execution (see chapter 3.1.3), enabling concurrent memory access and arithmetic operations.
Short SIMD architectures support SIMD data paths with few parallel lanes (typically four lanes). Examples are Sandbridge's Sandblaster SB3011 [GI06], Icera's DXP [Kno05], In- neon's MuSIC architecture [GRS07] and Linköping University's single instruction stream, multiple task (SIMT) architecture [Nil07, NTL09].
Wide SIMD architectures utilize 16 or more parallel data lanes. A higher degree of SIMD parallelism leads to better energy eciency, but higher levels of data parallelism are re- quired to utilize all data lanes eectively. This thesis analyses data parallelism in SDR algorithms for wide SIMD architectures. Hence, relevant wide SIMD architectures are dis- cussed in more detail in section 2.3. The CEVA-XC [CEV09] architecture is a wide SIMD processor architecture, which supports VLIW execution of multiple parallel 256-bit SIMD data paths. Yet, too little information on CEVA-XC is available for a detailed analysis.