Data Processing Accelerator Architecture for Low-Power

We presents two hardware accelerators for low-power SoCs on a sensor network. The first accelerator shows a data processing and control logic design for a new radiation detection sensor system that can generate data at or above Peta-bits-per-second level. The logic consists of novel data processing components and operation strategies including low-power and network-on-wafer solutions. The aim of this design is to achieve subtle data reduction before the information is ferried to the network, and redundant processing and channels to minimize the loss of information. The result is a radiation detection system that can operate at scan-rate of billion frames per second. Simulation results show that the intended clock rate is achieved within the power target of less than 200mW.

In the second design, we propose a low-power digital signal processing (DSP) ac-celerator supporting multiple multiply and accumulates (MACs), FFT, FIR, and 3-D cross product operations used in fundamental signal processing for embedded SoCs. Power con-sumption is a major concern as demands on the application processor to keep up with a constant stream of sensor data diminish opportunities for power conserving sleep. A low-power sensor hub SoC capable of managing the various sensors, aggregating and filtering sensor signals, and notifying the application processor of significant events is needed. Dig-ital signal processors are widely used to support this data processing for sensor hubs due to the flexible programmable ability and powerful data processing, but power consumption of DSPs are relatively higher than dedicated hardware accelerators. This paper presents a low-power DSP accelerator design satisfying both requirements of data processing ability and low-power consumption for sensor hub SoCs. In the evaluation of the proposed design, the DSP accelerator synthesized on TSMC 65nm worst case library takes 2080 cycles for 256-point complex FFT, consuming 12.35mW power and 0.167mm²area at 333MHz.

The rest of this paper is organized as follows: Chapter II discusses the proposed joint source and channel decoding work. Chapter III discusses our proposed NoC designs aiming to enhance performance in latency. Chapter IV presents data processing accelerator designs and implementations for low-power embedded SoCs. Finally, we conclude and present our future work in Chapter V.

CHAPTER II

JOINT SOURCE CHANNEL DECODING METHOD FOR LOW-POWER PORTABLE AND WIRELESS SOC SYSTEMS

H.264/AVC standard provides some error resilience features for unequal error protection such as flexible macro-block ordering (FMO), redundant slice, and data partitioning (DP).

Thomos et al. present that the use of FMO associated with a UEP scheme outperforms classical H.264/AVC transmission schemes in terms of decoded video quality [15]. The utilization of DP in H.264/AVC can yield a lower percentage of entirely lost frames [16].

An extensive study of prioritization and layering techniques for H.264/AVC shows that the combination of DP, turbo codes (TC), and flexible modulation techniques outperforms the combination of DP and TC only [17].

JSCD architectures which combine LDPC and H.264 video coding for UEP are pre-sented by Guo, Wang, Qi, Kumar, and Yang et. al. [18–22]. Guo and Wang et. al. propose LDPC-based unequal error protection algorithms using data partitioning [18, 19]. The idea of the LDPC-based unequal error protection is the high priority data are allocated to low code rate, and low priority data are allocated to high code rate to protect more important partitions from channel errors. Qi et. al propose a dynamic rate selection forward error correction (FEC) scheme utilizing LDPC codes and Reed-Solomon (RS) code for robust video communication [20].

The studies above focus on improving received data quality or robustness of trans-mission using UEP. In contrast, Wang, Lu, and Zhang et al. consider minimizing both processing power for JSCD and transmitting energy for constrained video quality with RS channel coding [23–25]. Eisenberg et al. propose an unequal iterative decoding approach minimizing the power consumption of a channel decoder with data partitioning and turbo decoding [26]. Higher number of iterations for turbo decoding is used for high priority data

to minimize the receiver power while meeting distortion constraints specified by the video decoder at a given channel rate.

Another method to reduce the power consumption of LDPC decoding is presented by Dielissen et. al. [27]. That method exploits scalable sub-block parallelism to achieve ef-ficient LDPC decoding implementations for DVB-S2, enabling lower operating frequency by reducing the parallelism of the LDPC decoder instead of using UEP. However, the scal-able parallelism cannot be varied according to the demand of tradeoffs between decoded data quality and low-power requirement.

Wang et. al. present LDPC decoder architecture improving power efficiency through adaptively adjusting the number of iterations of LDPC decoding to meet a required quality for each incoming frame [28–30]. The advantage of this approach is that the LDPC en-coder/decoder does not require rate adaptation, thereby simplifying enen-coder/decoder hard-ware solutions. The early termination of the iterative process is determined by the con-straints of check errors during the decoding of each individual frame. This scheme leads to energy reduction, compared to a fixed iteration technique.

To mitigate the bandwidth limit, MIMO wireless systems that offer higher through-put when compared to single inthrough-put and single outthrough-put (SISO) wireless systems have been developed [31]. Thus, to accommodate the increasing capabilities of mobile multimedia devices and services, a video over MIMO joint decoding design using UEP can improve energy efficiency and error robustness in these devices. MIMO-based UEP schemes are presented in [32–35]. These studies demonstrate that MIMO with UEP improves not only the capacity of the system but also error resilience compared to EEP and overcomes fre-quency selective effects of broadband wireless channels. Yang et. al. propose a hybrid MIMO system, which consists of spatial multiplexing (SM) for low priority data and spa-tial diversity (SD) for high priority data to achieve better performance in terms of BER and PSNR [32]. Liu et. al. similarly divide H.264 information into two parts according to

prior-ity as well [33]. Li et. al utilize two modes, transmission diversprior-ity (TD) mode for high error protection and spatial multiplexing (SM) mode for high data rate [34]. These researches use the unequal number of information bits in their channel error correction codes.

The following sections discuss the proposed JSCD schemes using a low-power LDPC decoding architecture, UEP-based configuration set search algorithm, H.264 data partition-ing, and DVFS. We also propose a novel MIMO-H.264 JSCD scheme using the UEP-based configuration set search algorithm to reduce power on MIMO detection.

A. Joint Source Channel Decoding Method Using Unequal Error Protection and LDPC

In document LOW-POWER EMBEDDED DESIGN SOLUTIONS AND LOW-LATENCY ON-CHIP INTERCONNECT ARCHITECTURE FOR SYSTEM-ON-CHIP DESIGN. A Dissertation YOON SEOK YANG (Page 25-29)