MPSoC’10 Nagaragawa Convention Center Gifu City, Japan
Efficiency Metrics for Design Space Exploration
of Wireless Baseband Processing
Norbert Wehn //ems.eit.uni-kl.de
Metric - Computational Requirements
Example - Mobile Phone Trends for Channel Decoding
lte
umts hsdpa
umts lte-awimax
10000 gprs wcdma cdma2000 td-scdma td-scdma gsm dmb-t dab dvb-t dvb-s dvb-h dvb-c uwb 802.16e 802.11b 802.11n 802.11a dvd 100 1000 c operations/bit 0.1 GOPS
Source: Kees van Berkel, DATE2009
bluetooth blueray 1 10 0.01 0.1 1 10 100 1000 Datarate Mbps A lgorithmi c 10 GOPS cellular decoders broadcast decoders connectivity decoders
Metric – Energy Efficiency
Example - SODA, DSP and GP Architectures
3
10
LTE TC (150Mbit/s)
Area- and Energy Efficiency Design Space
better energy efficiency better overall efficiency 1 En er gy Ef fi c ien cy : o pe ra ti on /e ne rg y ( o p/ p J ) ( ) LDPC WiMedia (~1Gbit/s) CC (500Mbit/s) Min-Sum 0,1 10 100 1000 o
Area Efficiency: GOPs/mm2
-3-Min
better area efficiency
Assessment
Metrics are computation centric i.e. operations are counted only On the other hand SoC are becoming
more and more interconnect centric e.g. NoC
more and more memory centric Assessment of these metrics
Various channel decoder implementations @ research group
All architectures based on standard synthesis flows
All designs based on 65nm technology@worst case
All data in-house available
5
Overview Channel Decoders
Decoder Flexibility MaxBlock-size Payload Throughput [Mbit/s] Freq. [MHz] Area [mm2] Dynamic Power [mWatt] ASIP Conv. Codes
Binary TC Duo-binary TC N=16k 40 14(6iter) 28(6iter) 385 (P&R) 0.7 (P&R) ~100
LTE Turbo LTE turbo code N=18k 150 (6iter) 300 (P&R) 2.1 (P&R) ~300 LDPC flex R=1/4 to R=9/10 N=16k 150-300(20-10iter) 385 (P&R) 1.172 (P&R) ~389 6 LDPC fixed R=3/4 N=1.2k 480 (6iter) 435 (P&R) 0.583 (P&R) ~202 LDPC
WiMedia 1.5 R=1/2-4/5 N=1.3k 640 960 (R=1/2,5iter)(R=3/4,5iter) 265 0.51 ~193
Algorithmic Throughput Calculations [GOPs]
Code Operations per decodedinformation bit
normalized to ~8bit addition
Infobit-Throughput
Giga operations per second [GOPs]
100Mbit/s 300Mbit/s 1 Gbit/s 100Mbit/s 300Mbit/s 1 Gbit/s CC: states=64 ~200 ~20 ~ 60 ~200 LDPC (Min-Sum) (x3.4 appr. BP) 5 iter 75/R ~7.5/R ~22.5/R ~ 75/R 10 iter 150/R ~15/R ~ 45/R ~ 150/R 20 iter 300/R ~ 30/R ~ 90/R ~ 300/R 40 iter 600/R ~ 60/R ~ 180/R ~ 600/R 7 40 iter 600/R 60/R 180/R 600/R Turbo (Max-Log) 2 iter 280 ~ 28 ~ 84 ~ 280 4 iter 560 ~ 56 ~168 ~ 560 6 iter 840 ~ 84 ~252 ~ 840 10 ASIP TC (14Mbit/s)
Area- and Energy Efficiency
1 E n er g y E ff ici ency: o pe ra ti on /e n e rg y ( o p /pJ ) LTE TC (150Mbit/s) LDPC flexible (~100 Mbit/s) LDPC WiMedia (~1Gbit/s) CC (500Mbit/s) Min-Sum 0.1 10 100 1000
Area Efficiency: GOPs/mm2
o
-3-Min
What about Memory/Data Transfers
Current metric: energy efficiency = only operations/energy
Data transfers/ accesses substantially contribute to the power consumption Example (R=0.5)
150 Mbit/s Turbo: ~126 Gops ~40 Gaccesses
150 Mbit/s LDPC : ~90 Gops ~80 Gaccesses
Efficient data transfer is key for efficient implementation E.g. LTE TC: special interleaver structure to avoid access conflicts
E.g. DVB-S2/WiMAX LDPC: special code structure to minimize access conflicts
9
Efficiency metrics based only on operations are not appropriate
Power includes operations and accesses!
Architectures are favored where operations dominate compared to accesses
Turbo decoder will always be evaluated to be more efficient
Same argumentation for area efficiency
E.g. DVB S2/WiMAX LDPC: special code structure to minimize access conflicts
10
LTE TC (150Mbit/s)
System Oriented Area- and Energy Efficiency
better energy efficiency better overall efficiency 1 En er g y Ef fi ci en cy : c od ed b it /en er g y ( b it /n J) ( ) LDPC WiMedia (~1Gbit/s) CC (500Mbit/s) Min-Sum
Energy efficiency : decoded information bit per energy Area efficiency : decoded information bit/s per area
10
0,1
10 100 1000
De
c
Area Efficiency: (Mbit/s)/mm2
-3-Min
better area efficiency
100
)
ASIP TC (14Mbit/s) LTE TC (150Mbit/s)
Decoders in System Design Space
1 10 En erg y Effi ciency: e code d bit/en ergy (bit /n J ) LTE TC (150Mbit/s) LDPC flexible (~100 Mbit/s) LDPC WiMedia (~1Gbit/s) CC (500Mbit/s) 11 0.1 10 100 1000 10000
Area Efficiency: (Mbit/s)/mm2
d
e
Min-Sum
-3-Min
Observations
Large change in relative and absolute positions can be observed
Relative positions between Min-Sum and λ-3-Min changed
Efficiency difference between Min-Sum and λ-3-Min decreases
LDPC WiMedia decoder has a much larger efficiency than the flexible LDPC decoder
Efficiency of LDPC WiMedia decoder is not far away from CC decoder
But what about communications performance in the comparisons ?
Overall efficiency of a baseband receiver depends on
Communications performance
Implementation performance (area- and energy efficiency)
Comparison of Different Architectures
Scenario 1: Communication driven
Comparison of two iterative decoders (TC versus LDPC)
Identical communications performance
Different parameters (code rate, iterations) impact on implementation efficiency Scenario 2: Implementation driven
Comparison of iterative and non-iterative decoders (LDPC versus CC)
13
p ( )
64-state convolutional code 960 Mbit/s (WiMedia 1.2) and WiMedia 1.5 LDPC decoder Varying communication performance impact on implementations efficiency
Scenario 1: Fixed Communication Performance
14
Turbo Code
LDPC Code
10
J
)
Scenario 1: Implementation Efficiency
0.1 1 E n er gy E ff ic ien c y : ec o d e d bit /en er gy ( b it /n
TC LTE
TC efficiency constant for all code rates, 6.5 iterR=1/3, 40 iter R=1/2, 20 iter R=4/5, 10 iter 15 0.01 10 100 1000
Area Efficiency: (Mbit/s)/mm2
d
TC LTE
flexible LDPC
R 1/3, 40 iter
Scenario 2: Implementation Efficiency
100 4dB worse com. Perf. (1iter)
Identical Throughput 10 E n e rg y E ff ici e n cy: c o ded bi t/ e nergy (bi t/ n J
) 4dB worse com. Perf. (1iter)
5 times higher throughput Identical comm. Perf. (2iter)
Identical throughput (e.g. clock gating)
Identical Throughput (e.g. clock gating)
Identical com. Perf. (2iter) 2.5 times higher throughput
17
1
100 1000 10000 100000
Area Efficiency: (Mbit/s)/mm2
de
c
LDPC W iMedia1.5 CC W iMedia1.5
4dB better com. Perf. (5iter) Identical throughput
Summary
Understanding trade-offs between implementation efficiency,
communications performance and flexibility requirements is mandatory f ffi i b b d i
for efficient baseband receivers
Operation based metrics for energy and area efficiency can be misleading
Memory and data transfers have to be considered in metrics for design space exploration
18 Implementation efficiency metrics have to be linked to application
performance