• No results found

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

N/A
N/A
Protected

Academic year: 2021

Share "Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Breaking the Interleaving Bottleneck in Communication Applications

for Efficient SoC Implementations

Norbert Wehn

Norbert Wehn

[email protected]

[email protected]

-

-

kl.de

kl.de

Microelectronic System Design Research Group

Microelectronic System Design Research Group

University Kaiserslautern

University Kaiserslautern

www.eit.uni

www.eit.uni--kl.dekl.de/wehn/wehn

MPSoC’05, 11-15 July 2005

Relais de Margaux Hotel, France

Wireless Implementation Challenges

Wireless Implementation Challenges

(2)

• TurboTurbo--Codes (1993)Codes (1993)

– Revolution in channel coding e.g. UMTS, DVB, WLAN, CCSDRevolution in channel coding e.g. UMTS, DVB, WLAN, CCSD

• LDPC Codes (1996)LDPC Codes (1996)

– Renaissance of Gallagers Work (1960)Renaissance of Gallagers Work (1960)

– Competitor to turboCompetitor to turbo--codes e.g. used in DVBcodes e.g. used in DVB--S2, WiMAXS2, WiMAX

• STM Greenside Chip: 2.5G/3G Wireless Infrastructure Chip (130nm STM Greenside Chip: 2.5G/3G Wireless Infrastructure Chip (130nm technology)technology)

– Supports WCDMA, CDMA2000, EDGESupports WCDMA, CDMA2000, EDGE

– Decodes up to 256 voice channels & up to 8 384Kbps data channelsDecodes up to 256 voice channels & up to 8 384Kbps data channelssimultaneouslysimultaneously

Example: Channel Coding

Example: Channel Coding

² GreenSIDE D e b u g D M A P e r i p h 2 x Ethernet M A C 2x MSP External Memory Interface VIC U T O P I A Core Periph S y s t e m Control PLL 2 x 32-bit Timers 32-bit GPIO 1 x U A R T s C O P R O Convolutional Decoder Engine Turbo Decoder Engine Parallel C o m . Interfaces C r o s s B a r Data C a c h e ARM926 Program Cache E T M D a ta C ac h e ST140 D S P P ro g ra m C ac h e ITC T i m e r s U A R T Bus Switch D a ta C ac h e ST140 D S P P ro g ra m C ac h e ITC T i m e r s U A R T Bus Switch D M A GMM 2 MB (SRAM)

Design Space

Design Space

• MPSoC’04: TradeMPSoC’04: Trade--offs different implementation stylesoffs different implementation styles

– ASIC, FPGA, ASIP, DSP, Multiprocessor...ASIC, FPGA, ASIP, DSP, Multiprocessor...

– Architectural efficiency & design timeArchitectural efficiency & design time

Architecture Implement.Style

(3)

Generic Decoding Structure

Generic Decoding Structure

1.

1.

Subdecoder 1 processes one complete block

Subdecoder 1 processes one complete block

2.

2.

After finishing the calculation, data are sent to subdecoder 2

After finishing the calculation, data are sent to subdecoder 2

3.

3.

Subdecoder 2 starts block processing...

Subdecoder 2 starts block processing...

4.

4.

Iterate until stopping crtierion fullfilled

Iterate until stopping crtierion fullfilled

Information exchange takes place „randomly“

Information exchange takes place „randomly“

Tanner graph (Parity check matrix)/ Interleaver

Tanner graph (Parity check matrix)/ Interleaver

Quality of „randomness“ influences communcations performance

Quality of „randomness“ influences communcations performance

High Throughput/Low Latency achitectures

High Throughput/Low Latency achitectures

Interconnect centric architectures

Interconnect centric architectures

Sub-decoder 1

Sub-decoder 2

PU1 PUN PU1 PUN

Solving Interconnect Bottleneck

Solving Interconnect Bottleneck

Crossbar functionality with output blocking conflict (MPSoC’04)

Crossbar functionality with output blocking conflict (MPSoC’04)

Conflict free by code design

Conflict free by code design •

• Solves the interconnect problem at the „system level“Solves the interconnect problem at the „system level“

• Tanner Graph/Interleaver is designed according to a fixed architTanner Graph/Interleaver is designed according to a fixed architectural template with ectural template with

regular interconnect topology e.g. barrelshifter

regular interconnect topology e.g. barrelshifter

• Architecture imposes constraints on the codeArchitecture imposes constraints on the code

– Impact on communications performanceImpact on communications performance

Run

Run--time conflict resolution (MPSoC’04)time conflict resolution (MPSoC’04) •

• Largest flexibilty, no impact on communications performanceLargest flexibilty, no impact on communications performance

(4)

Algorithm Design Space

Algorithm Design Space

• TurboTurbo--Decoder UMTS compliant, 166MHz, 180nm technology, Throughput 100Decoder UMTS compliant, 166MHz, 180nm technology, Throughput 100MbitMbit

– NoC approach, NoC approach, large flexibiltylarge flexibilty

14 parallel units, area =

14 parallel units, area = 16.84 mm16.84 mm22(14mm(14mm22PUs, 2.8mmPUs, 2.8mm22NoC)NoC)

– Conflict free interleaver design, Conflict free interleaver design, limited flexibilitylimited flexibility

10 parallel units, area=

10 parallel units, area=11.23 mm11.23 mm22(10,3mm(10,3mm22PUs, 0.9mmPUs, 0.9mm22Barrelshifter)Barrelshifter)

• Implementing LDPC Decoding on NetworkImplementing LDPC Decoding on Network--OnOn--Chip, T. Theocharides, G. Link, N. Chip, T. Theocharides, G. Link, N.

Vijaykrishnan, M. J. Irwin, Int. Conference on VLSI Design 2005

Vijaykrishnan, M. J. Irwin, Int. Conference on VLSI Design 2005

– 1024 Bit block size, 1.2Gb/s (1024 Bit block size, 1.2Gb/s (??), R=0.75), R=0.75

– NoC: 5x5 2D mesh, dimensionNoC: 5x5 2D mesh, dimension--order routing, order routing, large flexibilitylarge flexibility

– 160nm CMOS Technology, 1.8V, synthesis, 500 MHz, 160nm CMOS Technology, 1.8V, synthesis, 500 MHz, 110 mm110 mm22, , ~30 Watt~30 Watt

• A. Blanksby, C. Howland, IEEE Journal on SolidA. Blanksby, C. Howland, IEEE Journal on Solid--States CircuitsStates Circuits

– 1024 Bit block size, 1 Gb/s, Rate=0.51024 Bit block size, 1 Gb/s, Rate=0.5

– full hardwired interconnect network, full hardwired interconnect network, no flexibility at allno flexibility at all

– 160nm CMOS Technology, 1.5 V, synthesis, 64 MHz, 160nm CMOS Technology, 1.5 V, synthesis, 64 MHz, 52.5mm52.5mm22, , ~700mW~700mW

22.74 22.74 Total Area [mm²] Total Area [mm²] 0.55 0.55 Shuffling Network Shuffling Network Control logic Control logic Functional Nodes Functional Nodes Area Area [mm²] [mm²] Area [mm²] Area [mm²] (0.13um, 270MHz) (0.13um, 270MHz) 255 255 Throughput[Mbit/s

Throughput[Mbit/s] @30iter] @30iter

0.2 0.2 10.8 10.8 Logic Logic 0.075 0.075 Addresses/Shuffling Addresses/Shuffling 9.117 9.117 Messages Messages 1.997 1.997 Channel LLR Channel LLR RAMs RAMs •

• Blocksize 64800 Bit, R=1/4..9/10, 0.7 dB to Shannon limitBlocksize 64800 Bit, R=1/4..9/10, 0.7 dB to Shannon limit

• Synthesis results for the DVBSynthesis results for the DVB--S2 LDPC code decoderS2 LDPC code decoder

DVB

DVB-

-S2 LDPC implementation

S2 LDPC implementation

Why is this implementation so efficient ?

Why is this implementation so efficient ?

• Code was defined with implementation complexity in mindCode was defined with implementation complexity in mind

(5)

Algorithmic Flexibility

Algorithmic Flexibility

LDPC Decoder: synthesizable IP block, 130nm, 10000

LDPC Decoder: synthesizable IP block, 130nm, 10000 bitsbits

• Full flexibility Full flexibility i.ei.e. supports every LDPC code: 27 mm. supports every LDPC code: 27 mm22

• Limited flexibility (e.g. WiMaxLimited flexibility (e.g. WiMax--WLANWLAN): 5 mm): 5 mm22

This graph is independent of implementation style

This graph is independent of implementation style

Throughput Logic Flexibility Interconnect / Memory Complexity

Analyze carefully the flexibility requirements

Analyze carefully the flexibility requirements

Application/architecture Codesign

Application/architecture Codesign

Conclusion

Conclusion

The Gap can be closed

The Gap can be closed

Application/Architecture Codesign

Application/Architecture Codesign

Match the architecture & application

Match the architecture & application

„Just enough flexibility“

„Just enough flexibility“

Application

Application

Implementation

Implementation

Thank you for listening!

Thank you for listening!

For further information please visit

For further information please visit

References

Related documents