I
I
T
T
I
I
DynaCORE
DynaCORE
Dynamically Reconfigurable
Dynamically Reconfigurable
Coprocessor
Coprocessor
for Network Processors
for Network Processors
Carsten
Carsten Albrecht, Albrecht, Roman KochRoman Koch, , ChristophChristoph OsterlohOsterloh,, Thilo
Thilo PionteckPionteck, Erik , Erik MaehleMaehle
Institut
Institut ffüürrTechnischeTechnische InformatikInformatik
Universit
Universitäät zu Lt zu Lüübeckbeck Head: Prof. Dr.
Overview
Overview
Introduction System Architecture Key Components Internal Interconnect Runtime-Adaptive Network-on-Chip Architecture Buffer Sizes Fault Tolerance Fault Scenarios Stepwise Procedure Modelling DynaCORE Principles DynaCore Model Simulation Runtime Reconfiguration Point of Reconfiguration Technical AspectsEvaluation and Demonstrator Publications
Introduction
Introduction
(1/2)
(1/2)
In-transit packet processing in edge routers
Header
processing
Situation
Payload
processing
Processing tasks
Introduction
Introduction
(2/2)
(2/2)
DynaCORE
= Dynamically adaptable COprocessor based on Reconfiguration
Reconfigurable
hardware accelerator for
payload processing
Allows
flexible adaptation
to changes in network traffic profile
→ Dynamic partial reconfiguration of FPGA
Combination of
Network processor
(e. g. FlexPath NP)
→header processing
+ DynaCORE
(in Xilinx Virtex-4 FX)
→payload processing
Loose coupling
Gigabit Ethernet
Suitable for various network processors
System
System
Architecture
Architecture
(1/3)
(1/3)
Interface Interface Type H Type S Type 0 Application specific Hardware Assist 1 Hardware Assist 2 Hardware Assist 3 Hardware Assist 4 Transmit-Unit Receive-Unit Dispatcher Reconfiguration Manager (HW + SW)
Static partition Dynamic partition
System
System
Architecture
Architecture
(2/3)
(2/3)
Transmit Unit
Send processed packets back to NP
Receive Unit/Dispatcher
Recognise requested type of processing
Assign packets to suitable hardware assists
Report to reconfiguration manager in case of unassignable packets
Reconfiguration Manager
Implemented as software running on embedded PowerPC
Collect utilisation information from hardware assists, decide when and how to reconfigure
Control actual process of reconfiguration,
i.e. send configuration data to reconfiguration logic
Reconfiguration Control Logic
Write configuration data to FPGA-internal configuration access port (ICAP)
Software-based Hardware Assist
Backup processing unit
Utilises additional hard-wired PowerPC cores (UltraController II)
Components in the Static Partition
I/O In te rfa ce
System
System
Architecture
Architecture
(3/3)
(3/3)
Hardware Assists
Actual payload processing modules
Equipped with universal, algorithm-independent interface
Embedded off-the-shelf IP cores
Switches
Forward packets from static partition to HAs and back
Runtime
Runtime-
-
Adaptive
Adaptive
Network
Network
-
-
on
on
-
-
Chip
Chip
(1/2)
(1/2)
NoC architecture for runtime reconfigurable FPGAs
Virtual cut-through switches with for equal full-duplex links (16 bit)
Low hardware overhead compared to other NoCs
Switches not needed for a certain setting of processing units can be
removed from the network
→
low latency
Support for QoS
Physical
and
logical
addresses
• Physical addresses:
refer to specific switches
at specific locations within the NoC topology
• Logical addresses
: refer to processing entities
inside hardware modules
CoNoChi
CoNoChi
= Confígurable Network on Chip
log add
Interface
phy add
Hardware Assist
physical addressphysical address logical addresslogical address
Runtime
Runtime-
-
Adaptive
Adaptive
Network
Network
-
-
on
on
-
-
Chip
Chip
(2/2)
(2/2)
Interface HA 6 In te rfa ce HA 5
Topology Adaptation
Network topology can be
adapted at runtime
Coarse-grained tile
Merging/separation
of
neighbouring tiles
→
Provides space for
modules of varying
complexity
Fault
Fault Tolerance
Tolerance
(1/3)
(1/3)
Fault scenarios:
User data
• Non-permanent fault
• Huge hardware effort to detect and correct • Tolerated by application area
Processing units and infrastructure • Device degradation
Fault in hardware structure
• Single-Event Functional Interrupts (SEFIs) Bitflip in configuration data
Approach: Combination of
Configuration readback
• Slow (33 ms for one tile)
• Does not detect hardware faults
Test packets
Do not cover all faults
Alive messages
Missing alive message indicates problem
Permanent faults → → → → need to be corrected
DynaCORE
Fault
Fault Tolerance
Tolerance
(2/3)
(2/3)
Fault detection
Fault detection
Alive messages
Test packets
Periodic configuration readback
Fault localization and correction
Fault localization and correction
Stepwise procedure using test packets
Test against different assumptions
SEU in control registers → tile reset
SEFI → rewritting reconfiguration
Permanent hardware fault → reorganization
Fault
Fault Tolerance
Tolerance
(3/3)
(3/3)
Example: no alive message from switch 1
1. Identification of faulty segment
Identify path under test
Known by the reconfiguration manager
Send test packets to all switch along the path under test
If a test packet does not return correctly, faulty segment has been
identified
Fault
Fault Tolerance
Tolerance
(3/3)
(3/3)
Example: no alive message from switch 1
2. Assumption: SEU in control registers of switches or routing tables
Reset switches in affected section
Send new routing tables
Repeat test
Fault
Fault Tolerance
Tolerance
(3/3)
(3/3)
Example: no alive message from switch 1
3. Assumption: SEFI
Readback configuration data for each tile and compare with reference
In case of mismatch, reconfigure tile
If tile contains a switch, send new routing tables
Repeat test
permanent hardware error
→
reorganize system
Procedure takes time, does not cover all fault scenarios, yet is hardware efficient
Modelling
Modelling
DynaCORE
DynaCORE
(1/4)
(1/4)
Dynamically Structured Discrete Event-Based System Network (DSDEVN)
Extends discrete-event based system (DEVS) formalism
States of controller χ can again be models
„Simple“ DEVS simulator sufficient for simulation of DSDEVN
DynaCORE Model:
DSDEVN
∆= < X
∆, Y
∆,
χ
, M
χ>
∆ identifies DynaCORE
X∆, valid inputs of the system, and Y∆, outputs of the system: messages received from and send to the NP
χ: DynaCORE-specific controller
Modelling
Modelling
DynaCORE
DynaCORE
(2/4)
(2/4)
Controller Description as DEVS:
Mχ= < Xχ, Sχ, Yχ, δintχ,, δextχ, λχ, τχ >
Xχ: Set of valid controller input
Sχ: Controller state space
Yχ: Set of valid controller output
δintχ: State transition function for internal events – including „timeouts“
δextχ: State transition function for external events
λχ: Output function
τχ: Timeout function (assigns a timeout value to states from Sχ)
Controller States
Include information on system configuration, i.e. configured HAs
Contain, in turn, models of system components active in respective state
0 200 400 600 800 1000 1200 1400 0.0005 0.0165 0.0325 0.0485 0.0645 B a n d w id th [ M b it /s ] R e c o n fi g u ra ti o n
input data rate output data rate reconfiguration
Modelling
Modelling
DynaCORE
DynaCORE
(3/4)
(3/4)
Structure of SystemC
Simulation Model
Simulation Stimulus and Output
Input burst
Modelling
Modelling
DynaCORE
DynaCORE
(4/4)
(4/4)
Influence of Buffer Sizes
4 16 64 2 8 32 128 8,0 9,0 10,0 11,0 12,0 13,0 Latency [ms] Buffer Switch [#Pkt] Buffer NoC-Interface [#Pkt] 0,00 0,20 0,40 0,60 0,80 1,00 1,20 4 8 16 32 64 128
Buffer size [#packets]
R a ti o 0,00 2,00 4,00 6,00 8,00 10,00 12,00 T im e [ m s ]
Data rate Packet loss Latency Low impact of buffer sizes between NoC and HA
Large switch buffers:
• Only little advantage for latency
• Increased packet loss in case of reconfiguration
Runtime
Runtime
Reconfiguration
Reconfiguration
(1/3)
(1/3)
Configuration State Space
Three modules
Three types of HA
Possible transitions between
configurations
Transition costs
(number of HAs to be
reconfigured)
{ A B C } { A B B } { A C C } { B B B } { A A A } { A A B } { A A C } { C C C } 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 2 1 3 2 2 3 2 2 3 3 2 3 3 2 3 3 2 3Runtime
Runtime
Reconfiguration
Reconfiguration
(2/3)
(2/3)
Reduced Configuration State Space
Transition cost limited
A B C A B B A C C C B C B B B B B C A A A A B A A A C C C C 1 9 2 22 3 10 4 13 5 16 6 19 7 25 8 11 12 28 14 17 15 27 29 18 21 30 20 23 24 26
Reconfiguration Trigger
Configurable per-HA utilisation threshold exceeded multiple times in sequence Zeit Schwellwert T Sχ u Sχ v Sχ u Sχv Sχu Sχ u Sχv Sχu Monitor -datum Sχ u Sχ v
Runtime
Runtime
Reconfiguration
Reconfiguration
(3/3)
(3/3)
Merging and Separating Tiles
Changes number and shapes of partially reconfigurable regions
Different sets of bus macros
Technical Aspects
Scenario 1 Scenario 2
Static elements in original design as
part of hard macro Bus macros
Reconfiguration Speed
Evaluation/
Evaluation/
Demonstrator
Demonstrator
(1/2)
(1/2)
Evaluation/
Evaluation/
Demonstrator
Demonstrator
(2/2)
(2/2)
FlexPath NP
NP with reconfigurable data-path
Virtex-4 FX 60
DynaCORE
reconfigurable processing modules (HAs)
Virtex-4 FX 60
analysis,
analysis,
visualisation
visualisation
Publications
Publications
[PKA09] Pionteck, T.: Koch, R.; Albrecht, C.; Maehle, E.: A Design Technique for Adapting Number and Boundaries of Reconfigurable Modules at Runtime. International Journal of Reconfigurable Computing, vol. 2009, Article ID 942930,, Hindawi Publishing Corporation , New York 2009
[PAK08a] Pionteck, T.; Albrecht, C.; Koch, R,; Maehle, E,: Adaptive Communication Architectures for Runtime Reconfigurable System-on-Chips. Parallel Processing Letters, 2008
[AFK09] Albrecht, C.; Foag, J.; Koch, R.; Maehle, E.; Pionteck, T.: DynaCORE – Dynamically Reconfigurable Coprocessor for Network Processors. To Appear: Dynamically Reconfigurable Systems Architectures: Design Methods and Applications, Springer, 2009 [AKP09] Albrecht, C.; Koch, R.; Pionteck, T.; Glösekötter, P.: Towards a Flexible Fault-Tolerant System-on-Chip. 22th International Conference on Architecture of Computing Systems - Workshop Proceedings, 83-90, VDE Verlag GmbH, Berlin 2009
[KAP09] Koch, R.; Albrecht, C.; Pionteck, T.: Adaptive Health Monitoring in a Reconfigurable Network-on-Chip. Workshop on Diagnostic Services in Network-on-Chips (DSNOC), Nice 2009
[AOP08] Albrecht, C.; Osterloh, Ch.; Pionteck, T.; Koch, R.; Maehle, E.: An Application-Oriented Synthetic Network Traffic Generator. European Conference on Modelling and Simulation 2008, 299-305, ECMS, Nicosia, Cyprus 2008
[ARK08] Albrecht, C.; Roß, P.; Koch, R. ; Pionteck, T. ; Maehle, E.: Performance Analysis of Bus-Based Interconnects for a Run-Time Reconfigurable Co-Processor Platform. PDP 08, 200-205, IEEE Computer Society, Toulouse, France 2008
[AWP08] Albrecht, C.; Werner, M.; Pionteck, T.; Fuchsen, R.; Koch, R.; Maehle, E.: WCET Determination Tool for Embedded Systems Software. SIMUTools08 Proceedings, 1, ICST, Marseille, France 2008
[PAK08] Pionteck, T.; Albrecht, C.; Koch, R.; Brix, T.; Maehle, E.: Design and Simulation of Runtime Reconfigurable Systems. IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS 2008 ), 2008
[PAK08b] Pionteck, T.; Albrecht, C.; Koch, R.; Maehle, E.: Performance and Reliability Monitoring in Network-on-Chips. To Appear: Workshop on Diagnostic Services in Network-on-Chips (DSNOC), 2008
[PAK08c] Pionteck, T.; Albrecht, C.; Koch, R.; Maehle, E.: On the Design Parameters of Runtime Reconfigurable Systems. Accepted for: International Conference on Field Programmable Logic and Applications (FPL 2008), Heidelberg, Germany 2008
[AKP07] Albrecht, C.; Koch, R.; Pionteck, T.; Maehle, E.: Simulation System for Run-Time Reconfigurable Networks-on-Chip. Proceedings of the 6th EUROSIM Congress on Modelling and Simulation, ARGESIM - ARGE Simulation News, Wiedner Hauptstrasse 8-10, 1040 Vienna 2007
[APK07]Albrecht, C.; Pionteck, T.; Koch, R.; Maehle, E.: Modelling Tile-Based Run-Time Reconfigurable Systems Using SystemC. European Conference on Modelling and Simulation 2007, Prague, Czech Republic 2007