An Ultra
An Ultra
-
-
low energy
low energy
asynchronous processor for
asynchronous processor for
Wireless Sensor Networks
Wireless Sensor Networks
L.Necchi
L.Necchi
,
,
L.Lavagno
L.Lavagno
,
,
D.Pandini
D.Pandini
,
,
L.Vanzago
L.Vanzago
Politecnico
Wireless Sensor Networks
Wireless Sensor Networks
Application areas: Application areas: Monitoring Monitoring Building automation Building automation
Health care, Medical
Health care, Medical Emergency response
-- AdAd--hoc wireless networkshoc wireless networks -- SensingSensing -- ComputationComputation -- ActuationActuation
Key WSN Requirements
Key WSN Requirements
Flexibility (general purpose design)
Flexibility (general purpose design)
High energy efficiency (battery powered)
High energy efficiency (battery powered)
Extremely wide voltage supply range
Extremely wide voltage supply range
Exhausted battery or energy scavenging Exhausted battery or energy scavenging
Fast and inexpensive wake
Fast and inexpensive wake
-
-
up
up
event driven power management (not predictable)event driven power management (not predictable)
Sporadic high computational load
Sporadic high computational load
Encryption (security)Encryption (security)
TI
TI
MSP430
MSP430
Main components of a WSN node:
Main components of a WSN node:
Microcontroller
Microcontroller
Memory
Memory
Radio
Radio
Sensors / Actuators
Sensors / Actuators
Power supply
Power supply
Battery (energy storage)
Battery (energy storage)
Power scavenging
Power scavenging
Sensor node architecture
Sensor node architecture
Atmel
Circuit
Circuit
-
-
level Power Management
level Power Management
X
X
X
X
Active Active Idle IdleLong
Long
Deadlines
Deadlines
Idle Time
Idle Time
Scenario ScenarioAdaptive Body Biasing
Adaptive Body Biasing
Dynamic Voltage Scaling
Dynamic Voltage Scaling
X
X
Power Gating Power GatingX
X
Clock Gating Clock GatingSave energy while
Save energy while
Management
Management
DVS can be obtained by:
DVS can be obtained by:
OffOff--line preline pre--computed voltage/frequency tablescomputed voltage/frequency tables High delay margins
High delay margins
Evaluated onEvaluated on--line:line:
PowerWise
Closed
Closed
-
-
loop DVS technique
loop DVS technique
PowerWise
PowerWise::
Samples, with a high frequency clock, the output of a digital deSamples, with a high frequency clock, the output of a digital delay lay
line, and arrange voltage supply to deliver required performance
line, and arrange voltage supply to deliver required performance
Razor:
Razor:
Detects timing errors comparing values stored in duplicated slavDetects timing errors comparing values stored in duplicated slave e
latches, in which the second is clocked half clock cycle later,
latches, in which the second is clocked half clock cycle later,
restarts the pipeline and arranges voltage supply accordingly
restarts the pipeline and arranges voltage supply accordingly
Asynchronous with Dual
Asynchronous with Dual--Rail encoding:Rail encoding:
(Quasi) delay insensitive implementation, that guarantees (Quasi) delay insensitive implementation, that guarantees
correctness for (almost) every voltage supply and process
correctness for (almost) every voltage supply and process
variation
variation
Asynchronous with Bundled Data encoding:
Asynchronous with Bundled Data encoding:
De
De
-
-
synchronization
synchronization
Synchronous CLK Asynchronous Desynchronize CLKDesign Flow
Design Flow
Obtain asynchronous Obtain asynchronous implementation from implementation from synchronous specification: synchronous specification: Think synchronously Think synchronously Design synchronously Design synchronously De De--synchronize synchronize (automatically) (automatically) Test synchronously Test synchronously Run asynchronously Run asynchronously HDL RTL Synthesis & Optimization Netlist De-synchronization Netlist Physical Design Layout LibraryMS flip-flop
Synchronous circuit
Synchronous circuit
CLK L L L L L L 0 0 0 0 1 1De
De
-
-
synchronization
synchronization
L L L L L L 0 0 0 0 1 1 C C C C C CDe
De
-
-
synchronization
synchronization
C C C C
C C
Distributed micropipeline-style controllers
substitute the clock network
Flow equivalence
Flow equivalence
[Guernic, Talpin, Lann, 2003]
[Guernic, Talpin, Lann, 2003]
A
A
B
CLK A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 Synchronous behavior De-synchronized behavior
Flow equivalence
Flow equivalence
Flow equivalence
[Guernic, Talpin, Lann, 2003]
[Guernic, Talpin, Lann, 2003]
Theorem:
Theorem:
CLK A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 Synchronous behavior De-synchronized behaviorFlow equivalence
De
De
-
-
synchronization Benefits
synchronization Benefits
For the end user:
For the end user:
Reduced electromagnetic emissionReduced electromagnetic emission
Process Variation toleranceProcess Variation tolerance
Enables Enables partialpartial average case design, average case design,
wrt
wrt process & environment variation (not process & environment variation (not wrtwrt datadata--dependent dependent delay)
delay)
The resulting circuit will be:The resulting circuit will be:
Ready for frequency and voltage scaling
Ready for frequency and voltage scaling
Inherently more robust to delay variations
Inherently more robust to delay variations
Virtually no performance or area overhead
Virtually no performance or area overhead wrtwrt synchronoussynchronous
For the designer
For the designer
Conventional EDA Tools and design flowConventional EDA Tools and design flow
Limited design time and effort, fully automated
Asynchronous advantages
Asynchronous advantages
not
not
offered by de
offered by de
-
-
synchronization
synchronization
Fine
Fine
-
-
grained power management
grained power management
The desynchronized circuit inherits the synchronous The desynchronized circuit inherits the synchronous
clock gating
clock gating
Fine
Fine
-
-
grained pipelining
grained pipelining
The pipeline structure is not changedThe pipeline structure is not changed
Data
Data
-
-
dependent delays
dependent delays
Could be exploited by using a Could be exploited by using a datapathdatapath with with
completion detection (work in progress)
completion detection (work in progress)
Robustness with respect to uncorrelated local
Robustness with respect to uncorrelated local
variability
variability
Synchronous Logic Interfacing
Synchronous Logic Interfacing
C L 0 L 1 C L 0 L 1 C L 0 L 1 FAST LOGIC
Data path (not modified) Handshaking line
Synchronous Logic Interfacing
Synchronous Logic Interfacing
C L 0 L 1 C L 0 L 1 C L 0 L 1
•Synchronized with an external slower clock -Just low EMI
CL CL CL
SLOW LOGIC
Synchronous Logic Interfacing
Synchronous Logic Interfacing
C L 0 L 1 C L 0 L 1 C L 0 L 1 CL CL CL SELF TIMED LOGIC
Main components of a WSN node:
Main components of a WSN node:
Microcontroller
Microcontroller
Memory
Memory
Radio
Radio
Sensors / Actuators
Sensors / Actuators
Power supply
Power supply
Battery (energy storage)
Battery (energy storage)
Power scavenging
Power scavenging
Sensor node architecture
Sensor node architecture
Atmel
Our Case Study
Our Case Study
Application independent 8 Bit CPU architecture:
Application independent 8 Bit CPU architecture:
Atmel
Atmel
AVR Instruction Set (like MICA2
AVR Instruction Set (like MICA2
-
-MICAZ) from
MICAZ) from
OpenCores.org
OpenCores.org
, implemented
, implemented
with a 130nm technology
with a 130nm technology
Toolchain
Toolchain
and lots of software are ready to use
and lots of software are ready to use
nesCnesC, , TinyOSTinyOS, , TinyDBTinyDB, Surge, , Surge, TossimTossim
Aggressive Energy management enabled by
Aggressive Energy management enabled by
de
de
-
-
synchronization, using:
synchronization, using:
Dynamic Voltage Scaling
Dynamic Voltage Scaling
zero wake
Typical AVR architecture
Typical AVR architecture
L 0 L 1 Instruction FETCH INSTR. Memory Instruction DECODE ALU MEM Access DATA Memory Execution External CLK
Data Path (8 bit) Address bus Clk distribution
Design Choices
Design Choices
Main target is energy efficiency (
Main target is energy efficiency (
vs
vs
speed)
speed)
Large delay margins (100%) to increase
Large delay margins (100%) to increase
robustness at low voltage supply
robustness at low voltage supply
AVR core is really small (~4500 gates),
AVR core is really small (~4500 gates),
hence we used a
hence we used a
Single
Single
controller
controller
Reduced area overhead
Reduced area overhead
De
De
-
-
synchronized AVR
synchronized AVR
L 0 L 1 Instruction FETCH INSTR. Memory Instruction DECODE ALU MEM Access DATA Memory Execution Data Path Address bus
Handshake signal distribution Delay chain
Logic and Delay Line Matching
Logic Delay
Logic Delay
Leakage per
Leakage per
instruction
instruction
Energy Efficiency
Energy Efficiency
Voltage Supply [V]
Voltage Supply [V]
Power
Power
Consumption
Consumption
Energy per
Energy per
Instruction
Instruction
Energy Efficiency
Some Past Work Comparison
Some Past Work Comparison
Philips 80c51 (H. van
Philips 80c51 (H. van
Gageldonk
Gageldonk
., 1998)
., 1998)
Asynchronous bundledAsynchronous bundled--data implementation of the data implementation of the
8051 ISA, general purpose.
8051 ISA, general purpose.
Lutonium
Lutonium
(A. Martin et al., 2003)
(A. Martin et al., 2003)
Asynchronous QDI implementation of the 8051 ISA.Asynchronous QDI implementation of the 8051 ISA.
Snap/le (V.
Snap/le (V.
Ekanayake
Ekanayake
et al., 2004)
et al., 2004)
Asynchronous QDI processor specifically designed for Asynchronous QDI processor specifically designed for
WSN.
WSN.
Razor (D. Ernst et al., 2004)
Razor (D. Ernst et al., 2004)
Synchronous processor that estimated the best Synchronous processor that estimated the best VddVdd by by
dynamically monitoring the delay of the logic using a
dynamically monitoring the delay of the logic using a
redundant latching schema.
CONCLUSIONS
CONCLUSIONS
Aggressive Energy management using DVS
Aggressive Energy management using DVS
14 14 pJ/InstrpJ/Instr @ 1.2 V (170 MIPS) @ 1.2 V (170 MIPS)
2.7 2.7 pJ/InstrpJ/Instr @ 0.51 V ( 48 MIPS)@ 0.51 V ( 48 MIPS)
Minimal overhead
Minimal overhead
wrt
wrt
synchronous counterpart
synchronous counterpart
+6% area (due to FF+6% area (due to FF-->latch conversion)>latch conversion)
--20% speed (could be improved by reducing margins)20% speed (could be improved by reducing margins)
Future work:
Future work:
Analysis with other “SPICEAnalysis with other “SPICE--like” simulators (like” simulators (HsimHsim))
Statistical simulations to check robustness Statistical simulations to check robustness wrtwrt
process variability (Monte Carlo)
process variability (Monte Carlo)