&
www.cea.fr
Grenoble Workshop on Autonomic Computing and Control
May 27, 2013
Suzanne LESECQ, CEA, LETI, DACLE/LIALP
D. Puschini, E. Beigné, W. Lombardi, A. Molnos, J. Mottin, V. Olive
L. Vincent (post-doc fellow with Persyval-lab) Y. Akgul, M. Altieri, N.-M. Nguyen, M. Becher T. Ducroux (with STM)
Application of control theory to ManyProcessor
System-on-Chip (MPSoC) (computing platforms)
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |2
&
Context
Computing platforms Embedded systems
Mono-Processor, DSP, smart sensor nodes… Many-Processor System-on-Chip (MPSoC)
Dedicated computing platforms, e.g. H.264 hardware encoder
Main challenge for embedded (mobile) platforms
Power consumption P ↘↘ under performance constraints (Ftarget)
Processing Element
(PE)
Supply voltage Vdd
Body bias Vbb
Clock Freq. F Task to be finished before its deadline DL
Power consumption Temperature increase Task t Ftarget DL F
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |3
& Power Domain Processing Element Global Control Power Domain Power Domain Power Domain Power Domain Vdd actuator F actuator Local DVFS control Fast local adjustment
Adaptive architecture to mitigate local but also dynamic PVT variations need for T°,V evolution
Info extract
(Data fusion)
Main objective …
Objective: Reach the most
energy efficientand safe
operating point
Cliquez pour modifier le style du titre… With constraints …
Advanced technologies
Power consumption highly depends on temperature Thermal runaway!
Low complexity of the control strategies
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |5
&
… Under Variability …
Dynamics for Process-Voltage-Temperature variations very different
Process Voltage Temperature
P T
Keng L. Wong and al., “Enhancing Microprocessor Immunity to Power Supply Noise With Clock-Data Compensation”, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 4, APRIL 2006
LIRMM
J. Altet and al., “Thermal couplingin integrated circuits: application to thermal testing,” Solid-StateCircuits, IEEE Journal of, vol. 36, pp. 81–91, 2001.
P. Li and al., “Efficient full-chip thermalmodeling and analysis,” Computer Aided Design, 2004. ICCAD-2004.IEEE/ACM International Conference on, pp. 319–326, 2004.
Rui Zheng and al. , "Circuit Aging Prediction for Low-Power Operation”, CICC, 2009 J. Cain andal., "Electrical linewidth metrology for systematic CD variation characterization and causal analysis," Metrologt, Inspection, and Process Control for Microlithography XVII, Proceedings of SPIE, vol. 5038, pp. 350-361, 2003.
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |6
&
Adaptive architecture: power management
Power consumption
Timing faults to be avoided
Temperature measurement/estimation ) , ), (, , ( ) , , , (V V T techno P F V2 T V activity P P= stat dd bb + dyn dd bb time clk clk Timing fault Non functional zone F Vdd T1 Vmin Vmax T°
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |7
&
From previous talks (Alberto, Erik, Ada)
“Modulating control”
Functional system (without our control/observation tools)
Improve power efficiency Improve performances
System #
Finite set of values
#
Finite set of values
Cliquez pour modifier le style du titre
Back to Adaptive architecture Main issues (and FDSOI technology)
Global Control (Scheduler, OS) Power Domain Core V actuator F actuator Local control Vbbactuator TSM Multi probe TSM Multi probe
Fast local adjustment
F V P F {PMi} {V,T} estimated Info extract (Data fusion) Multi probe Activity Ring-oscillator #1 Stage 1Stage 2 Stage n &
Scan in Scan out
. . . Start / Stop Test A d re ss d e co d e r 8 -t o -1 m u lt ip le xe r Ring-oscillator #2 Ring-oscillator #7
28 bits Counter Over-flow bit 3 bits Config DL WL Power α.DL t P1 P2 DL (1-α).DL Closed-loop systems F Ptot
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |9
&
Outline Platform level
(Process &) Voltage &Temperature estimation
Sensor
Local estimation of V and T Validation on a hardware platform
Choice of set point (F , Vdd (, Vbb)) Control of set-points
And particular implementations …
VT Estimation Models
(Memory)
MultiProbe
Sensor: 7 ROs
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |10
&
Variability sensors
Monitor local variations of V and T
Integrated (on-chip) sensors
Dedicated sensors
Precise, absolute value but limited V,T functioning range Analog→ large size + ADC
General purpose sensors
Ring-Oscillator : FRO= f (P,V,T)
⇒Proposed sensor: MultiProbe,a set of ROs co-located
⇒Standard cell : easy conception
⇒small: easilly integrated etreplicatedon chip
⇒V and T not directly read
Counter
ROs
31µm x 14.4µm = 450 µm² in
CMOS 32nm
Ring-oscillator #1
Stage 1 Stage 2 Stage n &
Scan in Scan out
. . . Start / Stop Test A d re ss d e co d e r 8 -t o -1 m u lt ip le x e r Ring-oscillator #2 Ring-oscillator #7 28 bits Counter
Over-flow bit 3 bits Config
Estimation VT
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |11
&
Multiprobe sensor
7 Ring Oscilatorswith different architectures
⇒Exploit different behavioursin
order to estimate the Voltage and Temperature values Ring-oscillator #1
Stage 1 Stage 2 Stage n &
Scan in Scan out
. . . Start / Stop Test A d re ss d e co d e r 8 -t o -1 m u lt ip le x e r Ring-oscillator #2 Ring-oscillator #7
28 bits Counter flow bitOver- Config3 bits
…
Estimation VT
Modèles MultiProbe
LowTherm
Cliquez pour modifier le style du titreVT estimation: Principle
Power domain Observers? model…
Comparison between models and mesure
Processing Element { }V ˆˆ,T Estimat ed VT estimation Models (Memory) { }V ˆˆ,T estimation MultiProbe (sensor: 7 ROs) MProbe MProbe Info extract (Data fusion) Estimation VT Modèles MultiProbe Goodness-of-fit test
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |13
& Models Measurement Pre-treatment VT estimation VT Estimatio 7 frequencies Models reading Models (Memory) { } { } { } { }p { }pq i i q T V T V T V T V T V M M M M M , , , , , 1 1 1 1 L M M L { }V ˆˆ,T estimated
Based on Hypothesis testing
Models storage Goodness-of-fit test Kolmogorov-Smirnov test MultiProbe (sensor: 7 ROs) Estimation VT Modèles MultiProbe pValue Build the CDF
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |14
& Estimation VT Simulation results ᴏ Real state V = 0,83V T = 12°C x Estimated state V = 0,831V T = 11,66°C
Estimation mean errors
µεV=2,42mV, σεV=5,00mV µεT=-0,58°C, σεT=7,46°C ⇒Depend on: Number of models Statistical test Measurement pre-treatment … { }V ˆˆ,T estimated Aggregation
Weighted mean value
… pValue 7 Fréquences Modèles Lecture Models (emory) { } { } {} { }p { }pq i i q T V T V T V T V T V M M M M M , , , , , 1 1 1 1 L M M L Modèles Stockage Goodness-of-fit test Kolmogorov-Smirnov test MultiProbe (sensor: 7 ROs) Estimation VT Modèles MultiProbe Constructeur de CDF
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |15
&
V
Temporal performances
Software implementation 2500 clock cycles per model evaluation ⇒ 605µs @ 500MHz
Hardware (dedicated) implementation 42 cycles per model
10kbits of memory 9k gates
⇒ 10µs @ 500MHz
SThorm : 4x(16 cores+ 8 MProbes) Modèles Estimation VT MultiProbe
10µs
Voltage Estimation ???
Faster version for V estimation
Cliquez pour modifier le style du titreTemporal performances
Monitoring / estimation using both methods ( VT and V) V x15 T V Estimation VT Modèles MultiProbe
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |17
&
Validation : performed on STHORM platform
Estimation VT
Modèles MultiProbe
Measurements performed in an oven SThorm : 4x(16 cores+ 8 MProbes)
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |18
&
Validation
Measurements on a multiprobe in
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |19
&
What can we do now?
Fast online thermal floorplan construction : know the
temperature at desired location (not sensor location)
Mitigation using thermal aware scheduling (e.g. OpenCL) at cluster granularity
DEMO at DAC 2014
References
• L. Vincent, P. Maurine, S. Lesecq, and E. Beigné, “Embedding Statistical Tests for on-chip Dynamic Voltage and Temperature Monitoring,” DAC2012
• L. Vincent, P. Maurine, E. Beigne, S. Lesecq and J. Mottin, "Temperature and Fast Voltage On-Chip Monitoring using Low-Cost Digital Sensors", VARI 2013
Im p le m e n ta tio n Im p le m e n ta tio n
VT estimation task allocation under thermal constraints
Heat dissipated by 1 PE
Cliquez pour modifier le style du titreOutline
(Process &) Voltage &Temperature estimation Choice of set point (F, Vdd, Vbb)
Advanced technologies new “parameter” to be adjusted
Control of set-points
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |21
&
Objective: choose the set point (F, Vdd, Vbb) under performance constraints
A third actuator with large output range?
Power domain (with on-chip actuators)
PE Control ? F actuator Vddactuator Vbbactuator Ftarget F Ptot FD-SOI F Vbb Vdd,3 Vdd,2 Vdd,1 Vbb,min Vbb,max
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |22
&
3 continuous actuators
Vdd∈[Vdd,min, Vdd,max]
Vbb∈[Vbb,min, Vbb,max]
F ∈[Flow, Fhigh]
2 continuous and 1 discrete
Vdd= Vdd,i, with i=1..n (implem cts)
Vbb∈[Vbb,min, Vbb,max]
F ∈[Flow, Fhigh]
State of the art
Traditionally 2 actuators
F &Vdd ++
F &Vbb +
Vdd&Vbb
Ptot(F) profile is convex [1]
3 actuators
Vbbis modified once F &Vdd New opportunities
Dynamic management? [2]
Combination of actuators Continuous vs discrete actuator
F Ptot
[2] F. Firouzi, et al., ‘‘Dynamic soft error hardening via joint body biasing and dynamic voltage scaling’’, Euromicro 2011
[1] R. Rao, et al., ‘‘Energy optimal speed control of devices with discrete speed sets’’, DAC 2005
⇒Choose appropriate configuration (F, Vdd, Vbb) to minimize power consumption in the case of 2 continuous actuators and 1 discrete
⇓
Power domain (with on-chip actuators)
PE Control ? F actuator Vddactuator Vbbactuator
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |23
&
Voltage Frequency Island
PE Control ? F actuator Vddactuator Vbbactuator Ftarget F Ptot Ftarget Vdd,2,Vbb,1 Vdd,2,Vbb,2 F Ptot F1
Motivations & Assumptions
Several configurations for Ftarget Apply Ftargetis not optimal
Assumptions
Which configuration (F, Vdd, Vbb) should be applied to minimize power consumption under performance constraints?
Vdddiscrete, Vbb&F continuous Ptotknown
For given (Vdd, F), Vbbis adjusted to minimize Ptot
Known performance constraints Ftarget
F Ptot Pi Ftarget=Fi Vdd,1, Vbb Vdd,2, Vbb
Cliquez pour modifier le style du titre
Ptot(F)
PWCS
F F Ptot
Mode 1 (M1): Apply one configuration at Ftarget
belonging to the PWCS
Mode 2 (M2): Apply the 2 closest configurations in the
PWCS hopping execution
Ftargetin the
PWCS ?
Ftarget
Proposition
3 actuators out of which one is discrete
implementation constraints Proposed method Selection phase PWCS Execution phase Ftarget Ptot(F) PWCS M1 M2 Yes No Ftarget ⇒Ensure optimal power consumption on the whole frequency range A B
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |25
& Ptot(F) PWCS 0 400 800 1200 600 1600 2600 Ptot (mW) F(MHz) Vdd=1.3V Vdd=1.1V Vdd=0.9V Vdd=0.7V 0 400 800 1200 600 1600 2600 Ptot (mW) F(MHz) Vdd=1.3V Vdd=1.1V Vdd=0.9V Vdd=0.7V
Results on a DSP in FDSOI technology
DSP in STM 28 nm FD-SOI
Hopping execution (M2) vs. applying directly Ftargetwith min(Ptot)
From ISSCC 2014. R. Wilson, E. Beigne, et al., ‘‘A 460 MHz at 397mV, 2.6 GHz at 1.3V, 32b VLIW DSP, embedding Fmax tracking’’.
0 400 800 1200 600 1600 2600 Ptot (mW) F(MHz) Vdd=1.3V Vdd=1.1V Vdd=0.9V Vdd=0.7V Ptot(F) PWCS M1 M2 0 8 16 20 Power saving (%) 600 1600 2600 F(MHz) Vdd= {0.7, 0.9, 1.1, 1.3} V Vbb∈[0, 1.5] V F ∈[700, 2560] MHz Up to 17 % power saving
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |26
&
Outline
(Process &) Voltage &Temperature estimation Choice of set point (F, Vdd, Vbb)
Control of set-points
And particular implementation…
Power domain (with on-chip actuators)
PE Control ? F actuator Vddactuator Vbbactuator
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |27
&
Dynamic Voltage and Frequency Scaling
“Continuous” clock actuator
“Continuous” voltage actuator
V-F relation
How to ensure staying in functional zone? Processing Element Frequency Actuator CLK Vdd Voltage Actuator Timing Faults Functional Zone Non Functional Zone
Cliquez pour modifier le style du titre
Stay in safe domain? Coupled Drivers (difficult resuse)
Voltage Controlled-Oscillator (VCO)
Imprecise clock frequency output (with jitter)
Not used in the recent technologies due to PVT variability
Jointly designed actuators
Reuse is difficult Processing Element Energy Management Unit Voltage Actuator VCO CLK Vdd Vdd
T. Burd, T. Pering, A. Stratakos and R. Brodersen. “A dynamic voltage scaled microprocessor system”. In
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |29
&
Stay in safe domain? Non-Coupled Drivers (promote reuse!)
Phase- or Frequency-Locked Loop (PLL or FLL)
Predefined sequence
Poor power efficiency
Poor Performance during transient
Processing Element Energy Management Unit Voltage Actuator Frequency Actuator CLK Vdd Functional Zone Non Functional Zone , ,
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |30
&
Coupled actuators: Joint Control Coupled Voltage-Frequency Control
Functional Zone Non Functional Zone , , Processing Element Top-level Controller Joint Control Voltage Actuator Frequency Actuator CLK Vdd P0 F0 Energy Management Unit
Objective: mechanism to: Jointly control V & F transient periods
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |31
& Joint Control F-Lim V-Lim + ΔV ΔF + V-Ref
JOINT CONTROL BLOCK
′ ′′ ′ ′′ Frequency Actuator Voltage Actuator Hypotheses:
Closed-loop V & F actuators V & F are measurable without delay V & F actuators are “black-box models” Linear relation for V-F
Cliquez pour modifier le style du titreHardware Implementation?
GenV GenF CalcVpath CalcFpath P0 F0 ! " ! " " " ∆ ∆ SetRef Control %& '( ) ∗ %& '( + / ∆ ( + ! " ( ) ∆ ( -./ 0123 ∆ ( + ! " ( ) ∆ ( -./0123
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |33
&
Results
Nearly the size (area) of a Digital Frequency-Locked Loop¹
(Vdd= 0.9V; T = -40°C)
¹C. Albea, D. Puschini, S. Lesecq and E. Beigné. “Optimal and robust control for a small-area FLL”. In Proc.
IEEE Mediterranean Conference on Control & Automation, pp. 1100 – 1105, 2011.
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |34
& Non-Coupled Performance: 3.81K cycles Functional Zone Non Functional Zone , ,
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |35
& Coupled Performance: 5.81K cycles (52.33%) Functional Zone Non Functional Zone , ,
Cliquez pour modifier le style du titreComparison
VF Plot Under-clocking
(difference between the path and the reference)
91.71%
Improve energy efficiency during transitions Promote design reuse
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |37
&
Outline
(Process &) Voltage &Temperature estimation Choice of set point (F, Vdd, Vbb)
Control of set-points
And particular implementation…
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |38
&
Dedicated platforms
e.g. H.264 hardware encoder
Cliquez pour modifier le style du titre
S. Lesecq STAARS workshop | May 27, 2014
© CEA. All rights reserved |39
&
Dedicated platforms
VENGME platform (with VNU, Hanoi)
FIFOs between blocks FIFOS inside blocks
Split in various power domains Adapt (Vdd, F) in order to - Meet perf. constraints - Decrease PW consumption
Cliquez pour modifier le style du titreSummary
Needs for control in micro-electronics
Hardware digital implementation Extra power consumption
Extra Silicon area
Simple problems complex ones
implem constraints,
Complex problems per se
MEMS, New DCDC architectures
Silicon photonics for manycore in future micro-servers
Thermal “tuning” to compensate for intrinsic resonance shift (due to PVT variability)
Centre de Grenoble 17 rue des Martyrs 38054 Grenoble Cedex
Centre de Saclay Nano-Innov PC 172 91191 Gif sur Yvette Cedex