• No results found

Mono-Processor, DSP, smart sensor nodes Many-Processor System-on-Chip (MPSoC) Dedicated computing platforms, e.g. H.264 hardware encoder

N/A
N/A
Protected

Academic year: 2021

Share "Mono-Processor, DSP, smart sensor nodes Many-Processor System-on-Chip (MPSoC) Dedicated computing platforms, e.g. H.264 hardware encoder"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

&

www.cea.fr

Grenoble Workshop on Autonomic Computing and Control

May 27, 2013

Suzanne LESECQ, CEA, LETI, DACLE/LIALP

[email protected]

D. Puschini, E. Beigné, W. Lombardi, A. Molnos, J. Mottin, V. Olive

L. Vincent (post-doc fellow with Persyval-lab) Y. Akgul, M. Altieri, N.-M. Nguyen, M. Becher T. Ducroux (with STM)

Application of control theory to ManyProcessor

System-on-Chip (MPSoC) (computing platforms)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |2

&

Context

Computing platforms Embedded systems

Mono-Processor, DSP, smart sensor nodes… Many-Processor System-on-Chip (MPSoC)

Dedicated computing platforms, e.g. H.264 hardware encoder

Main challenge for embedded (mobile) platforms

Power consumption P ↘↘ under performance constraints (Ftarget)

Processing Element

(PE)

Supply voltage Vdd

Body bias Vbb

Clock Freq. F Task to be finished before its deadline DL

Power consumption Temperature increase Task t Ftarget DL F

(2)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |3

& Power Domain Processing Element Global Control Power Domain Power Domain Power Domain Power Domain Vdd actuator F actuator Local DVFS control Fast local adjustment

Adaptive architecture to mitigate local but also dynamic PVT variations need for T°,V evolution

Info extract

(Data fusion)

Main objective …

Objective: Reach the most

energy efficientand safe

operating point

Cliquez pour modifier le style du titre… With constraints …

Advanced technologies

Power consumption highly depends on temperature Thermal runaway!

Low complexity of the control strategies

(3)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |5

&

… Under Variability …

Dynamics for Process-Voltage-Temperature variations very different

Process Voltage Temperature

P T

Keng L. Wong and al., “Enhancing Microprocessor Immunity to Power Supply Noise With Clock-Data Compensation”, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 4, APRIL 2006

LIRMM

J. Altet and al., “Thermal couplingin integrated circuits: application to thermal testing,” Solid-StateCircuits, IEEE Journal of, vol. 36, pp. 81–91, 2001.

P. Li and al., “Efficient full-chip thermalmodeling and analysis,” Computer Aided Design, 2004. ICCAD-2004.IEEE/ACM International Conference on, pp. 319–326, 2004.

Rui Zheng and al. , "Circuit Aging Prediction for Low-Power Operation”, CICC, 2009 J. Cain andal., "Electrical linewidth metrology for systematic CD variation characterization and causal analysis," Metrologt, Inspection, and Process Control for Microlithography XVII, Proceedings of SPIE, vol. 5038, pp. 350-361, 2003.

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |6

&

Adaptive architecture: power management

Power consumption

Timing faults to be avoided

Temperature measurement/estimation ) , ), (, , ( ) , , , (V V T techno P F V2 T V activity P P= stat dd bb + dyn dd bb time clk clk Timing fault Non functional zone F Vdd T1 Vmin Vmax

(4)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |7

&

From previous talks (Alberto, Erik, Ada)

“Modulating control”

Functional system (without our control/observation tools)

Improve power efficiency Improve performances

System #

Finite set of values

#

Finite set of values

Cliquez pour modifier le style du titre

Back to Adaptive architecture Main issues (and FDSOI technology)

Global Control (Scheduler, OS) Power Domain Core V actuator F actuator Local control Vbbactuator TSM Multi probe TSM Multi probe

Fast local adjustment

F V P F {PMi} {V,T} estimated Info extract (Data fusion) Multi probe Activity Ring-oscillator #1 Stage 1Stage 2 Stage n &

Scan in Scan out

. . . Start / Stop Test A d re ss d e co d e r 8 -t o -1 m u lt ip le xe r Ring-oscillator #2 Ring-oscillator #7

28 bits Counter Over-flow bit 3 bits Config DL WL Power α.DL t P1 P2 DL (1-α).DL Closed-loop systems F Ptot

(5)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |9

&

Outline Platform level

(Process &) Voltage &Temperature estimation

Sensor

Local estimation of V and T Validation on a hardware platform

Choice of set point (F , Vdd (, Vbb)) Control of set-points

And particular implementations …

VT Estimation Models

(Memory)

MultiProbe

Sensor: 7 ROs

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |10

&

Variability sensors

Monitor local variations of V and T

Integrated (on-chip) sensors

Dedicated sensors

Precise, absolute value but limited V,T functioning range Analog→ large size + ADC

General purpose sensors

Ring-Oscillator : FRO= f (P,V,T)

Proposed sensor: MultiProbe,a set of ROs co-located

⇒Standard cell : easy conception

⇒small: easilly integrated etreplicatedon chip

⇒V and T not directly read

Counter

ROs

31µm x 14.4µm = 450 µm² in

CMOS 32nm

Ring-oscillator #1

Stage 1 Stage 2 Stage n &

Scan in Scan out

. . . Start / Stop Test A d re ss d e co d e r 8 -t o -1 m u lt ip le x e r Ring-oscillator #2 Ring-oscillator #7 28 bits Counter

Over-flow bit 3 bits Config

Estimation VT

(6)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |11

&

Multiprobe sensor

7 Ring Oscilatorswith different architectures

Exploit different behavioursin

order to estimate the Voltage and Temperature values Ring-oscillator #1

Stage 1 Stage 2 Stage n &

Scan in Scan out

. . . Start / Stop Test A d re ss d e co d e r 8 -t o -1 m u lt ip le x e r Ring-oscillator #2 Ring-oscillator #7

28 bits Counter flow bitOver- Config3 bits

Estimation VT

Modèles MultiProbe

LowTherm

Cliquez pour modifier le style du titreVT estimation: Principle

Power domain Observers? model…

Comparison between models and mesure

Processing Element { }V ˆˆ,T Estimat ed VT estimation Models (Memory) { }V ˆˆ,T estimation MultiProbe (sensor: 7 ROs) MProbe MProbe Info extract (Data fusion) Estimation VT Modèles MultiProbe Goodness-of-fit test

(7)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |13

& Models Measurement Pre-treatment VT estimation VT Estimatio 7 frequencies Models reading Models (Memory) { } { } { } { }p { }pq i i q T V T V T V T V T V M M M M M , , , , , 1 1 1 1 L M M L { }V ˆˆ,T estimated

Based on Hypothesis testing

Models storage Goodness-of-fit test Kolmogorov-Smirnov test MultiProbe (sensor: 7 ROs) Estimation VT Modèles MultiProbe pValue Build the CDF

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |14

& Estimation VT Simulation results ᴏ Real state V = 0,83V T = 12°C x Estimated state V = 0,831V T = 11,66°C

Estimation mean errors

µεV=2,42mV, σεV=5,00mV µεT=-0,58°C, σεT=7,46°C ⇒Depend on: Number of models Statistical test Measurement pre-treatment … { }V ˆˆ,T estimated Aggregation

Weighted mean value

pValue 7 Fréquences Modèles Lecture Models (emory) { } { } {} { }p { }pq i i q T V T V T V T V T V M M M M M , , , , , 1 1 1 1 L M M L Modèles Stockage Goodness-of-fit test Kolmogorov-Smirnov test MultiProbe (sensor: 7 ROs) Estimation VT Modèles MultiProbe Constructeur de CDF

(8)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |15

&

V

Temporal performances

Software implementation 2500 clock cycles per model evaluation ⇒ 605µs @ 500MHz

Hardware (dedicated) implementation 42 cycles per model

10kbits of memory 9k gates

10µs @ 500MHz

SThorm : 4x(16 cores+ 8 MProbes) Modèles Estimation VT MultiProbe

10µs

Voltage Estimation ???

Faster version for V estimation

Cliquez pour modifier le style du titreTemporal performances

Monitoring / estimation using both methods ( VT and V) V x15 T V Estimation VT Modèles MultiProbe

(9)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |17

&

Validation : performed on STHORM platform

Estimation VT

Modèles MultiProbe

Measurements performed in an oven SThorm : 4x(16 cores+ 8 MProbes)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |18

&

Validation

Measurements on a multiprobe in

(10)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |19

&

What can we do now?

Fast online thermal floorplan construction : know the

temperature at desired location (not sensor location)

Mitigation using thermal aware scheduling (e.g. OpenCL) at cluster granularity

DEMO at DAC 2014

References

• L. Vincent, P. Maurine, S. Lesecq, and E. Beigné, “Embedding Statistical Tests for on-chip Dynamic Voltage and Temperature Monitoring,” DAC2012

• L. Vincent, P. Maurine, E. Beigne, S. Lesecq and J. Mottin, "Temperature and Fast Voltage On-Chip Monitoring using Low-Cost Digital Sensors", VARI 2013

Im p le m e n ta tio n Im p le m e n ta tio n

VT estimation task allocation under thermal constraints

Heat dissipated by 1 PE

Cliquez pour modifier le style du titreOutline

(Process &) Voltage &Temperature estimation Choice of set point (F, Vdd, Vbb)

Advanced technologies new “parameter” to be adjusted

Control of set-points

(11)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |21

&

Objective: choose the set point (F, Vdd, Vbb) under performance constraints

A third actuator with large output range?

Power domain (with on-chip actuators)

PE Control ? F actuator Vddactuator Vbbactuator Ftarget F Ptot FD-SOI F Vbb Vdd,3 Vdd,2 Vdd,1 Vbb,min Vbb,max

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |22

&

3 continuous actuators

Vdd∈[Vdd,min, Vdd,max]

Vbb∈[Vbb,min, Vbb,max]

F ∈[Flow, Fhigh]

2 continuous and 1 discrete

Vdd= Vdd,i, with i=1..n (implem cts)

Vbb∈[Vbb,min, Vbb,max]

F ∈[Flow, Fhigh]

State of the art

Traditionally 2 actuators

F &Vdd ++

F &Vbb +

Vdd&Vbb

Ptot(F) profile is convex [1]

3 actuators

Vbbis modified once F &Vdd New opportunities

Dynamic management? [2]

Combination of actuators Continuous vs discrete actuator

F Ptot

[2] F. Firouzi, et al., ‘‘Dynamic soft error hardening via joint body biasing and dynamic voltage scaling’’, Euromicro 2011

[1] R. Rao, et al., ‘‘Energy optimal speed control of devices with discrete speed sets’’, DAC 2005

Choose appropriate configuration (F, Vdd, Vbb) to minimize power consumption in the case of 2 continuous actuators and 1 discrete

Power domain (with on-chip actuators)

PE Control ? F actuator Vddactuator Vbbactuator

(12)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |23

&

Voltage Frequency Island

PE Control ? F actuator Vddactuator Vbbactuator Ftarget F Ptot Ftarget Vdd,2,Vbb,1 Vdd,2,Vbb,2 F Ptot F1

Motivations & Assumptions

Several configurations for Ftarget Apply Ftargetis not optimal

Assumptions

Which configuration (F, Vdd, Vbb) should be applied to minimize power consumption under performance constraints?

Vdddiscrete, Vbb&F continuous Ptotknown

For given (Vdd, F), Vbbis adjusted to minimize Ptot

Known performance constraints Ftarget

F Ptot Pi Ftarget=Fi Vdd,1, Vbb Vdd,2, Vbb

Cliquez pour modifier le style du titre

Ptot(F)

PWCS

F F Ptot

Mode 1 (M1): Apply one configuration at Ftarget

belonging to the PWCS

Mode 2 (M2): Apply the 2 closest configurations in the

PWCS hopping execution

Ftargetin the

PWCS ?

Ftarget

Proposition

3 actuators out of which one is discrete

implementation constraints Proposed method Selection phase PWCS Execution phase Ftarget Ptot(F) PWCS M1 M2 Yes No Ftarget ⇒Ensure optimal power consumption on the whole frequency range A B

(13)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |25

& Ptot(F) PWCS 0 400 800 1200 600 1600 2600 Ptot (mW) F(MHz) Vdd=1.3V Vdd=1.1V Vdd=0.9V Vdd=0.7V 0 400 800 1200 600 1600 2600 Ptot (mW) F(MHz) Vdd=1.3V Vdd=1.1V Vdd=0.9V Vdd=0.7V

Results on a DSP in FDSOI technology

DSP in STM 28 nm FD-SOI

Hopping execution (M2) vs. applying directly Ftargetwith min(Ptot)

From ISSCC 2014. R. Wilson, E. Beigne, et al., ‘‘A 460 MHz at 397mV, 2.6 GHz at 1.3V, 32b VLIW DSP, embedding Fmax tracking’’.

0 400 800 1200 600 1600 2600 Ptot (mW) F(MHz) Vdd=1.3V Vdd=1.1V Vdd=0.9V Vdd=0.7V Ptot(F) PWCS M1 M2 0 8 16 20 Power saving (%) 600 1600 2600 F(MHz) Vdd= {0.7, 0.9, 1.1, 1.3} V Vbb∈[0, 1.5] V F ∈[700, 2560] MHz Up to 17 % power saving

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |26

&

Outline

(Process &) Voltage &Temperature estimation Choice of set point (F, Vdd, Vbb)

Control of set-points

And particular implementation…

Power domain (with on-chip actuators)

PE Control ? F actuator Vddactuator Vbbactuator

(14)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |27

&

Dynamic Voltage and Frequency Scaling

“Continuous” clock actuator

“Continuous” voltage actuator

V-F relation

How to ensure staying in functional zone? Processing Element Frequency Actuator CLK Vdd Voltage Actuator Timing Faults Functional Zone Non Functional Zone

Cliquez pour modifier le style du titre

Stay in safe domain? Coupled Drivers (difficult resuse)

Voltage Controlled-Oscillator (VCO)

Imprecise clock frequency output (with jitter)

Not used in the recent technologies due to PVT variability

Jointly designed actuators

Reuse is difficult Processing Element Energy Management Unit Voltage Actuator VCO CLK Vdd Vdd

T. Burd, T. Pering, A. Stratakos and R. Brodersen. “A dynamic voltage scaled microprocessor system”. In

(15)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |29

&

Stay in safe domain? Non-Coupled Drivers (promote reuse!)

Phase- or Frequency-Locked Loop (PLL or FLL)

Predefined sequence

Poor power efficiency

Poor Performance during transient

Processing Element Energy Management Unit Voltage Actuator Frequency Actuator CLK Vdd Functional Zone Non Functional Zone , ,

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |30

&

Coupled actuators: Joint Control Coupled Voltage-Frequency Control

Functional Zone Non Functional Zone , , Processing Element Top-level Controller Joint Control Voltage Actuator Frequency Actuator CLK Vdd P0 F0 Energy Management Unit

Objective: mechanism to: Jointly control V & F transient periods

(16)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |31

& Joint Control F-Lim V-Lim + ΔV ΔF + V-Ref

JOINT CONTROL BLOCK

′ ′′ ′ ′′ Frequency Actuator Voltage Actuator Hypotheses:

Closed-loop V & F actuators V & F are measurable without delay V & F actuators are “black-box models” Linear relation for V-F

Cliquez pour modifier le style du titreHardware Implementation?

GenV GenF CalcVpath CalcFpath P0 F0 ! " ! " " " ∆ ∆ SetRef Control %& '( ) ∗ %& '( + / ∆ ( + ! " ( ) ∆ ( -./ 0123 ∆ ( + ! " ( ) ∆ ( -./0123

(17)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |33

&

Results

Nearly the size (area) of a Digital Frequency-Locked Loop¹

(Vdd= 0.9V; T = -40°C)

¹C. Albea, D. Puschini, S. Lesecq and E. Beigné. “Optimal and robust control for a small-area FLL”. In Proc.

IEEE Mediterranean Conference on Control & Automation, pp. 1100 – 1105, 2011.

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |34

& Non-Coupled Performance: 3.81K cycles Functional Zone Non Functional Zone , ,

(18)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |35

& Coupled Performance: 5.81K cycles (52.33%) Functional Zone Non Functional Zone , ,

Cliquez pour modifier le style du titreComparison

VF Plot Under-clocking

(difference between the path and the reference)

91.71%

Improve energy efficiency during transitions Promote design reuse

(19)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |37

&

Outline

(Process &) Voltage &Temperature estimation Choice of set point (F, Vdd, Vbb)

Control of set-points

And particular implementation…

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |38

&

Dedicated platforms

e.g. H.264 hardware encoder

(20)

Cliquez pour modifier le style du titre

S. Lesecq STAARS workshop | May 27, 2014

© CEA. All rights reserved |39

&

Dedicated platforms

VENGME platform (with VNU, Hanoi)

FIFOs between blocks FIFOS inside blocks

Split in various power domains Adapt (Vdd, F) in order to - Meet perf. constraints - Decrease PW consumption

Cliquez pour modifier le style du titreSummary

Needs for control in micro-electronics

Hardware digital implementation Extra power consumption

Extra Silicon area

Simple problems complex ones

implem constraints,

Complex problems per se

MEMS, New DCDC architectures

Silicon photonics for manycore in future micro-servers

Thermal “tuning” to compensate for intrinsic resonance shift (due to PVT variability)

(21)

Centre de Grenoble 17 rue des Martyrs 38054 Grenoble Cedex

Centre de Saclay Nano-Innov PC 172 91191 Gif sur Yvette Cedex

References

Related documents