9:00 - 9:10 Section 1 - Basic intro to power and energy
9:10 - 9:30 Section 2 - Devices for measuring power
9:30 - 9:45 Section 3 - Component specific measurement techniques
9:45-10:00 Section 4 - Advanced power measurement concepts
10:00-10:30 Section 5 – Memory and Compute on various platforms
10:30-11:00 Coffee Break (Dinning Hall)
11:00-11:30 Section 6 - Instruction-based power models
11:30-11:50 Section 7 - Open discussion
11:50-12:00 Section 8 - Summary and conclusion
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
Multicore and Uncore
7-10 watts per core
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
C-States
C-State Power
(watts)
Description
C0 33.2 Normal execution
C1 10.6 Core halted; Core state and L1 cache still resident C3 7.2 Core, L1, and L2 powered down
i7z
C-States
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
Voltage-‐Frequency Scaling
y = 0.11x + 0.63 0 0.2 0.4 0.6 0.8 1 1.2 0.0 1.0 2.0 3.0 4.0 C ore V ol ta ge (V ol ts) Frequency (GHz) Haswell 4770K DVFS Settings P-States Frequency (GHz) Voltage (Volts) 3.5 1.012 3.0 0.958 2.5 0.899 2.0 0.845 1.5 0.791 1.0 0.737Voltage-‐Frequency Scaling
y = 8.00e0.52x 0 10 20 30 40 50 60 70 0 0.5 1 1.5 2 2.5 3 3.5 4 A ve ra ge Po w er (W at ts) Frequency (GHz) HSW DVFS Power AVX Expon. (AVX) 2.00 2.20 2.40 2.60 2.80 3.00 0.0 1.0 2.0 3.0 4.0 Ef fici en cy (G F LO PS/ W at t) Frequency (GHz) HSW Energy Efficiency AVX-‐ -‐ Most efficient at 2.0 GHz
Voltage Frequency Scaling
Overclocking
[Nick Shih, Sep 2012] http://www.youtube.com/watch?v=968ZQ3a6pBM
Overclocked to 7.136 GHz
Voltage Frequency Scaling
0 20 40 60 80 100 0 1 2 3 Pow e r ( Watts) Frequency (GHz) Frequency Scaling DGEMM 1.2 V 1.1 V 1.0 V Linear (1.2 V) Linear (1.1 V) Linear (1.0 V) Performance 0 20 40 60 80 100 1.7 1.9 2.2 2.7 3.1 3.5 En e rg y (J o u le s) Frequency (GHz) Voltage-Frequency Scaling DGEMM Static Cost Dynamic Cost Efficient Operations Less OverheadAdvanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
Temperature Variations
Power (watts) Temperature (C)
Idle – Cold 7.5 40 Idle - Hot 9.2 55 Kernel -- Cold 45 51 Kernel -- Hot 48 72
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
SKU and Manufacturing Variability
64 Processors Ordered by Average Watts
A ve ra ge W at ts
NAS MG.C.8 -- Intel Xeon E5-2670
77.7 – 85.4 watts, Range of 10%
Source: Rountree, Barry, et al. "Beyond DVFS: A first look at performance under a
hardware-enforced power bound." Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International. IEEE, 2012.
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
Advanced Concepts
• Multicore and uncore
• Sleep states
• Voltage-frequency scaling
• Managing temperature variations
• SKU and manufacturing variability
• Synchronizing power measurements with
application phases
Impact of manufacturing process
Kenneth Czechowski, Victor W. Lee, Ed Grochowski, Ronny Ronen, Ronak Singhal, Pradeep Dubey, and Richard Vuduc. Improving the energy efficiency of big cores. In Proc. ACM/IEEE Int’l. Symp. on Computer Architecture (ISCA), Minneapolis, MN, USA, June 2014.
Generations of the Intel Core i7
LONGITUDINAL STUDY: CORE I7 PROCESSOR
Sandy Bridge! (2011) Ivy Bridge (2012) Haswell(2013) Nehalem (2009) Penryn (2007) Westmere (2010) 45nm 32nm 22nm
Core Nehalem Sandy Bdg Haswell
Microarchitecture Generation Process Tech nolog y Nod e Tock Tock Tock Tick Tick
Generations of the Intel Core i7
L
ONGITUDINALS
TUDY: C
ORE I7 P
ROCESSORSandy Bridge! (2011) Ivy Bridge (2012) Haswell(2013) Nehalem (2009) Penryn (2007) Westmere (2010) 45nm 32nm 22nm
Core Nehalem Sandy Bdg Haswell
Microarchitecture Generation Process Tech nolog y Nod e Tock Tock Tock Tick Tick
Impact of process technology
PROCESS TECHNOLOGY NODES y = 0.68x - 10.48 R² = 0.87 0.00 10.00 20.00 30.00 40.00 50.00 50.00 55.00 60.00 65.00 70.00 75.00 80.00 IVB Po w er W atts ) SNB Power (Watts)Impact of 22nm process technology step
y = 0.57x + 7.55 R² = 0.97 0.00 10.00 20.00 30.00 40.00 50.00 60.00 45.00 55.00 65.00 75.00 85.00 95.00 W SM Po w er (W atts ) NHM Power (Watts)
Impact of 32nm process technology step
NHM (45nm) vs WSM (32nm) SNB (32nm) vs IVB (22nm)