NASA Center for Climate Simulation
SR-IOV
In
High Performance
Computing
Hoot Thompson & Dan Duffy
NASA Center for Climate Simulation NASA Goddard Space Flight Center
Greenbelt, MD 20771
NASA Center for Climate Simulation
2/13/2012 2
NASA Center for Climate Simulation
• Focus on the research side of climate study (versus NOAA’s operational position) • Simulations span multiple time scales
– Days for weather prediction
– Seasons to years for short term climate prediction – Centuries for climate change projection
• Examples:
– High fidelity 3.5 KM global simulations of cloud and hurricane predictions – Comprehensive reanalysis of the last thirty years of weather/climate –MERRA – Multi-millennium analysis for the Intergovernmental Panel on Climate Change • Integrated set of supercomputing, visualization and data management technologies
– Discover computational cluster
• 30K traditional Intel cores plus 64 GPUs, roughly 400 TFlops • DDR/QDR Infiniband (IB) backbone
• 1 GbE and 10 GbE management infrastructure
• ~4 PBytes RAID based shared parallel file system (GPFS) – Tape archive of over 20 PBytes
NASA Center for Climate Simulation
Discover IB/GPFS Architecture
B 1 2 3 4 5a 5b 6a 6b
• Base Unit: 512 Dempsey (3.2 GHz)
• SCU1: 1,024 Woodcrest (2.66 GHz) • SCU2: 1,024 Woodcrest (2.66 GHz) • SCU3: 3,096 Westmere (2.8 GHz) • SCU4: 3,096 Westmere (2.8 GHz) 24 DDR IB uplinks to each unit 24 DDR IB uplinks to each unit 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) • SCU5: 4,096 Nehalem (2.8 GHz) • SCU6: 4,096 Nehalem (2.8 GHz) • SCU7: 14,400 Westmere (2.8 GHz) Data Analysis
Data File Systems: Data Direct Networks
S2A9500 S2A9550 S2A9900 Metadata File Systems:
IBM/Engenio DS4700
Each circle represents a 288-port DDR IB Switch
Brocade 48000
7a-7e
The triangle represents a 2-to-1 QDR IB Switch fabric
NASA Center for Climate Simulation
2/13/2012 4
Nebula – NASA’s Cloud
• Open-source (OpenStack) cloud computing project and service • Alternative to costly construction of additional data centers • Sharing portal for NASA scientists and researchers
– Large, complex data sets
– External partners and the public.
• Nebula comprised of two components/containers – Nebula west at NASA AMES
– Nebula east at NASA GSFC
• NCCS team evaluating Nebula as adjunct to Discover hosted science processing • Key question can clouds match HPC level of capability needed for climate research • Potential obstacle – clouds primarily exist in virtualized space
– Overhead or loss due to virtual machine (VM) versus bare metal
– Node-to-node communication critical – high speed, low latency, RDMA
Intel Developer Forum
NASA Center for Climate Simulation
2/13/2012 5
Background And Proposition
• Background
– Discover’s performance tied to it’s DDR/QDR IB fabric
– Nebula, clouds in general, 10 GE based
• Question – can clouds deliver HPC level of performance?
– Can 10GE compete with high speed, low latency IB?
– What network performance is lost due to virtualization?
– What computational performance is lost due to virtualization?
• Proposition – typical NCCS model
– Build test bed to investigate the virtualization technologies
– Work with vendors to answer questions and address issues
NASA Center for Climate Simulation
2/13/2012 6
Methodology and Objectives
•
Compare bare metal against virtualized NIC
– Full software virtualization (SW Virt) – device emulation
– Virtio – split driver, para-virtualization
– Single Root IO Virtualization (SR-IOV)
• Direct assignment
• Mapped Virtual Function (VF)
•
Determine overhead of executing within VM construct
– VM to VM communication
• Base Network
• Message passing environment (mvapich2)
– Application
• Single node, multi-core
• Multi-node, multi-core
•
Draw conclusions and comparisons with Discover and Nebula
Intel Developer Forum
NASA Center for Climate Simulation
2/13/2012 Intel Developer Forum 7
Benchmarks
Started from the basic benchmarks to analyze system performance and build
up towards the application layer
Benchmark Version Description Download
Nuttcp nuttcp-7.1.5.c gcc compiler
Measure raw network
bandwidth, similar to netperf:
http://lcp.nrl.navy.mil/nuttcp OSU MPI Benchmarks MVAPICH2 1.7rc1 Intel compiler
Test latencies and bandwidths of most common MPI functions.
http://mvapich.cse.ohio-state.edu/
Linpack 10.2.6
Intel compiler
Intel version of Linpack
http://software.intel.com/en- us/articles/intel-math-kernel-library-linpack-download/
NAS PB 3.3.1
Intel compiler
NASA Parallel Benchmarks; CFD kernel benchmarks
http://www.nas.nasa.gov/Resources/Soft ware/npb.html
NASA Center for Climate Simulation
2/13/2012 Intel Developer Forum 8
Configuration Bare1 Bare2 VM1 VM2
Processor Type Intel Nehalem Intel Nehalem Intel Nehalem Intel Nehalem
Processor Number E5520 E5520 E5520 E5520
Processor Speed 2.27 GHz 2.27 GHz 2.27 GHz 2.27 GHz
Cores per Socket 4 4 4 4
Number of Sockets 2 2 2 2
Cores per Node 8 8 8 8
Theoretical Peak 72.64 GF 72.64 GF 72.64 GF 72.64 GF
Main Memory 48 GB 48 GB 16 GB 16 GB
Operating System Ubuntu 11.04 Ubuntu 11.04 Ubuntu 11.04 Ubuntu 11.04
Kernel 2.6.38-10.server 2.6.38-10.serve 2.6.38-10.server 2.6.38-10.server
Hypervisor KVM KVM N/A N/A
Hyperthreading Off Off Off Off
NASA Center for Climate Simulation
2/13/2012 9
Test Configuration
Intel Developer Forum
R&D Network '\ Q) Q) <J) ::::; ::::; <Ii ro Cll > > CD CD 10.10101 10.10.10.2 -.:t -.:t -.:t -.:t 0 0 Intel 1000 1 10002 Intel 0 0 ~ ~ ~ ~ ~ 82599EB 82599EB ~ ~ 10.10.20.1 10.10.20.2 ~ :::J ~ :::J 10.0.10.1 10.0.10.2 ~ :::J ~ :::J ~ C C C C :::J :::J :::J :::J .0 .0 .0 .0 ::J ::J ::J ::J " / Dell R710 Dell R710 ---> E5520 2.27GHz (X2) -;. E5520 2.27GHz (X2) -;. 48GB -;. 48GB
NASA Center for Climate Simulation 2/13/2012 10
Nuttcp Results
Bare to Bare VM to VM Sw Virt VM to VM Virtio VM to VM SR-IOV 4418.8401 Mbps 0 retrans 8028.6459 Mbps 0 retrans 9392.7072 Mbps 0 retrans 9415.2675 Mbps 0 retrans 9341.4362 Mbps 733 retrans 9354.0999 Mbps 208 retrans 9414.7318 Mbps 0 retrans 9414.8207 Mbps 0 retrans 9414.9368 Mbps 0 retrans 9415.1618 Mbps 0 retrans 137.3301 Mbps 0 retrans 145.6024 Mbps 0 retrans 145.7500 Mbps 0 retrans 138.5963 Mbps 0 retrans 141.8702 Mbps 0 retrans 146.1092 Mbps 0 retrans 146.3042 Mbps 0 retrans 146.4449 Mbps 0 retrans 146.2758 Mbps 0 retrans 146.1043 Mbps 0 retrans 5864.0557 Mbps 212 retrans 5678.0625 Mbps 0 retrans 5973.2256 Mbps 0 retrans 6309.8478 Mbps 0 retrans 6223.4034 Mbps 7 retrans 6311.3896 Mbps 0 retrans 6316.7924 Mbps 0 retrans 5955.8176 Mbps 0 retrans 5746.2926 Mbps 0 retrans 5692.8146 Mbps 0 retrans 9151.5769 Mbps 0 retrans 9408.0323 Mbps 0 retrans 8714.4063 Mbps 34 retrans 9313.8894 Mbps 7 retrans 9251.8453 Mbps 0 retrans 9193.1103 Mbps 0 retrans 9348.2984 Mbps 0 retrans 9101.7356 Mbps 73 retrans 8958.5032 Mbps 16 retrans 9228.5370 Mbps 0 retransNASA Center for Climate Simulation
2/13/2012 11
OSU Benchmarks Results – Bandwidth
Intel Developer Forum
0 100 200 300 400 500 600 700 800 900 1000 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 T hr o ug hput ( M B y tes /s ec)
Message Size (Bytes)
Bare to Bare VM to VM SRIOV VM to VM Virtio
Be
tt
er
NASA Center for Climate Simulation
2/13/2012 Intel Developer Forum 12
OSU Benchmarks Results – Latency
0 2000 4000 6000 8000 10000 12000 0 1000 2000 3000 4000 5000 L a tency ( m icr o seco nds )
Message Size (MBytes)
Bare to Bare VM to VM SRIOV VM to VM Virtio
Be
tt
er
NASA Center for Climate Simulation
2/13/2012 Intel Developer Forum 13
OSU Benchmarks Results – Latency (Small)
0 20 40 60 80 100 120 140 160 180 200 0 2,000 4,000 6,000 8,000 10,000 L a tency ( m icr o seco nds )
Message Size (Bytes)
Bare to Bare VM to VM SRIOV VM to VM Virtio
Be
tt
er
NASA Center for Climate Simulation
2/13/2012 14
Linpack Benchmarks Results
Intel Developer Forum
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 0 10000 20000 30000 40000 50000 60000 % o f P ea k P erf o rm a nce Problem Size (N)
Ubuntu1& Ubuntu2 - Bare Metal VM to VM SRIOV
VM to VM Virtio
Be
tt
NASA Center for Climate Simulation
2/13/2012 15
Going Forward
•
Conclusions to-date
– Clear advantages to SR-IOV technology
– Cloud based HPC feasible
– Data requires further analysis to understand Nebula implications
•
Issues/concerns
– TCP Slow start, variability and retran impact on HPC processing
•
Additional testing to close the gap
– More application testing – NAS Parallel and HPCC benchmarks
– Jumbo frames (9000 MTU)
– Bare metal-to-bare metal and VM-to-VM IB
– Different hypervisor – XEN
– Other VM guest types – RedHat, SUSE
– Multiple VMs running, bandwidth sharing
– Add cloud infrastructure to test setup – Openstack, Eucalyptus
NASA Center for Climate Simulation
Intel Developer Forum 16 2/13/2012