© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep dive on new 6th generation Amazon EC2 instances powered by the AWS Graviton2
processor
C M P 3 2 2 - R
Sudhir Raman
Sr Product Manager EC2 AWS
Ali Saidi
Principal Engineer AWS
Sudhir Kakulla Vice President Nielsen
Agenda
Amazon Elastic Compute Cloud (Amazon EC2) A1 instances and AWS Graviton processor
Customer success stories and Arm Software ecosystem momentum Next-gen AWS Graviton2 processors: deep dive
New 6th gen Amazon EC2 instances Customer walk-on: Nielsen
Key takeaways
Categories Capabilities Options
Broadest and deepest platform choice
General purpose Burstable
Compute intensive Memory intensive Storage (High I/O)
Dense storage GPU compute Graphics intensive
Elastic Block Store
Elastic Inference
270 +
for virtually every workload and business need Choice of processor
(AWS, Intel, AMD)
Fast processors
(up to 4.0 GHz)
High memory footprint
(up to 12 TiB)
Instance storage
(HDD and NVMe)
Accelerated computing
(GPUs and FPGA)
Networking
(up to 100 Gbps)
Bare Metal Size
(Nano to 32xlarge)
Innovating in the core platform
Broadest choice of processors
AWS Graviton processor (first gen)
Custom AWS silicon with 64-bit Arm Neoverse cores
Rapidly innovate, build, and iterate on behalf of customers
Targeted optimizations for cloud-native workloads
First instance powered by AWS Graviton processor
Optimized cost and performance for scale-out applications
Significant cost savings AWS Graviton Processor with 64-bit Arm Neoverse
cores and custom AWS silicon
Amazon EC2 A1
Run scale-out and Arm-based workloads in the cloud
Applications Configurations Availability
Scale-out workloads
Web tier
Containerized microservices Arm-based software development
6 instance sizes
Up to 16 vCPUs, 32GiB memory Up to 10 Gbps NW, 3.5 Gbps EBS
Bare metal options
Amazon EC2 A1 Instances
9 Regions
US (N. Virginia, Oregon, Ohio) EU (Ireland, Frankfurt)
APAC (Mumbai, Sydney, Tokyo, Singapore)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer success stories
over 250 million registered users 58 million downloads
A1 instances
40% cost savings
Customer success stories
Network Acceleration (Hybrid Infra)
Singapore Cluster N. Virginia Cluster
Tokyo Cluster
Chat Payment
Log DB
Games proxy1 proxy2
game1 game2 game3
manager
… …
… … … …
Southeast Asian User East Asian User EU&NA User
Customer success stories
Photo-serving tier on A1 instances PHP, Nginx, HAProxy
40% cost savings, seamless transition
Web traffic
Nginx, .NET core, Docker Significant savings
Data processing, market insights OpenJDK/Java
20% cost savings
K3s, Kubernetes for the edge on Arm CI infrastructure on A1 instances
No cross-compilation/emulation
Arm software ecosystem for Graviton instances
Tools and software
AWS
Marketplace AWS Systems
Manager Amazon
CloudWatch
AWS CodePipeline AWS Cloud9
AWS CodeCommit
Amazon
Inspector AWS Batch
AWS CodeBuild
Amazon Corretto OpenJDK
OSVs and ISVs
Amazon Linux 2
Red Hat Enterprise Linux 7.6 and newer
SUSE Linux Enterprise Server 15
+ Fedora, Debian, FreeBSD and more…
NGINX+ for Arm64 in AWS Marketplace
VMware ESXi + bare metal
Cadence Liberate Trio Characterization Suite Mentor's Questa Sim Supports Arm 64-Bit
Ubuntu 16.04LTS, 18.04LTS and newer
Containers
Amazon Elastic Container Service
(Amazon ECS)
Amazon Elastic Kubernetes Service
(Amazon EKS)
Docker Desktop Community and Docker Enterprise Engine
Micro VMs
Datadog announces GA Agent for Graviton / Arm
• Datadog is the monitoring and analytics platform for
developers, operations, and
business users in the cloud age
• The Datadog Agent is now
generally available for Graviton / Arm instances
• Users can visualize, monitor, and alert on metrics, traces, and logs from these new instance types
CrowdStrike with AWS Graviton
PoC Partner Increased demand
for Arm64
Workload Protection EC2 A1 and other EC2
workloads
Ease of Deployment No reboots, scans Visibility
AWS environments, accounts and instances
Asset inventory Contextual metadata
Interested? Visit Booth #3134 to learn more
Email [email protected] to become a beta customer
Workload optimization journey
I/O Accelerator Card Nitro Card AWS Graviton, Inferentia
AWS Graviton2 processor
AWS Graviton2 processor
4X 5X 7X
6 New instances powered by AWS Graviton2 Processor
General Purpose
4GB DRAM/vCPU
M6g M6gd
Compute Optimized
2GB DRAM/vCPU
C6g C6gd
Memory Optimized
8GB DRAM/vCPU
R6g R6gd
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leap to Graviton2
AWS Graviton processor
• First Arm processor in AWS
• First-class citizen in EC2
• 16nm
• ~5 Billion transistors
AWS Graviton2 processor
• 4x the vCPUs
• 7x CPU performance
• ~2x performance/vCPU
• 7nm
• ~30 Billion transistors
Graviton2 - Cores
• Arm Neoverse N1 cores
• Arm v8.2 compliant
• Worked closely with Arm on creation of N1
• Large 64KB L1 caches and 1MB L2 cache/vCPU
• Coherent Instruction cache
• Lower overheads of interrupts, virtualization, and context switching
• 4-wide front-end, with 8-wide dispatch/issue
• Dual-SIMD units
• Instructions to accelerate ML inference: int8, fp16
• Every vCPU is a physical core
• No simultaneous multithreading (SMT)
Graviton2 - Interconnect
• 64 cores connected together with a mesh
• ~2TB/s bisection bandwidth
• 32MB LLC
• With the private caches over 100MB of user-accessible caches
• No NUMA concerns
• Every core sees the same path to memory and to other cores
• 64 lanes of PCIe gen4
• Provide flexibility for different instance configurations
Graviton2 - System
• 8x DDR4-3200 channels over 200GB/s
• Always AES-256 encrypted DRAM with ephemeral key
• Uniform memory latency from all CPU cores
• 1Tbit/s of compression accelerators
• 2xlarge and larger instances will have a compression device
• DPDK and Linux kernel drivers will be available ahead of GA
• Data compression at up to 15GB/s and decompression at up to 11GB/s
Graviton2 powered instances
Nitro Hypervisor Nitro Card
Nitro Security Chip Graviton2
Lightweight hypervisor Memory and CPU allocation Bare Metal-like performance Amazon Elastic Block Store,
Elastic Network Adapter Monitoring, and security Integrated into motherboard
Protects hardware resources Industry leading
performance
Broadening workloads and target applications
Amazon EC2 M6g: sizes and specifications
Instance vCPUs Memory (GB)
Network Bandwidth
(Gbps)
EBS
Optimized EBS Bandwidth (Mbps)
EBS Optimized Burst Bandwidth
(Mbps)
m6g.medium 1 4 Up to 10 Yes 315 4,750
m6g.large 2 8 Up to 10 Yes 630 4,750
m6g.xlarge 4 16 Up to 10 Yes 1,188 4,750
m6g.2xlarge 8 32 Up to 10 Yes 2,375 4,750
m6g.4xlarge 16 64 Up to 10 Yes 4,750 4,750
m6g.8xlarge 32 128 12Gbps Yes 9,000 9,000
m6g.12xlarge 48 192 20Gbps Yes 13,500 13,500
m6g.16xlarge 64 256 25Gbps Yes 18,000 18,000
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SPEC cpu2017
• Industry standard CPU intensive benchmark
• Run on all vCPUs concurrently
• Comparing performance/vCPU
* All SPEC scores estimates, compiled with GCC9 -O3 -march=native, run on largest single socket size for each instance type tested.
40%
60%
80%
100%
120%
140%
160%
SPECint2017 Rate SPECfp2017 rate
Performance/vCPU
SPECcpu2017 Rate*
M5 M6G
Load Balancing with Nginx
40%
60%
80%
100%
120%
140%
160%
Performance (Requests/s)
M5 M6G
Load Balancer
(nginx)
NGINX v1.15.9, 512 clients, 128 GET/POST payloads, all HTTPS connections, AES128-GCM-SHA256, OpenSSL 1.1.1, 4 target machines, all tests run on 4xl size; load generator c5.9xl; web servers c5.4xls;
All servers run in a cluster placement group
Memcached
40%
60%
80%
100%
120%
140%
160%
Performance (Requests/s)
Throughput
M5 M6G
Memcached
Memcached v1.5.16, 16B keys, 128B values, 7.8M KV-pairs, 576 connections for load generation from 2x c5.9xlarge instances, 16 additional connections used to measure latency from 1 additional c5.9xlarge,;each connection maintains 4096 outstanding requests; All servers in a cluster placement group
Memcached
Memcached
Memcached v1.5.16, 16B keys, 128B values, 7.8M KV-pairs, 576 connections for load generation from 2x c5.9xlarge instances, 16 additional connections used to measure latency from 1 additional c5.9xlarge, each connection maintains 4 outstanding requests; all servers in a cluster placement group
0 100 200 300 400 500 600 700 800 900 1000
0 100 200 300 400 500 600 700 800
99th Percentile Latency (μs)
Throughput (KRPS)
m5.2xlarge m6g.2xlarge
Media Encoding with x264
• Huge amount of video created daily
• Encoding it reduces bandwidth to deliver and storage of that video
• Using libx264 encoder encoded uncompressed 1080p to h264
40%
60%
80%
100%
120%
140%
160%
Performance (Frames/s)
M5 M6G
Machine Learning
• BERT: The state-of-the-art text encoder
• Feature representation (encoding) is crucial for ML
• Deep neural network for text feature representation
• Graviton2 has fp16 and int8 support to accelerate Machine Learning workloads
• M6g can outperform M5
• M5 with AVX-512 is limited to FP32
• With FP16 support M6g performs better for CPU based inference
40%
60%
80%
100%
120%
140%
160%
Improvement (Inference Latency)
M5 M6G
EDA Performance – Arm and Cadence Xcellium
• Chip development is expensive
• Design needs to be right the first time
• Arm uses millions of hours of CPU time per month to simulate their processor designs
• Demand is highly variable depending on phase of the project
• Perfect use-case for Amazon EC2
• Simulated Arm Cortex-A53 using Cadence Xcellium
• 570 validation simulations on the the DPU RTL of the processor
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
Runtime (hours)
Total Runtime
m5.24xlarge m6g.16xlarge
EDA Performance – Arm and Cadence Xcellium
• Chip development is expensive
• Design needs to be right the first time
• Arm uses millions of hours of CPU time per month to simulate their processor designs
• Demand is highly variable depending on phase of the project
• Perfect use-case for Amazon EC2
• Simulated Arm Cortex-A53 using Cadence Xcellium
• 570 validation simulations on the the DPU RTL of the processor
40%
60%
80%
100%
120%
140%
160%
Performance/vCPU
m5.24xlarge m6g.16xlarge
AWS Graviton delivers significantly better performance per vCPU for EDA
How about a database workload?
• Scylla is a high-throughput, low latency Big Data database
• Thread-per-core architecture guarantees full CPU utilization
• I/O Scheduler guarantees peak I/O throughput
• Can easily reach AWS I3’s 15GB/s bandwidth limit
• Can you use Scylla on A1 instance?
• Yes. It works, and it is supported
• But doesn’t reach peak performance
• M6g instances change the game
• All CPU and memory-bound workloads are supported
• 4xlarge 37.5k reads/s/core; 5x improvement over A1!
• 64 vCPU theoretical limit around 2.4M reads/s
• Amazon EBS is still instance storage
M6gd completes the story
With a preview version of M6gd
Over 5GB/s of disk bandwidth Close to 1M IOPS
Enough for demanding storage workloads
Scylla welcomes the M6g series: ready for NoSQL
Lower TCO with Graviton2 powered instances
20% lower cost vs M5 Up to 40% better
price/performance vs comparable instances Highest performance in their
instance families
AWS Services Supporting Multiple Architectures
Offering M6g soon Transitioning to Graviton2
Amazon ElastiCache
Amazon EMR
Elastic Load Balancing
More coming soon!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
About Nielsen
Digital Collections AWS Migration Collections ROI Problem
The Graviton Chapter What’s Next
About Nielsen
TV Measurement
Our TV Ratings presence Measuring for 90 years
Media is traded based on Nielsen ratings
Provide independent 3rd party measurement
Digital Measurement
Measure how people consume media outside TV Measure across screens, Phones, Smart TV, etc Measure how people interact with digital media Provide insights on how consumers buy media
Cable Drama Episode
Gross Avg. Audience Projection
Cross-platform viewership totals that illustrate the full picture
35%
1%
35%
2%
8%
5%
5%
3%
6%
Digital Collections AWS Migration
• What
Collects 10+ Billion requests/day.
Collects 2.5 TB of data/day Currency Critical Platform
24/7 platform with zero downtime
• How
Successfully migrated in 2 months
Fully Automated Blue-Green deployments Dynamic Scaling and RI’s
Immutable Infrastructure
• Why
Large dynamic loads
Controlled blast radius via separate accounts Auto recovery & self healing
Support fast product launches
• Results
Rock solid for 1.5 years
SQS + S3 reduced latency from 120 to 20 ms New region rollouts in less than 2 hours
Agility to support new business initiatives
Collections AWS Migration
Collections ROI Problem
Challenge
Underutilized M5 fleets.
Less than 35% CPU utilization
Less than 57% Memory utilization Less than optimal ROI
2xLarge 8 Cores
32 GB RAM
$0.384/Hr
The Graviton Chapter
2018 re:Invent
A1 is born
46% cheaper than M5 ARM based platform
Ideal for scaleout architectures
2xLarge 8 Cores
16 GB RAM
$0.204/Hr
2 Month Dash
Ported 3rd party monitoring software to A1
Optimized app architecture to run under 13 GB Tested all our memory optimizations
Performed 24/7 dynamic load tests Adopted A1 where available
The Graviton Chapter
Highlights
8 months in production, rock solid performance Running in 4 regions
Fleet of 120 servers (RI & OnDemand)
60% CPU utilization, significant improvement over M5 Achieved substantial cost efficiency
Load Tests (2018) M5 (20ms) A1 (20ms)
Price $0.384/Hr $0.204/Hr
RPS 2200 1800
CPU 30% 70%
Memory 56% 80%
2xLarge 8 Cores
16 GB RAM
$0.204/Hr
What’s Next
M6g Next Generation Graviton
2x performance
suited for demanding loads consistent performance
Load Tests (2019) A1 M6g
RPS 3500 6900 (~2x)
CPU 70% 64%
Memory 80% 56%
What’s Next
Our Journey
Vendor
Technology provider
Collaboration
Partnership
Summary
• AWS Graviton2 processors deliver 7x performance, 4x compute cores, and 5x faster memory versus first gen Graviton processors
• New C6g, M6g, and R6g instances (and disk variants) deliver up to 40%
better price/perf vs comparable instances
• M6g instances are now in preview. Graviton2 instances will be GA in the coming months.
• Learn more and sign up for preview:
https://aws.amazon.com/ec2/instance-types/m6
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sudhir Raman
[email protected] Ali Saidi
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.