Deep dive on new 6th generation Amazon EC2 instances powered by the AWS Graviton2 processor

(1)

(2)

Deep dive on new 6th generation Amazon EC2 instances powered by the AWS Graviton2

processor

C M P 3 2 2 - R

Sudhir Raman

Sr Product Manager EC2 AWS

Ali Saidi

Principal Engineer AWS

Sudhir Kakulla Vice President Nielsen

(3)

Agenda

Amazon Elastic Compute Cloud (Amazon EC2) A1 instances and AWS Graviton processor

Customer success stories and Arm Software ecosystem momentum Next-gen AWS Graviton2 processors: deep dive

New 6^th gen Amazon EC2 instances Customer walk-on: Nielsen

Key takeaways

(4)

Categories Capabilities Options

Broadest and deepest platform choice

General purpose Burstable

Compute intensive Memory intensive Storage (High I/O)

Dense storage GPU compute Graphics intensive

Elastic Block Store

Elastic Inference

270 +

for virtually every workload and business need Choice of processor

(AWS, Intel, AMD)

Fast processors

(up to 4.0 GHz)

High memory footprint

(up to 12 TiB)

Instance storage

(HDD and NVMe)

Accelerated computing

(GPUs and FPGA)

Networking

(up to 100 Gbps)

Bare Metal Size

(Nano to 32xlarge)

Innovating in the core platform

(5)

Broadest choice of processors

(6)

AWS Graviton processor (first gen)

Custom AWS silicon with 64-bit Arm Neoverse cores

Rapidly innovate, build, and iterate on behalf of customers

Targeted optimizations for cloud-native workloads

(7)

First instance powered by AWS Graviton processor

Optimized cost and performance for scale-out applications

Significant cost savings AWS Graviton Processor with 64-bit Arm Neoverse

cores and custom AWS silicon

Amazon EC2 A1

Run scale-out and Arm-based workloads in the cloud

(8)

Applications Configurations Availability

Scale-out workloads

Web tier

Containerized microservices Arm-based software development

6 instance sizes

Up to 16 vCPUs, 32GiB memory Up to 10 Gbps NW, 3.5 Gbps EBS

Bare metal options

Amazon EC2 A1 Instances

9 Regions

US (N. Virginia, Oregon, Ohio) EU (Ireland, Frankfurt)

APAC (Mumbai, Sydney, Tokyo, Singapore)

(9)

(10)

Customer success stories

over 250 million registered users 58 million downloads

A1 instances

40% cost savings

(11)

Customer success stories

Network Acceleration (Hybrid Infra)

Singapore Cluster N. Virginia Cluster

Tokyo Cluster

Chat Payment

Log DB

Games _proxy1 _proxy2

game1 game2 game3

manager

… …

… … … …

Southeast Asian User East Asian User EU&NA User

(12)

Customer success stories

Photo-serving tier on A1 instances PHP, Nginx, HAProxy

40% cost savings, seamless transition

Web traffic

Nginx, .NET core, Docker Significant savings

Data processing, market insights OpenJDK/Java

20% cost savings

K3s, Kubernetes for the edge on Arm CI infrastructure on A1 instances

No cross-compilation/emulation

(13)

Arm software ecosystem for Graviton instances

Tools and software

AWS

Marketplace AWS Systems

Manager Amazon

CloudWatch

AWS CodePipeline AWS Cloud9

AWS CodeCommit

Amazon

Inspector AWS Batch

AWS CodeBuild

Amazon Corretto OpenJDK

OSVs and ISVs

Amazon Linux 2

Red Hat Enterprise Linux 7.6 and newer

SUSE Linux Enterprise Server 15

+ Fedora, Debian, FreeBSD and more…

NGINX+ for Arm64 in AWS Marketplace

VMware ESXi + bare metal

Cadence Liberate Trio Characterization Suite Mentor's Questa Sim Supports Arm 64-Bit

Ubuntu 16.04LTS, 18.04LTS and newer

Containers

Amazon Elastic Container Service

(Amazon ECS)

Amazon Elastic Kubernetes Service

(Amazon EKS)

Docker Desktop Community and Docker Enterprise Engine

Micro VMs

(14)

Datadog announces GA Agent for Graviton / Arm

• Datadog is the monitoring and analytics platform for

developers, operations, and

business users in the cloud age

• The Datadog Agent is now

generally available for Graviton / Arm instances

• Users can visualize, monitor, and alert on metrics, traces, and logs from these new instance types

(15)

CrowdStrike with AWS Graviton

PoC Partner Increased demand

for Arm64

Workload Protection EC2 A1 and other EC2

workloads

Ease of Deployment No reboots, scans Visibility

AWS environments, accounts and instances

Asset inventory Contextual metadata

Interested? Visit Booth #3134 to learn more

Email [email protected] to become a beta customer

(16)

Workload optimization journey

I/O Accelerator Card Nitro Card AWS Graviton, Inferentia

(17)

(18)

AWS Graviton2 processor

AWS Graviton2 processor

4X 5X 7X

(19)

6 New instances powered by AWS Graviton2 Processor

General Purpose

4GB DRAM/vCPU

M6g M6gd

Compute Optimized

2GB DRAM/vCPU

C6g C6gd

Memory Optimized

8GB DRAM/vCPU

R6g R6gd

(20)

(21)

Leap to Graviton2

AWS Graviton processor

• First Arm processor in AWS

• First-class citizen in EC2

• 16nm

• ~5 Billion transistors

AWS Graviton2 processor

• 4x the vCPUs

• 7x CPU performance

• ~2x performance/vCPU

• 7nm

• ~30 Billion transistors

(22)

Graviton2 - Cores

• Arm Neoverse N1 cores

• Arm v8.2 compliant

• Worked closely with Arm on creation of N1

• Large 64KB L1 caches and 1MB L2 cache/vCPU

• Coherent Instruction cache

• Lower overheads of interrupts, virtualization, and context switching

• 4-wide front-end, with 8-wide dispatch/issue

• Dual-SIMD units

• Instructions to accelerate ML inference: int8, fp16

• Every vCPU is a physical core

• No simultaneous multithreading (SMT)

(23)

Graviton2 - Interconnect

• 64 cores connected together with a mesh

• ~2TB/s bisection bandwidth

• 32MB LLC

• With the private caches over 100MB of user-accessible caches

• No NUMA concerns

• Every core sees the same path to memory and to other cores

• 64 lanes of PCIe gen4

• Provide flexibility for different instance configurations

(24)

Graviton2 - System

• 8x DDR4-3200 channels  over 200GB/s

• Always AES-256 encrypted DRAM with ephemeral key

• Uniform memory latency from all CPU cores

• 1Tbit/s of compression accelerators

• 2xlarge and larger instances will have a compression device

• DPDK and Linux kernel drivers will be available ahead of GA

• Data compression at up to 15GB/s and decompression at up to 11GB/s

(25)

Graviton2 powered instances

Nitro Hypervisor Nitro Card

Nitro Security Chip Graviton2

Lightweight hypervisor Memory and CPU allocation Bare Metal-like performance Amazon Elastic Block Store,

Elastic Network Adapter Monitoring, and security Integrated into motherboard

Protects hardware resources Industry leading

performance

(26)

Broadening workloads and target applications

(27)

Amazon EC2 M6g: sizes and specifications

Instance vCPUs Memory (GB)

Network Bandwidth

(Gbps)

EBS

Optimized EBS Bandwidth (Mbps)

EBS Optimized Burst Bandwidth

(Mbps)

m6g.medium 1 4 Up to 10 Yes 315 4,750

m6g.large 2 8 Up to 10 Yes 630 4,750

m6g.xlarge 4 16 Up to 10 Yes 1,188 4,750

m6g.2xlarge 8 32 Up to 10 Yes 2,375 4,750

m6g.4xlarge 16 64 Up to 10 Yes 4,750 4,750

m6g.8xlarge 32 128 12Gbps Yes 9,000 9,000

(28)

(29)

SPEC cpu2017

• Industry standard CPU intensive benchmark

• Run on all vCPUs concurrently

• Comparing performance/vCPU

* All SPEC scores estimates, compiled with GCC9 -O3 -march=native, run on largest single socket size for each instance type tested.

40%

60%

80%

100%

120%

140%

160%

SPECint2017 Rate SPECfp2017 rate

Performance/vCPU

SPECcpu2017 Rate*

M5 M6G

(30)

Load Balancing with Nginx

40%

60%

80%

100%

120%

140%

160%

Performance (Requests/s)

M5 M6G

Load Balancer

(nginx)

NGINX v1.15.9, 512 clients, 128 GET/POST payloads, all HTTPS connections, AES128-GCM-SHA256, OpenSSL 1.1.1, 4 target machines, all tests run on 4xl size; load generator c5.9xl; web servers c5.4xls;

All servers run in a cluster placement group

(31)

Memcached

40%

60%

80%

100%

120%

140%

160%

Performance (Requests/s)

Throughput

M5 M6G

Memcached

Memcached v1.5.16, 16B keys, 128B values, 7.8M KV-pairs, 576 connections for load generation from 2x c5.9xlarge instances, 16 additional connections used to measure latency from 1 additional c5.9xlarge,;each connection maintains 4096 outstanding requests; All servers in a cluster placement group

(32)

Memcached

Memcached v1.5.16, 16B keys, 128B values, 7.8M KV-pairs, 576 connections for load generation from 2x c5.9xlarge instances, 16 additional connections used to measure latency from 1 additional c5.9xlarge, each connection maintains 4 outstanding requests; all servers in a cluster placement group

0 100 200 300 400 500 600 700 800 900 1000

0 100 200 300 400 500 600 700 800

99th Percentile Latency (μs)

Throughput (KRPS)

m5.2xlarge m6g.2xlarge

(33)

Media Encoding with x264

• Huge amount of video created daily

• Encoding it reduces bandwidth to deliver and storage of that video

• Using libx264 encoder encoded uncompressed 1080p to h264

40%

60%

80%

100%

120%

140%

160%

Performance (Frames/s)

M5 M6G

(34)

Machine Learning

• BERT: The state-of-the-art text encoder

• Feature representation (encoding) is crucial for ML

• Deep neural network for text feature representation

• Graviton2 has fp16 and int8 support to accelerate Machine Learning workloads

• M6g can outperform M5

• M5 with AVX-512 is limited to FP32

• With FP16 support M6g performs better for CPU based inference

40%

60%

80%

100%

120%

140%

160%

Improvement (Inference Latency)

M5 M6G

(35)

EDA Performance – Arm and Cadence Xcellium

• Chip development is expensive

• Design needs to be right the first time

• Arm uses millions of hours of CPU time per month to simulate their processor designs

• Demand is highly variable depending on phase of the project

• Perfect use-case for Amazon EC2

• Simulated Arm Cortex-A53 using Cadence Xcellium

• 570 validation simulations on the the DPU RTL of the processor

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00

Runtime (hours)

Total Runtime

(36)

EDA Performance – Arm and Cadence Xcellium

• Chip development is expensive

• Design needs to be right the first time

• Arm uses millions of hours of CPU time per month to simulate their processor designs

• Demand is highly variable depending on phase of the project

• Perfect use-case for Amazon EC2

• Simulated Arm Cortex-A53 using Cadence Xcellium

• 570 validation simulations on the the DPU RTL of the processor

40%

60%

80%

100%

120%

140%

160%

Performance/vCPU

AWS Graviton delivers significantly better performance per vCPU for EDA

(37)

How about a database workload?

• Scylla is a high-throughput, low latency Big Data database

• Thread-per-core architecture guarantees full CPU utilization

• I/O Scheduler guarantees peak I/O throughput

• Can easily reach AWS I3’s 15GB/s bandwidth limit

• Can you use Scylla on A1 instance?

• Yes. It works, and it is supported

• But doesn’t reach peak performance

• M6g instances change the game

• All CPU and memory-bound workloads are supported

• 4xlarge  37.5k reads/s/core; 5x improvement over A1!

• 64 vCPU theoretical limit around 2.4M reads/s

• Amazon EBS is still instance storage

(38)

M6gd completes the story

With a preview version of M6gd

Over 5GB/s of disk bandwidth Close to 1M IOPS

Enough for demanding storage workloads

Scylla welcomes the M6g series: ready for NoSQL

(39)

Lower TCO with Graviton2 powered instances

20% lower cost vs M5 Up to 40% ^better

price/performance vs comparable instances Highest performance in their

instance families

(40)

AWS Services Supporting Multiple Architectures

Offering M6g soon Transitioning to Graviton2

Amazon ElastiCache

Amazon EMR

Elastic Load Balancing

More coming soon!

(41)

(42)

Agenda

About Nielsen

Digital Collections AWS Migration Collections ROI Problem

The Graviton Chapter What’s Next

(43)

About Nielsen

TV Measurement

Our TV Ratings presence Measuring for 90 years

Media is traded based on Nielsen ratings

Provide independent 3rd party measurement

Digital Measurement

Measure how people consume media outside TV Measure across screens, Phones, Smart TV, etc Measure how people interact with digital media Provide insights on how consumers buy media

Cable Drama Episode

Gross Avg. Audience Projection

Cross-platform viewership totals that illustrate the full picture

35%

1%

35%

2%

8%

5%

3%

6%

(44)

Digital Collections AWS Migration

• What

Collects 10+ Billion requests/day.

Collects 2.5 TB of data/day Currency Critical Platform

24/7 platform with zero downtime

• How

Successfully migrated in 2 months

Fully Automated Blue-Green deployments Dynamic Scaling and RI’s

Immutable Infrastructure

• Why

Large dynamic loads

Controlled blast radius via separate accounts Auto recovery & self healing

Support fast product launches

• Results

Rock solid for 1.5 years

SQS + S3 reduced latency from 120 to 20 ms New region rollouts in less than 2 hours

Agility to support new business initiatives

(45)

Collections AWS Migration

(46)

Collections ROI Problem

Challenge

Underutilized M5 fleets.

Less than 35% CPU utilization

Less than 57% Memory utilization Less than optimal ROI

2xLarge 8 Cores

32 GB RAM

$0.384/Hr

(47)

The Graviton Chapter

2018 re:Invent

A1 is born

46% cheaper than M5 ARM based platform

Ideal for scaleout architectures

2xLarge 8 Cores

16 GB RAM

$0.204/Hr

2 Month Dash

Ported 3rd party monitoring software to A1

Optimized app architecture to run under 13 GB Tested all our memory optimizations

Performed 24/7 dynamic load tests Adopted A1 where available

(48)

The Graviton Chapter

Highlights

8 months in production, rock solid performance Running in 4 regions

Fleet of 120 servers (RI & OnDemand)

60% CPU utilization, significant improvement over M5 Achieved substantial cost efficiency

Load Tests (2018) M5 (20ms) A1 (20ms)

Price $0.384/Hr $0.204/Hr

RPS 2200 1800

CPU 30% 70%

Memory 56% 80%

2xLarge 8 Cores

16 GB RAM

$0.204/Hr

(49)

What’s Next

M6g Next Generation Graviton

2x performance

suited for demanding loads consistent performance

Load Tests (2019) A1 M6g

RPS 3500 6900 (~2x)

CPU 70% 64%

Memory 80% 56%

(50)

What’s Next

Our Journey

Vendor

Technology provider

Collaboration

Partnership

(51)

Summary

• AWS Graviton2 processors deliver 7x performance, 4x compute cores, and 5x faster memory versus first gen Graviton processors

• New C6g, M6g, and R6g instances (and disk variants) deliver up to 40%

better price/perf vs comparable instances

• M6g instances are now in preview. Graviton2 instances will be GA in the coming months.

• Learn more and sign up for preview:

https://aws.amazon.com/ec2/instance-types/m6

(52)

Thank you!

Sudhir Raman

[email protected] Ali Saidi

[email protected]

(53)