• No results found

Deep dive on new 6th generation Amazon EC2 instances powered by the AWS Graviton2 processor

N/A
N/A
Protected

Academic year: 2021

Share "Deep dive on new 6th generation Amazon EC2 instances powered by the AWS Graviton2 processor"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Deep dive on new 6th generation Amazon EC2 instances powered by the AWS Graviton2

processor

C M P 3 2 2 - R

Sudhir Raman

Sr Product Manager EC2 AWS

Ali Saidi

Principal Engineer AWS

Sudhir Kakulla Vice President Nielsen

(3)

Agenda

Amazon Elastic Compute Cloud (Amazon EC2) A1 instances and AWS Graviton processor

Customer success stories and Arm Software ecosystem momentum Next-gen AWS Graviton2 processors: deep dive

New 6th gen Amazon EC2 instances Customer walk-on: Nielsen

Key takeaways

(4)

Categories Capabilities Options

Broadest and deepest platform choice

General purpose Burstable

Compute intensive Memory intensive Storage (High I/O)

Dense storage GPU compute Graphics intensive

Elastic Block Store

Elastic Inference

270 +

for virtually every workload and business need Choice of processor

(AWS, Intel, AMD)

Fast processors

(up to 4.0 GHz)

High memory footprint

(up to 12 TiB)

Instance storage

(HDD and NVMe)

Accelerated computing

(GPUs and FPGA)

Networking

(up to 100 Gbps)

Bare Metal Size

(Nano to 32xlarge)

Innovating in the core platform

(5)

Broadest choice of processors

(6)

AWS Graviton processor (first gen)

Custom AWS silicon with 64-bit Arm Neoverse cores

Rapidly innovate, build, and iterate on behalf of customers

Targeted optimizations for cloud-native workloads

(7)

First instance powered by AWS Graviton processor

Optimized cost and performance for scale-out applications

Significant cost savings AWS Graviton Processor with 64-bit Arm Neoverse

cores and custom AWS silicon

Amazon EC2 A1

Run scale-out and Arm-based workloads in the cloud

(8)

Applications Configurations Availability

Scale-out workloads

Web tier

Containerized microservices Arm-based software development

6 instance sizes

Up to 16 vCPUs, 32GiB memory Up to 10 Gbps NW, 3.5 Gbps EBS

Bare metal options

Amazon EC2 A1 Instances

9 Regions

US (N. Virginia, Oregon, Ohio) EU (Ireland, Frankfurt)

APAC (Mumbai, Sydney, Tokyo, Singapore)

(9)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

(10)

Customer success stories

over 250 million registered users 58 million downloads

A1 instances

40% cost savings

(11)

Customer success stories

Network Acceleration (Hybrid Infra)

Singapore Cluster N. Virginia Cluster

Tokyo Cluster

Chat Payment

Log DB

Games proxy1 proxy2

game1 game2 game3

manager

… …

… … … …

Southeast Asian User East Asian User EU&NA User

(12)

Customer success stories

Photo-serving tier on A1 instances PHP, Nginx, HAProxy

40% cost savings, seamless transition

Web traffic

Nginx, .NET core, Docker Significant savings

Data processing, market insights OpenJDK/Java

20% cost savings

K3s, Kubernetes for the edge on Arm CI infrastructure on A1 instances

No cross-compilation/emulation

(13)

Arm software ecosystem for Graviton instances

Tools and software

AWS

Marketplace AWS Systems

Manager Amazon

CloudWatch

AWS CodePipeline AWS Cloud9

AWS CodeCommit

Amazon

Inspector AWS Batch

AWS CodeBuild

Amazon Corretto OpenJDK

OSVs and ISVs

Amazon Linux 2

Red Hat Enterprise Linux 7.6 and newer

SUSE Linux Enterprise Server 15

+ Fedora, Debian, FreeBSD and more…

NGINX+ for Arm64 in AWS Marketplace

VMware ESXi + bare metal

Cadence Liberate Trio Characterization Suite Mentor's Questa Sim Supports Arm 64-Bit

Ubuntu 16.04LTS, 18.04LTS and newer

Containers

Amazon Elastic Container Service

(Amazon ECS)

Amazon Elastic Kubernetes Service

(Amazon EKS)

Docker Desktop Community and Docker Enterprise Engine

Micro VMs

(14)

Datadog announces GA Agent for Graviton / Arm

Datadog is the monitoring and analytics platform for

developers, operations, and

business users in the cloud age

The Datadog Agent is now

generally available for Graviton / Arm instances

Users can visualize, monitor, and alert on metrics, traces, and logs from these new instance types

(15)

CrowdStrike with AWS Graviton

PoC Partner Increased demand

for Arm64

Workload Protection EC2 A1 and other EC2

workloads

Ease of Deployment No reboots, scans Visibility

AWS environments, accounts and instances

Asset inventory Contextual metadata

Interested? Visit Booth #3134 to learn more

Email [email protected] to become a beta customer

(16)

Workload optimization journey

I/O Accelerator Card Nitro Card AWS Graviton, Inferentia

(17)
(18)

AWS Graviton2 processor

AWS Graviton2 processor

4X 5X 7X

(19)

6 New instances powered by AWS Graviton2 Processor

General Purpose

4GB DRAM/vCPU

M6g M6gd

Compute Optimized

2GB DRAM/vCPU

C6g C6gd

Memory Optimized

8GB DRAM/vCPU

R6g R6gd

(20)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

(21)

Leap to Graviton2

AWS Graviton processor

First Arm processor in AWS

First-class citizen in EC2

16nm

~5 Billion transistors

AWS Graviton2 processor

4x the vCPUs

7x CPU performance

~2x performance/vCPU

7nm

~30 Billion transistors

(22)

Graviton2 - Cores

Arm Neoverse N1 cores

Arm v8.2 compliant

Worked closely with Arm on creation of N1

Large 64KB L1 caches and 1MB L2 cache/vCPU

Coherent Instruction cache

Lower overheads of interrupts, virtualization, and context switching

4-wide front-end, with 8-wide dispatch/issue

Dual-SIMD units

Instructions to accelerate ML inference: int8, fp16

Every vCPU is a physical core

No simultaneous multithreading (SMT)

(23)

Graviton2 - Interconnect

64 cores connected together with a mesh

~2TB/s bisection bandwidth

32MB LLC

With the private caches over 100MB of user-accessible caches

No NUMA concerns

Every core sees the same path to memory and to other cores

64 lanes of PCIe gen4

Provide flexibility for different instance configurations

(24)

Graviton2 - System

8x DDR4-3200 channels  over 200GB/s

Always AES-256 encrypted DRAM with ephemeral key

Uniform memory latency from all CPU cores

1Tbit/s of compression accelerators

2xlarge and larger instances will have a compression device

DPDK and Linux kernel drivers will be available ahead of GA

Data compression at up to 15GB/s and decompression at up to 11GB/s

(25)

Graviton2 powered instances

Nitro Hypervisor Nitro Card

Nitro Security Chip Graviton2

Lightweight hypervisor Memory and CPU allocation Bare Metal-like performance Amazon Elastic Block Store,

Elastic Network Adapter Monitoring, and security Integrated into motherboard

Protects hardware resources Industry leading

performance

(26)

Broadening workloads and target applications

(27)

Amazon EC2 M6g: sizes and specifications

Instance vCPUs Memory (GB)

Network Bandwidth

(Gbps)

EBS

Optimized EBS Bandwidth (Mbps)

EBS Optimized Burst Bandwidth

(Mbps)

m6g.medium 1 4 Up to 10 Yes 315 4,750

m6g.large 2 8 Up to 10 Yes 630 4,750

m6g.xlarge 4 16 Up to 10 Yes 1,188 4,750

m6g.2xlarge 8 32 Up to 10 Yes 2,375 4,750

m6g.4xlarge 16 64 Up to 10 Yes 4,750 4,750

m6g.8xlarge 32 128 12Gbps Yes 9,000 9,000

m6g.12xlarge 48 192 20Gbps Yes 13,500 13,500

m6g.16xlarge 64 256 25Gbps Yes 18,000 18,000

(28)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

(29)

SPEC cpu2017

Industry standard CPU intensive benchmark

Run on all vCPUs concurrently

Comparing performance/vCPU

* All SPEC scores estimates, compiled with GCC9 -O3 -march=native, run on largest single socket size for each instance type tested.

40%

60%

80%

100%

120%

140%

160%

SPECint2017 Rate SPECfp2017 rate

Performance/vCPU

SPECcpu2017 Rate*

M5 M6G

(30)

Load Balancing with Nginx

40%

60%

80%

100%

120%

140%

160%

Performance (Requests/s)

M5 M6G

Load Balancer

(nginx)

NGINX v1.15.9, 512 clients, 128 GET/POST payloads, all HTTPS connections, AES128-GCM-SHA256, OpenSSL 1.1.1, 4 target machines, all tests run on 4xl size; load generator c5.9xl; web servers c5.4xls;

All servers run in a cluster placement group

(31)

Memcached

40%

60%

80%

100%

120%

140%

160%

Performance (Requests/s)

Throughput

M5 M6G

Memcached

Memcached v1.5.16, 16B keys, 128B values, 7.8M KV-pairs, 576 connections for load generation from 2x c5.9xlarge instances, 16 additional connections used to measure latency from 1 additional c5.9xlarge,;each connection maintains 4096 outstanding requests; All servers in a cluster placement group

(32)

Memcached

Memcached

Memcached v1.5.16, 16B keys, 128B values, 7.8M KV-pairs, 576 connections for load generation from 2x c5.9xlarge instances, 16 additional connections used to measure latency from 1 additional c5.9xlarge, each connection maintains 4 outstanding requests; all servers in a cluster placement group

0 100 200 300 400 500 600 700 800 900 1000

0 100 200 300 400 500 600 700 800

99th Percentile Latency (μs)

Throughput (KRPS)

m5.2xlarge m6g.2xlarge

(33)

Media Encoding with x264

Huge amount of video created daily

Encoding it reduces bandwidth to deliver and storage of that video

Using libx264 encoder encoded uncompressed 1080p to h264

40%

60%

80%

100%

120%

140%

160%

Performance (Frames/s)

M5 M6G

(34)

Machine Learning

BERT: The state-of-the-art text encoder

Feature representation (encoding) is crucial for ML

Deep neural network for text feature representation

Graviton2 has fp16 and int8 support to accelerate Machine Learning workloads

M6g can outperform M5

M5 with AVX-512 is limited to FP32

With FP16 support M6g performs better for CPU based inference

40%

60%

80%

100%

120%

140%

160%

Improvement (Inference Latency)

M5 M6G

(35)

EDA Performance – Arm and Cadence Xcellium

Chip development is expensive

Design needs to be right the first time

Arm uses millions of hours of CPU time per month to simulate their processor designs

Demand is highly variable depending on phase of the project

Perfect use-case for Amazon EC2

Simulated Arm Cortex-A53 using Cadence Xcellium

570 validation simulations on the the DPU RTL of the processor

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00

Runtime (hours)

Total Runtime

m5.24xlarge m6g.16xlarge

(36)

EDA Performance – Arm and Cadence Xcellium

Chip development is expensive

Design needs to be right the first time

Arm uses millions of hours of CPU time per month to simulate their processor designs

Demand is highly variable depending on phase of the project

Perfect use-case for Amazon EC2

Simulated Arm Cortex-A53 using Cadence Xcellium

570 validation simulations on the the DPU RTL of the processor

40%

60%

80%

100%

120%

140%

160%

Performance/vCPU

m5.24xlarge m6g.16xlarge

AWS Graviton delivers significantly better performance per vCPU for EDA

(37)

How about a database workload?

• Scylla is a high-throughput, low latency Big Data database

• Thread-per-core architecture guarantees full CPU utilization

• I/O Scheduler guarantees peak I/O throughput

• Can easily reach AWS I3’s 15GB/s bandwidth limit

• Can you use Scylla on A1 instance?

Yes. It works, and it is supported

But doesn’t reach peak performance

• M6g instances change the game

All CPU and memory-bound workloads are supported

4xlarge  37.5k reads/s/core; 5x improvement over A1!

64 vCPU theoretical limit around 2.4M reads/s

Amazon EBS is still instance storage

(38)

M6gd completes the story

With a preview version of M6gd

Over 5GB/s of disk bandwidth Close to 1M IOPS

Enough for demanding storage workloads

Scylla welcomes the M6g series: ready for NoSQL

(39)

Lower TCO with Graviton2 powered instances

20% lower cost vs M5 Up to 40% better

price/performance vs comparable instances Highest performance in their

instance families

(40)

AWS Services Supporting Multiple Architectures

Offering M6g soon Transitioning to Graviton2

Amazon ElastiCache

Amazon EMR

Elastic Load Balancing

More coming soon!

(41)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

(42)

Agenda

About Nielsen

Digital Collections AWS Migration Collections ROI Problem

The Graviton Chapter What’s Next

(43)

About Nielsen

TV Measurement

Our TV Ratings presence Measuring for 90 years

Media is traded based on Nielsen ratings

Provide independent 3rd party measurement

Digital Measurement

Measure how people consume media outside TV Measure across screens, Phones, Smart TV, etc Measure how people interact with digital media Provide insights on how consumers buy media

Cable Drama Episode

Gross Avg. Audience Projection

Cross-platform viewership totals that illustrate the full picture

35%

1%

35%

2%

8%

5%

5%

3%

6%

(44)

Digital Collections AWS Migration

What

Collects 10+ Billion requests/day.

Collects 2.5 TB of data/day Currency Critical Platform

24/7 platform with zero downtime

How

Successfully migrated in 2 months

Fully Automated Blue-Green deployments Dynamic Scaling and RI’s

Immutable Infrastructure

Why

Large dynamic loads

Controlled blast radius via separate accounts Auto recovery & self healing

Support fast product launches

Results

Rock solid for 1.5 years

SQS + S3 reduced latency from 120 to 20 ms New region rollouts in less than 2 hours

Agility to support new business initiatives

(45)

Collections AWS Migration

(46)

Collections ROI Problem

Challenge

Underutilized M5 fleets.

Less than 35% CPU utilization

Less than 57% Memory utilization Less than optimal ROI

2xLarge 8 Cores

32 GB RAM

$0.384/Hr

(47)

The Graviton Chapter

2018 re:Invent

A1 is born

46% cheaper than M5 ARM based platform

Ideal for scaleout architectures

2xLarge 8 Cores

16 GB RAM

$0.204/Hr

2 Month Dash

Ported 3rd party monitoring software to A1

Optimized app architecture to run under 13 GB Tested all our memory optimizations

Performed 24/7 dynamic load tests Adopted A1 where available

(48)

The Graviton Chapter

Highlights

8 months in production, rock solid performance Running in 4 regions

Fleet of 120 servers (RI & OnDemand)

60% CPU utilization, significant improvement over M5 Achieved substantial cost efficiency

Load Tests (2018) M5 (20ms) A1 (20ms)

Price $0.384/Hr $0.204/Hr

RPS 2200 1800

CPU 30% 70%

Memory 56% 80%

2xLarge 8 Cores

16 GB RAM

$0.204/Hr

(49)

What’s Next

M6g Next Generation Graviton

2x performance

suited for demanding loads consistent performance

Load Tests (2019) A1 M6g

RPS 3500 6900 (~2x)

CPU 70% 64%

Memory 80% 56%

(50)

What’s Next

Our Journey

Vendor

Technology provider

Collaboration

Partnership

(51)

Summary

AWS Graviton2 processors deliver 7x performance, 4x compute cores, and 5x faster memory versus first gen Graviton processors

New C6g, M6g, and R6g instances (and disk variants) deliver up to 40%

better price/perf vs comparable instances

M6g instances are now in preview. Graviton2 instances will be GA in the coming months.

Learn more and sign up for preview:

https://aws.amazon.com/ec2/instance-types/m6

(52)

Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Sudhir Raman

[email protected] Ali Saidi

[email protected]

(53)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

References

Related documents

The AWS Toolkit for Visual Studio enables you to create and configure security groups to use with Amazon Elastic Compute Cloud (Amazon EC2) instances and AWS CloudFormation.. When

security, including Introduction to Amazon GuardDuty and Deep Dive on Container Security. Learn security with AWS Training

AWS’ compute service, the Elastic Compute Cloud (EC2) offers the most diverse offering splitting its instances into nine families with 38 instance types (with an additional 15

Using AWS CloudFormation you can leverage other services such as such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (Amazon EBS), Amazon Simple

Amazon Elastic Compute Cloud EC2  EC2 = Virtual Machine  Amazon EC2: on-demand compute power  Obtain and boot new server instances in minutes  Quickly scale capacity up or

Amazon Elastic Compute Cloud (Amazon EC2)

High Availability using multiple AWS Availability Zones within a region Failover of Amazon EC2 instances (Web/App) using Amazon Auto Scaling Scalability of Amazon EC2

High Availability using multiple AWS Availability Zones within a region Failover of Amazon EC2 instances (Web/App) using Amazon Auto Scaling Scalability of Amazon EC2