Berlin Storage, Backup and Disaster Recovery in the Cloud AWS Customer Case Study: HERE Maps for Life

(1)

Berlin 2015

Storage, Backup and Disaster Recovery in the Cloud

AWS Customer Case Study: HERE „Maps for Life“

(2)

Storage, Backup and Disaster

Recovery in the Cloud

Robert Schmid, Storage Business Development, AWS

Ali Abbas, Principal Architect, HERE

Case Study: AWS Customer HERE

(3)

What we will cover in this session

• Amazon storage options

• _{Amazon Elastic File System}

• Use cases (Backup, Archive, DR)

• Customer Use Case: HERE

(4)

102% year-over-year increase in

data transfer to and from S3

(Q4 2014 vs Q4 2013, not including Amazon use)

(5)

Amazon S3

(6)

$0.03

per GB-month

$360

per TB/year

99.999999999%

durability

Amazon S3

(7)

Amazon Glacier

(8)

$0.01

per GB-month

$120

per TB/year

99.999999999%

durability

Amazon Glacier

Low-cost archiving service

3–5 hours

(9)

Amazon EBS

(10)

EBS

General Purpose (SSD)

Up to 16 TB

10,000 IOPS

Provisioned IOPS (SSD)

Up to 16 TB

20,000 IOPS

$0.10

per GB-month

$0.125

per GB-month

0.065/provisioned IOPS

(11)

(12)

Storage Gateway

Your on-ramp to AWS cloud storage:

• Back up into S3

• Archive into

Amazon Glacier

(13)

Summary: AWS Storage Options

• Object Storage (S3, Glacier)

• Elastic Block Storage (EBS)

• Storage Gateway (iSCSI, VTL)

(14)

Introducing

Amazon Elastic File System

for EC2 Instances

pilot availability later this summer

(15)

What is

EFS

?

• Fully managed file system for EC2 instances

• Provides standard file system semantics (NFSv4)

• Elastically grows to petabyte scale and shrinks elastically

• Delivers performance for a wide variety of workloads

• Highly available and durable

simple

elastic

scalable

(16)

Amazon Storage Use Cases:

(17)

Block File

Customer Data Center

Archive _Backup _{Disaster Recovery}

Backup, Archive, Disaster Recovery

AWS Cloud

Glacier S3

Colocation Data Center

DirectConnect Private Storage

for AWS Internet AWS Direct

Connect AWS SGW Storage Gateways S3 Glacier Customer /CSP Assets

(18)

18

AWS Customer Case Study

Ali Abbas

HERE: Maps for Life

Principal Architect

• High Resolution Satellite Imagery

• Predictive Analytics/Machine Learning

[email protected]

(19)

19

(20)

20

Maps for Life

Web and Mobile App

(21)

21

Offline Map

Save the maps of your country or state on your phone

Use your phone offline Explore anywhere without an internet connection

(22)

22 Unified Route Planning Route Alternatives Turn-by-turn Navigation

(23)

23 23 Route Alternatives Step-by-step transit Turn-by-turn walk guidance

Urban

Navigation

(24)

24 24

Collections Easy location sharing

(25)

25 25

Train Schedule Traffic incidents

3D Maps

(26)

26

Reality Capture Processing

Satellite/Aerial Delivery

Enterprise Businesses

(27)

27

99.99% availability, 99.999999999% durability

High throughput/Good Performance for most use-cases

Good price ratio

(28)

28

(29)

29

(30)

30

Billion of tiles

• Huge storage requirements due to high resolution content across zoom levels

• Big amount of small tile size to keep track and deliver

• Exponential growth rate (today some billions, tomorrow some trillions)

• Increased data volume refresh rate

• Maintain low latency requirements and service level agreement

(31)

31

Behind the curtain

• Specialized spatial file system to deliver tile imagery with sub-ms lookup time over

the network.

• Simple Architecture with CDN Caches and Core sites (with full dataset)

• Remote sites had CDN type caches with geospatial shard-ing placement

algorithms.

• Some select cache regions suffered sometimes from inter-continental network latency due to non-optimized routing

(32)

32 Core Caches

Shared Store

Specialized Spatial Blob Store

Singleton Store

Specialized Adaptive Spatial Blob Store Mercator based shard-ing layer

(33)

33

Given the success of S3 usage across HERE and the recent enhancement

to the offering, we started to look at S3 to solve 2 main problems with 1

solution

Simplify the storage handling layer with getting rid of the storage compute from our architecture and simplify Operations.

Reduce the network latency from core data to our delivery instances by adding core data presence in each availability regions.

(34)

34

• Easy life-cycle management for recurring update

• Big Data store requirements on-demand (ease capacity planning)

• Easy pipeline integration with SQS/SNS for background jobs

• Good performance out of the box, however did not fulfill our requirements

- Too much variation in response time ~ AVG 150-300ms.

(35)

35

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a

specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

(36)

36

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

(37)

37

S3 Load constrain + Satellite

Stored lexicographically across S3 partitions

Satellite example tile ID: 15/18106/11272

z x y 302013232331232 quadkey representation 302013232321201 15/18089/11275 17/72409/45094 30201323233033003

(38)

38

S3 Load constrain + Satellite

Satellite example tile ID: 15/18106/11272

z x y 302013232331232 quadkey representation 17/72409/45094 30201323233033003 302013232321201 15/18089/11275

Stored lexicographically across S3 partitions

Each zoom level has 4^level_detail tiles, a quadkey length is equal to the level of detail of the corresponding tile.

(39)

39

S3 Load constrain + Satellite

Stored lexicographically across S3 partitions

Alternative to quadkeys

use random hash, increase base number

Remaining problem

At the scale of satellite, the ratio of requests in regards to the lexicographic overlap produced with a random hash was still significant and would not scale well.

Performance was still unacceptable in light of our requirements. Billion of PUT requests would considerably increase recurring-updates cost.

(40)

40

S3 Load constrain + Satellite

Stored lexicographically across S3 partitions

Better solution

Reduce the amount of files by creating binary blob on S3, index the tiles inside the blobs and use HTTP range-request for access.

New Challenge

Managing updates got more complicated, more logic requires to distribute tiles inside the blobs and more important the predicted index size was in magnitude of terabytes and growing… cost and complexity overhead.

(41)

41

(42)

42

New Pseudo-Quad Index

• New compact O(1) data-structure to work around the performance constrains of S3 • It minimizes the index size constrain to keep track of tiles and random hashes

• 194.605% size reduction in comparison to generic optimized hash tables

• It reduces and sets boundaries for proximity regions to cause better dispersion on the n-gram load split algorithm used by S3

• Simplified Imagery updates; geometrical consistency across all S3 buckets • Performance:

• S3: >150-300ms

(43)

43

With S3 and PQI we have simplified our architecture

PQI Backend

Tiny ref file

(44)

44

Impact on Architecture

Impact on day-day Operation of our services

Brings us geographically closer to our

customer while not compromising on design patterns to work around network latencies.