Berlin 2015
Storage, Backup and Disaster Recovery in the Cloud
AWS Customer Case Study: HERE „Maps for Life“
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Storage, Backup and Disaster
Recovery in the Cloud
Robert Schmid, Storage Business Development, AWS
Ali Abbas, Principal Architect, HERE
Case Study: AWS Customer HERE
What we will cover in this session
•
Amazon storage options
•
Amazon Elastic File System
•
Use cases (Backup, Archive, DR)
•
Customer Use Case: HERE
102% year-over-year increase in
data transfer to and from S3
(Q4 2014 vs Q4 2013, not including Amazon use)
Amazon S3
$0.03
per GB-month
$360
per TB/year
99.999999999%
durability
Amazon S3
Amazon Glacier
$0.01
per GB-month
$120
per TB/year
99.999999999%
durability
Amazon Glacier
Low-cost archiving service
3–5 hours
Amazon EBS
EBS
General Purpose (SSD)
Up to 16 TB
10,000 IOPS
Provisioned IOPS (SSD)
Up to 16 TB
20,000 IOPS
$0.10
per GB-month
$0.125
per GB-month
0.065/provisioned IOPS
Storage Gateway
Your on-ramp to AWS cloud storage:
• Back up into S3
• Archive into
Amazon Glacier
Summary: AWS Storage Options
• Object Storage (S3, Glacier)
• Elastic Block Storage (EBS)
• Storage Gateway (iSCSI, VTL)
Introducing
Amazon Elastic File System
for EC2 Instances
pilot availability later this summer
What is
EFS
?
• Fully managed file system for EC2 instances
• Provides standard file system semantics (NFSv4)
• Elastically grows to petabyte scale and shrinks elastically
• Delivers performance for a wide variety of workloads
• Highly available and durable
simple
elastic
scalable
Amazon Storage Use Cases:
Block File
Customer Data Center
Archive Backup Disaster Recovery
Backup, Archive, Disaster Recovery
AWS Cloud
Glacier S3
Colocation Data Center
DirectConnect Private Storage
for AWS Internet AWS Direct
Connect AWS SGW Storage Gateways S3 Glacier Customer /CSP Assets
18
AWS Customer Case Study
Ali Abbas
HERE: Maps for Life
Principal Architect
• High Resolution Satellite Imagery
• Predictive Analytics/Machine Learning
ali.abbas@here.com
19
20
Maps for Life
Web and Mobile App
21
Offline Map
Save the maps of your country or state on your phone
Use your phone offline Explore anywhere without an internet connection
22 Unified Route Planning Route Alternatives Turn-by-turn Navigation
23 23 Route Alternatives Step-by-step transit Turn-by-turn walk guidance
Urban
Navigation
24 24
Collections Easy location sharing
25 25
Train Schedule Traffic incidents
3D Maps
26
Reality Capture Processing
Satellite/Aerial Delivery
Enterprise Businesses
27
99.99% availability, 99.999999999% durability
High throughput/Good Performance for most use-cases
Good price ratio
28
29
30
Billion of tiles
• Huge storage requirements due to high resolution content across zoom levels
• Big amount of small tile size to keep track and deliver
• Exponential growth rate (today some billions, tomorrow some trillions)
• Increased data volume refresh rate
• Maintain low latency requirements and service level agreement
31
Behind the curtain
• Specialized spatial file system to deliver tile imagery with sub-ms lookup time over
the network.
• Simple Architecture with CDN Caches and Core sites (with full dataset)
• Remote sites had CDN type caches with geospatial shard-ing placement
algorithms.
• Some select cache regions suffered sometimes from inter-continental network latency due to non-optimized routing
32 Core Caches
Shared Store
Specialized Spatial Blob Store
Singleton Store
Specialized Adaptive Spatial Blob Store Mercator based shard-ing layer
33
Given the success of S3 usage across HERE and the recent enhancement
to the offering, we started to look at S3 to solve 2 main problems with 1
solution
Simplify the storage handling layer with getting rid of the storage compute from our architecture and simplify Operations.
Reduce the network latency from core data to our delivery instances by adding core data presence in each availability regions.
34
• Easy life-cycle management for recurring update
• Big Data store requirements on-demand (ease capacity planning)
• Easy pipeline integration with SQS/SNS for background jobs
• Good performance out of the box, however did not fulfill our requirements
- Too much variation in response time ~ AVG 150-300ms.
35
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a
specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
36
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
37
S3 Load constrain + Satellite
Stored lexicographically across S3 partitions
Satellite example tile ID: 15/18106/11272
z x y 302013232331232 quadkey representation 302013232321201 15/18089/11275 17/72409/45094 30201323233033003
38
S3 Load constrain + Satellite
Satellite example tile ID: 15/18106/11272
z x y 302013232331232 quadkey representation 17/72409/45094 30201323233033003 302013232321201 15/18089/11275
Stored lexicographically across S3 partitions
Each zoom level has 4^level_detail tiles, a quadkey length is equal to the level of detail of the corresponding tile.
39
S3 Load constrain + Satellite
Stored lexicographically across S3 partitions
Alternative to quadkeys
use random hash, increase base number
Remaining problem
At the scale of satellite, the ratio of requests in regards to the lexicographic overlap produced with a random hash was still significant and would not scale well.
Performance was still unacceptable in light of our requirements. Billion of PUT requests would considerably increase recurring-updates cost.
40
S3 Load constrain + Satellite
Stored lexicographically across S3 partitions
Better solution
Reduce the amount of files by creating binary blob on S3, index the tiles inside the blobs and use HTTP range-request for access.
New Challenge
Managing updates got more complicated, more logic requires to distribute tiles inside the blobs and more important the predicted index size was in magnitude of terabytes and growing… cost and complexity overhead.
41
42
New Pseudo-Quad Index
• New compact O(1) data-structure to work around the performance constrains of S3 • It minimizes the index size constrain to keep track of tiles and random hashes
• 194.605% size reduction in comparison to generic optimized hash tables
• It reduces and sets boundaries for proximity regions to cause better dispersion on the n-gram load split algorithm used by S3
• Simplified Imagery updates; geometrical consistency across all S3 buckets • Performance:
• S3: >150-300ms
43
With S3 and PQI we have simplified our architecture
PQI Backend
Tiny ref file
44
Impact on Architecture
Impact on day-day Operation of our services
Brings us geographically closer to our
customer while not compromising on design patterns to work around network latencies.
Allows us to only focus on our core business and technologies while offloading
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved