• No results found

Designing Apps for Amazon Web Services

N/A
N/A
Protected

Academic year: 2021

Share "Designing Apps for Amazon Web Services"

Copied!
112
0
0

Loading.... (view fulltext now)

Full text

(1)

Designing Apps for

Amazon Web

Services

Mathias Meyer, GOTO Aarhus 2011

(2)
(3)

Me

infrastructure

code

databases

@roidrage

www.paperplanes.de

Montag, 10. Oktober 11

(4)

The Cloud

Montag, 10. Oktober 11

(5)
(6)
(7)
(8)

10,000 feet

(9)

Amazon Web Services

(10)

EC2

(11)

On-demand computing

Montag, 10. Oktober 11

(12)

API

Montag, 10. Oktober 11

(13)

Pay as you go

(14)

Multiple regions

Montag, 10. Oktober 11

Geographically distributed, for all web services. US East, West, EU, Singapore, Japan

(15)

Multiple datacenters

Montag, 10. Oktober 11

Called availability zones.

At least two data centers in each region. Physically separated locations.

(16)

Different instance types

(17)

High CPU

vs.

High memory

Montag, 10. Oktober 11

(18)

1.7 GB

-68 GB

Montag, 10. Oktober 11

(19)

Elastic Block Store

Montag, 10. Oktober 11

Storage on instances is ephemeral.

Goes away when the instance goes away.

EBS allows persisting data for longer than an instance’s lifetime. Bound to a data center.

(20)

Mount to any instance

Montag, 10. Oktober 11

(21)

Snapshots

Montag, 10. Oktober 11

A point in time, atomic snapshot of an EBS volume.

(22)

More AWS Products

S3

CloudFront

CloudFormation

CloudWatch

RDS

Auto Scaling

SimpleDB

Route 53

Load Balancing

Queue Service

Notification Service

Elastic MapReduce

Montag, 10. Oktober 11

(23)

What’s Scalarium?

(24)

Automates:

Setup

Configuration

One-Click Deploy

Montag, 10. Oktober 11

(25)

...for EC2

...on EC2

Montag, 10. Oktober 11

Scalarium uses a couple of Amazon’s Web services EC2 is the most interesting one

(26)

Automation

Montag, 10. Oktober 11

Scalarium automates so that customers don’t have to. Most important part about deploying in the cloud.

(27)

The Dream:

Configure a cluster

Push a button

Boom!

Montag, 10. Oktober 11

(28)

Two ways...

(29)

Create image, boot it.

(30)

Build once, use forever

Montag, 10. Oktober 11

Point in time snapshot of a system, fully configured. App server, database, web server, etc.

(31)

How do you handle

updates?

Montag, 10. Oktober 11

Build a new image, install updates, discard old image. Lather, rinse, repeat, with every update.

(32)

Configure from scratch

(33)

Montag, 10. Oktober 11

Start with a blank slate.

(34)

Montag, 10. Oktober 11

Use a configuration management tool.

(35)

Configuration

+ Cookbooks/Manifests

+ Chef/Puppet/etc.

= Configured cluster

Montag, 10. Oktober 11

(36)

Configuration:

Chef Server

RightScale

JSON

Scalarium

Montag, 10. Oktober 11

There’s an infinite numbers of infrastructure service providers. Just as many ways to store cluster configuration

(37)

Montag, 10. Oktober 11

(38)

In the beginning...

(39)

Scalarium

Montag, 10. Oktober 11

Lasted for a few months on just one instance.

Instance ran RabbitMQ, CouchDB, Redis, background workers, web and application servers. Bootstrapped startup = start small, iterate quickly.

(40)

Montag, 10. Oktober 11

(41)
(42)
(43)

Automate,

automate,

automate!

Montag, 10. Oktober 11

Chef for everything.

Yes, our first isntance was not fully automated. Hypocratic. Vagrant is an excellent tool to test locally.

Scalarium customers can test their own and our cookbooks locally.

(44)

EC2 is not a traditional

datacenter

(45)

Montag, 10. Oktober 11

(46)

Montag, 10. Oktober 11

Still looks like this. But it’s transparent to you. No operational access to you.

(47)

Multi-tenant

Montag, 10. Oktober 11

Shared resources (CPU, memory, network).

(48)

High likelihood of failure

Montag, 10. Oktober 11

It’s still hardware that’s running your servers. Hardware fails.

(49)

Faulty instances

Montag, 10. Oktober 11

They fail, not all the time, but if you have high turnover scaling up and down, they’ll fail. Discard, boot new instance, done.

(50)

Datacenter outage

(51)

Network partition

(52)

More instances

=

Higher chance

of failure

(53)

MTBF

Montag, 10. Oktober 11

Mean Time Between Failures

On EC2 as a whole it’s pretty small. Not an important metric.

(54)

21/04/2011

Montag, 10. Oktober 11

US-East data centers become unavailable. Cascading failures in EBS storage.

(55)

Montag, 10. Oktober 11

(56)

Montag, 10. Oktober 11

(57)

Montag, 10. Oktober 11

(58)

Montag, 10. Oktober 11

(59)

Montag, 10. Oktober 11

(60)

7/8/2011

Montag, 10. Oktober 11

Downtime in EU data centers.

Lightning strike caused power outage.

Again, cascading failure in the EBS storage layer. More than three days ‘til full recovery.

(61)
(62)
(63)

Failure is a good thing

(64)

You can ignore it

(65)

Learn from it

(66)

Design for it

(67)

Don’t fear failure

(68)

Plan for failure

Montag, 10. Oktober 11

Failure becomes a part of your apps’ lifecycle.

(69)

Design for resilience

Montag, 10. Oktober 11

In case of failures, you app should handle them gracefully, not breaking along the way entirely. Serve statics instead of failure notices to the user.

(70)

Plan for recovery

Montag, 10. Oktober 11

Can you re-deploy your app into a different region quickly? If not, why not?

(71)

MTTR

(72)

Disaster recovery plan

Montag, 10. Oktober 11

What do you do when your site goes down?

What do you do when you need to restore data? Plan, verify, one click.

(73)

Multi-datacenter

deployments

Montag, 10. Oktober 11

Deploy App across multiple data centers/availability zones.

Make deploying to different data centers part of the deployment process. Staged deployments: new set of instance, flip load balancer.

(74)

Replication

Montag, 10. Oktober 11

Storage becomes a key part in handling failure.

Everything else is usually much easier to scale and distribute. Replicate data across availability zones, across regions.

(75)

Montag, 10. Oktober 11

Availability zones are geographically distributed

Reading from in between them means increased latency

Replication ensures data is in multiple geographic locations.

Replication allows to recover quickly by moving to different data centers. Not all databases do this well, but they do it

(76)

Multi-region

deployments

Montag, 10. Oktober 11

(77)

$$$

Montag, 10. Oktober 11

Deploying highly distributed is expensive. How distributed is up to your budget.

(78)

Relax consistency

requirements

Montag, 10. Oktober 11

Strong consistency increases need for full availability

(79)

Latency

Montag, 10. Oktober 11

Immediately became an issue when we scaled out.

Network latency adds to EBS latency and made for higher response times from the database. All network traffic on EC2 is firewalled, even internal traffic.

(80)

Keep data local

(81)
(82)

Keep data in memory

(83)

Cache is king

Montag, 10. Oktober 11

(84)

Use RAIDs for EBS

Montag, 10. Oktober 11

Increased performance, better network utilization

EBS performance is okay, but not great. Don’t expect SATA or SAS like performance. RAID 0, 5 or 10.

EBS is redundant, but extra reduncany with striping doesn’t hurt. More likely recovery when one EBS volume fails. RAIDs won’t save you from EBS unavailability.

(85)

Use local storage

Montag, 10. Oktober 11

Don’t use EBS at all.

Local storage requires redundancy.

Instance storage is lost when the instance is lost. RAID across local storage.

More reliable in terms of I/O than EBS.

(86)

Use bigger instances

Montag, 10. Oktober 11

The bigger your instance the less shared it is on the host. Bigger instances have higher I/O throughput.

(87)

What would I do

differently?

(88)

Small services

Montag, 10. Oktober 11

Small services can run independent of each other.

Small services are easy to deploy, easy to reconfigure (Chef).

Don’t have to know about all the other services upfront, leave that to CM tools. Layered system with small services allows failure handling on every layer.

(89)

Frontend vs. Small APIs

(90)

Fail fast

Montag, 10. Oktober 11

When components fail, don’t block waiting for them. Timeout quickly.

(91)

Retry

Montag, 10. Oktober 11

(92)

Don’t just

assume failure

(93)

Test failure

Montag, 10. Oktober 11

Shut off instances randomly, see what happens.

Turn on the firewall, adds network timeouts, see what happens.

(94)

Montag, 10. Oktober 11

(95)

“Think about your

software running.”

Theo Schlossnagle, OmniTI

(96)

Understand your code’s

breaking points

Montag, 10. Oktober 11

Use patterns like circuit breakers and bulkheads to reduce failure points Think about outcome and implications, not just features.

Understand your code’s breaking points and how they handle unavailability, timeouts, and the like. All these are so much more likely in a cloud environment.

(97)

Isn’t all that what you do

at large scale?

Montag, 10. Oktober 11

(98)

Cloud

==

Large scale

Montag, 10. Oktober 11

(99)

You’re a part of it

Montag, 10. Oktober 11

Prepare for the worst, plan for the worst.

(100)

Scalarium today

(101)

Scalarium

runs on

Scalarium

Montag, 10. Oktober 11

(102)

Montag, 10. Oktober 11

(103)
(104)
(105)

Lack of visibility

Montag, 10. Oktober 11

Takes up to an hour still to acknowledge problems.

Amazon is not good at admitting failure happens a lot on EC2.

(106)

Don’t fall for SLAs

(107)

Amazon only handles

infrastructure

(108)

How you build on it

is up to you

(109)

Fun fact

(110)

amazon.com

is served off EC2

Montag, 10. Oktober 11

Since 21/10/2010.

(111)

It’s not the cloud that

matters, it’s how

you use it.

Montag, 10. Oktober 11

EC2 doesn’t make everything harder, quite the opposite, it makes things easier: Adding capacity, automation, responding to failures.

(112)

Thank you!

References

Related documents

In this chapter besides discussing the leading French political parties’ approaches – Partie Socialiste (the French Socialist Party / PS ) on the left and Union pour

We hypothesized that learners’ DLCs can affect language learning behaviors, attitudes, and preferences; that there is a relationship between participants’ motivation to

Using powerful moisture quenching ingredients, a deep facial massage is followed by an application of the ESPA Professional Lifting and Smoothing Mask to leave skin toned

For that, we examined ovarian PRLR expression as well as that of several P4 production- modulating molecules and we found a consistent expression pattern along gestation for

– Installation program that can selectively install all components (Management Server software, Video Recording Manager software, Configuration.. Client software, Operator

While the components in View Layer implement features needed in one frontend applications (web portal, mobile application of power user application), the services in Service Layer

Data Center Layer Data Center Layer Networking Layer Networking Layer Device Layer Device Layer Amazon Amazon Microsoft. Microsoft

Gathered information shows answers as; (34%) of the respondents strongly agree with the view that new online markets switch client's inclinations.. (38%) of the respondents agree