System Architecture. Last time. Today. Cloud Application Development #2. Autonomic Computing

(1)

Cloud Application Development #2

Johan Tordsson

Department of Computing Science

Last time

•  Autonomic Computing

•  Self-* management of services and infrastructure

–  Self-configuration –  Self-optimization –  Self-healing –  (Self-protection)

•  Techniques for autonomic management •  Examples

Today

1.  Common (Web) application architectures

–  N-tier applications

•  Load Balancers •  Application Servers •  Databases

2.  Cloud application guidelines

–  Scalability –  Fault-tolerance –  Some best practices –  A few Amazon examples

3.  DevOps

–  Principles –  Tools

System Architecture

•  The architecture of a computer system is the high-level (most general) design on which the system is based

•  Architectural features include:

–  Components

–  Collaborations (how components interact) –  Connectors (how components communicate)

•  Common examples

–  Client-Server –  Layered –  Peer-to-peer –  Etc.

(2)

Client-server

•  Roles

–  Client: a component that makes requests clients are active initiators of transactions –  Server: a component that satisfies requests

servers are passive and react to client requests

•  The client-server architecture can be thought of as a median between

–  Centralized processing: computation is performed on a central platform, which is accessed using “dumb” terminals –  Distributed processing: computation is

performed on platforms located with the user

Centralized

Client / Server

Distributed

Tiered Web Architectures

•  Web applications are usually implemented with 2-tier, 3-tier, or multitier (N-tier) architectures

•  Each tier is a platform (client or server) with a unique responsibility

1-Tier server Architecture

•  (Tier 0): Client platform, hosting a web browser

•  Tier 1: server platform, hosting all server software components

1-Tier Characteristics

•  Advantage:

–  Inexpensive (single platform)

•  Disadvantages

–  Interdependency (coupling) of

components

–  No redundancy

–  Limited scalability

•  Typical application

–  10-100 users

–  Small company or organization, e.g.,

law office, medical practice, local

non-profit

(3)

2-Tier C-S Architecture

•  Tier 2 takes over part of the server function from Tier 1, typically data management

2-Tier Characteristics

•  Advantages

–  Improved performance, from specialized hardware

–  Decreased coupling of software components –  Improved scalability

•  Disadvantages

–  No redundancy

•  Typical Application

–  100-1000 users

–  Small business or regional organization, e.g., specialty retailer, small college

Multitier C-S Architecture

•  A multitier (N-tier) architecture is an

expansion of the 2-tier architecture, in one of several different possible ways

–  Replication of the function of a tier –  Specialization of function within a tier

Replication

•  Application and data servers are replicated •  Servers share the total workload

(4)

Specialization

•  Servers are specialized

•  Each server handles a designated part of the workload, by function

Multi-Tier Characteristics

•  Advantages

–  Decoupling of software components –  Flexibility to add/remove platforms

in response to load –  Scalability –  Redundancy

•  Disadvantages

–  Higher costs (maintenance, design, electrical load, cooling)

•  Typical Application

–  1000+ users

–  Large business or organization

Characteristics Summary

1-Tier 2-Tier N-Tier 10+ 100+ 1000+ # users

• large e-commerce, business, or organization

• small e-commerce, regional business or organization • local business or organization

capacity

scalability

redundancy

cost

Inside the N-tier application

1.  Load Balancers

2.  Application Servers

(5)

Web Server Load

Balancing

•  Request enters a

router

•  Load balancing

server determines

which web server

should serve the

request

•  Sends the request

to the appropriate

web server

Internet

Router

Load-Balancing Server Web Servers Request Response

Traditional Web Cluster

How do we split up

information?

Content

Server Farm

?

Information Strategies

Replication

_Partition

Load Balancing Approaches

File Distribution

Routing

Content/Locality

Aware

DNS Server

Size Aware

Centralized

Router

Workload Aware Distributed

(6)

Issues

•  Efficiently processing requests with optimizations for load balancing

–  Send and process requests to a web server that has files in cache

–  Send and process requests to a web server with the least amount of requests

–  Send and process requests to a web server determined by the size of the request

•  Sessions complicate load balancing

–  Need to ensure that subsequent user requests are handled by same backend

•  Stickiness

•  Makes fail-over more complicated

Example techniques

•  Random

–  Fast and simple

•  Round-Robin

–  Distributes requests evenly

•  Fastest Replica First

–  Response-time aware

•  Shortest Queue First

–  Load aware

•  Two Random Choices

–  Response-time aware

Load balancing conclusions

•  There is no “best” way to distribute content among servers

•  There is no optimal policy for all website applications

•  Certain strategies are geared towards a particular website application

Sample Load balancers

•  Amazon Elastic Load Balancing

–  Balances load across across EC2 VMs –  Performs VM instance health checks –  Balancing metrics include request count and

request latency

–  Supports sticky sessions

–  Integrates smoothly with other AWS

•  HAProxy

–  Robust & high-performance load balancer for TCP/HTTP

–  Multiple load balancing algorithms –  Open source

(7)

Application Servers -

motivation

•  Key observation made by application

server vendors

–  Most web applications require similar features such as database access, security, scalability, etc.

–  Provide these features that are fully tested in a container to be leveraged by

application developers

•  Similar to programming language libraries –  Allows application programmers to focus

on business logic instead of writing all features from scratch

Features provided by most

application servers

•  Web Services

•  RMI for distributed applications •  Clustering (for load balancing) •  Database integration •  System management •  Message-oriented middleware •  Security •  Dynamic redeployment •  Etc.

Example: Java EE Containers

More examples

•  Java

–  Commercial

•  IBM WebSphere Application Server •  BEA WebLogic

•  Oracle OC4J

–  Open Source

•  JBoss Application Server •  Apache Geronimo •  Sun Glassfish

•  Apache Tomcat (only a web container)

•  Microsoft .Net

•  Ruby on Rails

(8)

Pros and Cons

•  Advantages

–  Many, many features provided to the application developer

–  Shorter development cycle

–  Low cost of entry, especially when using open source application servers

•  Disadvantages

–  Less flexibility on architecture –  Debugging harder

•  Problem may be in app server…

–  Sometimes your application does not need all features

•  This could hinder performance

•  Various open source tools target compose:ability

Databases -

Relational DB Characteristics

•  Data stored in columns and tables •  Relationships represented by data •  Data Manipulation Language (SQL) •  Data Definition Language (SQL) •  Transactions

•  Abstraction from physical layer

Data Manipulation Language

•  Data manipulated with Select, Insert, Update, & Delete statements

–  Select T1.Column1, T2.Column2 … From Table1, Table2 …

Where T1.Column1 = T2.Column1 …

•  Data Aggregation •  Compound statements •  Functions and Procedures •  Explicit transaction control

Data Definition Language

•  Schema defined at the start

•  Create Table (Column1 Datatype1,

Column2 Datatype 2, …)

•  Constraints to define and enforce

relationships

–  Primary Key

–  Foreign Key

–  Etc.

•  Triggers to respond to Insert, Update,

and Delete

•  Alter …

•  Drop …

(9)

Transaction: An Execution of a

DB Program

•  Key concept is transaction, which is an

atomicsequence of database actions (reads/ writes).

•  Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.

–  Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints.

–  Beyond this, the DBMS does not really understand the semantics of the data. –  Thus, ensuring that a transaction (run alone)

preserves consistency is ultimately the user’s responsibility!

Transactions – ACID Properties

•  Atomic – All of the work in a transaction completes (commit) or none of it completes •  Consistent – A transaction transforms the

database from one consistent state to another consistent state. Consistency is defined in terms of constraints.

•  Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.

•  Durable – The results of a committed transaction survive failures

CAP Theorem

•  Three properties of a system

–  Consistency (all copies have same value) –  Availability (system can run even if parts

have failed)

–  Partitions (network can break into two or more parts, each with active systems that cannot talk to other parts)

•  Brewer’s CAP “Theorem”: You can have at

most two of these three properties for

any system

•  Very large systems will partition at some

point

–  -> Choose one of consistency or availability –  Traditional database choose consistency –  Most Web applications choose availability

•  Except for specific parts such as order processing

NoSQL – A Scalable Web

alternative

•  Stands for Not Only SQL

•  Class of non-relational data storage

systems

•  Usually do not require a fixed table

schema nor do they use the concept of

joins

•  All NoSQL offerings relax one or more of

the ACID properties

•  Not a backlash/rebellion against RDBMS

•  SQL is a rich query language that cannot

be rivaled by the current list of NoSQL

offerings

(10)

Distributed Key-Value Data Stores

•  Distributed key-value data storage

systems allow key-value pairs to be

stored (and retrieved on key) in a

massively parallel system

–  E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..

–  Recall Ahmed’s lectures

•  Partitioning, high availability, etc

completely transparent to application

•  Sharding systems (partioned DBs) and

key-value stores do not support many

relational features

–  No join operations

–  No group by, order by, etc

–  No ACID transactions

–  No SQL

Typical NoSQL API

•  Basic API access:

–  get(key)

•  Extract the value given a key

–  put(key, value)

•  Create or update the value given its key

–  delete(key)

•  Remove the key and its associated value

–  execute(key, operation, parameters)

•  Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).

Databases - Summary

•  SQL Databases

–  Predefined Schema

–  Standard definition and interface language –  Tight consistency

–  Well defined semantics

•  NoSQL Databases

–  No predefined Schema

–  Per-product definition and interface language –  Getting an answer quickly is more important

than getting a correct answer

2. Cloud application

guidelines

(11)

Scalable N-tier applications

•  Basic 3-Tier architecture

–  Dedicated server for each tier –  Simple

–  Non-redundant

–  For testing/development –  Not suitable for production

Scaling RDBMS – Master/

Slave

•  Master-Slave

–  All writes are written to the master. All

reads performed against the replicated

slave databases

–  Good for mostly read, very few update

applications

–  Critical reads may be incorrect as

writes may not have been propagated

down

–  Large data sets can pose problems as

master needs to duplicate data to

slaves

Scaling RDBMS - Partitioning

•  Partitioning

–  Divide the database across many machines

•  E.g. hash or range partitioning

•  Handled transparently by parallel

databases

–  but they are expensive

•  “Sharding”

–  Divide data amongst many cheap databases (MySQL/PostgreSQL)

–  Manage parallel access in the application –  Scales well for both reads and writes –  Not transparent, application needs to be

partition-aware

Redundant 3-Tier architecture

•  Redudancy in each tier •  Replicated DB

- May used striped volumes

•  Fault tolerance •  But fixed size

(12)

Multi-center architecture

•  Similar to redundant architecture •  Uses multiple Data Centers

- Ex. EC2 availability zones

•  Protects against Data Center failure •  Redudant architecture protects against server failure… •  Fixed size

Autoscaling Architecture

•  Redundant & fault tolerant server tiers •  Change number of servers dynamically

- Commonly only in application server tier

•  Database tier the bottleneck…

Autoscaling Architecture 2

•  For read-heavy (static) DBs

•  Add read-only caches, e.g.. Memcached •  Does not solve

general problem

Autoscaling Architecture 3

•  Replace Relational DB with NoSQL data store •  Replicate datastore across multiple servers •  Adds Scalability •  At the expense of SQL, consistency, … •  Common in social networks, etc.

(13)

Scalable batch processing

•  For compute-intense workloads

•  Auto-scale application server + worker nodes •  Message queues

for robustness and scaling

More about Message Queues

•  Asynchronous calls between components

•  Enables loose coupling

•  Buffering of load spikes

•  Enables fault tolerance

•  Wanted: Admission control mechanisms

-

Traffic engineering a’la routers, drop requests, etc.

•  Example: Amazon SQS

Scalable architectures - summary

•  Auto-scaling of 3-tier architecture well understood

•  Database layer is bottleneck

–  NoSQL may be an alternative

•  Scaling up is easy

–  Scalable architectures can make use of more servers

•  Scaling down is harder

–  Which server instance to shut down? –  Long-running sessions make it hard to

evacuate workloads

•  Auto-scaling metrics

–  Generic: Server CPU/memory utilization, etc. –  Specific: Numbe of transactions, \\ users, etc.

•  When to scale?

A few words on faults

•  Services fail in operation

–  Disks break, power outages, application logic errors, overload failures, etc.

•  Well-designed services handle faults

(14)

Designing robust services

•  Quick service health checks

–  Detect problems early

•  Develop + test in full environment

–  Unit tests are not enough

•  Do not trust underlying systems

–  Rare failures are not rare at large scale

•  Isolate failures

–  Avoid cross-component fault propagation

•  Allow human intervention

–  Emergency cases

–  Through scripts, not manual processes –  Fixing problems under stress is hard…

•  Keep things simple and robust

–  Optimize only if factor 10 better, not for a few %

Rubust services (cont.)

•  Automate management and provisioning

–  Deployment, updates, restarts, etc. should be simple

–  Make everything configurable

•  Recompiling the system during downtime is hard

•  Intentionally fail the system

–  Conventional shutdown does not test the fault handling

–  Purposely kill components/services at random –  Recall Netflix Chaos-Monkey, etc.

•  Soft deletes only

–  Mark data as removed, but keep it

–  No backup can save a bad SQL delete query

Robust services (cont.)

•  Independent components

–  Asyncronous design: expect latencies –  Expect failures: backoffs, retries, etc. –  Fail fast: release resources unless successful

•  Coupling

–  Careful design to minimize cross-component –  No replicated functionality across components

•  Graceful degradation

–  Extend beyond ”working” or ”failure” –  Cut non-critical load in emergency cases –  Store non-processed work in queues

•  Admission control at component boarders

–  Do not bring more work to overloaded system –  Meter and control admission

–  Allow for tuning rate of users/work units added

Understand your service

•  Instrument and monitor as much as possible

–  Adjustable logging levels, e.g., log4j

•  Data mining and visualization helps understanding your application behaviour

–  Where are bottlenecks, etc.?

•  Workload analysis (trace-based) valuable for stress-testing updated systems

–  Will this actually work in full-scale production?

•  Monitor everthing, but rairly raise alarms

(15)

AWS examples

- Web hosting

1.  Route 53 DNS 2.  ELB 3.  EC2 Web server 4.  Auto scaling group 5.  S3 for static content 6.  Cloudfront for edge

location 7.  Availability zones

AWS content

streaming

1.  S3 for static content 2.  Cloudfront for low

latency (edge caching) 3.  Alt.: Host content in

EC2

4.  Deliver data stream with EC2 + Cloudfront

AWS Batch

Processing

1.  Job manager as EC2 VM 2.  Input data in S3

3.  Jobs inserted to SQS input queue 4.  Job processing in EC2 nodes. Auto-scaling group for elasticity + fault tolerance

5.  Output in S3

6.  Stats in RDS or SimpleDB

7.  Alt: Partial results to SQS output queue

AWS

High

Availability

2. Use of >1 Availability Zones 3. Elastic IP to bind IP to EC2 VMs, can

remap upon failures 4. No critical data on VM instance

storage, use EBS or S3. 1? ELB for load balancing across

(16)

AWS

Big Data

1.  Parallel data upload to S3, or use Amazon Import to send whole storage device 2.  Parallel processing in EC2, may use S3

as /scratch through FUSE 3.  Results written to S3

4.  Alt. Use EBS for input and/or output data sets. Tune TCP streams or use UDP

62 | Battery Ventures Works fine in theory, in practice:

“Death Star” Architecture Diagrams

As visualized by Appdynamics, Boundary.com and Twitter internal tools

Netflix Gilt Groupe (12 of 450) Twitter

3. DevOps

•  Concepts

•  Tools

–  Chef, Puppet, etc.

(17)

DevOps - consequences

DevOps - whatis?

•  "DevOps" is an emerging set of principles,

methods and practices for communication,

collaboration and integration between software development (application/ software engineering) and IT operations (systems administration/infrastructure) professionals

DevOps (cont.)

•  Acknowledges the interdependence of software development and IT operations •  Aims to help organizations rapidly produce

quality software products and services •  Responds to the demands of stakeholders for

an increased rate of production releases •  Supports the use of agile development

processes

DevOps (cont.)

•  The combination of traditional development activities with operations and testing (QA/QE) •  Collaboration, communication and integration is

key

•  Agile development model (sprints, scrum, …) •  Release coordination and automation

(18)

Classic configuration

management

•  Server crafting –  Boring –  Stressful –  Error-prone –  Time-consuming –  Expensive

•  Server configuration time

–  Days? Weeks?

Automated configuration

management

•  Automated configuration management •  Describe systems once, apply as often as you

want

•  Automate repetitive tasks

•  Admins focus on improvements and real problems

•  Predictable results •  No Snowflakes!

•  Server configuration time

–  Seconds

Infrastructure as code & DevOps

•  Eliminate tedious and repetitive work •  Configurations are defined in a machine- and

OS-independent domain language so the manifests are portable and can be reused •  Keep servers in sync and know what is running

on what server

•  Configuration can be used as documentation since configuration is applied by the

documentation is always up-to-date

•  Configuration can be version controlled and managed the same way you manage other code •  Brings development methodology to operations!

–  Sysadmins are (also) developers…

Infrastructure as Code:

Pets vs. Cattle

(19)

73!

AUTOMATING

73

• Declarative configuration language

• Plain-text configuration in source control

• Fully programmatic, no manual

interactions

Chef terminology

•  Node, Server, Workstation

•  Chef-client asks Chef-server about the policy for the node

•  Resources: a component and the desired state •  Recipes: describe resources and desired state •  Cookbooks: sets of recipes grouped together,

also includes templates and source files, etc. •  Run List: recipes from cookbooks you want to

run

•  Role: type of node

Chef architecture

Central Chef Server Multiple Administrators

Chef example recipe

19 | 10 19 | 28 Example Recipe paca Policy says package should be installed This service should be enabled on reboot, and must be running Template specifies the contents required

(20)

Puppet DSL

1.  Describe what you want to be configured

2.  (Don‘t care how it is done) 3.  Describe dependencies

file package service types

win *nix deb rpm POSIX win providers

Puppet overview

_{Puppet example:}

Apache server

10 | 10

10 | 28

Puppet Example |

Apache HTTP Server package { "httpd": name => "httpd.x86_64", ensure => "present", } file { "http.conf": path => "/etc/httpd/conf/httpd.conf", owner => root, group => root, mode => 0644, source => "puppet:///modules/apache/httpd.conf", require => Package["httpd"], } service { "httpd": ensure => running, enable => true, subscribe => File["http.conf"], } Manifests Resources ● Name ● Attributes and values Ordering and dependencies

Puppet building blocks

11 | 10

11 | 28

More on Puppet |

building blocks

Manifests

Variables and (custom) facts Node declarations Classes and Modules Defined resource types Templates Console if $operatingsystem == 'CentOS' node 'www1.example.com' { include common include apache } node 'db1.example.com' { include common include mysql } file { "http.conf": path => "/etc/httpd/conf/httpd.conf", owner => 'root', group => 'root', mode => '0644', content => template('config/httpd.erb'), }

A Puppet run

1.  Retrieve plugins from server 2.  Get “facts“ on client

and send them to master

3.  Compile catalog and send it to the client 4.  Apply catalog on

client 5.  Process report

(21)

DevOps - dangers

•  “To err is human, but to err and deploy it to all systems simultaneously is DevOps” – A. Cockcroft, Netflix Architect •  Recent Twitter outage due to rollout of

broken configuration that crashed all services •  2014 Gmail outage:

–  "An internal system that generates

configurations encountered a software bug and generated an incorrect configuration" –  "The incorrect configuration was sent to live

services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors.”

•  Canary deployment

–  Don’t roll out system-wide until it works

General Conclusions

•  Auto-scaling, fault tolerance, etc. does not happen automagically in the cloud •  Careful design is essential

–  Actual architecture likely to be a mess –  Apply principles of robustness

•  DevOps bridges the gap between development and operations •  Rapid deployment key to success •  Many tools/services out there to help

Assignment 3 - published

•  Course project •  Free form assignment

–  Implement something using the cloud

•  Should be suitable for the cloud –  Use at least 3 existing services •  Should be doable

–  Make a list of features/requirements + prioritize

•  Deadlines and Deliverables:

1.  Wednesday 14/5:

•  Project plan: what will you do + how?

2.  Wednesday 28/5, 13-16

•  Presentation + demo

3.  Thursday 5/6:

•  Report

System Architecture. Last time. Today. Cloud Application Development #2. Autonomic Computing

Cloud Application Development #2

Last time

Today

System Architecture

Client-server

Centralized

Client / Server

Distributed

Tiered Web Architectures

1-Tier server Architecture

1-Tier Characteristics

• Advantage:

– Inexpensive (single platform)

• Disadvantages

– Interdependency (coupling) of

components

– No redundancy

– Limited scalability

• Typical application

– 10-100 users

– Small company or organization, e.g.,

law office, medical practice, local

non-profit

2-Tier C-S Architecture

2-Tier Characteristics

• Advantages

• Disadvantages

• Typical Application

Multitier C-S Architecture

Replication

Specialization

Multi-Tier Characteristics

• Advantages

• Disadvantages

• Typical Application

Characteristics Summary

capacity

scalability

redundancy

cost

Inside the N-tier application

Web Server Load

Balancing

• Request enters a

router

• Load balancing

server determines

which web server

should serve the

request

• Sends the request

to the appropriate

web server

Internet

Router

How do we split up

information?

Content

Server Farm

?

Information Strategies

Replication

Partition

Load Balancing Approaches

File Distribution

Routing

Content/Locality

Aware

DNS Server

Size Aware

Centralized

Router

Workload Aware Distributed

Issues

Example techniques

Load balancing conclusions

Sample Load balancers

Application Servers -

motivation

•  Advantage:

–  Inexpensive (single platform)

•  Disadvantages

–  Interdependency (coupling) of

–  No redundancy

–  Limited scalability

•  Typical application

–  10-100 users

–  Small company or organization, e.g.,

•  Advantages

•  Disadvantages

•  Typical Application

•  Advantages

•  Disadvantages

•  Typical Application

•  Request enters a

•  Load balancing

•  Sends the request

_Partition

•  Key observation made by application

•  Microsoft .Net

•  Advantages

•  Disadvantages

•  Schema defined at the start

•  Create Table (Column1 Datatype1,

•  Constraints to define and enforce

–  Primary Key

–  Foreign Key

–  Etc.

•  Triggers to respond to Insert, Update,

•  Alter …

•  Drop …

•  Three properties of a system

•  Brewer’s CAP “Theorem”: You can have at

•  Very large systems will partition at some

•  Stands for Not Only SQL

•  Class of non-relational data storage

•  Usually do not require a fixed table

•  All NoSQL offerings relax one or more of

•  Not a backlash/rebellion against RDBMS

•  SQL is a rich query language that cannot

•  Distributed key-value data storage

•  Partitioning, high availability, etc

•  Sharding systems (partioned DBs) and

–  No join operations

–  No group by, order by, etc

–  No ACID transactions

–  No SQL

•  Basic API access:

–  get(key)

–  put(key, value)

–  delete(key)

–  execute(key, operation, parameters)