Cloud Application Development #2
Johan Tordsson
Department of Computing Science
Last time
• Autonomic Computing
• Self-* management of services and infrastructure
– Self-configuration – Self-optimization – Self-healing – (Self-protection)
• Techniques for autonomic management • Examples
Today
1. Common (Web) application architectures
– N-tier applications
• Load Balancers • Application Servers • Databases
2. Cloud application guidelines
– Scalability – Fault-tolerance – Some best practices – A few Amazon examples
3. DevOps
– Principles – Tools
System Architecture
• The architecture of a computer system is the high-level (most general) design on which the system is based
• Architectural features include:
– Components
– Collaborations (how components interact) – Connectors (how components communicate)
• Common examples
– Client-Server – Layered – Peer-to-peer – Etc.
Client-server
• Roles
– Client: a component that makes requests clients are active initiators of transactions – Server: a component that satisfies requests
servers are passive and react to client requests
• The client-server architecture can be thought of as a median between
– Centralized processing: computation is performed on a central platform, which is accessed using “dumb” terminals – Distributed processing: computation is
performed on platforms located with the user
Centralized
Client / Server
Distributed
Tiered Web Architectures
• Web applications are usually implemented with 2-tier, 3-tier, or multitier (N-tier) architectures
• Each tier is a platform (client or server) with a unique responsibility
1-Tier server Architecture
• (Tier 0): Client platform, hosting a web browser
• Tier 1: server platform, hosting all server software components
1-Tier Characteristics
• Advantage:
– Inexpensive (single platform)
• Disadvantages
– Interdependency (coupling) of
components
– No redundancy
– Limited scalability
• Typical application
– 10-100 users
– Small company or organization, e.g.,
law office, medical practice, local
non-profit
2-Tier C-S Architecture
• Tier 2 takes over part of the server function from Tier 1, typically data management
2-Tier Characteristics
• Advantages
– Improved performance, from specialized hardware
– Decreased coupling of software components – Improved scalability
• Disadvantages
– No redundancy
• Typical Application
– 100-1000 users
– Small business or regional organization, e.g., specialty retailer, small college
Multitier C-S Architecture
• A multitier (N-tier) architecture is an
expansion of the 2-tier architecture, in one of several different possible ways
– Replication of the function of a tier – Specialization of function within a tier
Replication
• Application and data servers are replicated • Servers share the total workload
Specialization
• Servers are specialized
• Each server handles a designated part of the workload, by function
Multi-Tier Characteristics
• Advantages
– Decoupling of software components – Flexibility to add/remove platforms
in response to load – Scalability – Redundancy
• Disadvantages
– Higher costs (maintenance, design, electrical load, cooling)
• Typical Application
– 1000+ users
– Large business or organization
Characteristics Summary
1-Tier 2-Tier N-Tier 10+ 100+ 1000+ # users• large e-commerce, business, or organization
• small e-commerce, regional business or organization • local business or organization
capacity
scalability
redundancy
cost
Inside the N-tier application
1. Load Balancers
2. Application Servers
Web Server Load
Balancing
• Request enters a
router
• Load balancing
server determines
which web server
should serve the
request
• Sends the request
to the appropriate
web server
Internet
Router
Load-Balancing Server Web Servers Request ResponseTraditional Web Cluster
How do we split up
information?
Content
Server Farm
?
Information Strategies
Replication
Partition
Load Balancing Approaches
File Distribution
Routing
Content/Locality
Aware
DNS Server
Size Aware
Centralized
Router
Workload Aware Distributed
Issues
• Efficiently processing requests with optimizations for load balancing
– Send and process requests to a web server that has files in cache
– Send and process requests to a web server with the least amount of requests
– Send and process requests to a web server determined by the size of the request
• Sessions complicate load balancing
– Need to ensure that subsequent user requests are handled by same backend
• Stickiness
• Makes fail-over more complicated
Example techniques
• Random
– Fast and simple
• Round-Robin
– Distributes requests evenly
• Fastest Replica First
– Response-time aware
• Shortest Queue First
– Load aware
• Two Random Choices
– Response-time aware
Load balancing conclusions
• There is no “best” way to distribute content among servers
• There is no optimal policy for all website applications
• Certain strategies are geared towards a particular website application
Sample Load balancers
• Amazon Elastic Load Balancing
– Balances load across across EC2 VMs – Performs VM instance health checks – Balancing metrics include request count and
request latency
– Supports sticky sessions
– Integrates smoothly with other AWS
• HAProxy
– Robust & high-performance load balancer for TCP/HTTP
– Multiple load balancing algorithms – Open source
Application Servers -
motivation
• Key observation made by application
server vendors
– Most web applications require similar features such as database access, security, scalability, etc.
– Provide these features that are fully tested in a container to be leveraged by
application developers
• Similar to programming language libraries – Allows application programmers to focus
on business logic instead of writing all features from scratch
Features provided by most
application servers
• Web Services
• RMI for distributed applications • Clustering (for load balancing) • Database integration • System management • Message-oriented middleware • Security • Dynamic redeployment • Etc.
Example: Java EE Containers
More examples
• Java
– Commercial
• IBM WebSphere Application Server • BEA WebLogic
• Oracle OC4J
– Open Source
• JBoss Application Server • Apache Geronimo • Sun Glassfish
• Apache Tomcat (only a web container)
• Microsoft .Net
• Ruby on Rails
Pros and Cons
• Advantages
– Many, many features provided to the application developer
– Shorter development cycle
– Low cost of entry, especially when using open source application servers
• Disadvantages
– Less flexibility on architecture – Debugging harder
• Problem may be in app server…
– Sometimes your application does not need all features
• This could hinder performance
• Various open source tools target compose:ability
Databases -
Relational DB Characteristics
• Data stored in columns and tables • Relationships represented by data • Data Manipulation Language (SQL) • Data Definition Language (SQL) • Transactions
• Abstraction from physical layer
Data Manipulation Language
• Data manipulated with Select, Insert, Update, & Delete statements
– Select T1.Column1, T2.Column2 … From Table1, Table2 …
Where T1.Column1 = T2.Column1 …
• Data Aggregation • Compound statements • Functions and Procedures • Explicit transaction control
Data Definition Language
• Schema defined at the start
• Create Table (Column1 Datatype1,
Column2 Datatype 2, …)
• Constraints to define and enforce
relationships
– Primary Key
– Foreign Key
– Etc.
• Triggers to respond to Insert, Update,
and Delete
• Alter …
• Drop …
Transaction: An Execution of a
DB Program
• Key concept is transaction, which is an
atomicsequence of database actions (reads/ writes).
• Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.
– Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints.
– Beyond this, the DBMS does not really understand the semantics of the data. – Thus, ensuring that a transaction (run alone)
preserves consistency is ultimately the user’s responsibility!
Transactions – ACID Properties
• Atomic – All of the work in a transaction completes (commit) or none of it completes • Consistent – A transaction transforms the
database from one consistent state to another consistent state. Consistency is defined in terms of constraints.
• Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.
• Durable – The results of a committed transaction survive failures
CAP Theorem
• Three properties of a system
– Consistency (all copies have same value) – Availability (system can run even if parts
have failed)
– Partitions (network can break into two or more parts, each with active systems that cannot talk to other parts)
• Brewer’s CAP “Theorem”: You can have at
most two of these three properties for
any system
• Very large systems will partition at some
point
– -> Choose one of consistency or availability – Traditional database choose consistency – Most Web applications choose availability
• Except for specific parts such as order processing
NoSQL – A Scalable Web
alternative
• Stands for Not Only SQL
• Class of non-relational data storage
systems
• Usually do not require a fixed table
schema nor do they use the concept of
joins
• All NoSQL offerings relax one or more of
the ACID properties
• Not a backlash/rebellion against RDBMS
• SQL is a rich query language that cannot
be rivaled by the current list of NoSQL
offerings
Distributed Key-Value Data Stores
• Distributed key-value data storage
systems allow key-value pairs to be
stored (and retrieved on key) in a
massively parallel system
– E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..
– Recall Ahmed’s lectures
• Partitioning, high availability, etc
completely transparent to application
• Sharding systems (partioned DBs) and
key-value stores do not support many
relational features
– No join operations
– No group by, order by, etc
– No ACID transactions
– No SQL
Typical NoSQL API
• Basic API access:
– get(key)
• Extract the value given a key
– put(key, value)
• Create or update the value given its key
– delete(key)
• Remove the key and its associated value
– execute(key, operation, parameters)
• Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
Databases - Summary
• SQL Databases
– Predefined Schema
– Standard definition and interface language – Tight consistency
– Well defined semantics
• NoSQL Databases
– No predefined Schema
– Per-product definition and interface language – Getting an answer quickly is more important
than getting a correct answer
2. Cloud application
guidelines
Scalable N-tier applications
• Basic 3-Tier architecture
– Dedicated server for each tier – Simple
– Non-redundant
– For testing/development – Not suitable for production
Scaling RDBMS – Master/
Slave
• Master-Slave
– All writes are written to the master. All
reads performed against the replicated
slave databases
– Good for mostly read, very few update
applications
– Critical reads may be incorrect as
writes may not have been propagated
down
– Large data sets can pose problems as
master needs to duplicate data to
slaves
Scaling RDBMS - Partitioning
• Partitioning
– Divide the database across many machines
• E.g. hash or range partitioning
• Handled transparently by parallel
databases
– but they are expensive
• “Sharding”
– Divide data amongst many cheap databases (MySQL/PostgreSQL)
– Manage parallel access in the application – Scales well for both reads and writes – Not transparent, application needs to be
partition-aware
Redundant 3-Tier architecture
• Redudancy in each tier • Replicated DB
- May used striped volumes
• Fault tolerance • But fixed size
Multi-center architecture
• Similar to redundant architecture • Uses multiple Data Centers
- Ex. EC2 availability zones
• Protects against Data Center failure • Redudant architecture protects against server failure… • Fixed size
Autoscaling Architecture
• Redundant & fault tolerant server tiers • Change number of servers dynamically
- Commonly only in application server tier
• Database tier the bottleneck…
Autoscaling Architecture 2
• For read-heavy (static) DBs
• Add read-only caches, e.g.. Memcached • Does not solve
general problem
Autoscaling Architecture 3
• Replace Relational DB with NoSQL data store • Replicate datastore across multiple servers • Adds Scalability • At the expense of SQL, consistency, … • Common in social networks, etc.
Scalable batch processing
• For compute-intense workloads
• Auto-scale application server + worker nodes • Message queues
for robustness and scaling
More about Message Queues
• Asynchronous calls between components
• Enables loose coupling
• Buffering of load spikes
• Enables fault tolerance
• Wanted: Admission control mechanisms
-
Traffic engineering a’la routers, drop requests, etc.• Example: Amazon SQS
Scalable architectures - summary
• Auto-scaling of 3-tier architecture well understood
• Database layer is bottleneck
– NoSQL may be an alternative
• Scaling up is easy
– Scalable architectures can make use of more servers
• Scaling down is harder
– Which server instance to shut down? – Long-running sessions make it hard to
evacuate workloads
• Auto-scaling metrics
– Generic: Server CPU/memory utilization, etc. – Specific: Numbe of transactions, \\ users, etc.
• When to scale?
A few words on faults
• Services fail in operation
– Disks break, power outages, application logic errors, overload failures, etc.
• Well-designed services handle faults
Designing robust services
• Quick service health checks
– Detect problems early
• Develop + test in full environment
– Unit tests are not enough
• Do not trust underlying systems
– Rare failures are not rare at large scale
• Isolate failures
– Avoid cross-component fault propagation
• Allow human intervention
– Emergency cases
– Through scripts, not manual processes – Fixing problems under stress is hard…
• Keep things simple and robust
– Optimize only if factor 10 better, not for a few %
Rubust services (cont.)
• Automate management and provisioning
– Deployment, updates, restarts, etc. should be simple
– Make everything configurable
• Recompiling the system during downtime is hard
• Intentionally fail the system
– Conventional shutdown does not test the fault handling
– Purposely kill components/services at random – Recall Netflix Chaos-Monkey, etc.
• Soft deletes only
– Mark data as removed, but keep it
– No backup can save a bad SQL delete query
Robust services (cont.)
• Independent components
– Asyncronous design: expect latencies – Expect failures: backoffs, retries, etc. – Fail fast: release resources unless successful
• Coupling
– Careful design to minimize cross-component – No replicated functionality across components
• Graceful degradation
– Extend beyond ”working” or ”failure” – Cut non-critical load in emergency cases – Store non-processed work in queues
• Admission control at component boarders
– Do not bring more work to overloaded system – Meter and control admission
– Allow for tuning rate of users/work units added
Understand your service
• Instrument and monitor as much as possible
– Adjustable logging levels, e.g., log4j
• Data mining and visualization helps understanding your application behaviour
– Where are bottlenecks, etc.?
• Workload analysis (trace-based) valuable for stress-testing updated systems
– Will this actually work in full-scale production?
• Monitor everthing, but rairly raise alarms
AWS examples
- Web hosting
1. Route 53 DNS 2. ELB 3. EC2 Web server 4. Auto scaling group 5. S3 for static content 6. Cloudfront for edge
location 7. Availability zones
AWS content
streaming
1. S3 for static content 2. Cloudfront for low
latency (edge caching) 3. Alt.: Host content in
EC2
4. Deliver data stream with EC2 + Cloudfront
AWS Batch
Processing
1. Job manager as EC2 VM 2. Input data in S3
3. Jobs inserted to SQS input queue 4. Job processing in EC2 nodes. Auto-scaling group for elasticity + fault tolerance
5. Output in S3
6. Stats in RDS or SimpleDB
7. Alt: Partial results to SQS output queue
AWS
High
Availability
2. Use of >1 Availability Zones 3. Elastic IP to bind IP to EC2 VMs, can
remap upon failures 4. No critical data on VM instance
storage, use EBS or S3. 1? ELB for load balancing across
AWS
Big Data
1. Parallel data upload to S3, or use Amazon Import to send whole storage device 2. Parallel processing in EC2, may use S3
as /scratch through FUSE 3. Results written to S3
4. Alt. Use EBS for input and/or output data sets. Tune TCP streams or use UDP
62 | Battery Ventures Works fine in theory, in practice:
“Death Star” Architecture Diagrams
As visualized by Appdynamics, Boundary.com and Twitter internal tools
Netflix Gilt Groupe (12 of 450) Twitter
3. DevOps
• Concepts
• Tools
– Chef, Puppet, etc.
DevOps - consequences
DevOps - whatis?
• "DevOps" is an emerging set of principles,
methods and practices for communication,
collaboration and integration between software development (application/ software engineering) and IT operations (systems administration/infrastructure) professionals
DevOps (cont.)
• Acknowledges the interdependence of software development and IT operations • Aims to help organizations rapidly produce
quality software products and services • Responds to the demands of stakeholders for
an increased rate of production releases • Supports the use of agile development
processes
DevOps (cont.)
• The combination of traditional development activities with operations and testing (QA/QE) • Collaboration, communication and integration is
key
• Agile development model (sprints, scrum, …) • Release coordination and automation
Classic configuration
management
• Server crafting – Boring – Stressful – Error-prone – Time-consuming – Expensive• Server configuration time
– Days? Weeks?
Automated configuration
management
• Automated configuration management • Describe systems once, apply as often as you
want
• Automate repetitive tasks
• Admins focus on improvements and real problems
• Predictable results • No Snowflakes!
• Server configuration time
– Seconds
Infrastructure as code & DevOps
• Eliminate tedious and repetitive work • Configurations are defined in a machine- and
OS-independent domain language so the manifests are portable and can be reused • Keep servers in sync and know what is running
on what server
• Configuration can be used as documentation since configuration is applied by the
documentation is always up-to-date
• Configuration can be version controlled and managed the same way you manage other code • Brings development methodology to operations!
– Sysadmins are (also) developers…
Infrastructure as Code:
Pets vs. Cattle
73!
AUTOMATING
73
•
Declarative configuration language
•
Plain-text configuration in source control
•
Fully programmatic, no manual
interactions
Chef terminology
• Node, Server, Workstation
• Chef-client asks Chef-server about the policy for the node
• Resources: a component and the desired state • Recipes: describe resources and desired state • Cookbooks: sets of recipes grouped together,
also includes templates and source files, etc. • Run List: recipes from cookbooks you want to
run
• Role: type of node
Chef architecture
Chef architecture
Central Chef Server Multiple AdministratorsChef example recipe
19 | 10 19 | 28 Example Recipe paca Policy says package should be installed This service should be enabled on reboot, and must be running Template specifies the contents required
Puppet DSL
1. Describe what you want to be configured
2. (Don‘t care how it is done) 3. Describe dependencies
file package service types
win *nix deb rpm POSIX win providers
Puppet overview
Puppet example:
Apache server
10 | 10
10 | 28
Puppet Example |
Apache HTTP Server package { "httpd": name => "httpd.x86_64", ensure => "present", } file { "http.conf": path => "/etc/httpd/conf/httpd.conf", owner => root, group => root, mode => 0644, source => "puppet:///modules/apache/httpd.conf", require => Package["httpd"], } service { "httpd": ensure => running, enable => true, subscribe => File["http.conf"], } Manifests Resources ● Name ● Attributes and values Ordering and dependenciesPuppet building blocks
11 | 10
11 | 28
More on Puppet |
building blocksManifests
Variables and (custom) facts Node declarations Classes and Modules Defined resource types Templates Console if $operatingsystem == 'CentOS' node 'www1.example.com' { include common include apache } node 'db1.example.com' { include common include mysql } file { "http.conf": path => "/etc/httpd/conf/httpd.conf", owner => 'root', group => 'root', mode => '0644', content => template('config/httpd.erb'), }
A Puppet run
1. Retrieve plugins from server 2. Get “facts“ on clientand send them to master
3. Compile catalog and send it to the client 4. Apply catalog on
client 5. Process report
DevOps - dangers
• “To err is human, but to err and deploy it to all systems simultaneously is DevOps” – A. Cockcroft, Netflix Architect • Recent Twitter outage due to rollout of
broken configuration that crashed all services • 2014 Gmail outage:
– "An internal system that generates
configurations encountered a software bug and generated an incorrect configuration" – "The incorrect configuration was sent to live
services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors.”
• Canary deployment
– Don’t roll out system-wide until it works
General Conclusions
• Auto-scaling, fault tolerance, etc. does not happen automagically in the cloud • Careful design is essential
– Actual architecture likely to be a mess – Apply principles of robustness
• DevOps bridges the gap between development and operations • Rapid deployment key to success • Many tools/services out there to help
Assignment 3 - published
• Course project • Free form assignment
– Implement something using the cloud
• Should be suitable for the cloud – Use at least 3 existing services • Should be doable
– Make a list of features/requirements + prioritize
• Deadlines and Deliverables:
1. Wednesday 14/5:
• Project plan: what will you do + how?
2. Wednesday 28/5, 13-16
• Presentation + demo
3. Thursday 5/6:
• Report