Cloud Performance Considerations
Disclaimer
This document represents the author's views and opinions.
It does not necessarily represent IBM's position or strategies.
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
$0 $50 $100 $150 $200 $250 $300 Installed Base (M Units) Spending (US$B)
New server spending
Server mgmt and admin costs Power and cooling costs
0 5 10 15 20 25 30 35 40 45 50 Source: IDC, 2008
1WW TB Capacity Shipped on Enterprise Disk Storage Systems
IT Costs are Increasing
§ Costs to manage systems has
doubled since 2000
§ Costs to power and cool
systems has doubled since 2000
§ Devices accessing data over
networks doubling every 2.5 years
§ Bandwidth consumed
doubling every 1.5 years
§ Data Doubling every 18
months1
§ Server processing capacity
doubling every 3 years2
§ 10G Ethernet ports tripling
What s Driving Cloud Computing?
1. Cost Reduction:
1. Efficiency: virtual resources for hardware utilization (memory, disk, machines)
2. Sharing of hardware/maintenance: multitenancy for cost reduction 3. Automation: automate mundane tasks
4. Commodity hardware for most public clouds
– Cloud: Highly virtualized with many users sharing the same hardware
2. Technology Maturity Cycle
1. New: Wow, it works!
2. Commercialization: Will it make money long term?
3. Good enough : Functionality is good enough for majority of users. Users have a lower tolerance for poor ease of use, care less about the technical details, etc.
4. Standardization: If users don t care about technical details, we can standardize and virtualize.
5. Business: Focus higher in the solution stack
– Cloud: Companies who are moving to the cloud are focusing on their business, not technology.
3. Payment model: Pay per use to reduce bar of adoption
1. Pay up front for all required capital 2. Finance terms (deferred financial cost) 3. Pay per use (for public cloud).
– Cloud: Pay per use with immediate time to value
vs.
vs.
Is Cloud Computing Growing
Mind
share
Market
share
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
What is Different about the Cloud
Server Server Server Server Server ServerData center
• Customers buy hw and sw
• 10 s to 100 s hw servers
• Servers are in silos • Enterprise applications • Few failures
• Heterogeneous hw
Cloud
• Customers rent hw and sw
• 1000s to 10,000 s hw servers
• Elastic capacity (+/- servers) • Enterprise and other apps • Constant failures
• Commodity hw
• Quality of Experience (QoE) is very important to customers
• Users run on virtualized hw
By 2012, one out five businesses will own no IT assets at all. Gartner 01/18/2010
http://www.gartner.com/it/page.jsp?id=1278413
Grid
• Customers buy hw and sw
• 100 s to 1000 s hw servers
• Shared servers • Mostly batch apps
• Need to account for failures • Homogenous hw
Is Performance Important to the Success of the Cloud
§ Five of the 10 obstacles and opportunities for cloud computing are related to quality-of-service aspects such as availability, performance, capacity or scalability.
§ Obstacle # 1 “Availability of service” discusses availability risks for cloud computing as a result of e.g. programming errors, overload of common services or Distributed Denial of Service (DDoS) attacks § Obstacle # 4 “Data transfer bottlenecks” discusses the growing data intensity of applications and how
this impacts data transfer rates and costs in the cloud
§ Obstacle # 5 “Performance unpredictability” discusses performance risks caused by e.g. inefficiencies in I/O sharing and by high performance computing
§ Obstacle # 6 “Scalable storage” discusses the difficulties of applying cloud computing to solutions requiring highly scalable persistent storage
§ Obstacle # 8 “Scaling quickly” discusses the difficulties of quickly scaling up and down in response to load without violating service level agreements.
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
IBM offers highly integrated cloud solutions for different client requirements
regarding workloads, service levels and delivery models
low gain high gain low pain high pain
Workloads determine type and fit of
Cloud Services
• Availability • Redundancy • Monitoring
• End to End Process Mgmt • Core Infrastructure Services • Server Management
• Storage Management • Security, Patch, Risk
Service Level expectations require
different Cloud Management Services
Enterprise Enterprise Data Center Private Cloud Enterprise Data Center IBM operated Managed Private Cloud
IBM owned and operated
Hosted Private Cloud
User
A User B User C User D User E Public Cloud Services Enterprise A Enterprise B Enterprise C Shared Cloud Services • Problem/Change • Audit Checking • Software License Mgmt • Application Management • Compliance Checking • MW and DBMS Services • Network Connectivity • Help Desk • Business Continuity
Different Cloud Delivery Models accommodate different needs regarding architectural control, operations and asset ownership
Delivery Model 1 Delivery Model 2 Delivery Model 3 Delivery Model 4 Delivery Model 5
Tier 1 Tier 2 Tier 3 Tier 4
What are the Layers in the Cloud
Infrastructure as a Service
Servers Networking Data Center Fabric Storage
Shared virtualized, dynamic provisioning
Infrastructure as a Service
Servers Networking Data Center Fabric Storage
Shared virtualized, dynamic provisioning
Platform as a Service
High Volume Transactions
Middleware Database
Web 2.0 Application
Runtime RuntimeJava
Development Tooling
Platform as a Service
High Volume Transactions
Middleware Database
Web 2.0 Application
Runtime RuntimeJava
Development Tooling
Software as a Service
Collaboration Business Processes CRM/ERP/HR Industry Applications
Software as a Service
Collaboration Business Processes CRM/ERP/HR Industry Applications
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
Operating System
Is the Cloud More Complex: Virtualization
Operating
System
JVM
Application
server
Application
§ Multiple hardware and software queues in a normal server
§ Virtualization adds two new queues (guest OS and hypervisor) which is a network of software queues § Memory and disk space are fixed resources that are shared even more
queue queue queue queue new queue new queue
Hypervisor
Guest OS
JVM
Application
server
Application
Guest OS
JVM
Application
server
Application
Guest OS
JVM
Application
server
Application
Is the Cloud More Complex: Scale Out and Network Functions
§ Network is a critical resource for persistent storage, input and output traffic § Network attached storage is a shared pool of multiple storage pods
Operating System Hypervisor Guest OS JVM Application server Application Guest OS JVM Application server Application Guest OS JVM Application server Application Operating System Hypervisor Guest OS JVM Application server Application Guest OS JVM Application server Application Guest OS JVM Application server Application Operating System Hypervisor Guest OS JVM Application server Application Guest OS JVM Application server Application Guest OS JVM Application server Application new queue new queue Network Attached Storage Network Attached Storage
Is the Cloud More Complex: Virtual Machine Mobility
§ VMs leave, appear, move, grow
Operating System Hypervisor Guest OS JVM Application server Application Guest OS JVM Application server Application Guest OS JVM Application server Application Operating System Hypervisor Guest OS JVM Application server Application Guest OS JVM Application server Application Guest OS JVM Application server Application Operating System Hypervisor Guest OS JVM Application server Application Guest OS JVM Application server Application Guest OS JVM Application server Application Network Attached Storage Network Attached Storage Guest OS JVM Application server Application
IBM CloudBurst Appliance
U 0 No. 42 41 40 39 38 37 36 35 34 33 32 31 F R R F 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 F R R F 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1Blade Center Comp. 1U GB Ethernet Sw Bl ad eSe rve r EXP3000 x3650M2 Mgt Node DS3400 Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r PS3 GbE Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Main C B D A EXP3000 EXP3000 EXP3000 EXP3000 Bl ad eSe rve r PS4 GbE Bl ad eSe rve r Bl ad eSe rve r 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Bl ad eSe rve r 1U GB Ethernet Sw Mgt PS1 Fan 1 Mgt PS2 Fan 2 Bl ad eSe rve r Bl ad eSe rve r 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1U Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Mgt PS2 Fan 2 PS4 Kbd. Mon. PS3 GbE 1U PANEL Fan 1 GbE Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r Bl ad eSe rve r P DU PDU P DU P DU
Blade Center Comp. Mgt PS1 DS3400 EXP3000 1U PANEL 1U PANEL Customer Network Midplane AMM2 AMM2 Midplane The image x3650 M2 H S 2 2 B la de H S 2 2 B la de H S 2 2 B la de H S 2 2 B la de 24 pt 1Gps Ethernet Sw 24 pt 1Gps Ethernet Sw 10pt FC SM 10pt FC SM Bay 3 Bay 4 Cntl A DS3400 Cntl B 10G SM Bay 1 4 10G SM Bay 2
§ Compute, Network, and Storage resources
are integrated into the
appliance
How is Cloud Performance Analysis Done
§
Dynamic modeling required to characterize non-locality due to feedback between
layered subsystems
–
Classical queuing theory is not that helpful
–
Discrete event simulation approaches are needed
Servers
Switches
NAS
A bo0leneck at the NAS may slow the execu9on at the server due to
Backpressure NAS
bottleneck shows up at
Agenda
§ Why cloud computing
§ What is cloud computing
§ What are the business perspectives
§ What is different about the cloud
§ Open questions
The Cloud Performance Challenge
§ Quality of Experience (QoE) depends upon (hybrid) cloud service performance – Excellent QoE accelerates adoption and is a functional requirement
– QoE crosses boundaries of internet, network, system, application performance and resilience
§ Competitive pressure will require competitive performance from all vendors to keep customers
– IaaS and PaaS paradigms allow customers to move (e.g., price, QoE, etc) – e.g., Amazon EC2 and IBM Compute Cloud can run the same software – QoS and SLA s are an important differentiator
§ Performance of the cloud will evolve to near real-time business – Communication needs are near real-time for correctness
– Complex event processing needs to be done quickly to be useful
Great engineering comes from creating predictable results at predictable costs…
§ Cloud computing is a new paradigm which will have new performance challenges - It incorporates prior component performance challenges too
- Hybrid clouds expand this further (e.g., network hops / latency) - Customer expectations will require education
Open Question: Comparing Cloud Performance
§ It can t!!
§ There aren t any industry defined benchmarks because the workload classes vary greatly and have dynamic lifetimes
§ And a benchmark needs to include cost and availability as key factors
§ Perhaps a benchmark framework needed that workloads are plugged into? § Perhaps a meta-benchmark analysis needed to provide a score?
Central shared storage (SAN or NFS)
J Provisioning is fast
J Live migration is supported
L VM disk I/O is slow due to disk and network
contentions
Open Question: Central storage vs. Local Disks vs Combination vs New …
Host machine
Guest OS
Host OS Hypervisor
Central shared storage Image repository
Image 1
Image 2
Virtual disk store Root disk
Data disk
Copy
Local disks
J VM disk I/O is fast
L Provisioning is slow due to network image
copying
L No live migration is supported
Host machine Guest OS Host OS Hypervisor Repository server Data disk Data disk Root disk Image repository Image 1 Image 2 Copy
Open Question: Optimal Approaches for Bin Packing and Moving VMs
When deploying services in a cloud, a balance must be found between
performance and capacity of the service, and the memory available on
nodes. This is further complicated if the number of replicas of an
application is limited, for instance by the available number of licenses.
The analysis of interference between services must scale to large
numbers of host nodes, applications, replicas of applications, and classes
of users. This paper combines a multi-dimensional packing heuristic and
network flow optimization to satisfy simultaneous constraints on
throughputs, processor utilizations, memory availability and license
availability, at a minimum cost and with a minimum of host processors.
Jim Zhanwen Li, John Chinneck, Murray Woodside, and Marin Litoiu. 2009. Deployment of Services in a Cloud Subject to Memory and License Constraints. In Proceedings of the 2009
Open Question: Performance Fault Diagnosis and Analysis
§ Intermittent backpressure causes lower level hw and/or sw to slow down § The problem may appear to move if it is caused by a VM and the VM moves
§ The problem may appear to move if it is caused by a VM and the problem VM dies § The problem may appear to move if it is caused by a VM and the problem VM starts up § The problem may appear to move if it is caused by hw and the VM moves
§ Several VMs may show the same symptom separated in space and time § What data and how much to monitor, with 104 à 105 elements
§ Expert system / analytics are needed to help in the identification of problems § Extend analysis to predict hw failures before the occur
Open Question: The CAP Theorem and Performance
§ Three properties of shared-data, distributed systems
1. Consistency: one update is made, all observers are updated
2. Availability: all database transactions should be processed accurately and promptly 3. Tolerance: tolerant to network Partitions
§ CAP Theorem
– Only two properties can be achieved at any time – Network partitions is given in distribute systems
– Have to pick one between consistency and availability
§ How will distributed architectures change to optimize for each pair of properties – Eventual consistency, non-relational databases?
Lynch, Nancy, and Seth Gilbert. “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services.” ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.
Cloud Service Developer Cloud Service Provider
Security & Resiliency
Service Development
Tools
Common Cloud Management Platform
OSS – Operational Support Services BSS – Business Support Services
Cloud Services
Virtualized Infrastructure – Server, Storage, Network, Facilities
Cloud Service Consumer Consumer In-house IT Cloud Service Integration Tools
e.g. Service Activation
• process optimization
e.g. Provisioning
• image copy • instance creation • partitioning
e.g. Run Time Performance
• Integration of storage, hypervisor, network components
• Dedicated nodes
© Copyright IBM Corporation 2010. All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE
INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.