Performance Management for Cloud-based
Applications
STC 2012
Agenda
Context
Problem Statement
Cloud Architecture
Key Performance Challenges in Cloud
Challenges & Recommendations
Context
Cloud Computing gained significance mainly due to its impact on reduced CapEx and OpEx that is
possible due to characteristics such as Elasticity, On-demand resource provisioning and Pay-per-Use that drive organizations to migrate some of their applications, data and infrastructure to Cloud Architectures.
However, organizations are speculative of potential challenges such as application performance
management in Cloud:
How is performance management different for applications in Cloud compared to that in
current architectures?
What are the typical performance management challenges for applications in different
Cloud Service Models (IaaS, PaaS, and SaaS) and Deployment Models (i.e., Public and
3 Cloud Service Models (IaaS, PaaS, and SaaS) and Deployment Models (i.e., Public and
Private Clouds)?
What are the ways and means to overcome performance management in Cloud?
Objective of the paper is to highlight the performance management challenges for Cloud-based
Generic Cloud Architecture
Following diagram represents typical cloud architecture and its components.
Virtual Machines (VMs)
Performance of any given IT System
(Client/Server, Multi-tiered, Mainframe, SOA et.al) depends on 3 key aspects:
Performance of Application ( includes
Application Code, Application Design, Software/Middleware/Database and External Systems)
4
Physical Servers
Host Operating System
Virtualization Layer
CPU Memory Storage Network
Hardware (H/W) Infrastructure
Optimal Software (S/W) Configuration
Settings for given H/W
When compared with traditional
Typical Performance Management Challenges in Cloud
• Bursty load of an Application robs resources from other Applications sharing the hardware infrastructure • Hypervisor Layer has
certain overhead due to resource
virtualization
• ‘Timekeeping’ issue impacts on time based perf metrics
Category
Challenge
Recommendation / Best
Practice
Virtualization / Hypervisor Layer
Time measurement is a challenge in Virtualized environment due to the fact that timing of a VM is not synchronized with other VMs or even with the
physical host as VMs get scheduled and de-scheduled based on workload
demand
Currently being addressed by different Hypervisor vendors (such as VMware, Microsoft, IBM)
Architects/developers should be aware when designing routines to capture latency at application code level
Performance Management Challenges & Recommendations
application code level virtualizing physical NIC into multiple
‘Virtual NICs’ will have more concurrent network traffic there by impacts
bandwidth available for application
Few VMs should be assigned dedicated physical NICs
depending on the criticality of workload & performance SLA.
Category
Challenge
Recommendation / Best
Practice
Shared Physical Environment
Sudden and unpredictable load of any application/workload might need more than the required computing resources, due to Elasticity, thereby taking away the resources of other workloads impacting their performance SLAs
Cloud Vendors (Public/Private) who manage underlying
hardware infrastructure should have complete understanding about
tenants/applications/workload s sharing the hardware
infrastructure, their load
Performance Management Challenges & Recommendations
infrastructure, their load patterns, respective capacity requirements (both Min and Max) and performance SLAs
Cloud Consumer/Cloud
Integrator should insist to get VM configurations - virtual and physical resources, Resource Sharing model of VMs
Category
Challenge
Recommendation / Best
Practice
Stateful Workloads
For stateful workloads, session
management and session replication across multiple VMs is costly due to n-way replication (store and retrieval operations)
Employ use of Distributed Caching solutions such as
Oracle Coherence, MemCache, WebSphere eXtreme Scale that does intelligent replication and avoids unnecessary n-way replication with faster session archival and retrieval.
Performance Management Challenges & Recommendations
archival and retrieval.
Ensure that the amount of data stored in Sessions is as minimum as possible.
Elasticity Vs Application Scalability
Elasticity benefits are realized if and only if a given ‘Application’ is ‘Scalable’ first.
Standard performance
engineering activities such as monitoring, performance tuning and application
scalability assessment should be carried out even for
Category
Challenge
Recommendation / Best
Practice
IaaS Cloud Consumer is provided only the required computing resources and hence has control over OS and
applications deployed on top of it - but not on the underlying hardware
infrastructure
Architects of cloud consumer group need to understand and review
Mapping between Virtual Resources and Physical Resources of VMs
VM Profile in terms of resource sharing (Shared
Performance Management Challenges & Recommendations
resource sharing (Shared or Dedicated or Shared with Cap)
Rules defined for the Resource Management of VMs (ideally defined by Cloud Provider
Category
Challenge
Recommendation / Best
Practice
PaaS Consumer does not have any clue on what happens below the Platform
Consumer does not have access to modify or tune platform specific configuration suitable for the application
Performance bottleneck identification needs profiling tools such as
Jprobe/JProfilier/.NET Profiler etc. which are agent-based tools that require the
Clearly define contractual
agreements with the PaaS Vendor w.r.t providing OS and Hardware level performance metrics and performance of underlying infrastructure.
Design application to have performance metrics logging feature for critical routines within
Performance Management Challenges & Recommendations
are agent-based tools that require the agent to be attached with platform’s runtime
Platform specific performance monitoring can be done using the pre-packaged monitoring capabilities of the platform, if and only if the capability is provided to Consumer
Usage of any enterprise monitoring tools such as DynaTrace, HP Diagnostics, HP SiteScope CA Introscope is restricted by Platform’s support and compatibility
feature for critical routines within application code
Review support provided by variour Platform vendors (Google,
Category
Challenge
Recommendation / Best
Practice
SaaS Consumers have no control over application code, platform and hardware infrastructure, hence application performance
management is completely dependant on Cloud Provider
Clearly define contractual
agreement and penalty clauses with the Cloud Provider for end-to-end application performance SLAs
Thank You
1 2
“The contents of this document are proprietary and confidential to Infosys Technologies Ltd. and may not be disclosed in whole or in part at any time, to any third party without the prior written consent of Infosys Ltd.”