White Paper
Understanding the End-User
Experience for Optimal
Application Performance
© Copyright Quest® Software, Inc. 2005. All rights reserved.
The information in this publication is furnished for information use only, does not constitute a commitment from Quest Software Inc. of any features or functions discussed and is subject to change without notice. Quest Software, Inc. assumes no responsibility or liability for any errors or inaccuracies that may appear in this publication.
T
ABLE OF
C
ONTENTS
BUSINESS CHALLENGES FACING IT ENVIRONMENTS TODAY... 4
MINIMIZING INEFFICIENCY ... 5
MAXIMIZING LOYALTY ... 6
MAXIMIZING PERFORMANCE WITH THE RIGHT SOLUTION TOOL SET... 7
THE RIGHT TOOLS FOR THE JOB ... 8
ZERO IMPACT MONITORING ... 9
QUEST’S END USER PERFORMANCE MANAGEMENT (EUPM) SOLUTION ... 10
QUEST’S END USER PERFORMANCE MANAGEMENT (EUPM) SOLUTION FEATURES ... 12
PROACTIVE AVAILABILITY MONITORING FOR KEY BUSINESS SERVICES AND TRANSACTIONS... 12
UNDERSTAND ALL END USER ACTIVITY AS IT RELATES TO IMPORTANT BUSINESS SERVICES AND ACTIVITIES... 14
MONITOR PERFORMANCE OF CERTAIN INDIVIDUAL WEB PAGES OR BUSINESS TRANSACTIONS.... 15
DETAILED USER ACTIVITY DIAGNOSTICS... 16
UNDERSTAND THE PERFORMANCE OF THE WEB SERVERS SUPPORTING YOUR APPLICATION OR WEBSITE... 17
UNDERSTANDING PERFORMANCE FOR DIFFERENT END-USER GEOGRAPHICAL LOCATIONS... 18
PERFORMANCE OF SUBNETS... 19
UNDERSTANDING ISP PERFORMANCE USED BY END USERS... 20
OUT-OF-THE-BOX DIAGNOSTICS INFORMATION... 21
UNDERSTANDING END USER BEHAVIOR... 22
HOLISTIC VIEW OF APPLICATION PERFORMANCE AND END-USER EXPERIENCE... 23
WORK FLOW FOR EFFECTIVE END-USER AND APPLICATION PERFORMANCE MANAGEMENT... 24
MAXIMIZING YOUR IT INVESTMENTS... 25
CONCLUSION ... 26
NOTES ... 27
ABOUT THE AUTHOR ... 28
ABOUT QUEST SOFTWARE, INC. ... 29
CONTACTING QUEST SOFTWARE... 29
B
USINESS
C
HALLENGES
F
ACING
IT
E
NVIRONMENTS
T
ODAY
Today's self-service, transaction-dependent enterprises have put overwhelming business pressures on IT executives to manage an increasingly complex array of systems and applications while keeping the business running at peak efficiency. An additional challenge is the fact that most IT organizations are being asked to reduce expenses while supporting new initiatives intended to fuel growth and improve efficiency. Yet, maintaining the performance of existing applications and infrastructure while embracing new technologies has significantly increased the complexity of supporting the enterprise and its business applications. The desire to measure and improve the user experience has been gaining momentum as enterprises realize that quality of experience has become a differentiator in an increasingly competitive market as well as a productivity gain for internal applications. Visibility into the real impact on customer experience reduces operational inefficiencies by highlight those issues that need immediate attention versus those problems that do not impact the business.
Business-centric IT organizations recognize that providing an optimal level of service quality starts with understanding the true user experience. By understanding what the "real-users" are doing and experiencing, IT professionals can gain valuable insight into the performance and availability of their critical applications and system, the cause and effect of seasonality on the business, and the impact to the business when performance degrades. Providing a consistent and excellent customer experience builds customer loyalty and, in turn, the long-term value of customers often allows for more reinvestment in IT.
M
INIMIZING
I
NEFFICIENCY
The cost of delivering information and services via a call center is considerably higher than through self-service, browser-based applications. When problems force customers to use call centers to complete their business transactions, fewer customers can be handled, the cost of servicing each customer goes up, hold times increase and customer satisfaction declines. As such, it is imperative to the business that customer-facing applications are available and functioning properly in order to keep operational costs to a minimum and the business running at peak efficiency. Proactively eliminating problems in web applications before they paralyze the business is critical for both the top and bottom lines since poorly performing Web applications are time consuming and expensive to organizations, both interrupting business transactions as well as decreasing productivity of internal staff. Yet resolving Web application problems in production environments is quite complex. From the network, to Web servers, to application servers, to back end systems, there are a number of places problems can occur in these business applications. The real challenge is not determining when a problem occurs, but which underlying systems, resources or application components are truly responsible for the performance degradation or failure.
Since elongated problem resolution timeframes represent significant costs to the organization, coordination and cooperation between IT managers and domain experts becomes essential in order to maximize business output. However, the ability to accurately track user experience and correlate it to the domain responsible for the various capacity and performance bottlenecks has been unattainable for many enterprises. The most comprehensive solutions intended to answer this question are expensive frameworks that require lots of training, customization and support.
As an alternative, many organizations have simply implemented multiple point solutions that individually monitor and alert on various part of the application stack or user experience. While the price is more appealing, all too often these disparate tools create more confusion than clarity. No two tools see the problem in the same way, and in many cases only one tool even see the problem at all. At this point, a frantic exchange of phone calls and emails commence, exchanging information about the time of the problem, the description of the problem, and the call to action: open all tools and to see what else might have been happening at roughly the same time.
And, here’s where the finger pointing and frustration begins. What is truly needed is a single view into the application’s performance (as experienced by the users) as well as correlated and contextual visibility into the performance of the underlying systems
M
AXIMIZING
L
OYALTY
While the users of an internal Web application are a captive audience, customers using web storefronts are relatively transient as is has become even easier than before to find alternatives for buying and selling goods and services. Factors such as the storefront availability and quality of experience have become significant influences affecting consumer buying habits and loyalty. According to a recent study by a Washington-based research organization, 50 percent of all unsatisfied consumers and 25 percent of all businesses with problems never complain to the company involved, they simply take their business elsewhere. When you consider these numbers it becomes imperative to insure that your customers end-user experience is exceptional. Making sure customers are uninterrupted and have a free flow to all your Web services ensures you will see top line results.
Figure 1.
With this in mind, many organizations strive to determine the business impact of their Web performance problems. This is relatively easy to estimate if you have an idea of the typical number of users, the average sale and the duration of a specific problem. However, given enough problems, dissatisfied customers will turn elsewhere. So it becomes imperative to quantify the exact number of customers affected, identify specifically which customers are experiencing unacceptable performance and know exactly what they are trying to accomplish within the site or application.
With the competition “only a mouse-click away”, it is up to the IT staff to insure a positive customer experience to RETAIN customers. Nothing is more real than the sense of urgency at Christmas time when a buyer’s tolerance for poorly behaving sites is low. Since a high concentration Web shopping occurs during this time of year, it is easy to build or destroy loyalty based upon a user’s experience with the site. By providing a consistent and exceptional quality of experience, enterprises increase the likelihood of not only the first sale as well as repeat business from their existing customers, referrals, and acquisition of new customers. Understanding the long-term value of your customers and being able to accurately identify specific customers affected can go along way
M
AXIMIZING
P
ERFORMANCE WITH THE
R
IGHT
S
OLUTION
T
OOL
S
ET
The performance of business critical applications—including availability and response time—has a direct impact on both the bottom- and top- line revenues of a business. Efficiency and profitability are driving most, if not all, IT decisions in the current economic climate. Since IT managers are often required to quantify the impact of their investment with respect to one or more defined business objectives, providing business insight into the impact of poor performance is vital to the decision making process for IT initiatives.
To address this, many effective IT organizations are attempting to map the performance of critical business processes to system and application performance. However, new application architectures such as Web Services and J2EE exacerbate the performance management challenge. These new applications have far more “moving parts” within an application. There are a large number of places problems can occur in today’s business applications, including the network, application and Web servers and the back-end databases. This leads to organizational barriers since there are many domain experts in the IT organization—system administrators, application administrators, performance engineers and DBAs—who are responsible for maintaining peak performance.
Databases
Application
Servers Web Servers Network
OS and Infrastructure
End User Experience
Problems can occur anywhere in the application infrastructure, making resolution more complex for the specific domain experts.
The real challenge is not determining when a problem occurs, but rather identifying which underlying systems, resources or application components are truly responsible for the performance degradation or failure. The need for effective management across domains is required to quickly isolate, diagnose and resolve problems while keeping finger pointing to a minimum. To address this challenge, IT managers need domain-centric products that work together, combining the application expertise required to solve problems and the business-centric visualization of the application and its dependencies.
T
HE
R
IGHT
T
OOLS FOR THE
J
OB
By now, most enterprises have implemented some form of active response time monitoring. However, most of what is implemented today is simply just “good enough” and does not address the real problems facing IT managers. The need for performance management solutions is often defined by application specific requirements and depth of expertise. Major packaged applications and custom applications have specific requirements that often require special insight into servlets, services and components that traditional infrastructure and testing companies cannot provide. Traditional approaches have a system-level, or a “bottoms-up” view of performance, but give no means to pinpoint and resolve problems once they occur.
The largest systems management vendors have begun moving into the Web application market, but in general they still tend to focus on the overall scope of management needs including event management and software deployment. Testing companies have long viewed performance monitoring as a natural extension of their testing business. However, solutions evolved from testing products are often undesirable for production monitoring. Because these products were architected without regard to their overhead in a production environment, they often impose a “performance penalty” on the application in order to simply monitor it.
The right solutions for today’s complex applications must wrap domain or application expertise around data collection and analysis in order to unequivocally diagnose problems. This empowers lesser skilled IT professionals and improves the coordination and hand-off between various IT domains. In fact, a recent study by the META Group (Now part of Gartner) recognized that “mature organizations seek centralized consoles for a single application. Although they do not try to overtake an existing event console that may be in use for the entire company, they do seek a console to serve as a midlevel manager to bring the large amount of data under the purview of a single application under a single console.”1 META further stated, “During 2004-2006, this application level console will be the largest area of investment.”2
By now, most enterprises have implemented some form of active response time monitoring and Web analytics. However, most of what is implemented today is simply just “good enough” and does not address the real problem facing IT managers - understanding the true user experience. Described below is a list of the common approaches currently used today to partially solve the problems associated with managing Web application performance. No single approach solves all of the problems and each has its own strengths and weaknesses.
Z
ERO
I
MPACT
M
ONITORING
Understanding the real user experience is more than just application performance; it also provides insight into the behavior of users when performance degrades beyond acceptable levels. Do users take a different path through the application to complete their task or do they abandon the site altogether? How does the user behavior change over time or when there have been changes to the applications? Does time of day affect performance and user behavior?
These questions cannot be answered without monitoring the real users. For this reason, the shift from “simulated” users to monitoring real users has been validated by many of the leading analyst firms that track the performance management market. In a recent survey performed by the 451 Group, it was stated that, “Once considered an esoteric luxury, end-user monitoring tools are rapidly becoming a mundane necessity. As more and more enterprises rely on external-facing applications, it's essential to know how these applications are performing for their users…the problem is so complicated and so comparatively new that there's no consensus on exactly how to proceed. You can monitor end-user sessions from inside or outside the firewall, with real or simulated transactions, using agents or an agentless system, or any combination of the above.
What you can't do is focus on a single tier, such as the application server.”
The Meta Group (now part of Gartner) had also previously claimed “Most organization have focused on implementing active response-time monitoring. In the coming years, passive response time will increase in importance as it offers drill-down into actual user activity. Organization must balance overhead in the system with demand for data because measuring too many transactions will impact the very response time being monitored.”
Q
UEST
’
S
E
ND
U
SER
P
ERFORMANCE
M
ANAGEMENT
(EUPM)
S
OLUTION
Quest believes that the best way to understand the end user experience is to have a combination of both the active as well as the passive monitoring solutions and compare the two to the application infrastructure. This provides a comprehensive map on when performance degradation occurs as it relates to the end user, while at the same time understand what part of the application infrastructure contributes to the performance problem.
The active monitoring solution comprises of recording and regular playback of key routing transactions and serves to attain the following:
• Assuring availability of key routine business transactions
• Establishing Service Level Agreements with constant factors (consistent locations, consistent connectivity types, consistent browser types, consistent transaction paths)
• Predictive and proactive alerting of performance degradation of the application. This can also be beneficial useful after any changes are implemented within the application.
• Reporting on availability for different time frames.
The passive monitoring solution comprises continuously monitoring all the end-user traffic and servers to attain the following:
• Understand the real/true user experience
• Monitors every combination of user interactions including:
• All users
• All connectivity types
• All browsers paths
• All user transaction paths
• All user locations
• Identify the slowest and most common user interactions
• Drill into the traffic volume, network performance, server utilization
• Provide the complete service delivery experience of any and every user all the time.
Quest’s passive monitoring of http/https-based Web applications is performed with a non-invasive agent-less approach with zero impact or overhead on your infrastructure. It utilizes a specialized monitor that plugs into a “span” port on the network to capture real user traffic to report on response time of both of the application was well as network conditions and server performance. The solution has been optimized to install in less than 1 hour to begin immediately observing all end-user traffic in real-time, providing real alerts for real problems affecting real users.
Passive monitoring of applications provides the following advantages:
Passive monitoring improves IT efficiency
• Application administrators can quantify, understand and baseline application performance based on actual end user experience. It helps establish realistic Service Level Objectives (SLOs) by providing insight into the performance for a particular application configuration.
• Passive monitoring alerts when user experience falls below defined objectives or historical performance.
• Passive monitoring analyzes the user response time by tier (client, network, processing response times) and server to quickly pinpoint performance bottlenecks in the infrastructure.
• Passive monitoring requires no administrative burden associated with writing and maintain scripts after application changes.
Passive monitoring maximizes customer loyalty
• Passive monitoring identifies how many customers are affected by poor performance.
• Passive monitoring identifies how those customer transactions and activities behave when application performance degrades.
• Passive monitoring identifies individual users experiencing poor performance, allowing the enterprise to proactively rebuild loyalty with those affected.
Passive monitoring optimizes technology spending
• Passive monitoring can quantify the amount of site/application traffic relative to performance so that the right bottlenecks can be addressed first.
• Passive monitoring can analyze infrastructure upgrade with no changes to your monitoring strategies.
Q
UEST
’
S
E
ND
U
SER
P
ERFORMANCE
M
ANAGEMENT
(EUPM)
S
OLUTION
F
EATURES
Proactive Availability Monitoring for Key Business
Services and Transactions
Key routine transactions or business services can be easily recorded, deployed to the different geographical locations and played back at regular intervals to assure availability of the application as well as provide a proactive means to identify any performance degradation. (e.g. a particular business service that takes 35 second on a consistent basis takes 50 seconds to complete from location X; an alert to this effect would be a proactive means to identify that there may be some performance degradation). This information is mapped to the application infrastructure (when used in conjunction with Quest’s application monitoring solution) to provide further information on the application performance.
See exactly which part of the synthetic transaction is problematic.
Understand all End User Activity as it Relates to
Important Business Services and Activities
Quest’s End User Performance Management solution enables customers to group the end user activities based on different definable business services to collect, analyze, understand and compare the performance as it relates to certain business objectives. This provides customers the visibility into which part of their business is affected the most due to poor application performance as experienced by the end users of a particular application. Performance of different business services or applications may be compared against an established or stated service level.
This “Application Service Level Compliance” or business service level compliance can also provide details into how the different business services are performing with respect to the number of users performing those services, as well performance indicators (page download times, processing times, number of pages downloaded, etc.) to indicate where any performance problems occurs as it relates to the end users.
Monitor Performance of Certain Individual Web Pages
or Business Transactions
Quest’s EUPM solution offers the ability to monitor either certain desired pages or group different pages and define transactions per the customer’s business need to get an overall picture of the end user activity as it relates to a series of different pages. Performance of different business transactions or individual web pages may be compared against an established or stated service level.
This “Transaction Service Level Compliance” or business activity level compliance can also provide details into how a particular page or different pages that comprise a particular business transaction is performing as experienced by the end users. Important metrics like how many times a particular transaction has been executed in any given time frame as well the processing and download times can be measured to see how application performance affects business objectives.
Detailed User Activity Diagnostics
For poorly performing business services or business transactions, the performance of individual pages comprising those services or transactions can be identified by looking into the response times of those individual pages. Since the passive monitoring solution captures all the requests (and the related responses), the approach taken here is to identify those pages that are most responsible for the performance degradation. This will help immensely with regard to what user activities are being impacted the most and which pages the administrators should prioritize on fixing the performance issues. They can also link the performance of these pages to other performance metrics that are being monitored for the application infrastructure (through other solutions/tools).
Once poor performing pages grouped under a particular business service or business transaction is identified, further drill down diagnosis can be performed to see which particular element/s contribute to the poor performance of that page. This significantly reduces the time to resolve application performance problems as the administrators then can investigate the relevant code behind that particular element. More importantly, the administrators will be working on resolving issues that impact the business operations the most.
Understand the Performance of the Web Servers
Supporting Your Application or Website
Quest’s EUPM solution collects, analyzes and provides performance information on all the servers that service a particular application or webpage. Performance of different servers may be compared against an established or stated service level.
This “Server Level Compliance” will help prioritize tuning efforts and dedicate resources to where they are needed the most (e.g. some servers may always be more problematic than the others as indicated by the different performance metrics analyzed).
Understanding Performance for Different End-user
Geographical Locations
With external monitoring services, you basically get monitoring from where they have their robots located. Often this geographic representation does not match your user community. To get a true picture of performance, you want your real users measured as they experience the application or Web site.
Performance of Subnets
Subnet performance can be used as the basis for SLAs (Service Level Agreements) with business partners, service providers, remote offices, and network providers. However, knowing the performance is only one part of the equation. Effective service levels are often expressed as performance relative to an agreed upon traffic volume, page views, throughput, etc. For example, “I will provide sub-2 second response time for traffic volumes up to 1,000 connections per hour.” Without knowing whether the traffic volumes exceed expected levels, it is difficult to evaluate service levels relative to the commitment. By tracing both performance and resource utilization, the enterprise can effectively manage both customer expectations and service levels.
Understanding ISP Performance Used by End Users
The primary reason companies use hosted monitoring services with a distributed network of transactional robots is for availability monitoring and some amount of insight into the performance of their customers ands service providers. Unfortunately, your real traffic may be coming from different locations and networks then where they place their robots. Understanding where your users are coming from allows you to pay attention to the details that will ultimately satisfy your customers. For example, if 30 percent of your traffic comes from AOL users, you would likely want to know what their typical performance for a day, week or monthly. Why focus your efforts on areas that do not affect your customers? The performance information gathered about the ISPs can also help companies standardize their application performance based on certain ISPs and communicate the same to their end users or customers.Out-of-the-Box Diagnostics Information
Quest’s passive monitoring solution provides numerous reports out-of-the-box that can be effectively used to proactively monitor and resolve any performance issues with the application.
These reports provide instant visibility into the end user transactions and will help application owners concentrate on those end-user activities that have the most negative impact on the business operation.
Understanding End User Behavior
Information collected on end users helps understand how the users are experiencing the application considering factors like stickiness, page stop rate and access speed.
In addition, the information gathered about the user behavior when compared to the infrastructure metrics as it relates to the end-user experience helps both the application owner as well the application administrator get an instant overview of what is contributing to poor application performance: Is it the users, the network, the infrastructure or the activity itself?
Holistic View of Application Performance and
End-user Experience
Quest’s flagship monitoring solution, Foglight®, provides the ability to have the active monitoring (synthetic) as well aggregated response times for key elements of all user activity (passive monitoring) to be managed from one point of view, including proactive alerting for both.
This holistic view enables application owners and administrators to have a “one stop” dashboard view of comparing the end user experience to the application infrastructure problems (when used in conjunction with Quest’s Foglight for Application Infrastructure Performance Monitoring). This approach will indicate whether the application or key business transactions (active/synthetic monitoring) are available and how the real end user (passive monitoring) experience is as it relates to certain business services or transactions (or locations or servers). In addition, during periods of application performance degradation affecting the end users, it will pin point where the problems lies within the application infrastructure.
W
ORK
F
LOW FOR
E
FFECTIVE
E
ND
-
USER AND
A
PPLICATION
P
ERFORMANCE
M
ANAGEMENT
1. Monitor all end-user activity (relate them to business transactions and activities) 2. Quantify, analyze and baseline application performance as it relates to the end
users and establish realistic Service Level Objectives.
3. Record custom transactions and monitor them from different strategic locations a) Monitor service levels for continuous business services availability and
performance
4. Monitor the different components that comprise the application infrastructure (Web and application servers, DB, Batch, Ntwrk)
5. Identify the four “Ws”
a) When there is a problem – as indicated by poor end-user response times b) Where the problem occurs – as indicated by end-user response times for
client, network and backend processing as well as the application infrastructure tiers monitoring
c) What is causing the problem – by mapping it to the metrics collected for the end users as well as those collected within the application
infrastructure tiers
d) Who is causing (or affected by) the problem – investigating specific end-user (locations) activities when needed
e) A combination of two or more of the above four “Ws” often holds the key to the answer for the most elusive “Why” the performance degradation is occurring
M
AXIMIZING YOUR
IT
I
NVESTMENTS
While enterprises continue to strive for optimal performance of critical applications, the days of simply throwing additional hardware and bandwidth at problems long gone. IT managers are required to justify the impact of new investments with respect to one or more defined business objectives. Unfortunately, all too often, the expected gain in some new investment in technology or infrastructure is not nearly what was anticipated or promised. Further complicating matters is the fact that as new technology and infrastructure is deployed, the cost of managing and administering these new components competes for budget with the tools required to maintain existing service level of the existing applications and their underlying infrastructure. The real challenge has become the ability to optimize the budget, spending on only those initiatives that actually deliver on their promise to improve performance and productivity and forgoing those that don’t. Did the impact of a caching solution really improve performance or bandwidth utilization? Did hosting your site improve its availability? Did the new server upgrade really make things worse?
While IT wants to hold their vendors touting large performance boosts accountable to actually delivering the results they claim, they often lack the tools to accurately assess the real impact of these initiatives. IT managers need to be able to quantify performance before and after changes have been made in the infrastructure in order to justify the investment in new performance enhancing technology. Only by measuring the real impact on the end user will it become evident whether things have really improved. By quantifying the true impact on the end users, organizations will quickly determine which products provide value to the business and allow for more efficient use of their budget. Additionally, it becomes imperative to identify those areas of the application and infrastructure causing the most pain to your end users and those areas that have the greatest exposure to your end users and improve the performance of those areas first to have the greatest impact on the majority or your customers. For example, the most common business transactions, most frequented pages, longest queries, etc. are the right areas to focus your performance optimization efforts.
C
ONCLUSION
Despite conservative fiscal management and tightening IT budgets, enterprises are still investing in new technologies to improve efficiency and create opportunities for growth. In order to support new initiatives and maintain current service levels, IT managers must make more efficient use of existing resources, improve expertise, and promote better coordination among teams. Application centric solutions providing real time insight and reporting into the end-to-end service levels of critical packaged and custom applications are becoming a requirement of enterprises serious about their bottom line. If the true value of the application is to be measured, it has to be quantified in terms of how the end users are utilizing the applications and how that relates to the business objectives. This will truly determine the priorities as to how application performance problems need to be actually resolved.
N
OTES
1. META Group, “Web Application Management Tools, Service Management Strategies, Operations Strategies,” by Corey Ferengul, April 2003
2. META Group, “Web Application Management Tools, Service Management Strategies, Operations Strategies,” by Corey Ferengul, April 2003
A
BOUT THE
A
UTHOR
Sunil G. Shenoy is a senior product marketing manager at Quest Software with responsibility for marketing and strategic direction of Quest’s monitoring product line. Prior to joining Quest, Sunil was with BMC Software where he focused on application performance and tuning. Sunil specializes in Enterprise Applications and Databases. He brings several years of consulting, project management, software implementation and custom development experience to Quest.
A
BOUT
Q
UEST
S
OFTWARE
,
I
NC
.
Quest Software, Inc. delivers innovative products that help organizations get more performance and productivity from their applications, databases and infrastructure. Through a deep expertise in IT operations and a continued focus on what works best, Quest helps more than 18,000 customers worldwide meet higher expectations for enterprise IT. Quest Software can be found in offices around the globe and at www.quest.com.
Contacting Quest Software
Mail: Quest Software, Inc.World Headquarters
5 Polaris Way
Aliso Viejo, CA 92656 USA Web site www.quest.com
Email: [email protected]
Phones: 1.800.306.9329 (Inside U.S.) 1.949.754.8000 (Outside U.S.)
Please refer to our Web site for regional and international office information. For more information on Quest Software solutions, visit www.quest.com.
Trademarks
All trademarks and registered trademarks used in this guide are property of their respective owners.