Rigorous Performance
Testing on the Web
Grant Ellis
Who is Instart Logic?
§ Software company focused on Application Delivery
§ We work with globally known brands whose business
depends on performance, and make their sites and apps really fast
§ Team includes big data, virtualization and web
performance experts from Google, Facebook, Akamai, Cisco, Citrix, VMware, and Aster Data
§ How was the data collected? Aggregated? Normalized?
§ What is response time? What does that mean for the users?
§ Did any actual human beings see this response time?
§ What devices/browsers were used? Laptop? Phone? Tablet?
§ Where were the users located?
1. Methodology matters more than results
2. Statistical analysis can (and sometimes does) lie.
Ø It is really easy to
Ø make great results look poor,
Ø make poor results look great,
Ø either deliberately or accidentally.
Table of Contents
§ The Internet, The Bottleneck, and The Test: A brief history
§ Last-Mile Performance Tools (It’s dangerous to go alone!)
§ Now I have data… Lots of data
§ But, wait, there’s more (data)!
§ Need more? Meet the CDF.
Need For Speed: Packet Edition, created by Raphaël Luta
http://www.aptiwan.com/packetstory/
The Dawn of the (World Wide) Web
§ Adoption viable for commerce and business
§ Performance detractors:
- Weak server hardware
- Clumsy scaling technology
- Poor first-mile connectivity
§ Primary Bottlenecks:
- Hardware
- First-mile connectivity
The Internet, The Bottleneck, and The Test: A brief history
LAST MILE MIDDLE MILE FIRST MILE
Bottleneck Bottleneck
HARDWARE
Repeatedly loads whole pages. Measured performance takes into account the page, the
embedded objects, and the server latency introduced by a then-traditional three-tier
architecture.
ADC ISP
Data center scale was conquered.
§ Adoption on the web increased again:
- Google, Facebook, fully-baked e-commerce, others
- Governments digitized records and moved vital functions to the
Web
§ Performance detractors:
- Middle-mile copper
- Congested switches
- Poorly maintained peering points
§ Primary Bottlenecks:
- Middle-mile
The Internet, The Bottleneck, and The Test: A brief history
LAST MILE MIDDLE MILE FIRST MILE HARDWARE
Bottleneck
Backbone products from Gomez and Keynote
Enables ongoing performance testing (e.g. monitoring) from multiple geographies at the same time.
Beware: Some content delivery networks have taken care to
place their nodes on the same network, or even the same rack, as synthetic testing nodes. Look for unrealistically low response
times in your embedded objects!
The Internet, The Bottleneck, and The Test: A brief history
LAST MILE MIDDLE MILE FIRST MILE HARDWARE
Bottleneck
§ Last mile latency, packet loss
§ Browser mechanics
The Application Delivery Challenge Today
High Performance Browser Networking by Ilya Grigorik, Figures 7-16 and 10-6
Available for free online: http://chimera.labs.oreilly.com/books/1230000000545/index.html
0 50 100 150 200 250
Wired LTE WiFi 4G 3G
La te n cy (ms )
Table of Contents
§ The Internet, The Bottleneck, and The Test: A brief history
§ Last-Mile Performance Tools (It’s dangerous to go alone!)
§ Now I have data… Lots of data
§ But, wait, there’s more (data)!
§ Need more? Meet the CDF.
Last-Mile Performance Tools
(It’s dangerous to go alone!)
• JMeter and LoadRunner measure:
• From a single geography (usually on-premise) • With a single browser
• Keynote backbone / Gomez backbone: • Report only on average
• Use fixed (backbone) connectivity • Still simulate data
• None of the above measure:
• Multiple devices
• Multiple connection types • True user experience
• Impact from wireless technologies
Synthetic Testing
Pros
• User Experience metrics
• Open source!
• Multiple device types
• Multiple connection types (traffic shaping) • Great reports
• Captures waterfall diagrams
Cons
• Limited analysis tools
• Difficult to monitor performance • Platform stability
• It’s still synthetic
Real User Monitoring (RUM)
Pros
• True user experience • Easy set-up
• Great browser support • Multiple device types
• Multiple connection types • Open source tools available
Cons
• Requires live traffic - Responsive, not preemptive • Measurement impacts results
• Safari data is limited
• Outliers are can be extreme and must be removed
boomerang.js
Last-Mile Performance Tools
First: New vocab for last-mile tools
§ Fully Loaded
- Entire page has been loaded
- Including asynchronous functions like analytics beacons.
- The browser hasn’t utilized the Internet Connection for a while
- Generally transparent from a users perspective
For a long time, fully loaded is all we had. With mature
client-side technologies, the Fully Loaded metric is much less relevant:
• Does not take into account browser mechanics
• Fires after connection is disused– nothing to do with
First: New vocab for last-mile tools
§ Fully Loaded
- Entire page has been loaded
- Including asynchronous functions like analytics beacons.
- The browser hasn’t utilized the Internet Connection for a while
- Generally transparent from a users perspective
§ Document Complete (or Onload)
- The page is assembled by the browser and ready for the user.
- (Almost) always visually complete
- User can use the scroll bars, click links, or search.
- The browser may still be doing things in the background.
• Some sites defer loading of prominent content until after document complete.
• Some Front-End Optimization (FEO) packages defer script
execution for document complete. In this case, an interactive site may look visually complete at document complete, but
won’t actually be responsive or usable until after those scripts execute!
First: New vocab for last-mile tools
§ Fully Loaded
- Entire page has been loaded
- Including asynchronous functions like analytics beacons.
- The browser hasn’t utilized the Internet Connection for a while
- Generally transparent from a users perspective
§ Document Complete (or Onload)
- The page is assembled by the browser and ready for the user.
- (Almost) always visually complete
- User can use the scroll bars, click links, or search.
- The browser may still be doing things in the background.
§ Start Render (or Render Start)
- Browser paints something (anything) on the screen.
- May be all or most of the page, or a single image, or a single paragraph, or a single pixel.
- The moment your user knows that the web site is actually working.
Load Time
§ Otherwise known as Document Complete.
First Byte
§ Network latency plus server latency.
Start Render
§ Otherwise known as Render Start.
Visually Complete
§ All visual components of the page are painted on the screen.
Speed Index
§ Loosely, the average time for visual components to be painted on
the screen.
Fully Loaded
§ The same Fully Loaded. The Browser stops using the connection.
First: New vocab for last-mile tools
§ Transparent for users.
§ Critical path for all browser functions
Load Time
§ Otherwise known as Document Complete.
First Byte
§ Network latency plus server latency.
Start Render
§ Otherwise known as Render Start.
Visually Complete
§ All visual components of the page are painted on the screen.
Speed Index
§ Loosely, the average time for visual components to be painted on
the screen.
Fully Loaded
§ The same Fully Loaded. The Browser stops using the connection.
First: New vocab for last-mile tools
§ BEWARE: Visually complete is not the same as functional.
Some Front-End Optimizations defer JavaScript execution to make the page look visually complete faster– but users may not be able to click links, scroll the window, or search!
§ More technically: the integration of the area above the
curve if all paint events are plotted (lower is better).
§ The same warnings around visual completeness apply.
Sites with great speed indexes are not necessarily functional as quickly as they are visible.
Load Time
§ Otherwise known as Document Complete.
First Byte
§ Network latency plus server latency.
Start Render
§ Otherwise known as Render Start.
Visually Complete
§ All visual components of the page are painted on the screen.
Speed Index
§ Loosely, the average time for visual components to be painted on
the screen.
Fully Loaded
§ The same Fully Loaded. The Browser stops using the connection.
Table of Contents
§ The Internet, The Bottleneck, and The Test: A brief history
§ Last-Mile Performance Tools (It’s dangerous to go alone!)
§ Now I have data… Lots of data
§ But, wait, there’s more (data)!
§ Need more? Meet the CDF.
Now I have data… lots of data
Over 6,000 data points.
Possible interpretations…
Average Median Devia/on Standard
blue 8.947 7.323 4.792
red 9.239 7.168 5.357
green 8.155 6.977 4.844
purple 14.104 Over 6,000 data points. 13.109 4.397
à Gross oversimplification
May be useful.
But, look at how the graph changes with slightly different cuts.
Table of Contents
§ The Internet, The Bottleneck, and The Test: A brief history
§ Last-Mile Performance Tools (It’s dangerous to go alone!)
§ Now I have data… Lots of data
§ But, wait, there’s more (data)!
§ Need more? Meet the CDF.
But, wait! There’s more (data)!
None of these representations capture the whole picture!
There are hundreds of permutations of variability- different: • Internet connection types
• Devices • Browsers
• Geographies
• Wireless connection quality • Computing power
And then, there’s the natural variability of the Internet.
Plots over time usually aren’t that relevant for web performance: • Oversimplification – sometimes misleading!
But, wait! There’s more (data)!
§ We can’t take all these things and distill them into one number, or even one number
plotted over time. Enter the histogram:
The histogram expresses
how many users experienced a particular page load time.
But, wait! There’s more (data)!
§ We can’t take all these things and distill them into one number, or even one number
plotted over time. Enter the histogram:
Taller bars mean that more users saw the load time in that interval.
But, wait! There’s more (data)!
§ We can’t take all these things and distill them into one number, or even one number
plotted over time. Enter the histogram:
Shorter bars mean that fewer users saw the load time in that interval.
But, wait! There’s more (data)!
§ We can’t take all these things and distill them into one number, or even one number
plotted over time. Enter the histogram:
Faster transaction times are on the left side of the histogram.
But, wait! There’s more (data)!
§ We can’t take all these things and distill them into one number, or even one number
plotted over time. Enter the histogram:
When the taller bars are on the left side, it means that more users saw a fast experience.
If you are comparing two experiences, plot the histograms on the same chart!
But, wait! There’s more (data)!
§ We can’t take all these things and distill them into one number, or even one number
plotted over time. Enter the histogram:
Red is definitely faster than blue: • Fast users got faster
• Medium users got faster • Slow users got faster
Table of Contents
§ The Internet, The Bottleneck, and The Test: A brief history
§ Last-Mile Performance Tools (It’s dangerous to go alone!)
§ Now I have data… Lots of data
§ But, wait, there’s more (data)!
§ Need more? Meet the CDF.
Need More? Meet the Cumulative Distribution Function (CDF)
§ We all love histograms:
- Everything is represented
- Easy to consume
§ But, they still have shortcomings:
- Finite granularity
- Arbitrary bucket designations
Need More? Meet the Cumulative Distribution Function (CDF)
The Cumulative Distribution Function (CDF) expresses the percentage of page loads completed after a given amount of elapsed time.
Need More? Meet the Cumulative Distribution Function (CDF)
So, for blue, approximately 20% of page
loads were completed in 5 seconds or less.
Need More? Meet the Cumulative Distribution Function (CDF)
Slightly less than 70% of transactions were done in 10 seconds or less.
Need More? Meet the Cumulative Distribution Function (CDF)
As with histograms, a better (faster) CDF is one with a curve to the left and above this one.
Need More? Meet the Cumulative Distribution Function (CDF)
The red line is higher and more to the left. A greater percentage of users are done with their page load at any given time.
Need More? Meet the Cumulative Distribution Function (CDF)
The gap between the lines is the differential. Right here, only 80% of blue users were done with
their page load. After the same amount of time, more than 90% of red users were done.
Need More? Meet the Cumulative Distribution Function (CDF)
The red curve is above and to the left of the blue curve in all cases. Red is faster for all users.
Table of Contents
§ The Internet, The Bottleneck, and The Test: A brief history
§ Last-Mile Performance Tools (It’s dangerous to go alone!)
§ Now I have data… Lots of data
§ But, wait, there’s more (data)!
§ Need more? Meet the CDF.
Tie it all together…
§ The Internet is a jungle.
§ Methodology matters more than results.
§ Statistics can lie.
§ Pick your tool wisely.
§ Irrelevant metrics mislead.
§ Performance is never a single number.
§ Powerful visualizations trump aggregate figures.