White Paper. Conversion Testing Best Practices March P a g e

(1)

1 | P a g e

White Paper

Conversion Testing Best Practices

March 2013

(2)

2 | P a g e

Conversion Testing Best Practices

The key to a successful e-commerce conversion test is to ensure it utilizes a methodology that will provide an accurate conversion impact measurement for the solutions you are testing. To ensure test accuracy, you should consider the following factors when setting up and conducting your test:

1) Is time series or an A/B split best for your site? Specifically, ease vs. accuracy. 2) Are you testing an element that is on a specific page, or site wide?

3) Are you interested in the impact on a single visit, or the lifetime impact on the visitor? 4) How much data do you need to feel comfortable with a business decision?

5) If using a third party testing tool, what data capture and integrity issues should you care about?

1. Is time series or an A/B split best for your site?

Time Series vs. A/B Split Test. There are two general over-arching methods to test website conversion

a. Time Series: The comparison of website conversion rates from one time period with another, e.g. introducing an element to be tested, tracking conversion for a set period of time, and then comparing conversion rates between the two periods.

b. A/B Split Testing: This method involves randomly dividing website traffic into two, similarly composed groups, introducing a test element to one of the groups, and comparing

performance between the two groups. If done correctly, the only difference between the two groups is the test element, thus enabling you to attribute differences in conversion to that element.

The Simplicity vs. Accuracy Trade-off

The advantages and disadvantages of the two approaches center on simplicity vs. accuracy. The time series approach is very simple to conduct, and can typically be implemented with zero additional technology and measured using conventional conversion measurement tools. A/B testing is more involved, and will typically require either specialized measurement capabilities. A/B testing systems, executed correctly are very accurate in pinpointing what changes to your site are having the most effect. By carefully controlling two separate, but identical groups of site visitors with different experiences, and comparing results, it is easy to evaluate what site experience is optimal. Sales, marketing activities, time of day/week, or general changes in the broader environment that affect performance are controlled for. In almost every instance, A/B testing will yield far more accurate results than time series.

Time series data is much harder to interpret. Sales, marketing activities and general market trends will effect time periods differently, and are not controlled for, e.g., conversion rates pre-holiday and post pre-holidays can swing dramatically. Similarly, conversion rates between Monday and Saturday can differ from 20 to 40%. Time series tests that compare data with more

‘Saturdays’ than ‘Mondays’ yield skewed results which will lead to very flawed decision making. Time series tests can best be used if site performance is very stable so that a long baseline track record can be used, where the test can be run for a very long period of time, where the element to be tested will have a very pronounced impact on conversion (i.e. > 20%). and where the cost & complexity of AB testing outweigh the potential increase in performance.

In short, A/B testing is much more accurate than time series testing. Further, since most online sellers are testing solutions and site permutations where the change in performance is marginal, there is a significant risk that the change in performance measured is less than, and obscured by the amount of error inherent in a time series test. However A/B testing is not costless, and requires patience and skill in setting up an accurate test.

(3)

3 | P a g e

2. Are you testing an element that is on a specific page, or site wide?

Single Page vs Site-wide Testing There are two approaches to testing site elements

a. Single Page Testing: Testing approaches that rotate the elements on a specific page, independent of previous or next pages seen by that visitor. Rotation occurs per page view, e.g.,a single visitor may see different elements on the same page after a page refresh. This is a relatively simple form of third party testing and should be only used when you only want to measure the impact of one element on increasing click-throughs to the next page.

b. Site Wide Testing: This involves introducing and maintaining elements that will be persistent throughout a visitors visit to a site and measuring the overall impact on conversion/sales, e.g, testing free shipping by introducing on the home page, and re-merchandising on all

subsequent pages. This is more technologically challenging for testing services as it requires the ability to maintain the experience of an individual visitor across their entire visitor, and requires the testing service be protected from the increased prevalence of private browsing, and cookie blocking/cleansing tools.

If the solution you are testing exists solely on a single page, single page testing can be used. If the solution you are testing is present throughout the site, it is critical to use testing services that can account for this. The most common mistake is to use a testing tool like Google Optimizer that is primarily designed for single page testing (to test changes to a single page on your site) to measure the impact of site wide solutions. For example, if you have free shipping promoted on all pages of your website, running an A/B test that only removes the promotion from a single page while free shopping is still promoted on all other pages will make the test invalid. This also applies to security seals, trust seals and merchant reliability seals that are present on all or many pages of your site. This can also apply to ratings and reviews, product recommendation engines and social plug-ins. It is critically important to ensure your test is carefully calibrated to be a full site-wide test where each visitor either consistently sees or consistently does not see the solution you are testing. For more details on Google Optimizer, refer to Appendix A of this document.

3. Are you interested in the impact on a single visit, or the lifetime impact on the visitor? Repeat visit consistency. Many buyers visit a site multiple times before making a purchase, these are called repeat visitors. For consistency of results and user experience, it is critical that users who start in one group remain in that same group throughout the duration of the entire test period. For

example, you run a free shipping promotion test on your site. On the first visit, the visitor was exposed to the free shipping offer throughout the site. They leave, but come back the next day. On the second visit, the visitor is not exposed to the free shipping offer but makes a purchase. Should that buyer be counted in the test as influenced by the free shipping offer? Maybe, maybe not. It’s impossible to know given the mixed experience and thus the test results will be invalid.

Testing tools that utilize cookies are negatively impacted by standard cookie blocking and cleansing, resulting in many users who are intermittently exposed to the solution you are testing. This can result in test error rates of up to 15%. The best solutions will also use IP address ranges to randomize visitors into two groups as they create a much more stable user experience that does not solely rely on cookies.

4. How much data do you need to feel comfortable with a business decision?

How much data is needed? It’s a critical question, as a test with too little data will likely yield invalid results. A test run for longer than necessary, wastes time, keeps you running a less than optimized state (i.e. as all visitors are not exposed to the best option during the test, the opportunity cost of running a test can therefore be quite high) and can keep you from moving on to the next test. The methodology used has a major impact on the answer to this question.

(4)

4 | P a g e

With time series testing, a great deal of data is required, in terms of data points, and weeks to run the test. Specifically, many weeks and/or months would be required to assess if the conversion you are seeing is a product of the tested element or on outside forces.

For A/B testing, the amount of data depends on the level of conversion lift the tested element is providing. Essentially, you want to test long enough to establish that the change in conversion, is ‘real’ and not simply noise. For example, in a 1,000 order test, if the test group yields 10 extra orders than the control group, that appears to be a 1% lift. Statistically speaking, this is not yet meaningful, as the control and test group could actually be identical, but the test group got ‘lucky’ 10 times – which in a 1,000 order test is not that much. However, if after 1,000 orders, the test group has yielded 100 extra orders, it is much more likely that the true lift in performance is 10%. For the control and test group to be ‘even’, the test group would have to have gotten ‘lucky’ 100 times. The bottom line to statistical significance is to have enough data to rule out the possibility that what you are seeing is luck, and not sustained difference in performance.

5. If using a third party testing tool, what data capture and integrity issues should you care about?

Quality Assurance of third party testing tools. There are several factors you should consider when using a third party testing too to gain confidence in their methodology.

a. Can the testing tool support A/B split tests that are site wide? You’ll quickly find that you’ll want to use site wide A/B split tests for measuring the impact of most things impacting website conversion.

b. Does the testing tool enable you to accuracy handle repeat visitors to your site?

c. How does the testing service handle bot-traffic? I.e., site traffic from automated services that may show up periodically, and in a very spiky patterns. If it counts non-human traffic in your test, your results may be highly inaccurate.

d. How does the testing service handling a website’s own customer service orders? For sites where the website is used as an order management system, does the testing service correctly account for and remove customer service and phone orders? If not, those orders will over-weight the test group they fall in and provide you midleading results.

e. Does the testing tool’s order values match with your own order management system? This is a basic check to make sure that the testing algorithm is capturing all orders that are occurring on the site. Amazingly, sometimes sites will find pretty wide discrepancies between the too systems if they haven’t performed a rigorous comparison before. You want to make sure there is no systematic bias to the results. Sources of discrepancy could include payment options, timing, customer service, and or order changes.

f. Does the testing service interfere/collide with ROI trackers, typically placed on your order receipt page? Most checkout pages have a large number of ROI trackers (conversion

trackers, ad trackers, etc). These trackers have the tendency to collide with each other, and can commonly disable underlying Java engines, and therefore disrupt the performance of those ROI trackers and the testing engine.

Conclusion

A/B testing is the most effective way to ensure that the conversion optimization solutions you utilize are delivering the ROI you need to justify the investment. When you use the A/B testing best practices described in this document, you will have peace of mind in knowing that you are making are the right decisions for your business.

(5)

5 | P a g e

APPENDIX A - Google Optimizer Overview for A/B Testing

Many online merchants utilize the Google Optimizer testing tool for conducting tests to optimize various elements of their website. They do this for good reason, as Google Optimizer is one of the best free tools available for running single page A/B tests. In fact, Google Optimizer supports both A/B testing – to test the performance of two (or more) entirely different versions of a page and Multivariate testing – to test multiple variables (sections of a page, buttons, etc.) on a page simultaneously.

The following is an excerpt (and link) directly from the Google Website Optimizer website

www.google.com/websiteoptimizer which describes the two types of tests Google Optimizer is designed

to support.

Can Google Optimizer be used for testing site wide solutions?

While Google Optimizer is a good tool for conducting single page optimization tests, Google Optimizer is not designed to do is site wide testing of multiple elements. Because of its design as a single page test tool, it is not recommended that merchants utilize Google Optimizer for site wide testing. For example, if you have free shipping promoted on all pages of the website, running an A/B test that only removes the promotion from a single page while it is still present on all other pages will make the test invalid. This also applies to other security seals, trust seals and merchant reliability seals that are present on all pages of the site. This could also apply to ratings and reviews, product recommendation engines and social plug-ins.

Testing tools that are designed to support site wide tests over an extended period of time are much better tools to use. Some examples of these tools include Omniture’s Test & Target or Monetate.

What kind of testing can I do?

Website Optimizer uses two types of testing: A/B testing and multivariate testing.

A/B testing experiments enable you to test the performance of two (or more!) entirely different versions of a page. You can change the content of a page, alter the look and feel, or move around the layout of your alternate pages – there's plenty of design freedom with A/B testing. It's the simpler type of test, and works best with pages that don't get a lot of traffic.

Multivariate tests, on the other hand, allow you to test multiple variables – in this case, sections of a page – simultaneously. For example, you could identify the headline, image, and promo text as parts of your page you'd like to improve, and try out three different versions of each one. Website Optimizer would then show users different combinations of those versions (let's say, Headline #2, Image #3, and Promo Text #1) to see what users respond to best. Multivariate tests are more complicated and typically require higher page traffic.