• No results found

Popular existing approaches to evaluating and optimizing the usability of web interfaces include expert inspection methods—such as cognitive walkthroughs and heuristic evalu-ations (Nielsen, 1994; Nielsen, 1995)—and different forms of user studies (Insfran and Fernandez, 2008), e.g., either in a lab or remote setting. These methods seem costly and time-consuming from a company’s point of view, as particularly user testing is “heavily con-strained by available time, money and human resources” (Nebeling et al., 2013b). However, the usability of an e-commerce product is a prime factor for ensuring customer satisfaction and loyalty (Sauro, 2010). Specifically, Sauro (2010)—who compared System Usability Scale scores (abbreviated “SUS”; Brooke, 1996) of promoters and detractors of e-commerce products—states that “[p]erceptions of usability explain around13of the changes in cus-tomer loyalty”. Moreover, Kuan et al. (2005) found a relation between system quality and conversions, i.e., they identified three dimensions of usability that “explain over 70% of variance of intentions for planned purchase as well as future purchase.” Thus, it is highly necessary for e-commerce companies to continuously evaluate their products with respect to good usability, in particular to remain competitive.

Yet, in industry contexts, the maximization of conversions (e.g., Tonkin et al., 2010) is often preferred over traditional usability testing. That is, based on more efficient tools such as Google Analytics11(Google, 2014a) and Visual Website Optimizer12(Wingify, 2014), the goal is to maximize a given target metric, mostly clicks on advertisements or completed checkout processes. For instance, The Complete Guide to A/B Testing13by Visual Website Optimizer states: “All websites on the web have a goal - a reason for them to exist

— eCommerce websites want visitors buying products

— SaaS web apps want visitors signing up for a trial and converting to paid visitors

— News and media websites want readers to click on ads or sign up for paid subscriptions Every business website wants visitors converting from just visitors to something else.”

(Wingify, 2014) This is an understandable course of action from a company’s point of view, as the foremost intention of an e-commerce product is the generation of revenue. Conversion maximization is often achieved through split testing (or A/B testing) set ups. This means that two slightly different variations of the same web interface are deployed simultaneously.

Then, a pre-defined fraction of the users are sent to the one variation while the rest is sent to the other. Based on this, company analysts try to infer which interface has generated more conversions. A realistic example scenario for this would be a dating agency who display a

11http://www.google.com/analytics/ (Nov. 10, 2014).

12https://vwo.com/ (Nov. 10, 2014).

13https://vwo.com/ab-testing/ (Dec. 19, 2014).

4 Chapter 1 Introduction

person on their website and via split testing investigate how many users register depending on whether that person is blond or brunet. Split testing set ups have two main advantages:

1. They are cheap, i.e., deployment is simple and no usability experts are required to interpret the collected data (Nielsen, 2005), which makes split testing more efficient and cost-effective than traditional usability evaluation.

2. They can be applied to live websites, i.e., actual user behavior under real-world conditions is measured (Nielsen, 2005). This stands in particular contrast to user testing, which in today’s IT industry is mostly applied in lab settings before a new website or major redesign goes live.

While these advantages are undeniably attractive for e-commerce companies and explain the popularity of the method in real-world scenarios, split testing also has considerable drawbacks—or, as Nielsen (2005) puts it: “the downsides usually outweigh the upsides”.

1. Although it investigates real users’ actual behavior, the method cannot give insights into the reasons for that behavior (Nielsen, 2005). For instance, the completed checkout processes of an online shop might increase due to an incorrectly labeled button—“continue” instead of “buy now”. Therefore, conversion maximization might be even contradictory to usability (Nielsen, 2005) and thus also to customer satisfac-tion and loyalty (Sauro, 2010).

2. Split testing cannot propose optimizations to an interface. This is because both varia-tions have to be present before the test is carried out (Nielsen, 2005). Particularly, the potentially optimized version must be readily designed before quantitative evidence is available.

Because the combination of split testing and conversion maximization is generally used with any revenue-oriented website, it is in particular also applied to advertisement-displaying SERPs. As already described above, SERPs of different providers are evolving in terms of their look & feel and the presented information. Also, competitors with new concepts are entering the search engine market. To give just two examples, DuckDuckGo builds on the principle of tracking-free search (“The search engine that doesn’t track you.”) while Ecosia14is a green search engine (“Search the web, save the environment”15). The continuous evaluation of SERP usability is particularly important for such new competitors engaging a novel look &

feel. But also well-established providers need to engage methods for usability evaluation due to evolving needs in terms of provided information and their presentation.

To summarize the above, in today’s IT industry, existing means for usability evaluation are not sufficiently applied in e-commerce settings. Yet, they are highly important, particularly in the context of SERPs. Besides this, existing methods for usability evaluation (e.g., Insfran and Fernandez, 2008; Matera et al., 2006; Nielsen, 1994; Nielsen, 1995) as well as split testing (Nielsen, 2005) and conversion maximization (e.g., Google, 2014a; Tonkin et al., 2010;

Wingify, 2014) are general approaches.

14https://www.ecosia.org/ (Nov. 12, 2014).

15https://www.ecosia.org/what (Nov. 12, 2014).

1.1 Motivation 5

Fig. 1.2.: Real-world search results.

That is, there exists no holistic approach that is specifically tai-lored to the evaluation and optimization of SERPs. To give just one example, usability consulting agency Userfocus provide a checklist that is aimed at search usability. However, their approach is a rather limited, 20-item set of best practices for a general search setting on arbitrary websites.16 In the fol-lowing, we illustrate the shortcomings described above using two real-world examples.

Example #1 Figure 1.2 shows web search results by a new real-world search engine (that as of November 13, 2014, is in a closed beta state) for the query “Neues iPhone” (German for “new iPhone”). In total, the interface displays ten search results and nine advertisements (of which six are grouped, i.e., they contain several hyperlinks). Of these nine advertise-ments, four are displayed above and five are displayed below the actual results. That is, nine out of nineteen “result-like entities” are advertisements, which corresponds to 47.4%.

Moreover, 37 out of 47 hyperlinks (78.7%) belong to adver-tisements due to the presence of grouped adveradver-tisements.

When looking at the occupied space, we have a total of 900,015 pixels (435 × 2,069 px), of which 504,165 px are occupied by advertisements. This corresponds to 56% of the total area. Regarding the advertisements displayed above the actual results, we have an occupied area of 231,420 px, which corresponds to 25.7% in total and 45.9% of the space occupied by advertisements. These metrics are summarized in Table 1.1.

The dashed line in Figure 1.2 indicates the lower bound of the initial viewport when the SERP is accessed with the author’s PC—featuring a standard 15.6′′display with a resolution of 1366 × 768 px and a browser viewport of 1349 × 705 px.

That is, in this setting, 0% of the web search results are visible when accessing the page without scrolling, which is a clear shortcoming from the usability perspective.

Moreover, we applied jQMetrics (Nebeling, 2012, Ch. 7) to the corresponding SERP interface. jQMetrics are a set of static metrics that detect potential problems of an interface with respect to the efficient usage of screen real-estate, the ratio between document and viewport size, font size etc. In this way, we can get a first impression of the SERP’s overall usability. The metrics were applied using the author’s PC already described above as well as a second computer featuring a 22′′ screen with a resolution of 1680 × 1050 px and a

16http://www.userfocus.co.uk/resources/searchchecklist.html (Nov. 12, 2014).

6 Chapter 1 Introduction

Tab. 1.1.: Analysis of real-world web search results presented in Figure 1.2 concerning the distribution of advertisements and actual results.

# % px %

actual results 10 52.6 395,850 44.0 advertisements 9 47.4 504,165 56.0

— above 4 21.1 231,420 25.7

— below 5 26.3 272,745 30.3

total 19 100.0 900,015 100.0

Tab. 1.2.: jQMetrics (Nebeling, 2012, Ch. 7) for the SERP featuring the web search results presented in Figure 1.2 (** indicates critical values, * indicates borderline values).

15.6′′screen 22′′ screen

browser viewport of 1665 × 913 px. Results are summarized in Table 1.2. In fact, half of the metrics have critical values on both machines.

At the company developing the said search engine, we conducted interviews with the personnel in charge of interface testing. From these interviews, we learned that in its current state, the search engine is solely evaluated based on split tests, all of which define conversions (in terms of clickthroughs on advertisements and certain results) as their target metric. This underpins that the SERP interface featuring the results illustrated in Figure 1.2 is in its current state clearly company-centered rather than human-centered. This stresses the need for tailored methods to evaluate and optimize the SERP, so that a reasonable trade-off between revenue and usability can be found.

Example #2 As of November 13, 2014, when accessing Google’s SERP for web results, using the ↓ and ↑ keys on a regular keyboard scrolls the page. This is because the search box is defocused upon submitting the search query. In contrast, DuckDuckGo follows a different approach. They go through their list of search results when pushing ↓ or ↑, clearly highlighting the currently focused result with a gray box. Finally, when trying the same on Ecosia, using the ↓ or ↑ keys focuses the search box. That is, after submitting the search query and clicking somewhere on the page to defocus the search box, pushing any of the arrow keys focuses it again. This is counterintuitive from the user’s perspective, as in the context of a SERP arrow keys communicate different functionality than focusing the search box—i.e., either scrolling or sequentially going through the results. While the shortcomings of example #1 might arise out of strategical considerations (e.g., revenue maximization), this example represents a careless mistake. Although suchlike might not seem to be of top

1.1 Motivation 7

priority, they are annoying for the user and must be avoided by engaging appropriate means for usability evaluation and optimization.

Besides the above examples, in a survey17 with 118 participants we asked users to rate the appeal of different real-world SERP interfaces on a five-point scale (1 = not appealing at all, 5 = very appealing). While Google stood at the top with an average rating of 4.32 (σ = 0.85), the remaining three interfaces clearly fell behind with average ratings of 3.04 (DuckDuckGo, σ = 1.07), 2.66 (Qwant, σ = 1.27) and 2.26 (a non-public interface provided by the cooperating company, σ = 1.12).

All in all, there is a strong need for a novel holistic approach to evaluate and optimize SERP usability, because:

1. existing methods are of a more general nature and do not consider SERPs as a specific case that requires special attention;

2. existing approaches to effective usability evaluation do not seem efficient from an e-commerce company’s point of view;

3. methods applied in industry are mostly not as effective in determining usability; and

4. even up-to-date search interfaces (particularly from novel competitors) show short-comings with respect to their usability.