• No results found

Based on the personas above, the following section shall illustrate a number of scenarios that shed light on typical problems with the design, development and usage of novel SERP interfaces. From each scenario, we derive a specific problem that is converted into a concrete requirement. Using the formulated problems as a starting point, the remainder of this thesis consequently aims at the fulfillment of the corresponding requirements. The scenarios orient at actual observations that have been made in the industry context of this thesis.

2.3.1 Scenario #1: Tracking Interactions

Rey tracks a number of client-side interaction features on NoSpySearch’s SERP interface that she wants to interpret for inferring appropriate optimizations, because she knows that in general, searchers are not keen on answering questionnaires (which is also reflected by the fact that only very few searchers use NoSpySearch’s feedback function). The scenario she

6https://adblockplus.org/ (Dec. 21, 2015).

7https://www.mozilla.org/en-US/firefox/new/ (Dec. 21, 2015).

8http://www.apple.com/safari/ (Dec. 21, 2015).

9https://www.google.com/chrome/browser/desktop/index.html (Dec. 21, 2015).

10http://stackoverflow.com/ (Dec. 21, 2015).

11https://github.com/ (Dec. 21, 2015).

2.3 Scenarios 19

has in mind is that, e.g., more and faster cursor movements on certain parts of the interface indicate that those parts are confusing the searcher und thus require adjustments. However, all of this is only based on assumptions and educated guesses that are not backed by any data so far. Hence, her plan is to collect a certain amount of training data for correlation to be able to make well-founded statements about the tracked interactions (e.g., faster cursor speed negatively correlates with searcher confusion). Rey decides to build on the System Usability Scale (abbreviated “SUS”; Brooke, 1996), which is a de facto industry standard (usability.gov, 2013), for collecting interactions and statements about interface usability from a fraction of her searchers. However, she notices that the instrument features a rather complex scoring system (usability.gov, 2013), is not diagnostic (usability.gov, 2013) and does not have the right level of abstraction for what she has in mind. That is, items such as “I felt very confident using the system” (usability.gov, 2013) do not hint at specific issues with the SERP interface.

Finn has set NoSpySearch as his default search engine. Yet, he frequently notices that he falls back on Google because he has the feeling that he is confused by NoSpySearch’s SERP interface and cannot discover what he is looking for quickly enough. One day, after having submitted a search query, he is presented with an SUS questionnaire. Because Finn is aware of the fact that NoSpySearch is still a start-up and wants to help, he answers the questionnaire. However, he finds the questionnaire to be unnecessarily time-consuming and ambiguous and decides to not answer the questionnaire again in the future.

From the above scenario, the following problem and corresponding requirement become evident:

Problem 2.1 Searchers do not want to interact with suboptimal SERP interfaces. If they are not able to find what they are looking for, they abandon their search. Hence, SERP interfaces require constant improvement to create loyal searchers. However, corresponding questionnaires for evaluation are usually perceived as time-consuming and cumbersome.

Requirement 2.1 SERP developers require an instrument for meaningful correlations with client-side user interactions. These correlations can serve as the basis for inferring the usability of a SERP interface directly from interactions.

2.3.2 Scenario #2: Split Testing

Rey—following the instructions of C-level management—has implemented and deployed an split testing solution for evaluating NoSpySearch’s SERP interface. The applied split testing system aims at maximizing conversions. In the case of NoSpySearch, this means optimizing for clicks on sponsored search results, which is the business model of the novel search engine. Yet, the results of the split tests are skewed due to the wide-spread use of ad blockers12. Also, from a few searchers, Rey has received answers to an SUS questionnaire, which indicate that their SERP still has room for improvement. Hence, she would like to additionally use the split testing setup for optimizing the usability of the SERP interface

12http://www.statista.com/statistics/435252/adblock-users-worldwide/ (Dec. 22, 2015).

20 Chapter 2 Problem Statement

since conversion maximization does not focus on this matter. She is convinced that the client-side user interactions she is already tracking can contribute to this once they can be meaningfully correlated with aspects of usability. What Rey has in mind is a quantitative measure of usability that can be directly inferred from the interaction data and used as a target metric in addition to the number of conversions.

Finn has been using NoSpySearch for quite some time now. However, he keeps falling back on Google regularly when he feels he cannot discover what he is looking for on NoSpySearch, because he could not notice considerable improvements concerning the SERP interface lately.

From the above scenario, the following problem and corresponding requirement become evident:

Problem 2.2 Split testing is a highly efficient way of evaluating SERP interfaces that is com-monly applied in industry. Yet, split tests mostly aim at conversion maximization, which does not particularly focus on better usability. That is, the performed evaluations are company-centered rather than human-centered and do not lead to noticeable improvements from the searcher perspective.

Requirement 2.2 SERP developers must be able to also use split testing setups for effectively optimizing usability in addition to conversion maximization.

2.3.3 Scenario #3: Analyzing Interactions

Rey has started to evaluate and interpret the client-side user interactions she is tracking on NoSpySearch’s SERP interface. Due to the lack of an appropriate instrument for correlation with aspects of usability and corresponding training data, this must happen mostly in terms of heat maps. These heat maps illustrate the distribution of mouse cursor positions as well as the clicks performed by the searcher. While the analysis of potential usability problems requires a considerable amount of manual work, it can moreover only be based on assumptions and educated guesses. For instance, Rey must assume that regions of the SERP interface receiving more attention in terms of mouse cursor interaction are more relevant to the user than regions receiving less attention. Hence, such regions are candidates for, e.g., being specifically highlighted, either by means of positioning or layout. Rey regularly discusses the evaluations and potential optimizations with NoSpySearch’s design team, which is often time-consuming. Still, the outcomes of these discussions are not based on hard evidence. Thus, Rey desires a more automatic approach to evaluation and proposing adequate adjustments, which would improve communication and the process of SERP optimization in terms of efficiency and objectivity.

Recently, Finn has noticed some slight improvements regarding the usability of the NoSpy-Search SERP interface. For instance, the list of related search terms has been moved to a more prominent place on the page, where it is more easily reachable. When consuming textual and/or visual information on the SERP—such as the abstracts of search results or the

2.3 Scenarios 21

info box presenting semantic information for the current search query—he usually moves the mouse cursor to one of the corners of the browser window, where it does not obscure his desired information.

From the above scenario, the following problem and corresponding requirement become evident:

Problem 2.3 Manual analysis and inspection of user interactions is a time-consuming process that is based on assumptions and educated guesses rather than hard evidence. In particular, user behavior might be contradictory to the assumptions that these evaluations are based on and hence, the resulting optimizations might not be what searchers need.

Requirement 2.3 For a more efficient and objective process, SERP developers require an automatic approach to evaluating SERP interfaces that in particular also infers adequate optimizations.

2.3.4 Scenario #4: Result Relevance

The ranking function delivering NoSpySearch’s results—which has been partly implemented by Rey—makes use of implicit searcher feedback by applying a model that interprets clicks on results (i.e., a clicked result is assumed to be relevant). However, the model also takes into account the searcher’s dwell time on a landing page after clicking a result. That is, it assumes that a dwell time of less than 30 seconds indicates an irrelevant result, although a click has happened (cf. Q. Guo and Agichtein, 2012). Recently, NoSpySearch has introduced info boxes displaying semantic information for the current search query, thus intending to answer the query directly on the SERP interface. Yet, this also means that no clicks happen and no dwell time can be measured on a landing page, which makes the aforementioned model unapplicable for this new kind of search result. Rather, Rey assumes that the client-side interactions she is tracking anyway can be used to determine relevance in this respect, e.g., as a replacement for dwell time measurements. She furthermore believes that the additional interactions can complement the clicks used for predicting the relevance of standard search results.

Recently, using NoSpySearch, Finn searched for the release date of “The Ridiculous 6” on Netflix. He clicked on the first result, which led him to the Wikipedia page for the movie13. Right away he found the information that it can be watched on Netflix since December 11, 2015. This means he returned to the SERP interface after approximately ten seconds and already having found his desired information. Out of interest, Finn skimmed through the remaining search results and found a link to an interview with the director of the movie, Frank Coraci. He clicked the link and read the whole interview, hence returning to the SERP interface after much more than 30 seconds. Based on the above behavior, the model applied by NoSpySearch assumed the link to the interview to be more relevant for the query than the Wikipedia page. Thus, the two results were considered for re-ranking.

13https://en.wikipedia.org/wiki/The_Ridiculous_6 (Dec. 23, 2015).

22 Chapter 2 Problem Statement

From the above scenario, the following problem and corresponding requirement become evident:

Problem 2.4 Current click models for determining result relevance do not consider novel kinds of search results. Moreover, they are limited in their interpretations due to the low number of considered features (e.g., clicks + dwell time). Hence, searchers might be presented with less relevant results due to wrong interpretations of their behavior in modern search engines.

Requirement 2.4 SERP developers require relevance models that leverage a broader range of interactions beyond clicks, which leads to more accurate predictions and more satisfied searchers.