• No results found

6.2 User Behavior and Location Bias

7.1.2 Context-Based Click Analyses

The content of ads is the primary source of information used in the literature to study click behavior over advertisement links. In the current thesis, factors beyond those ex- tracted from ad content form the targets of study. Variability in click behavior over the advertisement links is therefore studied with respect to the context of search result pages that accommodate these links. This context can include the number of ads displayed on the result page, the positioning of ads on the page, and the user intent underlying the queries with which these ads appear. The intuition behind this idea is that ads positioned at the top of a page may receive more clicks, even if they are less relevant than other ads. Furthermore, a weakly related ad appearing with results of a commercially oriented query may receive more clicks than a strongly related ad appearing with the results of a less commercially oriented query.

Given a group of contextual factors, a notion of context CTR is proposed with respect to the search engine result pages (SERPs) that have these factors in common. The context CTR is defined as the ratio of the total number of clicks recorded for SERPs that have a particular group of contextual factors in common to the total number of appearances of such SERPs. This empirical metric is used for two purposes in the thesis:

• It is first intended to evaluate the performance of SERPs with respect to various contextual factors, and

• When aggregated over the historical impressions of a particular ad, it is intended to infer the expected quality of the ad across these impressions.

The query intent categories have been obtained from the results of the initial part of the work. The available log data supply the rank for ads, along with the total number of ads displayed on a page, but it does not record the precise locations of ads. Using statistical analysis, the click probability for different locations of a page is estimated from the log data. There are two attempts proposed for this purpose in Chapter 4. In the main attempt, a probability distribution is defined for the average clickthrough rate with respect to the studied contextual factors, including the location of ads. Since a click on an ad is the result of user interactions with a result page, various possibilities of this distribution can be seen as results of the dynamic nature of human interactions. Hence, a solution with maximum randomness is assumed to be reasonable. Thus, the entropy of the distribution, as a measure of randomness and uncertainty, is maximized in order to obtain a stable state of the system and an answer for the distribution.

Using the estimate of click probability for different locations, and the empirical estimate of click probability with respect to the other contextual factors, the findings of the work according to the first objective can be summarized as follows:

• The number of ads displayed on result pages appears to show correlation with the number of ad clicks recorded for these pages.

• In general, the placement of ads appears to have a substantial impact on the number of clicks they receive. In particular, ad clicks appear to mostly occur at the first and the second ranks, and most especially at the first rank.

• User click behavior on ads is found to be distinct for different categories of query intent, and this can indicate that the click behavior is consistent with the classification results of the general categories of query intent as explained in Chapter 3. Generally speaking, categories that involve commercial intent are the leaders among the others. This result confirms that the commercial categories of queries receive more ad clicks comparing to the others.

• Certain click behavior of different users can be justified according to their query intent. For example, we show that SERPs associated with commercial- navigational queries attract more ad clicks comparing to the SERPs associated with commercial- informational queries. A query with commercial- navigational intent may indicate a relatively more focused and goal-directed search (Danaher and Mullarkey, 2003) (the user knows the retailer of the commercial product that they want), resulting in a higher chance of a click or conversion from the user.

• Among the three sub-categories, only for the retailer category, when a specific retailer is implied by the query intent, the number of clicks for varying number of ads is always higher compared to the case where the retailer is unknown. In other words, ads are placed in a way that the ones that reflect retailer intent are more of a target of clicks than the others.

• Further investigation on the placement of ads confirms that ads displayed on top of result pages are more often the targets of clicks than the ads displayed at the side. • The difference between the clickthrough rates of top ads and side ads becomes lower

when it comes to the leading query categories (i.e. commercial-navigational and com- mercial). This observation may indicate that when the intent underlying the query is commercial, the effect of the location of ads becomes less significant. However, ads at the top are still the main targets of clicks.

Overall, the above findings suggest that contextual factors, such as the intent underlying user’s queries, the total number of ads displayed on a result page, and the rank positions of ads result in varying click behavior for the associated result pages. These contextual factors are therefore assumed to be effective in estimating the clickthrough rate for an ad that appears within a context. Hence, two models are presented that target ads within the context of the SERPs on which they appear. These models estimate the clickthrough rate of an ad as the overall probability of click that it is expected to receive across various contexts in which it is displayed in the history of its appearances.

In the first model, referred to as the baseline model, the context is defined according to the SERP/ad pair for each appearance of an ad (i.e., impression), where the context is

represented by the particular number of ads that are listed on the page and by the rank position of the ad on this page. In the second model, referred to as the query intent model, we study the impact of the identified query intent as an extra factor in the baseline model. Comparing the performance of the baseline model against the intent model suggests that the inclusion of query intent information as a contextual factor provides a better estimation of the ad’s quality.

Overall, the findings of context-based ad click analysis suggest that ad clickthrough pre- diction techniques could benefit from the query intent information and other contextual factors. However, there still remains questions and limitations that need to be addressed. For instance, what if there are other categories of query intent that should be considered? What if there exists a better taxonomy of commercial intent than the sub-categories in- troduced earlier? Even if an extensive taxonomy is obtained such that a broad range of context is covered, efforts must be carried out to label queries in various dimensions of query intent. In this way, the context model needs to be expanded across various dimensions and training data needs to be collected across various contexts.

These all provided motivation to use the earlier findings as evidence for modeling user browsing and click behavior in a semi-supervised and online fashion, which is the focus of the last part of the thesis. Instead of employing the explicit judgements of the query intent, the contextual factors will be modeled through various query– and page–dependant parameters. These parameters are learned and updated in an online fashion.