This section will formally state the multiple elements problem. The basic structure of the problem is that of a multi-armed bandit problem as described in Section 4.1 but we need to characterise the information available about each user, define measures of quality for the arms, and combine both of these into a model to determine whether a given user clicks on a chosen set of arms.
CHAPTER 4. SELECTING MULTIPLE WEBSITE ELEMENTS 79
4.2.1
Topic Preference Vector
Our model for user click behaviour is based on the idea that each user has a latent preference or state. This state provides information about what type of element a user is likely to click on. The state could take many forms. For search advertising in its simplest form it can be thought of as a search objective or the intended meaning of an ambiguous search term but it could also represent more general characteristics of the user such as their interests or some subgroup of the user population that the user belongs to. If the state was known then elements could be chosen that are appropriate for that state. However we will assume throughout that we are provided only with a probability distribution over the user’s states which we will refer to as a topic preference vector. It represents prior information or beliefs about the user’s state. How this information is derived depends on the application and is beyond the scope of this work. However, for search advertising it would reasonably come from the user population search history together with the current search. The available information about searches would be much greater than that available about adverts and for this reason we take the topic preference vectors to be unchanging whilst we do learn about the adverts. The time frame over which an advert is displayed will be small relative to the number of times a search term has been entered. We characterise the quality of arms using weight vectors corresponding to the topic preference vector where a high value in a given entry indicates that the arm has high relevance to a user with that state.
The mathematical formulation of the users and arms is as follows. At each time step
t = 1,2, . . . a user arrives with a state xt ∈ {1, . . . , n}. This state is hidden but its
distribution Xt is observed and is given by a topic preference vector qt such that
Pr(Xt = x) = qt,x. Each distribution qt is random and we will assume throughout
that Pr(qt,x = 0) < 1. In response to this we present m arms as an (ordered) set
CHAPTER 4. SELECTING MULTIPLE WEBSITE ELEMENTS 80
clicking) at most one arm. If any arm is clicked then a reward of one is received, with zero reward otherwise.
Each arm a ∈ A is characterised by a weight vector w of length n with each wa,x ∈
(0,1). Let wA denote the set of vectors {wa}, a ∈ A. The weight wa,x represents
the probability a user with state x will click element a. Each weight is not known exactly but can be learnt over time as a result of the repeated selection of arms and observation of outcomes. The outcome at each time is given by the reward together with which element (if any) is clicked. Further details on this feedback and learning process and how it can be used to inform future decisions are given in Section 4.4.1. This framework is complete if m= 1 element is to be displayed and x is known: the click probability ifa is presented is simplywa,x. However if x is latent andm >1 we
need to build a model which gives the probability of receiving a click on the set of arms A as well as determining which arm (if any) is clicked. As described earlier we assume that at most one arm is clicked at a given time so we cannot simply sum the rewards from individual arms.
4.2.2
Click Models
We introduce a statistical model of which arm, if any, a user selects at each time. The click through rate (CTR) will refer to the expected probability over all relevant unknowns (if any) of a click on some arm in a set of arms. The term arm CTR will be used if we are interested in the probability of a click on a specific arm.
A simple and popular model that addresses the question of which arm is clicked is the cascade model (Craswell et al. 2008). Arms are presented in order and the user considers each one in turn until one is clicked or there are no more left. There are issues with this model as the probability that an arm is clicked is not directly affected by its position while, in reality, it is likely that users would lose interest before looking
CHAPTER 4. SELECTING MULTIPLE WEBSITE ELEMENTS 81
at arms later in the list. However, it is not the purpose of this work to address these issues and the framework given here can readily be adapted to more complicated models.
Using the cascade model we now give models to determine the CTR of a set of arms. Two intuitive models are presented, which are shown to be extreme examples of a general click model. The consequences of the choice of model will be a major focus of this work. Each model is initially specified for knownxt but easily extends to latent
xt as given at the end of this section.
Definition 4.2.1(Probabilistic Click Model). In the Probabilistic Click Model (PCM) the user considers each arm in At independently in turn until they click one or run
out of arms. This is closely related to the probabilistic coverage model for document retrieval given in El-Arini et al. (2009). At each step, the click probability for arma
iswa,xt as in the single arm case. Therefore the CTR of the setAtfor known wA and xt is
rPCM(xt, At,wAt) = 1−
Y
a∈At
(1−wa,xt).
Although it will be shown later that using this model does encourage diversity in the arm set, an obvious issue is that a set of two identical arms gives a higher CTR than a single such arm, which is inconsistent with our intuition of realistic user behaviour when presented with very similar elements. We present a new model which avoids this problem.
Definition 4.2.2 (Threshold Click Model). In the Threshold Click Model (TCM) each user has a threshold ut drawn independently from distribution U(0,1). They
CHAPTER 4. SELECTING MULTIPLE WEBSITE ELEMENTS 82
CTR of the set At for known wA and xt is then
rTCM(xt, At,wAt) = Z 1 0 1− Y a∈At (1−1wa,xt>ut) dut. = Z maxa∈Atwa,xt 0 1 dut+ Z 1 maxa∈Atwa,xt 0 dut = max a∈At wa,xt.
The TCM thus represents a user who, with preference xt, will click an element if
its relevance wa,xt exceeds a user-specific threshold ut. These two models form the
extreme ends of the following parameterised continuum of models.
Definition 4.2.3(General Click Model). In the General Click Model (GCM) there is a single parameter d∈ [1,∞). Let at∗ ∈argmaxa∈Atwa,xt with ties broken arbitrarily.
The click probability for arma∗t iswa∗
t,xt and for all other armsa ∈A\a
∗
t it is(wa,xt)
d.
Therefore the CTR of the setAt for known wA and xt is
rGCMd (xt, At,wAt, d) = 1−(1−wa∗t,xt) Y a∈At\a∗t (1−(wa,xt) d ).
Setting d = 1 gives PCM and d → ∞ results in TCM. In Section 4.3.3 we will demonstrate how, with latent xt, the diversity of the arm set with optimal CTR
changes with d.
Since xt is unobserved a more important quantity is the expected reward over q.
Since, in any particular instance,qt and wA are fixed, we write this, the CTR, as
CTRd(A) = Ext∼qt[r
d
GCM(xt, A,wA)]. (4.2.1)
The expected reward for PCM is therefore given byCT R1(A) and for TCM we denote the expected reward by CT R∞(A). As q defines a discrete distribution taking its expectation is easy to do for all click models.
In the full model the arm weights w are not known exactly but are learnt over time by observing clicks. A model and solution for learning the weights will be given in
CHAPTER 4. SELECTING MULTIPLE WEBSITE ELEMENTS 83
Sections 4.4 and 4.5. Where arm weights are known the CTR, as given in Equation 4.2.1, is our objective function. The problem of maximising this is studied in Section 4.3 together with an investigation into set diversity.