6.5 l r u f
LRUF [208] is Learning to Rank method based on the theory of learning auto-
mata. Learning automata are adaptive decision-making units that learn the op- timal set of actions through repeated interactions with its environment. Variable structure learning automata can be defined as a triple< β,α,L >, withβbeing the set of inputs, αthe set of actions.L, the learning algorithm, is a recurrence relation that modifies the action probability vector of the actions inα.
Three well-known learning algorithms for variable structure learning auto- mata that can be used as L are linear reward-penalty (LR−P), linear reward-- penalty (LR−P) andlinear reward-inaction(LR−I). LRUF uses theLR−I learning
at the moment that the selected actionai(k)is performed and rewarded by the environment. pj(k+1) = pj(k) +a[1−pj(k)] ifj=i (1−a)pj(k) otherwise (14)
where iandjare indices of action in αandp(k) is the probability vector over the action set at rankk.ais the learning rate of the model.
A variable action-set learning automaton is an automaton in which the num- ber of available actions change over time. It has been shown that a combination of a variable action-set learning automaton with theLR−Ilearning algorithm is absolutely expedient and-optimal [204], which means that it is guaranteed to
approximate the optimal solution to some valueand each update step is guar- anteed not to decrease performance of the model. A variable-action set learning automaton has a finite set ofractionsα=α1, ...,αr.Ais defined as the power set of α, A = A1, ..,Am with m = 2r−1. A(k) is the subset of all action that
can be chosen by the learning automaton at each instant k.ψ(k)is a probability distribution over A such that ψ(k) = p(A(k) = Ai|Ai ∈ A,1 6 i 6 2r−1).
ˆ
pi(k)is the probability of choosing action αigiven that action subsetA(k)has
already been selected andαi∈A(k).
LRUF uses an optimisation problem that can be illustrated as a quintuple
< qi,Ai,di,Ri,fi>, whereqi is a submitted query,Ai is a variable action-set learning automaton, di is a set of documents associated with qi, and Ri = rji|∀dji∈diis a ranking function that assigns rankrji to each document.fiis a feedback set used for the update step described in Equation14. For each rank
k the learning automaton Ai chooses one of its actions following its probabil-
ity vector, jointly forming Ri. LRUF translates the feedback in fi to an under- standable value to use it to converge the action probability vector in optimal configuration. Therefore LRUF defines a gi : fi → R+ to be a mapping from
the feedback set into a positive real number. The LRUF mapping function gi
computes the average relevance score of rankingRi based onfiand is defined as shown in Equation15. gi= 1 Ni X dji∈fi a(rji)−1 (15)
in whichNirefers to the size of feedback setfi, rjiis the rank of documentdji
inRiand, again,adenotes the learning rate. Initially, before any feedback, the
action probability factors are initialised with equal probability.
In case the document set of the search engine changes, i.e. new documents are indexed or old documents are removed, LRUF has an Increase Action-Set Size (IAS) and a Reduce Action-Set Size (RAS) procedure to adapt to the new document set without needing a complete retraining of the model. Because this thesis focusses on model accuracy and scalability the IAS and RAS procedure of LRUF will not be explained in further detail.
6.5 l r u f 55
Torkestani [208] does not provide pseudo-code specifically for the training
phase of the LRUF algorithm. Instead an algorithm including both ranking and automaton update is presented, and included in Algorithm6. Initial training of the algorithm can be performed by using the training data relevance labels in the feedback set of the algorithm. Ti in this algorithm is a dynamic relevance threshold initially set to zero.niis the number of documents to rank usingRi.
Data: Queryqi, Number of resultsni
1 Assume: LetAibe the learning automaton corresponding to queryqiwith
action-setαi
2 Actionαji∈αiis associated with documentdji∈di
3 Letkdenote the stage number 4 LetGbe the total relevance score 5 Initialise:k←1,Ti←0
6 whilek6nido
7 Aichooses one of its actions (e.g.aki) at random
8 Documentdjicorresponding to selected actionαjiis ranked atKth position ofRi
9 Configuration ofAiis updated by disabling actionαji
10 k←k+1 11 end
12 RankingRiis shown to the user
13 Ni←0,fi← ∅,G←0
14 repeat
15 forevery documentdjivisited by userdo 16 fi←fi+dji
17 G←G+a∗(rji)−1 18 end
19 Ni←Ni+1
20 untilquery session is expired; 21 gi← NGi
22 Configuration ofAiis updated by re-enabling all disabled actions
23 ifgi>Tithen
24 Reward the actions corresponding to all visited documents by Equation 14
25 Ti←gi
26 end
27 for∀αji∈αido
28 ifpji< T then
29 djiis replaced by another document of the searched results 30 end
31 end 32 OutputRi