• No results found

The FSMRank learning algorithm, obtained from Lai et al [ 122 ]

6.5 l r u f

LRUF [208] is Learning to Rank method based on the theory of learning auto-

mata. Learning automata are adaptive decision-making units that learn the op- timal set of actions through repeated interactions with its environment. Variable structure learning automata can be defined as a triple< β,α,L >, withβbeing the set of inputs, αthe set of actions.L, the learning algorithm, is a recurrence relation that modifies the action probability vector of the actions inα.

Three well-known learning algorithms for variable structure learning auto- mata that can be used as L are linear reward-penalty (LRP), linear reward-- penalty (LR−P) andlinear reward-inaction(LR−I). LRUF uses theLR−I learning

at the moment that the selected actionai(k)is performed and rewarded by the environment. pj(k+1) =    pj(k) +a[1−pj(k)] ifj=i (1−a)pj(k) otherwise (14)

where iandjare indices of action in αandp(k) is the probability vector over the action set at rankk.ais the learning rate of the model.

A variable action-set learning automaton is an automaton in which the num- ber of available actions change over time. It has been shown that a combination of a variable action-set learning automaton with theLRIlearning algorithm is absolutely expedient and-optimal [204], which means that it is guaranteed to

approximate the optimal solution to some valueand each update step is guar- anteed not to decrease performance of the model. A variable-action set learning automaton has a finite set ofractionsα=α1, ...,αr.Ais defined as the power set of α, A = A1, ..,Am with m = 2r−1. A(k) is the subset of all action that

can be chosen by the learning automaton at each instant k.ψ(k)is a probability distribution over A such that ψ(k) = p(A(k) = Ai|Ai ∈ A,1 6 i 6 2r−1).

ˆ

pi(k)is the probability of choosing action αigiven that action subsetA(k)has

already been selected andαi∈A(k).

LRUF uses an optimisation problem that can be illustrated as a quintuple

< qi,Ai,di,Ri,fi>, whereqi is a submitted query,Ai is a variable action-set learning automaton, di is a set of documents associated with qi, and Ri = rji|∀dji∈diis a ranking function that assigns rankrji to each document.fiis a feedback set used for the update step described in Equation14. For each rank

k the learning automaton Ai chooses one of its actions following its probabil-

ity vector, jointly forming Ri. LRUF translates the feedback in fi to an under- standable value to use it to converge the action probability vector in optimal configuration. Therefore LRUF defines a gi : fi → R+ to be a mapping from

the feedback set into a positive real number. The LRUF mapping function gi

computes the average relevance score of rankingRi based onfiand is defined as shown in Equation15. gi= 1 Ni X dji∈fi a(rji)−1 (15)

in whichNirefers to the size of feedback setfi, rjiis the rank of documentdji

inRiand, again,adenotes the learning rate. Initially, before any feedback, the

action probability factors are initialised with equal probability.

In case the document set of the search engine changes, i.e. new documents are indexed or old documents are removed, LRUF has an Increase Action-Set Size (IAS) and a Reduce Action-Set Size (RAS) procedure to adapt to the new document set without needing a complete retraining of the model. Because this thesis focusses on model accuracy and scalability the IAS and RAS procedure of LRUF will not be explained in further detail.

6.5 l r u f 55

Torkestani [208] does not provide pseudo-code specifically for the training

phase of the LRUF algorithm. Instead an algorithm including both ranking and automaton update is presented, and included in Algorithm6. Initial training of the algorithm can be performed by using the training data relevance labels in the feedback set of the algorithm. Ti in this algorithm is a dynamic relevance threshold initially set to zero.niis the number of documents to rank usingRi.

Data: Queryqi, Number of resultsni

1 Assume: LetAibe the learning automaton corresponding to queryqiwith

action-setαi

2 Actionαji∈αiis associated with documentdji∈di

3 Letkdenote the stage number 4 LetGbe the total relevance score 5 Initialise:k←1,Ti←0

6 whilek6nido

7 Aichooses one of its actions (e.g.aki) at random

8 Documentdjicorresponding to selected actionαjiis ranked atKth position ofRi

9 Configuration ofAiis updated by disabling actionαji

10 k←k+1 11 end

12 RankingRiis shown to the user

13 Ni←0,fi← ∅,G←0

14 repeat

15 forevery documentdjivisited by userdo 16 fi←fi+dji

17 G←G+a∗(rji)−1 18 end

19 Ni←Ni+1

20 untilquery session is expired; 21 gi← NGi

22 Configuration ofAiis updated by re-enabling all disabled actions

23 ifgi>Tithen

24 Reward the actions corresponding to all visited documents by Equation 14

25 Ti←gi

26 end

27 for∀αji∈αido

28 ifpji< T then

29 djiis replaced by another document of the searched results 30 end

31 end 32 OutputRi

Related documents