Probabilistic Model. Search Engine

(1)

Probabilistic Model

(2)

Probability: Definitions & Notations

 Probability is a numerical measure of the likelihood of a given event.

 number of desirable outcome / total number of possible outcomes

 P(A) denotes the probability that event A will occur

 1 - P(A) : probability that A will not occur



 0.00  P(A)  1.00

 The probability that any event A will occur varies from 0 to 1.

+---+---+

0 .5 1 never occur always occur half of the time occur

) ( 1 ) ( ) ( )

( A P A P A P A

P  

^c

  

(3)

Probability: Definitions & Notations

 P(A or B)  P(A  B)  The probability that either A or B will occur

 Mutually Exclusive events

 Events A and B cannot occur simultaneously

 e.g. A = coin toss turns up a head B = coin toss turns up a tail

 Not Mutually Exclusive events

 Events A and B can occur simultaneously

 e.g. A = a card drawn is an ace B = a card drawn is a heart

 P(A and B)  P(A  B)  The probability that both A and B will occur

 Independent Events

 The occurrence of A is independent of the occurrence of B

 e.g. A = the first coin toss turns up a head B = the second coin toss turns up a tail

 Dependent Events

 The occurrence of A is not independent of the occurrence of B

 e.g. A = the first card drawn is an ace

B = the second card drawn (without replacement) is a heart

 P(B|A)  The probability that B will occur given A has occurred.

(4)

Probability: ^Examples



P(A or B)  P(A  B)  The probability that either A or B will occur

 Mutually Exclusive events

 Events A and B cannot occur simultaneously

 P(A  B) = P(A) + P(B)

 What is the probability of rolling a die and getting 1 or 6?

 A = The die rolls a 1

 B = The die rolls a 6

 P(A) = 1/6

 P(B) = 1/6

 P(A  B) = 1/6 + 1/6 = 1/3

 Not Mutually Exclusive events

 Events A and b can occur simultaneously

 P(A  B) = P(A) + P(B) - P(A  B)

 What is the probability that a card drawn from a deck is an ace or spade?

 A = The card is an ace

 B = The card is a spade

 P(A) = 4/52

 P(B) = 13/52

 P(A  B) = 1/52

 P(A  B) = 4/52 + 13/52 – 1/52 = 16/52 = 4/13

(5)

Probability: ^Examples



P(A and B)  P(A  B)  The probability that both A and B will occur

 Independent Events

 The occurrence of A is independent of the occurrence of B

 P(A  B) = P(A) P(B)

 What is the probability that fair coin tosses will come up heads twice in a row?

 A = a head on the first toss

 B = a head on the second toss

 P(A) = P(B) = 1/2

 P(A  B) = 1/2*1/2 = 1/4

 Dependent Events

 The occurrence of A is not independent of the occurrence of B

 P(A  B) = P(A) P(B|A) = P(B) P(A|B)

 What is the probability that two cards drawn from a deck will both be aces?

 A = the first card is an ace

 B = the second card is an ace

 P(A) = 4/52 = 1/13

 P(B|A) = 3/51 = 1/17

 P(A  B) = 1/13*1/17 = 1/221

(6)



Addition

• Mutually Exclusive

− P(A  B  C) = P(A) + P(B) + P(C)

• Not Mutually Exclusive

− P(A  B  C) = P(A) + P(B) + P(C) – P(A  B) - P( (A  B)  C)

 AB = A  B

 P(A  B  C) = P(AB  C) = P(AB) + P(C) – P(AB  C)

» P(AB) = P(A) + P(B) – P(A  B)

» P(AB  C) = P( (A  B)  C)



Multiplication

• Independent

− P(A  B  C) = P(A) P(B) P(C)

• Dependent

− P(A  B  C) = P(A) P(B|A) P(C|(A  B))

 AB = A  B

 P(A  B  C) = P(AB  C) = P(AB) P(C|AB)

» P(AB) = P(A) P(B|A)

Combination of Probability

OR  addition

P(A  B) AND  multiplication

P(A  B)

Mutually Exclusive P(A) + P(B)

NOT Mutually Exclusive P(A) + P(B) – P(A  B)

Independent P(A) P(B)

Dependent P(A) P(B|A)

P(A) P(B)

Not Mutually Exclusive Events P(A  B)

P(A) P(B)

Mutually Exclusive Events

(7)

Probability: ^Examples



P(A|B)  The probability that A will occur given B has occurred.

 The conditional probability of A given B

 P(A|B) = P(A  B) / P(B)

 Derived from P(A  B) = P(B) P(A|B)

 At a restaurant, 90% of the customers order a burger. If 72% of the customers order a burger and fries, what is the probability that a customer who orders a burger will also order fries?

 A = a customer orders fries

 B = a customer orders a burger

 P(B) = 90/100, P(A  B) = 72/100

 P(A|B) = (72/100) / (90/100) = 72/90

(8)

Probability: Bayes Theorem

 Formula for calculating conditional probabilities

 P(A|B) as a function of P(B|A)

 Bayesian Inference

 Learning about a hypothesis from data

 Hypothesis

 Prior probability: initial subjective degree of belief in the hypothesis

 Learning

 Posterior probablity: the degree of belief in the hypothesis is updated based on evidence



 H = hypothesis, E = evidence

 P(H) = prior probability of H

 P(E) = marginal probability of E

 Probability of witnessing the evidence E

 P(E|H) = likelihood of E given H

 Conditional probability of seeing the evidence E given that hypothesis H is true

 P(H|E) = posterior probability of H given E

)

| ( ) ( )

| ( ) (

)

| ( ) ( )

( )

| ( ) ( )

( ) ) (

|

( P A P B A P A P B A

A B P A P B

P A B P A P B

P B A B P

A

P     

)

| ( ) ( )

| ( ) ( ) (

) ( )

(B P B A P B A P A P B A P A P B A

P      

)

| ( ) ( )

| ( ) (

)

| ( ) ( )

(

)

| ( ) ) (

|

( P H P E H P H P E H

H E P H P E

P

H E P H E P

H

P   

(9)

Probability: Bayesian Inference

 Box 1 has 10 oranges and 20 apples. Box 2 has 20 oranges and 10 apples. If you randomly picked out a fruit from a random box and got an apple, what is the probability that it came from box 1?

 H= The fruit (e.g., apple) comes from box 1

 H^c= The fruit (e.g., apple) comes from box 2

 E = picks an apple

 P(H) = P(H^C) = ½

 P(E|H) = 20/30

 P(E|H^C) = 10/30

 P(H|E) = (½ *2/3) / (½ *2/3 + ½ *1/3)

)

| ( ) ( )

| ( ) (

)

| ( ) ( )

(

)

| ( ) ) (

|

( P H P E H P H P E H

H E P H P E

P

H E P H E P

H

P   

(10)

Probability: ^Odds



Probability and odds are two ways of measuring the chances of an event occurring.



Probability

• number of desirable outcome / total number of possible outcomes

− occurrence / whole

• chances for / total chances

• predictive: long run ratio



Odds

• number of desirable outcome / number of undesirable outcomes

− occurrence / non-occurrence

• chances for / chances against

− O(A) = 2 / 1  odds are 2 to 1 in favor of A

− O(A) = 1 / 2  odds are 2 to 1 against A

• payoff to break even



Probability and odds have similar values for infrequent events (P < 0.1)

Probability Odds Transformation

Range 0 to 1 0 to



Comparison

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

 9.00 4.00 2.33 1.50 1.00 0.67 0.43 0.25 0.11 0.00

) ( 1

) ) (

( O A

A A O

P  

) ( 1

) ) (

( P A

A A P

O  

(11)

Logarithm



• Logarithm of x to base b

• x = b^y



Common Logarithm

• Logarithm to base 10

•



Natural Logaritm

• Logarithm to base e, where e = 2.718281828…

•



Properties

•



Log Transformation

− x = b^y,

−

y

b

x  log

x x log log

₁₀



x

e

x ln

log 

2 1

2

1 log log

logx x  x  x

2 1

2

1 log log

log x x

x

x  

x k x^k log

log 

b x x

c c

b log

log log

y

b x

log

b x b

y b

x _c ^y _c _b _c

c log log log log

log   

logx



(12)

IR: Probabilistic Model



Probability Ranking Principle

• Assumptions

− Each document is either relevant or not relevant

− Relevance of one document does not affect that of another document

• Ranking documents according to the probability of relevance to the query

 Maximizes the overall retrieval effectiveness



Binary Independence Retrieval Model

• Represent documents d as binary term vectors

−

• Rank documents according to the relevance odds (i.e., probability of relevance)

1. compute the odds that document d will be relevant:



» : probability that d is relevant

» : probability that d is not relevant

2. sort the documents by the descending order of

• Term Independence Assumption

− Presence/absence of one term is unrelated to the presence/absence of any other term.

− assumes that occurrence of terms in documents are independent from each other







otherwise d to belongs t

t if where t

t

d _n _i ⁱ

, 0

, ), 1

,..., (₁

)

| (

)

| ) (

|

( P R d

d R d P

R

O 

 

 )

| (R d P

)

| (R d P

)

| (R d

O 

(13)

IR: BIR Model Derivation

)

| (

)

| ( ) (

) (

R d P

R d P R P

R

P 





 O(R)

 )

| (

)

| ) (

|

( P R d

d R d P

R

O 

 



) (

)

| ( ) (

) (

)

| ( ) (

d P

R d P R P

d P

R d P R P



 ( | )

)

| (

1P t R

R t P

i n i i

apply Bayes Theorem assume Term Independence

P(R) = probability that a (randomly selected) document is relevant

split by presence/absence of terms

)

| 0 (

)

| 0 ( )

| 1 (

)

| 1 ) (

(

0

1 P t R

R t

P R t P

R t R P

O

i i i t

i

t_i _i 

 



 

  

Ignore , which is constant across documents, and take the log:

)

| 1

(t R

P p_i  _i 

)

| 1

(t R

P q_i  _i 

i i i t

i

t q

p q

R p O d R O

i

i 

 



   1

) 1 ( )

| (

0 1

 ⁱ ⁱ

i i

t

i i t

i i n

i i

i i n

i

i i i n

i t

i t i

t i t i n

i q

p q

R p O t

q if p

t q if p q

q p R p

O







 



 







 



 















 

 





 

 



1

1 1

1

1 1

) 1 ( 0 ),

1 (

) 1 (

1 ,

) 1 (

) 1 ) (

(

Let : probability that t_i occurs in a relevant document : probability that t_i occurs in a non-relevant document

) (

) ) (

( P R

R R P

O 







 













 







 



 



 















 







 



 





 ⁿ

i

t

i i t

i i n

i

i i

q p q

p q

d p RSV

1

1 1

log 1 1

log 1 )

(

 



 

 







 



 





 







 



 ⁿ

i i

i

i i i

i i i n

i i

i q

p q

t p q t p q

t p q

t p

1

1 1

log1 1

log1 1 log

log1 ) 1 ( log

 



   

 



 















 



  ⁿ

i

n

i i

i

i i

i i i n

i i

i

i i i i

i q

p p

q q t p

q p

q p q p t

1 1

log1 )

1 (

) 1 log ( 1

log1

1

log1 Ignore ,

which is constant across documents



 



n

i i

i

q p

1 1

log1





      



 

 ⁿ

i i n

n i

x x

x x x

1 2

1 1

log log

..

log log ) ..

log(

log

(14)

IR: Term Relevance Weight



 

 ⁿ 

i i i

i i

i q p

q t p

d RSV

1 (1 )

) 1 log ( )

(



tr

_k

= Log odds ratio



Probability Estimation for Term Relevance Weight

• Without relevance information

− Known

 N_d = total number of documents in collection (collection size)

 d_k = number of documents in which term k appears (postings)

− Estimated

 q_k = probability that term k occurs in a non-relevant document

» q_k  d_k / Nd (conventional) q_k = (d_k+0.5) / (N_d+1)

» assume that number of non-relevant documents  collection size

 p_k = probability that term k occurs in a relevant document

» p_k  0.5

» assume random chance

− Same as idf for large N_d

) (

) log ( ) 1 log( ) 1 log( ) 1 log( ) 1 log( ) 1 (

) 1 log (

k k

k O q

p O q

q p

p q

q p

p p

q q

tr p 

 

 

 

 



 



 





relevant document non

a in appears k

term that odds

document relevant

a in appears k

term that log odds

k k d

d k d

k d

d k d k

k k

k d

d N N

d N

N d N d q

q p

p p

q q

tr p 











 

 

 

 

  0 log log

1 5 log . 0 1

5 . log 0 ) 1 log( ) 1 log( ) 1 (

) 1 log (

5 . 0

5 . log 0

5 . 0

) 5 . 0 ( log 1

1 5 . 0

1 5 . 1 0

log ) 0

1 log( ) 1 log( ) 1 (

) 1 log (

: 



 





 



 

 



 

 

 

 

k k d k

k d

k d k

k k k

k k

k k k

k d

d N d

d N

N d

N d q

q p

p p

q q tr p

al Convention

(15)

IR: Term Relevance Weight



Probability Estimation for Term Relevance Weight

• With relevance information

− Known

 N_d = total number of documents in collection (collection size)

 d_k = number of documents in which term k appears (postings)

 N_r = total number of relevant documents in collection

» N_d - N_r = total number of non-relevant documents in collection

 r_k = number of relevant documents in which term k appears

» d_k - r_k = number of non-relevant documents in which term k appears

− Estimated

» (conventional)

r d

k k

k N N

r q d



 

rk

p   r 0.5

1 5 . 0







 

r d

k k

k N N

r q d

Relevant Documents

Non-relevant Documents

Total

Documents

with Term k r_k d_k – r_k D_k

Documents

without Term k N_r – r_k N_d – N_r–(d_k – r_k) N_d – d_k

Total N_r N_d – N_r N_d

(16)

IR: Term Relevance Weight



Probability Estimation for Term Relevance Weight

• With relevance information

» (conventional)

(conventional)



 











 





 







 







 



 

k r k d

k k

k r

k

r d

k k r d

r d

k k

r k r

r k

k k

k k k

r N d N

r d

r N

r

N N

r d N N

N N

r d

N r N

N r

q q

p p p

q q

tr p log log

1 log1 ) 1 (

) 1 log (

r d

k k

k N N

r q d



 

r k

k N

p  r

1 5 . 0



 

r k

k N

p r

1 5 . 0







 

r d

k k

k N N

r q d



 















 



































  





















 



 

 

5 . 0 5 . 0

5 . 0 5 . 0 log

1 5 . 0 1

5 . 0 1

1 5 . 0

) log 1

( ) 1 log (

k r k d

k k

k r

k

r d

k k r

d

r d

k k

r k r

r k

k k

k k k

r N d N

r d

r N

r

N N

r d N

N

N N

r d

N r N

N r

p q

q tr p

Probabilistic Model. Search Engine