Model Description - Experiments and Discussion

6.6 Experiments and Discussion

7.1.1 Model Description

We model the generation of the ratings R as a function of two low rank latent matrices U ∈ RU ×D and V ∈ RI×D_{, where D min(U, I). The value of r}

u,i is determined by

i) the scalar uT

uvi, where uu is the vector contained in the u-th row of U and vi in

the i-th row of V, and ii) a partition of the real line into R − 1 contiguous intervals with boundaries bi,0 < . . . < bi,R, where bi,0 = −∞ and bi,R = ∞. The value of ru,i

is obtained as a function of the interval in which uT

uvi lies. Note that the interval

boundaries are different for each column of R. A simple model would be ru,i = k

if u>_uvi ∈ (bi,k−1, bu,k]. However, due to noise, there may be no bi,0, . . . , bi,R, U and

V that guarantee u>

uvi ∈ (bi,ru,i−1, bi,ru,i] for all of the observed ratings in RD. We model this by adding zero-mean Gaussian noise eu,ito u>uvibefore generating ru,i, and

introducing the latent variable au,i= u>uvi+ eu,i. The probability of ru,igiven au,i and

bi = (bi,1, . . . , bi,R−1) is

p(ru,i|au,i, bi) = ru,i−1

k=1

Θ[au,i− bi,d] R−1 Y k=ru,i Θ[bj,k− au,i] = R−1 Y k=1

Θ [sign[ru,i− k − 0.5](au,i− bi,d)] , (7.1)

where Θ denotes the Heaviside step function; Θ[x] = 1 for x ≥ 1 and zero otherwise. Thus, the likelihood (7.1) takes value 1 when au,i ∈ (bru,i−1, bru,i] and 0 otherwise. Note the dependence of (7.1) on all the entries in bi and not only on bru,i−1 and bru,i. The prior for the vector of boundary variables bi for the i-th column of R is hierarchical Gaussian, p(bi|b0) =

QR−1

k=1 N(bi,k; b0,k, v0), where b0 is a vector of base

interval boundaries, with prior p(b0) =QR−1_k=1 N(b0,k; m_kb0, v0). mb10, . . . , m b0

R−1 and v0

are hyperparameters. Note that although the boundaries may cross a priori, crossed boundaries have zero likelihood, so the posterior means remain in order.

We include heteroscedasticity in the additive noise eu,i across users and items. For

this, eu,i follows a priori a zero-mean Gaussian distribution with variance γirow× γcolj ,

where γrow

i and γjcol are factors that specify the variance of eu,i for the u-th row and

i-th column of R. We define cu,i= uTuvi and assume that the conditional distribution

of au,i given cu,i, γirow and γjcolis p(au,i|cu,i, γirow, γjcol) =N(au,i; cu,i, γrowi γjcol). To learn

the user and item specific noise levels we put inverse Gamma priors on γrow

i and γjcol.

For robustness to fixing parameter values, we use a hierarchical Gaussian prior for U and V, that is, p(U|mU_{, v}U_{) =} QU

u=1

d=1N(uu,d; mUd, vdU) and p(V|mV, vV) =

i=1

d=1N(vi,d; mVd, vdV), where mU and mVare mean parameters for the rows of U

and V respectively. We select factorized standard Gaussian priors for these parameters. Similarly, vU _{and v}V _{are variance parameters for the rows of U and V and are given}

factorized inverse Gamma priors.

Lastly, let CDbe the set of variables cu,ifor which ru,iis observed, then p(CD|U, V) =

(u,i)∈Dδ(cu,i− u>uvi). Similarly we collect the variables au,i for the observed ratings

RD denote the set of entries in R that are observed. Then the likelihood factorizes as p(RD|AD, B) =Q

(u,i)∈Dp(ru,i|au,i, bi). Given RD, the posterior distribution over all

of the variables Ξ = {AD, CD, U, V, B, γrow_{, γ}col_{, b}

0, mU, mV, vU, vV} is

p(Ξ|RD) =p(RD|AD, B)p(AD|CD, γrow_{, γ}col_)p(CD_{|U, V)p(U|m}U_{, v}U_)p(V|mV_{, v}V₎

p(B|b0)p(b0)p(γrow)p(γcol)p(mU)p(mV)p(vU)p(vV)[p(RD)]−1, (7.2)

where p(RD) is the normalization constant. Conditioning on hyperparameters has been omitted for clarity.

Hyperparameter Values

All of the Gaussian hyper-priors were given standard Normal distributions. The prior means mb0

1 , . . . , m b0

L−1 were set to form an evenly spaced grid on the interval [−6, 6],

as suggested inPaquet et al.[2012]. The prior variance v0 for each component of b0 is

initialized to v0 = 0.1.

The hyperparameters aγ₀ and bγ₀ for the priors on γrow

u and γicol are set to a γ 0 = 5

and bγ₀ = 5√10. This yields a mean value of 10 for the product of γrow

i and γjcol, which

is the recommended noise level in the (homoscedastic) ordinal MF model in Paquet et al. [2012]. The other inverse gamma hyperpriors were given values au

0 = av0 = 5 and

0 = bv0 = 5. The is equivalent to having seen a random sample of size 10 with unit

empirical variance.

In document Efficient Bayesian active learning and matrix modelling (Page 136-138)