6.6 Experiments and Discussion
7.1.1 Model Description
We model the generation of the ratings R as a function of two low rank latent matrices U ∈ RU ×D and V ∈ RI×D, where D min(U, I). The value of r
u,i is determined by
i) the scalar uT
uvi, where uu is the vector contained in the u-th row of U and vi in
the i-th row of V, and ii) a partition of the real line into R − 1 contiguous intervals with boundaries bi,0 < . . . < bi,R, where bi,0 = −∞ and bi,R = ∞. The value of ru,i
is obtained as a function of the interval in which uT
uvi lies. Note that the interval
boundaries are different for each column of R. A simple model would be ru,i = k
if u>uvi ∈ (bi,k−1, bu,k]. However, due to noise, there may be no bi,0, . . . , bi,R, U and
V that guarantee u>
uvi ∈ (bi,ru,i−1, bi,ru,i] for all of the observed ratings in RD. We model this by adding zero-mean Gaussian noise eu,ito u>uvibefore generating ru,i, and
introducing the latent variable au,i= u>uvi+ eu,i. The probability of ru,igiven au,i and
bi = (bi,1, . . . , bi,R−1) is
p(ru,i|au,i, bi) = ru,i−1
Y
k=1
Θ[au,i− bi,d] R−1 Y k=ru,i Θ[bj,k− au,i] = R−1 Y k=1
Θ [sign[ru,i− k − 0.5](au,i− bi,d)] , (7.1)
where Θ denotes the Heaviside step function; Θ[x] = 1 for x ≥ 1 and zero otherwise. Thus, the likelihood (7.1) takes value 1 when au,i ∈ (bru,i−1, bru,i] and 0 otherwise. Note the dependence of (7.1) on all the entries in bi and not only on bru,i−1 and bru,i. The prior for the vector of boundary variables bi for the i-th column of R is hierarchical Gaussian, p(bi|b0) =
QR−1
k=1 N(bi,k; b0,k, v0), where b0 is a vector of base
interval boundaries, with prior p(b0) =QR−1k=1 N(b0,k; mkb0, v0). mb10, . . . , m b0
R−1 and v0
are hyperparameters. Note that although the boundaries may cross a priori, crossed boundaries have zero likelihood, so the posterior means remain in order.
We include heteroscedasticity in the additive noise eu,i across users and items. For
this, eu,i follows a priori a zero-mean Gaussian distribution with variance γirow× γcolj ,
where γrow
i and γjcol are factors that specify the variance of eu,i for the u-th row and
i-th column of R. We define cu,i= uTuvi and assume that the conditional distribution
of au,i given cu,i, γirow and γjcolis p(au,i|cu,i, γirow, γjcol) =N(au,i; cu,i, γrowi γjcol). To learn
the user and item specific noise levels we put inverse Gamma priors on γrow
i and γjcol.
For robustness to fixing parameter values, we use a hierarchical Gaussian prior for U and V, that is, p(U|mU, vU) = QU
u=1
QD
d=1N(uu,d; mUd, vdU) and p(V|mV, vV) =
QI
i=1
QD
d=1N(vi,d; mVd, vdV), where mU and mVare mean parameters for the rows of U
and V respectively. We select factorized standard Gaussian priors for these parameters. Similarly, vU and vV are variance parameters for the rows of U and V and are given
factorized inverse Gamma priors.
Lastly, let CDbe the set of variables cu,ifor which ru,iis observed, then p(CD|U, V) =
Q
(u,i)∈Dδ(cu,i− u>uvi). Similarly we collect the variables au,i for the observed ratings
RD denote the set of entries in R that are observed. Then the likelihood factorizes as p(RD|AD, B) =Q
(u,i)∈Dp(ru,i|au,i, bi). Given RD, the posterior distribution over all
of the variables Ξ = {AD, CD, U, V, B, γrow, γcol, b
0, mU, mV, vU, vV} is
p(Ξ|RD) =p(RD|AD, B)p(AD|CD, γrow, γcol)p(CD|U, V)p(U|mU, vU)p(V|mV, vV)
p(B|b0)p(b0)p(γrow)p(γcol)p(mU)p(mV)p(vU)p(vV)[p(RD)]−1, (7.2)
where p(RD) is the normalization constant. Conditioning on hyperparameters has been omitted for clarity.
Hyperparameter Values
All of the Gaussian hyper-priors were given standard Normal distributions. The prior means mb0
1 , . . . , m b0
L−1 were set to form an evenly spaced grid on the interval [−6, 6],
as suggested inPaquet et al.[2012]. The prior variance v0 for each component of b0 is
initialized to v0 = 0.1.
The hyperparameters aγ0 and bγ0 for the priors on γrow
u and γicol are set to a γ 0 = 5
and bγ0 = 5√10. This yields a mean value of 10 for the product of γrow
i and γjcol, which
is the recommended noise level in the (homoscedastic) ordinal MF model in Paquet et al. [2012]. The other inverse gamma hyperpriors were given values au
0 = av0 = 5 and
bu
0 = bv0 = 5. The is equivalent to having seen a random sample of size 10 with unit
empirical variance.