log Z(m,V)≡1 2m TV−1m+1 2log detV + n 2log(2π) and log Zi(m,V)≡log Z dxN(x|m,V)tiα(Uix).
Expectation propagation approximates the evidence p(y|θ)by Zep=Z1−n/α∏iZαi . Using the above
introduced notation this can be written as log ZEP=log Z(m,V) +1 α
∑
i h log Zj m\i,V\i+log Zm\i,V\i−log Z(m,V)i,which in the case when tidepends onUixleads to
log ZEP=log Z(m,V) + 1 α
∑
i log Zj Uim\i,UiV\iUiT +1 α∑
i h log Z Uim\i,UiV\iUiT −log Z Uim,UiV UiT i .Appendix D. A Summary of the Marginal Approximations
An explanatory list of the approximation methods in Figure 13.
• LA-TK. The Laplace approximation of Tierney and Kadane (1986). The approximation ˜
pLA-TK(x
i)is computed by using the Laplace method to approximate ci(xi)(Section 3.1). • EP-FULL. The full EP approximation of the marginal. This approximation is computed by
using EP to approximate ci(xi)(Section 4.1.1).
• EP-L. EP local. The approximation ˜pEP-L(xi)∝ εi(xi)q(xi)is obtained from cxi(x)≈1, where
εi(xi) =ti(xi)/˜ti(xi)and q(x)are computed by EP (Section 3).
• LM-L. Lapace method local. EP local. The approximation ˜pEP-L(xi)∝ εi(xi)q(xi)is obtained
from cxi(x)≈1 , whereεi(xi) =ti(xi)/˜ti(xi) and q(x)are computed by the Laplace method (Section 3). In this case logεi(xi) =R2[logti](xi).
• LA-CM. The Laplace approximation with the conditional mode approximated by the condi- tional mean. The approximation ˜pLA-CM(x
i)is computed as proposed in Rue et al. (2009),
that is, by using the approximationx∗\i(xi)≈Eq
x\i|xi
where q(x)is given by the Laplace method (Section 4.1.2).
• LA-CM2. The similar approximation asLA-CM, but with an additional term added to account forx∗ \i(xi)≈Eq x\i|xi (Section 4.1.2).
Expectation propagation (EP) Laplace method (LM)
with
EP-L LM-L
EP-1STEP LA-CM / LA-CM2
Use global method with some simplifications
Factorize and use the univariate global method
EP-FACT LA-FACT
EP-FACTN EP-OPW
(1st order) Expansions with regard to
EP-FULL LA-TK
Gaussian approximation Latent Gaussian model
Figure 13: A schematic view of the approximation methods introduced or referred to in this paper. For details see Section D of the Appendix.
• EP-1STEP. The one step EP approximation. The approximation ˜pEP-1STEP(x
i)is computed by
defining ˜εj(xj; xi)≡Collapse(q(xj|xi)εj(xj))/q(xj|xi)and using the approximation ci(xi)≈ R
dx\iq(x\i|xi)∏j6=iε˜j(xj; xi)(see Section 4.1.1). This corresponds to one EP step for com-
puting ci(xi)with the initialization ˜εj(xj; xi) =1.
• EP-OPW. The Taylor expansion of Opper et al. (2009). The approximation ˜pEP-OPW(x
i)is com-
puted by expanding p(x) ∝ p0(x)∏jεj(xj) in first order with regard to
εj(xj)−1 for all j=1, . . . ,n and integrating with regard to x\i. When expanding only for
j6=i the approximation is equal in first order to ˜pEP-FACT(x
i)(Section 4.3). • EP-FACT. The factorized EP approximation. The approximation ˜pEP-FACT(x
i) is computed
using the approximation ci(xi)≈∏j6=i R
dxjq(xj|xi)εj(xj), where the univariate integrals are
computed numerically or analytically, if it is the case. For further details see Section 4.2.
• LA-FACT. A similar approximation as EP-FACT, but here, the univariate integrals are com- puted with the Laplace method and using the approximation x∗j(xi)≈Eq[xj|xi], with q(x)
being the global approximation resulting from the Laplace method. For further details see Section 4.2.
• EP-FACTN. Higher order approximations obtained by using the factorization recursively. For further details see Section 4.2.
References
P. R. Amestoy, T. A. Davis, and Iain S. D. An approximate minimum degree ordering algorithm.
SIAM Journal on Matrix Analysis and Applications., 17(4):886–905, October 1996.
A. Birlutiu and T. Heskes. Expectation propagation for rating players in sports competitions. In Joost N. Kok, Jacek Koronacki, Ramon L´opez de M´antaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron, editors, Proceedings ECML/PKDD, volume 4702 of Lecture Notes in Com-
puter Science, pages 374–381. Springer, 2007.
L. Csat´o and M. Opper. Sparse representation for Gaussian process models. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, Cam- bridge, MA, USA, 2001. MIT Press.
P. Dangauthier, R. Herbrich, T. Minka, and T. Graepel. Trueskill through time: Revisiting the history of chess. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural
Information Processing Systems 20, pages 337–344. MIT Press, Cambridge, MA, 2008.
A. M. Erisman and W. F. Tinney. On computing certain elements of the inverse of a sparse matrix.
Communications of the ACM, 18(3):177–179, 1975. ISSN 0001-0782.
T. Heskes, M. Opper, W. Wiegerinck, O. Winther, and O. Zoeter. Approximate inference techniques with expectation constraints. Journal of Statistical Mechanics: Theory and Experiment, 2005: P11015, 2005.
S. Ingram. Minimum degree reordering algorithms: A tutorial, 2006. URLhttp://www.cs.ubc.
M. Kuss and C. E. Rasmussen. Assessing approximate inference for binary Gaussian process clas- sification. Journal of Machine Learning Research, 6:1679–1704, 2005. ISSN 1533-7928. S. Martino and H. Rue. Implementing approximate Bayesian inference using integrated nested
Laplace approximation: a manual for the INLA program. Technical report, Department of Math- ematical Sciences, NTNU, Norway, 2009.
T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, MIT, 2001. T. P. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173,
Microsoft Research Ltd., Cambridge, UK, December 2005.
K. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelli-
gence, volume 9, pages 467–475, San Francisco, USA, 1999. Morgan Kaufman.
I. Murray, R. P. Adams, and D. J.C. MacKay. Elliptical slice sampling. In Y. W. Teh and M. Titter- ington, editors, Proceedings of the 13th International Conference on Artificial Intelligence and
Statistics, pages 541–548. 2010.
M. Opper and C. Archambeau. The variational Gaussian approximation revisited. Neural Compu-
tation, 21(3):786–792, 2009.
M. Opper and O. Winther. Gaussian processes for classification: Mean-field algorithms. Neural
Computation, 12(11):2655–2684, 2000.
M. Opper, U. Paquet, and O. Winther. Improving on expectation propagation. In D. Koller, D. Schu- urmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems
21, pages 1241–1248. MIT, Cambridge, MA, US, 2009.
H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications, volume 104 of
Monographs on Statistics and Applied Probability. Chapman & Hall, London, UK, 2005.
H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society (Series
B), 71(2):319–392, 2009.
M. W. Seeger. Bayesian inference and optimal design for the sparse linear model. Journal of
Machine Learning Research, 9:759–813, 2008. ISSN 1533-7928.
K. Takahashi, J. Fagan, and M.-S. Chin. Formation of a sparse impedance matrix and its application to short circuit study. In Proceedings of the 8th PICA Conference, 1973.
L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densi- ties. Journal of the American Statistical Association, 81(393):82–86, 1986.
M. van Gerven, B. Cseke, R. Oostenveld, and T. Heskes. Bayesian source localization with the multivariate Laplace prior. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1901–1909, 2009.
M. van Gerven, B. Cseke, F. de Lange, and T. Heskes. Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior. Neuroimage, 50(1):150–161, March 2010.
O. Zoeter and T. Heskes. Gaussian quadrature based expectation propagation. In Z. Ghahramani and R. Cowell, editors, Proceedings of the Tenth International Workshop on Artificial Intelligence