Morten Mørup
Associate Professor
Section for Cognitive Systems (CogSys) DTU Informatics
Technical University of Denmark
Tensor Decomposition Approaches
for the Modeling of Multi-Graphs
Lars Kai Hansen Mikkel N. Schmidt Tue Herlau Joint work with
Kristoffer H. Madsen Anne-Marie
4-06-2012 TRICAP 2012
2 DTU Informatik, Danmarks Tekniske Universitet
The father of Graph Theory is considered to be
Leonhard Euler
Leonhard Euler
(1707-1783) The bridges of Königsberg
Problem: Find a walk through the city that
cross each bridge once and only once.
Euler proved that the problem has no solution using a graph representation and proved further that a solution to the problem would require all vertices to have even degree
(degree of vertex: number of edges (i.e. bridges) touching the vertex (land mass)).
4-06-2012 TRICAP 2012
3 DTU Informatik, Danmarks Tekniske Universitet
Graphs and their adjacency matrices
A graph G(V,E) with vertices V and edges E can be represented by the corresponding adjacency matrix A such that Aij=1 if there is a link from vertex i to vertex j and
Aij=0 otherwise.
Undirected Graph Directed Graph
G(V,E) V: set of vertices/nodes, E: set of edges/link V={1,2,3,4,5,6}
4-06-2012 TRICAP 2012
4 DTU Informatik, Danmarks Tekniske Universitet
Bipartite Graphs
1 3 2 2 3 1 4V
(1)={1,2,3}
V
(2)={1,2,3,4}
E
Bipartite={(1,2) ,(2,2),(2,3),(2,4),(3,1),(3,4)}
4-06-2012 TRICAP 2012
5 DTU Informatik, Danmarks Tekniske Universitet
Protein Interaction Food web Epidemic Spread Connectome
Friendship network Collaboration network Sexual relations Trade network
Business Organization
Tele communication network Airline connections The internet
Large-scale networks with complex patterns of connections between elements abound naturally in both nature and in man-made systems.
Biology
Social sciences: Economics
Technology
Basic premise: Local connectivity leads to collective behavior and for such collective behavior to be detectable (i.e., by statistical machine learning methods) one needs to analyze large datasets.
Complex networks pervades both science and popular culture
4-06-2012 TRICAP 2012
6 DTU Informatik, Danmarks Tekniske Universitet
Multiple graphs
Multiple graphs G(V,E(1),E(2),...,E(N)) with vertices V and edge sets
{E(1),E(2),...,E(N)} can be represented by the corresponding adjacency tensor
A
such that Aij(n)=1 if there is a link from vertex i to vertex j in relation n and
Aij(n)=0 otherwise.
MIT Press, 2010
Scott A Borrman, Harrison C White, 1976 Social structure from multiple networks. I. Blockmodels of roles and positions
4-06-2012 TRICAP 2012
7 DTU Informatik, Danmarks Tekniske Universitet
Our research focus:
Bayesian non-parametric modeling of relational data
Models we consider can:1. Adapt to the complexity of the relational data ( Bayesian non-parametrics)
2. Device generative model for graphs ( A statistical process for creating networks) 3. Be used to predict missing entries ( Evaluate significance of extracted patterns)
4. Form easy interpretable representations of structure ( Fascilitate human understanding) 5. Are scalable ( models we research grow in the complexity of the edges of the graph) 6. Reflect the statistical properties of the data
( Bernoulli-likelihood(0/1), Poisson-likelihood(integer counts), Gaussian-likelihood(dense))
4-06-2012 TRICAP 2012
8 DTU Informatik, Danmarks Tekniske Universitet
Bayesian learning and the principle of parsimony
Bayesian learning embodies Occam’s razor, i.e. Complex models are penalized.
David J.C. MacKay (among others)
To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize
The explanation of any phenomenon should make as few assumptions as possible,
eliminating those that make no difference in the observable predictions of the
explanatory hypothesis or theory.
Thomas Bayes William of Ockham
4-06-2012 TRICAP 2012
9 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical Learning
Device generative process for the the data, i.e. a recipie for how the observed world could be generated.
Infer parameters from the joint likelihood induced by the generative process
Evaluate model
performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist
hypothesis testing with the generative model as null-distribution.
4-06-2012 TRICAP 2012
10 DTU Informatik, Danmarks Tekniske Universitet
Starting point:
The Infinite Relational Model/Stochastic Blockmodel
(A prominent Bayesian generative model for graphs)
Josh Tenenbaum Thomas Griffith Takeshi Yamada
Charles Kemp Naonori Ueda
Zhao Xu Kai Yu Volker Tresp Hans-Peter Kriegel
Learning Systems of Concepts with an Infinite Relational Model (AAAI2006)
4-06-2012 TRICAP 2012
11 DTU Informatik, Danmarks Tekniske Universitet
The Stochastic Blockmodel and the Infinite
Relational Model
Holland et al. (1983), Wasserman & Andersson (1987), Snijders et al. (1997), Nowicki & Snijders (2001) Wasserman, S., & Faust, K. Social network analysis: Methods and applications. New York, Cambridge University Press, 1994.
Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.
Kemp, C., Griffiths, T.L. Tenenbaum, J.B. ”Discovering latent classes in relational data”, MIT 2004. Xu, Z, Yu, K, Tresp, V, Kriegel, H.-P., ”Infinite Hidden Relational Models”, UAI 2006
Graph with elements Aij given by the relation j ”defers to” i
4-06-2012 TRICAP 2012
12 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical Learning
Device generative process for the the data, i.e. a recipie for how the observed world could be generated.
Infer parameters from the joint likelihood induced by the generative process
Evaluate model
performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist
hypothesis testing with the generative model as null-distribution.
4-06-2012 TRICAP 2012
13 DTU Informatik, Danmarks Tekniske Universitet
Non-parametric modeling by the
Chinese Restaurant Process (CRP)
Imagine a (Chinese) restaurant with countable infinitely many tables, labeled 1, 2,…… Customers (i.e. vertices) walk in and sit down at some table (cluster). The tables are chosen according to the following random process.
1. The first customer always chooses the first table.
2. The ith customer chooses:
a) first unoccupied table with probability: /(i-1+)
b) an occupied table with probability: mk/(i-1+)
where mk is the number of people already sitting at that table.
1,3, 4, 8
2,5, 10
6
See also tutorial:
4-06-2012 TRICAP 2012
14 DTU Informatik, Danmarks Tekniske Universitet
Generative model of the IRM
(A statistical recipie for simulating networks)
(slide adapted from Elena Zhelev and Galileo Namat ”Stochastic Block Models: A Survey”)
Main Assumption:
Relationships are conditionally independent given cluster
assignments.
• Draw clustering assignemnt
from CRP
• Draw the relations between
the generated clusters
• Draw the graph
Modelling goal: find z that maximizes
4-06-2012 TRICAP 2012
15 DTU Informatik, Danmarks Tekniske Universitet
Different kinds of relational systems
4-06-2012 TRICAP 2012
16 DTU Informatik, Danmarks Tekniske Universitet
IRM extends well to many types of graphs
1 3 2 4 5 1 3 2 4 5
UnDirected Directed Bipartite
1 3 2 2 3 1 4 1 3 2 4 5 1 3 2 4 5 Multi-graph
4-06-2012 TRICAP 2012
17 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical Learning
Device generative process for the the data, i.e. a recipie for how the observed world could be generated.
Infer parameters from the joint likelihood induced by the generative process
Evaluate model
performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist
hypothesis testing with the generative model as null-distribution.
4-06-2012 TRICAP 2012
18 DTU Informatik, Danmarks Tekniske Universitet
4-06-2012 TRICAP 2012
19 DTU Informatik, Danmarks Tekniske Universitet
From the generative model
We can write down the joint likelihood
and we observe that
can be analytically (i.e., exact)
integrated out (collapsed):
4-06-2012 TRICAP 2012
20 DTU Informatik, Danmarks Tekniske Universitet
From the joint distribution
We can use Bayes theorem to obtain the posterior
distribution of z
iAnd infer z by Gibbs sampling each vertex at a time
from the above posterior distribution.
4-06-2012 TRICAP 2012
21 DTU Informatik, Danmarks Tekniske Universitet
Gibbs sampling
v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7 A) Select one vertexat a time
B) Calculate posterior distribution
C) Sample new assignment
4-06-2012 TRICAP 2012
22 DTU Informatik, Danmarks Tekniske Universitet
Metropolis Hastings split-merge moves
v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7v
1v
2v
3v
4v
5v
6v
7 Merge Split4-06-2012 TRICAP 2012
23 DTU Informatik, Danmarks Tekniske Universitet
By sampling we can infer z that maximizes
4-06-2012 TRICAP 2012
24 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical Learning
Device generative process for the the data, i.e. a recipie for how the observed world could be generated.
Infer parameters from the joint likelihood induced by the generative process
Evaluate model
performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist
hypothesis testing with the generative model as null-distribution.
4-06-2012 TRICAP 2012
25 DTU Informatik, Danmarks Tekniske Universitet
How do we evaluate the quality of the
inferred parameters?
• Compare with ground truth if available
• Evaluate how well the model predicts edges/links or account for structure in the data using generative process as null-model.
4-06-2012 TRICAP 2012
26 DTU Informatik, Danmarks Tekniske Universitet
Ground truth evaluation: Use metric that is
permutation invariant such as Mutual
Information
1 2 3 5 4 1 2 3 5 4 6 1 2 3 4 5 6 1 2 3 5 4 0.2 0.2 0.2 0.2 0.1 0.1 0.2 0.2 0.2 0.2 0.2 1 2 3 5 4 6 1 2 3 4 5 6 1 2 3 5 4 0.2 0.2 1 2 3 5 4 1 2 3 4 54-06-2012 TRICAP 2012
27 DTU Informatik, Danmarks Tekniske Universitet
When ground truth is not available
• Evaluate models predictive abilityTwo commonly used measures based on prediction:
1) Generalization error, i.e. Evaluate predictive distribution estimated on hold out data (i.e., missing data using cross-validation)
2) Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) on hold out data (i.e., missing data using cross-validation)
• Frequentist testing of structure in network
Use generative model to draw graphs and evaluate the properties of these graphs to the actual graph using the generative model as null-distribution.
4-06-2012 TRICAP 2012
28 DTU Informatik, Danmarks Tekniske Universitet
Bayesian Statistical Learning
Device generative process for the the data, i.e. a recipie for how the observed world could be generated.
Infer parameters from the joint likelihood induced by the generative process
Evaluate model
performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist
hypothesis testing with the generative model as null-distribution.
4-06-2012 TRICAP 2012
29 DTU Informatik, Danmarks Tekniske Universitet
Relational modeling of animal features
Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.
Animal
Feature
– 50 Animals
4-06-2012 TRICAP 2012
30 DTU Informatik, Danmarks Tekniske Universitet
Country
Feature
Data fusion: Relational modeling of Countries their
characteristics and relations
– 14 Countries
– 54 Predicates
– 90 Features
Country
Country
Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.
Relations
Characteristics
Symmetric TUCKER decomposition
coupled with side-information
4-06-2012 TRICAP 2012
31 DTU Informatik, Danmarks Tekniske Universitet
Analysis of functional brain connectivity
Symmetric TUCKER2 decomposition
4-06-2012 TRICAP 2012
32 DTU Informatik, Danmarks Tekniske Universitet
Of the 72 subjects there were 30
normal and 42 multiple sclerosis
subjects
32
Median connectivity between
extracted clusters across subjects Significant differences in connectivity between MS and normal subjects (not corrected for multiple
comparisons)
4-06-2012 TRICAP 2012
33 DTU Informatik, Danmarks Tekniske Universitet
Bayesian modelling of community
structure
Communities/modules: a natural divisions of network nodes into densely connected subgroups (see also Newman & Girvan, 2003 and Fortunato 2010)
G(V,E)
Adjacency Matrix
A
Community detection algorithm
Permuted adjacency matrix
PAPT
Permutation P of graph
from clustering assignment Z
4-06-2012 TRICAP 2012
34 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt, ”Bayesian Community Detection”, Neural Computation 2012)
Our Generative Model of Community
Structure
4-06-2012 TRICAP 2012
35 DTU Informatik, Danmarks Tekniske Universitet
Binary Graphs:
Integer Weighted Graphs:
4-06-2012 TRICAP 2012
4-06-2012 TRICAP 2012
4-06-2012 TRICAP 2012
38 DTU Informatik, Danmarks Tekniske Universitet
people
people
Alyawarra tribe, central Australia
– 104 individuals
– 27 binary social relations
Countries inter-relations
– 14 Countries
– 54 Predicates
Country
Country
Symmetric TUCKER2 decomposition
4-06-2012 TRICAP 2012
39 DTU Informatik, Danmarks Tekniske Universitet
Country Country people people people people Co-authorship
4-06-2012 TRICAP 2012
40 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt&Hansen, MLSP2011),
(Mørup&Schmidt, NIPS workshop on Bayesian Nonparametric Methods: Hope or Hype?)
Bayesian modelling of overlapping
groups, i.e. Vertices can have multiple
roles (i.e. Belong to multiple groups)
4-06-2012 TRICAP 2012
41 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt&Hansen, MLSP2011),
4-06-2012 TRICAP 2012
42 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt&Hansen, MLSP2011),
4-06-2012 TRICAP 2012
43 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt&Hansen, MLSP2011),
4-06-2012 TRICAP 2012
44 DTU Informatik, Danmarks Tekniske Universitet
Scales linearly in the number of links!
(Mørup&Schmidt&Hansen, MLSP2011),
4-06-2012 TRICAP 2012
45 DTU Informatik, Danmarks Tekniske Universitet
(Mørup&Schmidt&Hansen, MLSP2011),
4-06-2012 TRICAP 2012
46 DTU Informatik, Danmarks Tekniske Universitet
(Herlau , Mørup, Schmidt, Hansen, ” Detecting Hierarchical Structure in Networks” CIP 2012)
Our Generative Model for detecting hierarchical structure:
Clauset et al. Nature 2008 Roy et al. NIPS 2006
Stochastic Block Model/ IRM
4-06-2012 TRICAP 2012
47 DTU Informatik, Danmarks Tekniske Universitet
Our applications so far...
• User recommendation on social/internet services such
as Facebook and Netflix
• Topic modeling of large databases, i.e. PubMed and
NYTimes.
• Music preferential behaviour in large groups of people
attending the Roskilde Festival
• Link prediction in social networks
• Modeling functional and structural connectivity in the
human brain by fMRI and dMRI
4-06-2012 TRICAP 2012
48 DTU Informatik, Danmarks Tekniske Universitet
G(E,V)
Neuron
Neuron
(Braitenberg et al., 1991; Murre et al., 1995; Sporns et al., 2005)
1 mm3 braintissue contains 100.000 neurons.
Between these 100.000 neurons there are1.000.000.000 connections. Our connectome can be represented by a complex network with
approximatey 1011 neurons and 1015 connections.
The hunt for the human connectome
Connectome refers to the scientific effort to capture, map and understand the organization of neural connectivity in the brain.
Vertex (i.e., neuron) Edge (i.e., connetion)
Complex network of neuronal connectivity
4-06-2012 TRICAP 2012
49 DTU Informatik, Danmarks Tekniske Universitet
Functionel MRI (fMRI):
Measures the haemodynamic response related to neural activity in a brain volume (i.e. voxel) and can thereb quantify the working brains
informationprocessing.
Diffusions MRI (dMRI):
Measures waters diffusion that primarily follows the brains neural fiberbundles. These measurements can be aggregated by tractography to estimate the brains structural connectivity.
A voxel define a given brian volume
Relative haemodynamic response over tid.
Functionel connectivity between voxel i and j can estimate how related the haemodynamic respone is over time in the two areas.
Estimated connections btween brain areas by dMRI combined with tractography
fMRI and dMRI have become key non-invasive windows into the mind in terms
of its functionel and structural connections. Existing fMRI and dMRI technologies have a resolution of 105 voxels.
4-06-2012 TRICAP 2012
50 DTU Informatik, Danmarks Tekniske Universitet
“[Understanding brain connectivity] is fundamentally important in cognitive neuroscience and neuropsychology” ”You are more than your genes,
you are your connectome”
Nature, April 2012, ”Brain imaging: fMRI 2.0”
National Institute of Health Human Connectome Project
www.humanconnectome.org (O. Sporns, G. Tononi, R. Kötter, 2005)
(S. Seung, MIT 2011)
4-06-2012 TRICAP 2012
51 DTU Informatik, Danmarks Tekniske Universitet
Current research focus:
Modeling functional and structural connectivity from
functional and diffusion magnetic resonance imaging
(fMRI and dMRI) by Bayesian relational modelling
approaches
4-06-2012 TRICAP 2012
52 DTU Informatik, Danmarks Tekniske Universitet
References
T. Herlau , M. Mørup, M.N. Schmidt, L.K. Hansen, Detecting Hierarchical Structure in Networks, Cognitive Information Processing, 2012
M. Mørup, M. N. Schmidt, Bayesian Community Detection, Neural Computation, 2012
M. Mørup, Applications of tensor (multiway array) factorizations and decompositions in data mining, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1(1), pp. 24-40, Wiley
Interdisciplinary Reviews:, 2011
M. Mørup, K. H. Madsen, A. M. Dogonowski, H. Siebner, L. K.
Hansen, Infinite Relational Modeling of Functional Connectivity in Resting State fMRI, Neural Information Processing Systems 2010
M. Mørup, M. N. Schmidt, L. K. Hansen, Infinite Multiple Membership
Relational Modeling for Complex Networks, Presented first time at NIPS Workshop on Networks Across Disciplines in Theory and Application, published IEEE Machine Learning for Signal Processing 2011
M. Mørup, L. K. Hansen, Learning Latent Structure in Complex Networks, Workshop on Analyzing Networks and Learning with Graphs NIPS 2009 MLSP 2011, 2010