• No results found

[NORMAL] Tensor Decomposition Approaches for the Modeling of Multi-Graphs

N/A
N/A
Protected

Academic year: 2021

Share "Tensor Decomposition Approaches for the Modeling of Multi-Graphs"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

Morten Mørup

Associate Professor

Section for Cognitive Systems (CogSys) DTU Informatics

Technical University of Denmark

Tensor Decomposition Approaches

for the Modeling of Multi-Graphs

Lars Kai Hansen Mikkel N. Schmidt Tue Herlau Joint work with

Kristoffer H. Madsen Anne-Marie

(2)

4-06-2012 TRICAP 2012

2 DTU Informatik, Danmarks Tekniske Universitet

The father of Graph Theory is considered to be

Leonhard Euler

Leonhard Euler

(1707-1783) The bridges of Königsberg

Problem: Find a walk through the city that

cross each bridge once and only once.

Euler proved that the problem has no solution using a graph representation and proved further that a solution to the problem would require all vertices to have even degree

(degree of vertex: number of edges (i.e. bridges) touching the vertex (land mass)).

(3)

4-06-2012 TRICAP 2012

3 DTU Informatik, Danmarks Tekniske Universitet

Graphs and their adjacency matrices

A graph G(V,E) with vertices V and edges E can be represented by the corresponding adjacency matrix A such that Aij=1 if there is a link from vertex i to vertex j and

Aij=0 otherwise.

Undirected Graph Directed Graph

G(V,E) V: set of vertices/nodes, E: set of edges/link V={1,2,3,4,5,6}

(4)

4-06-2012 TRICAP 2012

4 DTU Informatik, Danmarks Tekniske Universitet

Bipartite Graphs

1 3 2 2 3 1 4

V

(1)

={1,2,3}

V

(2)

={1,2,3,4}

E

Bipartite

={(1,2) ,(2,2),(2,3),(2,4),(3,1),(3,4)}

(5)

4-06-2012 TRICAP 2012

5 DTU Informatik, Danmarks Tekniske Universitet

Protein Interaction Food web Epidemic Spread Connectome

Friendship network Collaboration network Sexual relations Trade network

Business Organization

Tele communication network Airline connections The internet

Large-scale networks with complex patterns of connections between elements abound naturally in both nature and in man-made systems.

Biology

Social sciences: Economics

Technology

Basic premise: Local connectivity leads to collective behavior and for such collective behavior to be detectable (i.e., by statistical machine learning methods) one needs to analyze large datasets.

Complex networks pervades both science and popular culture

(6)

4-06-2012 TRICAP 2012

6 DTU Informatik, Danmarks Tekniske Universitet

Multiple graphs

Multiple graphs G(V,E(1),E(2),...,E(N)) with vertices V and edge sets

{E(1),E(2),...,E(N)} can be represented by the corresponding adjacency tensor

A

such that Aij(n)=1 if there is a link from vertex i to vertex j in relation n and

Aij(n)=0 otherwise.

MIT Press, 2010

Scott A Borrman, Harrison C White, 1976 Social structure from multiple networks. I. Blockmodels of roles and positions

(7)

4-06-2012 TRICAP 2012

7 DTU Informatik, Danmarks Tekniske Universitet

Our research focus:

Bayesian non-parametric modeling of relational data

Models we consider can:

1. Adapt to the complexity of the relational data ( Bayesian non-parametrics)

2. Device generative model for graphs ( A statistical process for creating networks) 3. Be used to predict missing entries ( Evaluate significance of extracted patterns)

4. Form easy interpretable representations of structure ( Fascilitate human understanding) 5. Are scalable ( models we research grow in the complexity of the edges of the graph) 6. Reflect the statistical properties of the data

( Bernoulli-likelihood(0/1), Poisson-likelihood(integer counts), Gaussian-likelihood(dense))

(8)

4-06-2012 TRICAP 2012

8 DTU Informatik, Danmarks Tekniske Universitet

Bayesian learning and the principle of parsimony

Bayesian learning embodies Occam’s razor, i.e. Complex models are penalized.

David J.C. MacKay (among others)

To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize

The explanation of any phenomenon should make as few assumptions as possible,

eliminating those that make no difference in the observable predictions of the

explanatory hypothesis or theory.

Thomas Bayes William of Ockham

(9)

4-06-2012 TRICAP 2012

9 DTU Informatik, Danmarks Tekniske Universitet

Bayesian Statistical Learning

Device generative process for the the data, i.e. a recipie for how the observed world could be generated.

Infer parameters from the joint likelihood induced by the generative process

Evaluate model

performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist

hypothesis testing with the generative model as null-distribution.

(10)

4-06-2012 TRICAP 2012

10 DTU Informatik, Danmarks Tekniske Universitet

Starting point:

The Infinite Relational Model/Stochastic Blockmodel

(A prominent Bayesian generative model for graphs)

Josh Tenenbaum Thomas Griffith Takeshi Yamada

Charles Kemp Naonori Ueda

Zhao Xu Kai Yu Volker Tresp Hans-Peter Kriegel

Learning Systems of Concepts with an Infinite Relational Model (AAAI2006)

(11)

4-06-2012 TRICAP 2012

11 DTU Informatik, Danmarks Tekniske Universitet

The Stochastic Blockmodel and the Infinite

Relational Model

Holland et al. (1983), Wasserman & Andersson (1987), Snijders et al. (1997), Nowicki & Snijders (2001) Wasserman, S., & Faust, K. Social network analysis: Methods and applications. New York, Cambridge University Press, 1994.

Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.

Kemp, C., Griffiths, T.L. Tenenbaum, J.B. ”Discovering latent classes in relational data”, MIT 2004. Xu, Z, Yu, K, Tresp, V, Kriegel, H.-P., ”Infinite Hidden Relational Models”, UAI 2006

Graph with elements Aij given by the relation j ”defers to” i

(12)

4-06-2012 TRICAP 2012

12 DTU Informatik, Danmarks Tekniske Universitet

Bayesian Statistical Learning

Device generative process for the the data, i.e. a recipie for how the observed world could be generated.

Infer parameters from the joint likelihood induced by the generative process

Evaluate model

performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist

hypothesis testing with the generative model as null-distribution.

(13)

4-06-2012 TRICAP 2012

13 DTU Informatik, Danmarks Tekniske Universitet

Non-parametric modeling by the

Chinese Restaurant Process (CRP)

Imagine a (Chinese) restaurant with countable infinitely many tables, labeled 1, 2,…… Customers (i.e. vertices) walk in and sit down at some table (cluster). The tables are chosen according to the following random process.

1. The first customer always chooses the first table.

2. The ith customer chooses:

a) first unoccupied table with probability: /(i-1+)

b) an occupied table with probability: mk/(i-1+)

where mk is the number of people already sitting at that table.

1,3, 4, 8

2,5, 10

6

See also tutorial:

(14)

4-06-2012 TRICAP 2012

14 DTU Informatik, Danmarks Tekniske Universitet

Generative model of the IRM

(A statistical recipie for simulating networks)

(slide adapted from Elena Zhelev and Galileo Namat ”Stochastic Block Models: A Survey”)

Main Assumption:

Relationships are conditionally independent given cluster

assignments.

• Draw clustering assignemnt

from CRP

• Draw the relations between

the generated clusters

• Draw the graph

Modelling goal: find z that maximizes

(15)

4-06-2012 TRICAP 2012

15 DTU Informatik, Danmarks Tekniske Universitet

Different kinds of relational systems

(16)

4-06-2012 TRICAP 2012

16 DTU Informatik, Danmarks Tekniske Universitet

IRM extends well to many types of graphs

1 3 2 4 5 1 3 2 4 5

UnDirected Directed Bipartite

1 3 2 2 3 1 4 1 3 2 4 5 1 3 2 4 5 Multi-graph

(17)

4-06-2012 TRICAP 2012

17 DTU Informatik, Danmarks Tekniske Universitet

Bayesian Statistical Learning

Device generative process for the the data, i.e. a recipie for how the observed world could be generated.

Infer parameters from the joint likelihood induced by the generative process

Evaluate model

performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist

hypothesis testing with the generative model as null-distribution.

(18)

4-06-2012 TRICAP 2012

18 DTU Informatik, Danmarks Tekniske Universitet

(19)

4-06-2012 TRICAP 2012

19 DTU Informatik, Danmarks Tekniske Universitet

From the generative model

We can write down the joint likelihood

and we observe that

can be analytically (i.e., exact)

integrated out (collapsed):

(20)

4-06-2012 TRICAP 2012

20 DTU Informatik, Danmarks Tekniske Universitet

From the joint distribution

We can use Bayes theorem to obtain the posterior

distribution of z

i

And infer z by Gibbs sampling each vertex at a time

from the above posterior distribution.

(21)

4-06-2012 TRICAP 2012

21 DTU Informatik, Danmarks Tekniske Universitet

Gibbs sampling

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7 A) Select one vertex

at a time

B) Calculate posterior distribution

C) Sample new assignment

(22)

4-06-2012 TRICAP 2012

22 DTU Informatik, Danmarks Tekniske Universitet

Metropolis Hastings split-merge moves

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7

v

1

v

2

v

3

v

4

v

5

v

6

v

7 Merge Split

(23)

4-06-2012 TRICAP 2012

23 DTU Informatik, Danmarks Tekniske Universitet

By sampling we can infer z that maximizes

(24)

4-06-2012 TRICAP 2012

24 DTU Informatik, Danmarks Tekniske Universitet

Bayesian Statistical Learning

Device generative process for the the data, i.e. a recipie for how the observed world could be generated.

Infer parameters from the joint likelihood induced by the generative process

Evaluate model

performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist

hypothesis testing with the generative model as null-distribution.

(25)

4-06-2012 TRICAP 2012

25 DTU Informatik, Danmarks Tekniske Universitet

How do we evaluate the quality of the

inferred parameters?

• Compare with ground truth if available

• Evaluate how well the model predicts edges/links or account for structure in the data using generative process as null-model.

(26)

4-06-2012 TRICAP 2012

26 DTU Informatik, Danmarks Tekniske Universitet

Ground truth evaluation: Use metric that is

permutation invariant such as Mutual

Information

1 2 3 5 4 1 2 3 5 4 6 1 2 3 4 5 6 1 2 3 5 4 0.2 0.2 0.2 0.2 0.1 0.1 0.2 0.2 0.2 0.2 0.2 1 2 3 5 4 6 1 2 3 4 5 6 1 2 3 5 4 0.2 0.2 1 2 3 5 4 1 2 3 4 5

(27)

4-06-2012 TRICAP 2012

27 DTU Informatik, Danmarks Tekniske Universitet

When ground truth is not available

• Evaluate models predictive ability

Two commonly used measures based on prediction:

1) Generalization error, i.e. Evaluate predictive distribution estimated on hold out data (i.e., missing data using cross-validation)

2) Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) on hold out data (i.e., missing data using cross-validation)

• Frequentist testing of structure in network

Use generative model to draw graphs and evaluate the properties of these graphs to the actual graph using the generative model as null-distribution.

(28)

4-06-2012 TRICAP 2012

28 DTU Informatik, Danmarks Tekniske Universitet

Bayesian Statistical Learning

Device generative process for the the data, i.e. a recipie for how the observed world could be generated.

Infer parameters from the joint likelihood induced by the generative process

Evaluate model

performance by how well the model is able to predict unobserved quantities or account for observed structures using frequentist

hypothesis testing with the generative model as null-distribution.

(29)

4-06-2012 TRICAP 2012

29 DTU Informatik, Danmarks Tekniske Universitet

Relational modeling of animal features

Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.

Animal

Feature

– 50 Animals

(30)

4-06-2012 TRICAP 2012

30 DTU Informatik, Danmarks Tekniske Universitet

Country

Feature

Data fusion: Relational modeling of Countries their

characteristics and relations

– 14 Countries

– 54 Predicates

– 90 Features

Country

Country

Kemp, C. Tenenbaum, J.B., Griffiths, T.L., Yamada, T and Ueda, N ”Learning systems of concepts with an infinite relational model, AAAI 2006.

Relations

Characteristics

Symmetric TUCKER decomposition

coupled with side-information

(31)

4-06-2012 TRICAP 2012

31 DTU Informatik, Danmarks Tekniske Universitet

Analysis of functional brain connectivity

Symmetric TUCKER2 decomposition

(32)

4-06-2012 TRICAP 2012

32 DTU Informatik, Danmarks Tekniske Universitet

Of the 72 subjects there were 30

normal and 42 multiple sclerosis

subjects

32

Median connectivity between

extracted clusters across subjects Significant differences in connectivity between MS and normal subjects (not corrected for multiple

comparisons)

(33)

4-06-2012 TRICAP 2012

33 DTU Informatik, Danmarks Tekniske Universitet

Bayesian modelling of community

structure

Communities/modules: a natural divisions of network nodes into densely connected subgroups (see also Newman & Girvan, 2003 and Fortunato 2010)

G(V,E)

Adjacency Matrix

A

Community detection algorithm

Permuted adjacency matrix

PAPT

Permutation P of graph

from clustering assignment Z

(34)

4-06-2012 TRICAP 2012

34 DTU Informatik, Danmarks Tekniske Universitet

(Mørup&Schmidt, ”Bayesian Community Detection”, Neural Computation 2012)

Our Generative Model of Community

Structure

(35)

4-06-2012 TRICAP 2012

35 DTU Informatik, Danmarks Tekniske Universitet

Binary Graphs:

Integer Weighted Graphs:

(36)

4-06-2012 TRICAP 2012

(37)

4-06-2012 TRICAP 2012

(38)

4-06-2012 TRICAP 2012

38 DTU Informatik, Danmarks Tekniske Universitet

people

people

Alyawarra tribe, central Australia

– 104 individuals

– 27 binary social relations

Countries inter-relations

– 14 Countries

– 54 Predicates

Country

Country

Symmetric TUCKER2 decomposition

(39)

4-06-2012 TRICAP 2012

39 DTU Informatik, Danmarks Tekniske Universitet

Country Country people people people people Co-authorship

(40)

4-06-2012 TRICAP 2012

40 DTU Informatik, Danmarks Tekniske Universitet

(Mørup&Schmidt&Hansen, MLSP2011),

(Mørup&Schmidt, NIPS workshop on Bayesian Nonparametric Methods: Hope or Hype?)

Bayesian modelling of overlapping

groups, i.e. Vertices can have multiple

roles (i.e. Belong to multiple groups)

(41)

4-06-2012 TRICAP 2012

41 DTU Informatik, Danmarks Tekniske Universitet

(Mørup&Schmidt&Hansen, MLSP2011),

(42)

4-06-2012 TRICAP 2012

42 DTU Informatik, Danmarks Tekniske Universitet

(Mørup&Schmidt&Hansen, MLSP2011),

(43)

4-06-2012 TRICAP 2012

43 DTU Informatik, Danmarks Tekniske Universitet

(Mørup&Schmidt&Hansen, MLSP2011),

(44)

4-06-2012 TRICAP 2012

44 DTU Informatik, Danmarks Tekniske Universitet

Scales linearly in the number of links!

(Mørup&Schmidt&Hansen, MLSP2011),

(45)

4-06-2012 TRICAP 2012

45 DTU Informatik, Danmarks Tekniske Universitet

(Mørup&Schmidt&Hansen, MLSP2011),

(46)

4-06-2012 TRICAP 2012

46 DTU Informatik, Danmarks Tekniske Universitet

(Herlau , Mørup, Schmidt, Hansen, ” Detecting Hierarchical Structure in Networks” CIP 2012)

Our Generative Model for detecting hierarchical structure:

Clauset et al. Nature 2008 Roy et al. NIPS 2006

Stochastic Block Model/ IRM

(47)

4-06-2012 TRICAP 2012

47 DTU Informatik, Danmarks Tekniske Universitet

Our applications so far...

• User recommendation on social/internet services such

as Facebook and Netflix

• Topic modeling of large databases, i.e. PubMed and

NYTimes.

• Music preferential behaviour in large groups of people

attending the Roskilde Festival

• Link prediction in social networks

• Modeling functional and structural connectivity in the

human brain by fMRI and dMRI

(48)

4-06-2012 TRICAP 2012

48 DTU Informatik, Danmarks Tekniske Universitet

G(E,V)

Neuron

Neuron

(Braitenberg et al., 1991; Murre et al., 1995; Sporns et al., 2005)

 1 mm3 braintissue contains 100.000 neurons.

 Between these 100.000 neurons there are1.000.000.000 connections.  Our connectome can be represented by a complex network with

approximatey 1011 neurons and 1015 connections.

The hunt for the human connectome

Connectome refers to the scientific effort to capture, map and understand the organization of neural connectivity in the brain.

Vertex (i.e., neuron) Edge (i.e., connetion)

Complex network of neuronal connectivity

(49)

4-06-2012 TRICAP 2012

49 DTU Informatik, Danmarks Tekniske Universitet

Functionel MRI (fMRI):

Measures the haemodynamic response related to neural activity in a brain volume (i.e. voxel) and can thereb quantify the working brains

informationprocessing.

Diffusions MRI (dMRI):

Measures waters diffusion that primarily follows the brains neural fiberbundles. These measurements can be aggregated by tractography to estimate the brains structural connectivity.

A voxel define a given brian volume

Relative haemodynamic response over tid.

Functionel connectivity between voxel i and j can estimate how related the haemodynamic respone is over time in the two areas.

Estimated connections btween brain areas by dMRI combined with tractography

fMRI and dMRI have become key non-invasive windows into the mind in terms

of its functionel and structural connections. Existing fMRI and dMRI technologies have a resolution of 105 voxels.

(50)

4-06-2012 TRICAP 2012

50 DTU Informatik, Danmarks Tekniske Universitet

“[Understanding brain connectivity] is fundamentally important in cognitive neuroscience and neuropsychology” ”You are more than your genes,

you are your connectome”

Nature, April 2012, ”Brain imaging: fMRI 2.0”

National Institute of Health Human Connectome Project

www.humanconnectome.org (O. Sporns, G. Tononi, R. Kötter, 2005)

(S. Seung, MIT 2011)

(51)

4-06-2012 TRICAP 2012

51 DTU Informatik, Danmarks Tekniske Universitet

Current research focus:

Modeling functional and structural connectivity from

functional and diffusion magnetic resonance imaging

(fMRI and dMRI) by Bayesian relational modelling

approaches

(52)

4-06-2012 TRICAP 2012

52 DTU Informatik, Danmarks Tekniske Universitet

References

T. Herlau , M. Mørup, M.N. Schmidt, L.K. Hansen, Detecting Hierarchical Structure in Networks, Cognitive Information Processing, 2012

M. Mørup, M. N. Schmidt, Bayesian Community Detection, Neural Computation, 2012

M. Mørup, Applications of tensor (multiway array) factorizations and decompositions in data mining, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1(1), pp. 24-40, Wiley

Interdisciplinary Reviews:, 2011

M. Mørup, K. H. Madsen, A. M. Dogonowski, H. Siebner, L. K.

Hansen, Infinite Relational Modeling of Functional Connectivity in Resting State fMRI, Neural Information Processing Systems 2010

M. Mørup, M. N. Schmidt, L. K. Hansen, Infinite Multiple Membership

Relational Modeling for Complex Networks, Presented first time at NIPS Workshop on Networks Across Disciplines in Theory and Application, published IEEE Machine Learning for Signal Processing 2011

M. Mørup, L. K. Hansen, Learning Latent Structure in Complex Networks, Workshop on Analyzing Networks and Learning with Graphs NIPS 2009 MLSP 2011, 2010

References

Related documents