Tangles: from weak to strong clustering

(1)

Tangles: from weak to strong clustering

or: Our adventure in machine learning

(2)

Tangle definition

2

Given a set S of bipartitions (cuts), atangleis a setτ which

contains exactly one side of each bipartition such that

|A∩B∩C| ≥a ∀A, B, C∈τ.

(3)

Stochastic Block Model

3

k blocks of equal size n_k

(4)

Stochastic Block Model

3

k blocks of equal size n_k

Edges within blocks with probability p, between blocks with probability q < p

Consider all cuts up to order Ψ

(5)

4

When are the blocks (distinct) tangles?

When are there no other tangles?

(6)

SBM (Expectation Case)

5

2 blocks of equal size n₂

Edges within blocks withweight p, between blocks with weightq

All cuts up to order Ψ

|A, A{_|_:=P

(7)

SBM (Expectation Case)

5

|A, A{_|_:=P

(8)

SBM (Expectation Case)

5

|A, A{_|_:=P

(9)

SBM (Expectation Case)

5

|A, A{_|_:=P

(10)

SBM (Expectation Case)

5

|A, A{_|_:=P a∈A,b∈A{ w(a,b) n2 4 p(α1−α 2 1+α2−α22) +q(α1+α2−2α1α2)

(11)

SBM (Expectation Case)

5

All cutsup to order Ψ

|A, A{_|_:=P

(12)

SBM (Expectation Case)

5

All cutsup to order Ψ

|A, A{_|_:=P a∈A,b∈A{ w(a,b) X (i ,j) (i_n,j_n)∈AΨ _n 1 i · _n 2 j X (i ,j) (_δni,_δnj)∈AΨ _n 1 i · _n 2 j

(13)

6

(14)

6

How do we sample good cuts?

(15)

6

(16)

Cut finding strategies

7 0/5 1/5 2/5 1/2 Cut balance 0.0 0.1 0.2 0.3 0.4 0.5 Cu t v alu e i n te rm s o f | E|

(17)

Karger’s algorithm

8

(18)

Cut finding strategies

Comparison of sampling strategies, SBM with 5 blocks of 50 nodes, p = 0.5, q = 0.05 _{(Data is fuzzed for readability)}

random (m = 500, took 0.1s) kmodes (m = 44, took 8.6s) karger (m = 500, took 27.6s)

(19)

Cut finding strategies

random (m = 500, took 0.1s) kmodes (m = 44, took 8.6s) karger (m = 500, took 27.6s) 0/5 1/5 2/5 1/2 Cut balance 0.0 0.1 0.2 0.3 0.4 0.5 Cu t v alu e i n te rm s o f | E|

random (m = 500, took 0.1s) kmodes (m = 44, took 8.6s) karger (m = 500, took 27.6s) kneip (m = 513, took 8.4s) kernighan_lin (m = 500, took 73.0s)

(20)

Cut finding strategies

random (m = 500, took 0.1s) kmodes (m = 44, took 8.6s) karger (m = 500, took 27.6s) kneip (m = 513, took 8.4s) kernighan_lin (m = 500, took 73.0s) local_min (m = 500, took 241.9s)

(21)

Cut finding strategies

random (m = 500, took 0.1s) kmodes (m = 44, took 8.6s) karger (m = 500, took 27.6s) kneip (m = 513, took 8.4s) kernighan_lin (m = 500, took 73.0s) local_min (m = 500, took 241.9s) local_min_bounded (m = 500, took 43.1s) fid_mat (m = 500, took 837.1s)

(22)

Cut finding strategies

random (m = 500, took 0.1s) kmodes (m = 44, took 7.0s) karger (m = 500, took 48.5s) kneip (m = 502, took 14.5s) kernighan_lin (m = 500, took 214.1s) local_min (m = 500, took 473.0s) local_min_bounded (m = 500, took 49.0s) fid_mat (m = 500, took 1219.4s)

(23)

The mindset model

10

(24)

The mindset model

10

k mindsets,m questions, npeople

Step 1: Samplek template vectors µ1, . . . , µk ∈ {0,1}m (mindsets)

Step 2: For eachµi, a set of n_k people answers asµi does, but

(25)

The mindset model

10

deviates on each question independently with probabilityp < 1₂

(26)

The mindset model

10

deviates on each question independently with probabilityp < 1₂

Cuts are induced by questions.

When

are

the mindsets tangles?

When are there no other tangles?

(27)

Stochastics

11

Everything’s just Bernoulli random variables. Binomial distributions are well understood.

(28)

Stochastics

11

If 1 −3p > k a/n then with probability at least

1−k mexp(−2n(k a_n −1 + 3p)2 1

9k) every mindset is

(29)

Stochastics

11

1−k mexp(−2n(k a_n −1 + 3p)2 1

a tangle.

Ifp≤a/n then with probability at least

1−mkexp(−2n

k (p−

k a

n)

2₎_{every triple with large}

(30)

Stochastics

11

1−k mexp(−2n(k a_n −1 + 3p)2 1

a tangle.

Ifp≤a/n then with probability at least

1−mkexp(−2n

k (p−

k a

n)

2₎_{every triple with large}

inter-section comes from a mindset.

But how do we turn this into

‘Every tangle is a mindset’ ?

(31)

The problem

12

Suppose we have these mindsets:

(1,1,1,0,0,0,0,0,0, 0,0,0)

(0,0,0,1,1,1,0,0,0, 0,0,0)

(0,0,0,0,0,0,1,1,1, 0,0,0)

(32)

The problem

12

(1,1,1,0,0,0,0,0,0, 0,0,0)

(0,0,0,1,1,1,0,0,0, 0,0,0)

(0,0,0,0,0,0,1,1,1, 0,0,0)

(0,0,0,0,0,0,0,0,0, 1,1,1)

Then we also get a tangle for

(33)

The problem

12

(1,1,1,0,0,0,0,0,0, 0,0,0)

(0,0,0,1,1,1,0,0,0, 0,0,0)

(0,0,0,0,0,0,1,1,1, 0,0,0)

(0,0,0,0,0,0,0,0,0, 1,1,1)

Then we also get a tangle for

(0,0,0,0,0,0,0,0,0, 0,0,0)

Assumption.If τ ∈ {0,1}m _{satisfies that for all}_{x , y , z} _≤_m _there

exists a mindset µ such thatτ(x) =µi(x) as well as τ(y) =µi(y)

(34)

How often is this satisified?

13

(35)

How often is this satisified?

13

andτ(z) =µi(z), thenτ is a mindset, i.e.τ =µj for somej.

(36)

How often is this satisified?

13

Easily holds if every partition of the mindsets is induced by a question.

(37)

How often is this satisified?

13

This is bound to happen asm→ ∞.

(38)

How often is this satisified?

13

This is bound to happen asm→ ∞.

Caveat:This requires m to be exponential ink.

Theorem.Asympotically, m has to be exponential ink, or else the assumption fails with high probability.

(39)

How often is it

really

satisified?

14

(40)

How often is it

really

satisified?

14

(41)

Back to experiments

15

How do we evaluate the quality of our

clustering numerically?

(42)

Back to experiments

15

How do we evaluate the quality of our

clustering numerically?

Turn it into a hard clustering. Count the number of wrongly

(43)

Dimensions

16 k: number of mindsets m: number of questions n: number of people p: noise probability a: tangle agreement

additional noise questions

(44)

10 20 30 40 50 60 70 80 90 100

size of the smallest mindset|V2| 0.25 0.24 0.23 0.22 0.21 0.2 0.19 0.18 0.17 0.16 0.15 0.14 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.0 noise p 0.07 0.15 0.2 0.32 0.34 0.41 0.29 0.51 0.63 0.49 0.15 0.24 0.27 0.33 0.36 0.42 0.54 0.55 0.68 0.74 0.05 0.22 0.24 0.24 0.59 0.66 0.75 0.75 0.6 0.77 0.11 0.27 0.24 0.21 0.55 0.63 0.77 0.77 0.65 0.76 0.09 0.25 0.27 0.33 0.53 0.66 0.79 0.94 0.92 0.98 0.08 0.44 0.36 0.47 0.55 0.68 0.79 0.89 0.93 0.94 0.13 0.28 0.34 0.43 0.83 0.82 0.86 1 0.99 0.94 0.15 0.33 0.4 0.63 0.83 0.91 0.96 1 1 1 0.16 0.33 0.43 0.57 0.81 0.96 0.94 0.95 1 1 0.27 0.39 0.48 0.64 0.91 0.96 0.95 0.96 0.95 1 0.09 0.5 0.39 0.8 0.89 0.97 1 1 1 1 0.28 0.44 0.51 0.83 0.93 1 1 1 1 1 0.53 0.49 0.65 0.9 0.97 1 1 1 1 1 0.41 0.74 0.72 0.96 1 1 1 1 1 1 0.3 0.65 0.77 1 1 1 1 1 1 1 0.5 0.58 0.88 1 1 1 1 1 1 1 0.57 0.63 1 1 1 1 1 1 1 1 0.7 0.6 1 1 1 1 1 1 1 1 0.46 0.9 1 1 1 1 1 1 1 1 0.65 0.97 1 1 1 1 1 1 1 1 0.77 0.92 1 1 1 1 1 1 1 1 0.87 0.96 1 1 1 1 1 1 1 1 0.93 0.98 1 1 1 1 1 1 1 1 0.87 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.2 0.4 0.6 0.8 1.0

(45)

The same goes for the SBM

18 10 20 30 40 50 60 70 80 90 100 Block size|Vi| 100 90 80 70 60 50 40 30 20 10 Agreemen t a 0 0 0 0 0 0 0 0 0 0.9 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0.9 1 1 1 0 0 0 0 0 0.7 1 1 1 1 0 0 0 0 0.57 0.99 1 1 1 0.95 0 0 0 0.25 0.97 0.99 1 1 0.8 0.7 0 0 0 0.95 0.97 0.99 0.85 0.65 0.8 0.7 0 0 0.86 0.87 0.53 0.59 0.8 0.65 0.76 0.65 0 0.74 0.52 0.45 0.35 0.51 0.67 0.61 0.76 0.64 0.0 0.2 0.4 0.6 0.8 1.0

(46)

Visualizing tangles

19

(47)

Visualizing tangles

19

(48)

Visualizing tangles

19

(49)

Visualizing tangles

19

(50)

Visualizing tangles

19

(51)

Dendrogram

(52)

Dendrogram

(53)

(54)

Tangles: from weak to strong clustering