• No results found

Scaleable correlation clustering algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Scaleable correlation clustering algorithms"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Scaleable correlation clustering algorithms

Edo Liberty

(2)
(3)
(4)
(5)
(6)
(7)

Correlation clustering results

approx const running time

Bansal, Blum, Chawla ≈ 20, 000 Ω(n2)

Demaine, Emanuel,

Fiat, Immorlica 4 log(n) LP

Charikar, Guruswami,

Wirth 4 LP

Ailon, Charikar,

Newman, Alantha 2.5 LP

Ailon, Charikar,

Newman, Alantha 3 O(m)

Ailon, Liberty < 3 O(n) + cost(OPT )

(8)
(9)

Input for correlation bi-clustering

(10)

Output of correlation bi-clustering

(11)

Cost of a correlation bi-clustering solution

(12)

Correlation bi-clustering results

approx const running time

Demaine, Emanuel,

Fiat, Immorlica O(log(n))∗ LP

Charikar, Guruswami,

Wirth O(log(n))∗ LP

Guo, Huffner,

Ko-musiewicz, Zhang 4 sequential O(m)

4

3 message passing rounds O(n + cost(OPT )) com-munication

(13)

Quick-bi-cluster

(14)

Quick-bi-cluster

(15)

Quick-bi-cluster

(16)

Quick-bi-cluster

(17)

Quick-bi-cluster

(18)

Quick-bi-cluster

(19)

Quick-bi-cluster

(20)

Quick-bi-cluster

(21)

Quick-bi-cluster

Lemma

Let OPT denote the best possible bi-clustring of G . Let B be a random output of quick-bi-cluster . Then:

E [cost(B)] ≤ 4cost(OPT ) Let’s prove this...

(22)

Bad squares, bad events, and erroneous edges

A bad square, is a set of four nodes (two on each side) between which there are exactly three edges.

(23)

Bad squares, bad events, and erroneous edges

A bad square, is a set of four nodes (two on each side) between which there are exactly three edges.

(24)

Bounding cost(OPT )

OPT must make at least one mistake in every bad square.

cost(OPT ) ≥ minX e xe s.t. ∀ s X e⊂s xe≥ 1

(25)

Bounding cost(OPT )

Defining dual of the problem and using that PRIMAL ≥ DUAL we have:

cost(OPT ) ≥ maxX s βs s.t. ∀ e X s⊃e βs ≤ 1

(26)

Bad squares, bad events, and erroneous edges

A bad event, happens to a bad square when two nodes that belong to it are chosen, one on each side.

(27)

Bad squares, bad events, and erroneous edges

(28)

Bad squares, bad events, and erroneous edges

Property 1

There is a one to one mapping between bad events occurring and errors in the output clustering.

E(cost(B )) = E X e ze = E X s zs = X s ps

ze denote the indicator variable that edge e is erroneous

zs denote the event that a bad event happened to bad square s.

(29)

Bad squares, bad events, and erroneous edges

Property 2

The probability of each edge in a bad event being erroneous is 1/4.

We have that pe≤ 1 for every edge, and also:

pe = X s⊃e Pr(ze|zs = 1) · ps = X s⊃e 1 4ps ∀ e X s⊃e 1 4ps ≤ 1

(30)

Putting it all together

E(cost(B )) = X s ps ∀ e X s⊃e 1 4ps ≤ 1 cost(OPT ) ≥ maxX s βs s.t. ∀ e X s⊃e βs ≤ 1 We finally get...

The values βs = 14ps give a feasible solution to DUAL and so

(31)
(32)
(33)

References

Related documents

The others (e.g. Playing Videos, adding the shutdown button) are not crucial to the camera project but can be done if you’re also interested in exploring these capabilities.

Furthermore, it aims to respond to Philippine Accrediting Association Of Schools, Colleges And Universities' (PAASCU) recommendation &#34;to come up with collection

If there is a significant difference between the upper and lower slope angles at the best potential split, then the aspect region splits into two separate land components.. The

The Final Step… Habitat for Humanity of the Greater Harrisburg Area relies on contributions from the community and volunteers to make the creation of affordable homes for our partner

An empowering decision to learn, to become a ferocious learner, to use our mind for learning—accelerates our ability and skills in learning. 4) Set the Right Frames for Learning.

Six principal financial ratios - net profit margin, pretax profit margin, gross profit margin, pretax profits to assets, return on equity and liabilities to assets, are presented

It was shown in Chapter 6 that the diffusion coefficient for current paper- based electrochemical sensor is low, which was verified to have higher sen- sitivity by simulation

Assuming an average 3 per cent inflation in the coming decade and an average debt of 66 per cent (year 2015 value at a 3 per cent economic growth, see Table 1), multiplying this 66