• No results found

Online Correlation Clustering

N/A
N/A
Protected

Academic year: 2021

Share "Online Correlation Clustering"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Online Correlation Clustering

Ocan Sankur1 2

March 3, 2010

(Joint work with Claire Mathieu2 and Warren Schudy 2 )

1Ecole Normale Sup´erieure, Paris, France 2Brown University, Providence, RI, USA

(2)

Correlation Clustering

Input: complete graph with edges labeled +/- (similarity)

Output: partition of vertices (clustering) that agrees as much as possible with input: maximize profit = ‘+’ edges within clusters plus ‘−’ edges between clusters.

(3)

Correlation Clustering

Input: complete graph with edges labeled +/- (similarity)

Output: partition of vertices (clustering) that agrees as much as possible with input: maximize profit = ‘+’ edges within clusters plus ‘−’ edges between clusters.

(4)

Correlation Clustering

Input: complete graph with edges labeled +/- (similarity)

Output: partition of vertices (clustering) that agrees as much as possible with input: maximize profit = ‘+’ edges within clusters plus ‘−’ edges between clusters.

(5)

Background

Ben-Dor, Shamir, Yakhini [BDSY99] and Bansal, Blum, Chawla. [BBC04]: respectively, to cluster gene expression patterns and for information retrieval applications.

(6)

Background

Ben-Dor, Shamir, Yakhini [BDSY99] and Bansal, Blum, Chawla. [BBC04]: respectively, to cluster gene expression patterns and for information retrieval applications.

NP-hard [BBC04].

An algorithm that outputs a clustering with profit

(7)

Background

Ben-Dor, Shamir, Yakhini [BDSY99] and Bansal, Blum, Chawla. [BBC04]: respectively, to cluster gene expression patterns and for information retrieval applications.

NP-hard [BBC04].

An algorithm that outputs a clustering with profit

≥(1−ǫ)profit(OPT), for any ǫ >0, [BBC04].

Our contribution: We study this problem online.

(8)

Correlation Clustering: Online setting

Vertices arrive one by one. The size of the input is unknown.

Online clustering algorithm

Upon arrival of a vertex v, an online algorithm can

Create a new cluster{v}.

Addv to an existing cluster.

Merge any pre-existing clusters. Split a pre-existing cluster

(9)

Correlation Clustering: Online setting

Vertices arrive one by one. The size of the input is unknown.

Online clustering algorithm

Upon arrival of a vertex v, an online algorithm can

Create a new cluster{v}.

Addv to an existing cluster.

Merge any pre-existing clusters. Split a pre-existing cluster

An online algorithm is c-competitive if on any input I, the algorithm

outputs a clustering ALG(I) s.t. profit(ALG(I))≥c·profit(OPT(I)) where OPT(I) is the offline optimum.

(10)

Our results for maximizing profit

Results

(11)

Our results for maximizing profit

Results

A greedy algorithm that is 0.5-competitive;

No algorithm has a competitive ratio better than 0.834;

(12)

Our results for maximizing profit

Results

A greedy algorithm that is 0.5-competitive;

No algorithm has a competitive ratio better than 0.834;

We design a (0.5 +ǫ0)-competitive algorithm, whereǫ0 is a small

(13)

Our results for maximizing profit

Results

A greedy algorithm that is 0.5-competitive;

No algorithm has a competitive ratio better than 0.834;

We design a (0.5 +ǫ0)-competitive algorithm, whereǫ0 is a small

constant. How small?

(14)

Our results for maximizing profit

Results

A greedy algorithm that is 0.5-competitive;

No algorithm has a competitive ratio better than 0.834;

We design a (0.5 +ǫ0)-competitive algorithm, whereǫ0 is a small

constant. How small?

(15)

Algorithm

Greedy

Algorithm 1 AlgorithmGreedy

Upon arrival of vertexv do

Putv in new cluster{v}.

while ∃clusters C,D s.t. merging C andD improves the profitdo

MergeC and D

end while end for

(16)
(17)

Algorithm

Greedy

: Example

(18)
(19)

Better than a 0

.

5-approximation (1)

Result 1

AlgorithmGreedy is 0.5-competitive.

(20)

Better than a 0

.

5-approximation (1)

Result 1

AlgorithmGreedy is 0.5-competitive.

(21)

Better than a 0

.

5-approximation (1)

Result 1

AlgorithmGreedy is 0.5-competitive.

If profit(OPT)≤(1−α)|E|, Greedy has competitive ratio>0.5.

Idea: design an algorithm (Dense) with competitive ratio>0.5

when profit(OPT)>(1−α)|E|.

(22)

Better than a 0

.

5-approximation (1)

Result 1

AlgorithmGreedy is 0.5-competitive.

(23)

Better than a 0

.

5-approximation (2)

Algorithm 2 GreedyOrDense

With probabilityp, runGreedy,

With probability 1−p, runDense.

Result 2

AlgorithmGreedyOrDense is (0.5 +ǫ0)-competitive.

(24)

Introducing Algorithm

Dense

Reminder: focus on instances where profit(OPT)>(1−α)|E|.

Idea of Algorithm Dense: fixτ = 1.10,

Put every new vertex in a singleton. At times ti =τi

◮ Compute (near) OPT(ti) using [BBC04],

(25)

Introducing Algorithm

Dense

Reminder: focus on instances where profit(OPT)>(1−α)|E|.

Idea of Algorithm Dense: fixτ = 1.10,

Put every new vertex in a singleton. At times ti =τi

◮ Compute (near) OPT(ti) using [BBC04],

◮ Use it to run a merging procedure.

Simplification: Suppose we have the exact OPT at timesti.

(26)
(27)

Algorithm

Dense

: The merging procedure by example

At time t2, we run the merging procedure.

First, compute OPT(t2).

Then try to recreate OPT(t2).

(28)
(29)

Algorithm

Dense

: The merging procedure by example

The red clustering is the clustering at timet2.

We keep in mind that we obtained B′

1,B2′ by adapting our clusters toB1

and B2 of OPT(t2).

Call B1,B2 ∈OPT(t2) ghost clusters.

(30)
(31)

Algorithm

Dense

: The merging procedure by example

This is the clustering at time t3.

We keep ghost clusters C1,C2 ∈OPT(t3), since we obtained C1′,C2′ by

adapting to these.

(32)

Analysis of

Dense

Ifτ is large enough, then the clustering of Dense at time ti =τi, is

close to OPT(ti): profit(Dense(ti))(1O(α)) ti 2 .

(33)

Analysis of

Dense

Ifτ is large enough, then the clustering of Dense at time ti =τi, is

close to OPT(ti): profit(Dense(ti))(1O(α)) ti 2 .

Ifτ is small enough, then the profit of the clustering of Dense stays

high between two updates.

(34)

Analysis of

Dense

Ifτ is large enough, then the clustering of Dense at time ti =τi, is

close to OPT(ti): profit(Dense(ti))(1O(α)) ti 2 .

Ifτ is small enough, then the profit of the clustering of Dense stays

high between two updates.

Lemma (Main Lemma)

(35)

Better than a 0

.

5-approximation

Algorithm 3 GreedyOrDense

With probabilityp, runGreedy,

With probability 1−p, runDense.

Result 2

AlgorithmGreedyOrDense is (0.5 +ǫ0)-competitive.

(36)

Our results on minimizing cost

Instead of maximizing profit, minimize cost = ‘-’ edges within clusters plus ‘+’ edges between clusters.

(37)

Our results on minimizing cost

Instead of maximizing profit, minimize cost = ‘-’ edges within clusters plus ‘+’ edges between clusters.

In the offline case, there is a 2.5-approximation Ailon, Charikar, Newman

[ACN05].

(38)

Our results on minimizing cost

Instead of maximizing profit, minimize cost = ‘-’ edges within clusters plus ‘+’ edges between clusters.

In the offline case, there is a 2.5-approximation Ailon, Charikar, Newman

[ACN05].

Theorem

Algorithm Greedyis O(n)-competitive for minimizing cost, and this is

(39)

Conclusion

A greedy 0.5-competitive algorithm for maximizing profit.

But 0.5 is not the best we can: We exhibit a randomized 0.5 + 10−14

ratio.

Future work:

- Show a better upper bound - or find a better algorithm.

(40)

References

Nir Ailon, Moses Charikar, and Alantha Newman, Aggregating inconsistent information: ranking and clustering, STOC ’05:

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing (New York, NY, USA), ACM Press, 2005, pp. 684–693. Nikhil Bansal, Avrim Blum, and Shuchi Chawla, Correlation clustering, Mach. Learn. 56(2004), no. 1-3, 89–113.

Amir Ben-Dor, Ron Shamir, and Zohar Yakhini, Clustering gene expression patterns, Journal of Computational Biology 6(1999), no. 3-4, 281–297.

References

Related documents

Ontology is defines as a set of primitives that is used to design the domain of knowledge. Ontology language is a formal language that is used to design ontologies. It allows the

CLASS B is defined by behaviors that function to inhibit an interpersonal relationship between the client and therapist or the client and other people due to the client’s inability

Designing Robust Multiple Authority Control Access for Cloud

To prove the existence of periodic solution of Gurtin- MacCamy model we consider the Volterra integral equation (VIE) [2] and compare the models equation with

z The assessment of energy consumption in middle loaded The assessment of energy consumption in middle loaded (intermittent) region using JRA standard is NOT

Successful treatment of a case with pancreatic neuroendocrine carcinoma with focal hepatoid differentiation: a case report and literature review. Matsueda K, Yamamoto H, Yoshida Y,

Malaria is not only an infectious disease: the relationship between endemic Plasmodium falciparum malaria and Epstein-Barr virus (EBV) infection in the genesis of endemic Burkitt ’

• Restrictive tendering or quotation – Other purchases above a specified amount (except for petty cash purchases) require invitation of a specified number of suppliers sourced