• No results found

Big Data Graph Algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Graph Algorithms"

Copied!
86
0
0

Loading.... (view fulltext now)

Full text

(1)

Institute for Theoretical Informatics, Karlsruhe

Big Data Graph Algorithms

Christian Schulz

(2)

Algorithm Engineering

1 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Algorithms

implement

design

exper

iment

analyz

e

(3)

Research Areas

Scalable (parallel) sorting

Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation

Scalable (parallel/external) matchings

Scalable (shared-mem parallel) graph drawing Independent sets on large inputs

(4)

Research Areas

2 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Scalable (parallel) sorting

Scalable (parallel/external)graph partitioning Scalable (parallel) graph generation

Scalable (parallel/external) matchings

Scalable (shared-mem parallel)graph drawing Independent setson large inputs

(5)

Huge Complex Networks

(6)

3 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Scalable Graph Partitioning

(7)

The Common Parallel Approach

Mesh partitioned via dual graph

1.Each volume (data, calculation) represented by a vertex (+edges)

2.Interdependencies represented by edges All PE’s get same amount of work Communication is expensive

Graph Partitioning Problem:

Partition a graph into (almost) equally sized blocks, such that the number of edges connecting vertices from different blocks is minimal.

(8)

e

-Balanced Graph Partitioning

5 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Partition graphG= (V,E,c:V→R>0,ω :E→R>0) intokdisjoint blocks s.t.

totalnode weightof each block≤ 1+e

k total node weight

total weight ofcutedges as small as possible

Relevant Applications:

(9)

Multilevel Graph Partitioning

input graph match local improvement partitioning initial output partition ... contract uncontract ...

(10)

Matching-based Coarsening

7 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

a+b

A+B

a

b

A B

1.compute matching 2.perform contraction

(11)

Matching-based Coarsening

a+b

A+B

a

b

A B

1.compute matching 2.perform contraction

(12)

Local Search

8 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

compute gain: g(v) =dext(v)−dint(v) select blocks alternately – move nodesgreedy edge cut: 7

(13)

Local Search

1 0 −1 1 −1 0 −1 −1 1 0

computegain: g(v) =dext(v)−dint(v) select blocks alternately – move nodesgreedy edge cut: 7

(14)

Local Search

8 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

1 0 −1 1 −1 0 −1 −1 1 0

computegain: g(v) =dext(v)−dint(v) select blocks alternately – move nodesgreedy edge cut: 7

(15)

Local Search

0 −1 0 −1 −1 −2 1 −3 −1

updategaing(v)of neighbors move a node at most once edge cut: 7,6

(16)

Local Search

8 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics −1 0 −2 −3 −1 −3 −3 0

until stop criteria reached increase in cut possible edge cut: 7,6,5

(17)

Local Search

0 −2 −3 −1 −3 −3 −3

until stop criteria reached increasein cut possible edge cut: 7,6,5,5

(18)

Local Search

8 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics −2 −3 −3 −3 −2 −3

until stop criteria reached increasein cut possible edge cut: 7,6,5,5,6

(19)

Local Search

#steps

cut

increasein cut possible

(20)

Local Search

10 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

final partition with edge cut5 linear time

(21)

KaHIP

Karlsruhe High Quality Partitioning

input graph Output Partition contract ... ... match partitioning initial local improvement uncontract [IPDPS10] [ESA11] [ESA11] W− F− V− [IPDPS10] cycles a la multigrid [SEA13] [DIMACS12] [ALENEX12] evol. alg. distr. highly balanced: [ALENEX12] [SEA12/14] [SEA14,IPDPS15] A C B + edge ratings flows etc. parallel Multilevel Graph Partitioning A B C 0−1 −1 0 −1 A B C 0−1 −1 0 −1 0 0 social separators buffoon

(22)

KaHIP

Benchmarks

12 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

1.Walschaw Benchmark: runtime neglected

816 instances (e∈ {0%, 1%, 3%, 5%})

focus onsolution quality

almostallinstances improved or reproduced 2.10th DIMACS Implementation Challenge

best scoresin categories:

(23)
(24)

Matching-based Coarsening

Problem

13 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

bad for networks that are highly irregular substantial reduction is hard using matchings may contractwrongedges!

(25)

Matching-based Coarsening

Problem

bad for networks that are highly irregular substantial reduction is hard using matchings may contractwrongedges!

(26)

Basic Idea

14 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

aggressive contraction / simple and fast local search main idea:contractclusterings

clustering paradigm:internally denseandexternally sparse

(27)

Basic Idea

Contraction of Clusterings

A+B+C

a+b+c

a

b

c

A

B

C

contraction: respect balance and cut avoidlarge blocks: size constraintU

(28)

Basic Idea

Contraction of Clusterings

15 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

A+B+C

a+b+c

a

b

c

A

B

C

contraction: respect balance and cut avoidlarge blocks: size constraintU

(29)

Label Propagation

Cut-based, Linear Time Clustering Algorithm[Raghavan et. al] cut-basedclustering using size-constraint label propagation

start withsingletons

traverse nodes in random order orsmallest degree first

move node to cluster havingstrongesteligibleconnection modificationeligible: w.r.t size constraint U

Scan

...

...

(30)

Label Propagation

17 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(31)

Label Propagation

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(32)

Label Propagation

17 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(33)

Label Propagation

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(34)

Label Propagation

17 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(35)

Label Propagation

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(36)

Label Propagation

17 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(37)

Label Propagation

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(38)

Label Propagation

17 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(39)

Label Propagation

Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18 ... 5.09

(40)

Label Propagation

Simple Local Search

18 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Greedy Local Search:

start withpartitionfrom coarser level traverse nodes in random order

move node to cluster havingstrongesteligibleconnection eligible: w.r.t size constraintU:= (1+e)|Vk|

Scan

...

...

(41)
(42)

Graph Distribution over PEs

19 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Graph Distribution:a PE receivesn/pvertices and their edges Processor I Processor II

Communication

ghost nodes: adjacent nodes on other processor (communication!) interface nodes: nodes adjacent to ghost nodes

(43)

Label Propagation

Distributed Memory

each PE has a static part of the graph, only block IDs can change Overlap Computation and Communication (PE centric view):

Scan

V

Phase i Phase i+1 Phase i−1

At the end of phasei:

send block ID updates of phaseito neighboring PEs

receive block ID updates from neighboring PEs from phasei−1

* while scanning in phasei, messages are routed through the network

(44)

Contraction of Clusterings

The Parallel Case – High Level

21 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

A+B+C

a+b+c

a

b

c

A

B

C

parallel find mappingC:n..−1→n0..−1

exchange subgraphs, compute contracted graph locally when graph small parallel initial partitioning

(45)
(46)

Parallel Solution Quality

Performancek=2blocks, 32PEs

22 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

instances: meshes and social networks/web graphs ParMetis hasineffectivecoarsening (matching-based)

due to memory consumption of coarsest graph (dist. among PEs)

→couldnotsolve arabic-2005, sk-2005 and uk-2007 solved instances (ParMetis):

fastandecoyield 19.2% and 27.4% improvement

fastandecoslower on average social networks/web graphs:

fast: 38% less cut edges and>2×faster

eco: 45% less cut edges and slower

best instance: 18×faster and 61.6% less cut edges improvement overFacebook[Ugander and Backstrom]: 45% less cut edges on LiveJournal andmuch faster(k=100)

(47)

Strong Scaling

Social Networks 10 100 1000 1 2 4 8 16 32 64 256 1K 2K total time [s] number of PEs p Fast sk-2007 Fast arabic-2005 Fast uk-2002 Fast uk-2007 Minimal uk-2007

uk-2007 can be partitioned in15.2 seconds(seq. 10.5min) 72 seconds for random geometric graph with≈22G edges more scaling results in the paper

(48)

23 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Graph Drawing

(49)

Problem

G

= (

V,

E,

d

)

d

:

E

R

(50)

Problem

24 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

G

= (

V,

E,

d

)

d

:

E

R

d

1

(51)

Maximal Entropy Stress Model

[Gansner et al.’13]

EntropyH(x):

physics: nodes evenly dispersed

→nodes as far away as possible some nodes have predefineddistance! Maximal Entropy Stress Model:

maxH(x):=

{u,v}6∈E

ln||xu−xv||

(52)

Maximal Entropy Stress Model

[Gansner et al.’13]

25 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

EntropyH(x):

physics: nodes evenly dispersed

→nodes as far away as possible some nodes have predefineddistance! Maximal Entropy Stress Model:

maxH(x):=

{u,v}6∈E

ln||xu−xv||

subject to||xu−xv||=duv,{u,v} ∈E

not possible to satisfy all constraints!

(53)

Maximal Entropy Stress Model

[Gansner et al.’13]

Compromise: min error, max entropy

min

u,v∈E

wuv(||xu−xv|| −duv)2−αH(x)

αtrade-off parameter

Solve optimization problem by

repeatedly solving Laplacian systems or iterative scheme ...

(54)

Maximal Entropy Stress Model

[Gansner et al.’13]

26 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Compromise: min error, max entropy

min

u,v∈E

wuv(||xu−xv|| −duv)2−αH(x)

αtrade-off parameter

Solve optimization problem by

repeatedly solving Laplacian systems oriterative scheme...

(55)

Maximal Entropy Stress Model

[Gansner et al.’13] ... oriterative scheme: xu← 1 ρu {u,v

}∈E wuv xv+duv xu −xv kxu−xvk + α ρu{u,v

}/E xu−xv kxu−xvk2

→overall update costsO(n2)per iteration

Our contributions:

make this usable and fast in practice multilevel integration

approximate long-range forces employ parallelism

(56)

Multilevel Graph Drawing

[Hadany, Harel’99]

28 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

input graph initial drawing output drawing contract ... uncontract ... improve drawing

(57)

Multilevel Graph Drawing

Initial Drawing

coarsen until only two nodes left place them at optimal distance

define distances on coarse graphs, stay tuned

(58)

Uncoarsening

30 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Local Improvement

minimize maxent-stress on each level of hierarchy assume disk with radiuspc(u)to drawc(u)vertices

→define distanceduv:= √ c(u)+√c(v) 2 on current level Iterative Scheme xu← 1 ρu {u,

v}∈E wuv xv+duv xu −xv kxu−xvk + α ρu{u,

v}/E xu−xv kxu−xvk2

(59)

Uncoarsening

Local Improvement

minimize maxent-stress on each level of hierarchy assume disk with radiuspc(u)to drawc(u)vertices

→define distanceduv:= √ c(u)+√c(v) 2 on current level Iterative Scheme xu← 1 ρu {u,

v}∈E wuv xv+duv xu −xv kxu−xvk + α ρu{u,

v}/E xu−xv kxu−xvk2

(60)

Local Improvement

31 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Iterative Scheme xu ←. . .

{u,v}∈/E xu−xv kxu−xvk2 | {z } =:r(u,v) Approximation xu ← · · ·

u6=v M(u)=M(v) r(u,v) +

v0 ∈V0 v0 6=M(u) ν(v0) xu −x0v0 kxu−xv00k2 −

{u,v}∈E r(u,v) M(u)cluster ofu

ν(v0)number of finer vertices ofv0on current level

V0 vertex set of next coarser level

(61)

Local Improvement

Approximation xu ← · · ·

u6=v M(u)=M(v) r(u,v) +

v0 ∈V0 v0 6=M(u) ν(v0) xu−x0v0 kxu−xv00k2 −

{u,v}∈E r(u,v) M(u)cluster ofu

ν(v0)number of finer vertices ofv0on current level

V0 vertex set of next coarser level

(62)

Local Improvement

33 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Additional Enhancements

after each iteration→update barycenter of coarse nodes vertex computations independent→add parallelism

use approximation multiple –h– levels beneath in hierarchy

input graph initial drawing output drawing contract ... uncontract ... improve drawing

Proposition: Assume equal cluster sizes. The running time of one iteration of MulMenth,h≥0, isO(m+n

h+2

(63)
(64)

Scalability

Running Time [Delaunayn=220]

34 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

102 103 104 105 106 1 2 4 8 16 32 64 total time [s] number of PEs p MulMent0 MulMent1 MulMent2 MulMent4 MulMent6 MulMent8 MulMent9 MulMent10

(65)

Running Times

graph PMDS MaxEnt MulMent0 MulMent1 MulMent10

btree 0.02 1.14 0.10 0.17 0.11 1138 bus 0.04 1.41 0.15 0.13 0.11 USpowerG 0.14 3.82 0.29 0.23 0.18 3elt 0.16 3.45 0.21 0.19 0.17 commanche 0.24 5.42 0.32 0.27 0.18 bcsstk31 3.44 48.48 5.63 3.97 1.82 fe pwt 1.49 31.60 4.46 2.67 0.66 del16 2.86 61.42 13.63 7.75 1.10 luxembourg 3.10 96.10 40.94 22.87 1.39 nyc 9.03 233.94 216.27 119.33 3.70 auto 41.80 665.67 613.51 329.08 20.70 del20 53.80 1125.03 3303.82 1749.77 27.01

Table :Running times in seconds per graph. Smaller is better. PivotMDS and MaxEnt use one thread (sequential codes), the MulMent∗algorithms use 32

cores (64 threads). Running times of MaxEnt are without the time of PMDS (which yields input coordinates to MaxEnt)

(66)

Experimental Results

Summary

36 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Influence ofh/ Comparision

increasinghnot a large impact on solution quality maxent-stress remains comparable

maxent-stress of MaxEnt and MulMent more or less similar Dynamic Networks

≈model: removex% random edges, insertx% edges (distance≤ D) 4×faster (h=0), save 50% time (h=7)

(67)

Example Drawings

fe pwt

(68)

Example Drawings

37 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

bcsstk31

(69)

Example Drawings

commanche

(70)

37 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Independent Sets

(71)

Definitions

Independent Set:

subsetS⊆Vsuch that there are no adjacent nodes in S Maximum Independent Set:

maximum cardinality setS

related tomaximum cliqueandminimum vertex cover finding a MIS isNP-hardand hard to approximate

(72)

Common Approaches

39 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

use heuristic algorithms

gradually improve asingle solution node deletions, insertions and swaps plateau search

diversification & restart rules Andrade et al. (2012):

(73)

ARW Local Search

Node Swaps:

(j,k)-swaps:

removejsolution nodes and insertknew ones

(1, 2)-swaps:

removesinglesolution node and inserttwonew ones Local Search:

search for(1, 2)-swaps in timeO(m)

use data structure that supports fast insertion and removal

Solution nodes Free nodes Non-free

(74)

e

-Balanced Graph Partitioning

Node Separators

41 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Partition graphG= (V,E,c:V→R>0,ω :E→R>0) intokdisjoint blocks +node separators.t.

total node weight of each block≤ 1+e

k total node weight

(75)

Evolutionary Algorithms

General Structure

proceduresteady-state-EA

create initial populationP

whilestopping criterion not fulfilled selectparentsP1,P2fromP

combineP1withP2to create offspringo

mutateoffspringo

evictindividual in population usingo

(76)

Combine Operations

43 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Graph Partitioning:

exchangewhole blocksof solutions

operators for edge separators and node separators smallcut-sizevital for efficiency

+

=

Local Search:

resulting independent sets may not be maximal

(77)

Node Separator Combine

V

1

S

V

2

V

1

S

V

2

V

1

S

V

2

buildnode separatorV=V1∪V2∪S

use node separator ascrossover point combination takeslinear timeO(n)

(78)

Node Separator Combine

44 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

V

1

S

V

2

V

1

S

V

2

V

1

S

V

2

buildnode separatorV=V1∪V2∪S

use node separator ascrossover point combination takeslinear timeO(n)

(79)

Kernelization

[Akiba,Iwata’15] Reductions:

rules to decrease graph size, while maintaining optimality

solve problem onproblem kernel(using EA)

(80)

Kernelization

[Akiba,Iwata’15]

45 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

Reductions:

rules to decrease graph size, while maintaining optimality Example:

remove degree 0 or1vertices

v

(81)

Kernelization

[Akiba,Iwata’15] Reductions:

rules to decrease graph size, while maintaining optimality Example:

remove degree 0 or1vertices

v

neighbor ofvin MIS→choosevinstead, else addv

(82)

Guess “likely candidates”

46 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

can weguessvertices that are in MIS?

idea:select small-degree vertices from “fittest” independent set apply more reductions and recurse!

(83)
(84)

Near-optimal on “difficult networks”

47 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

finds exact MIS faster, when exact algorithm is slow: Skitter48 min21 min

Stanford13 hours5 min

bcsstk308.6 hours2.4 sec

Skitter2 hours28 sec, ...

finds exact MIS, for large networks with known MIS size consistently finds larger solutions on social and road networks

227000 227500 228000 228500 229000 229500 230000 10-1 100 101 102 103 104 105 Solution Size Time [s] ARW EvoMIS ReduMIS 129800 130000 130200 130400 130600 130800 131000 131200 131400 131600 10-1 100 101 102 103 104 105 Solution Size Time [s] ARW EvoMIS ReduMIS

(85)
(86)

Conclusion

48 Christian Schulz: Big Data Graph Algorithms

Department of Informatics Institute for Theoretical Informatics

applyalgorithm engineeringto optimization problems obtain algorithms that scale tolarge inputs and machines outperformstate-of-the-art

open source implementations KaHIP:algo2.iti.kit.edu/kahip KaDraw:algo2.iti.kit.edu/kadraw KaMIS:algo2.iti.kit.edu/kamis appl. engineering realistic models design implementation libraries algorithm− perf.− guarantees applications 4 6 3 10 deduction falsifiable induction hypotheses7 5 analysis experiments algorithm engineering real Inputs 9 8

References

Related documents

FIGURE 17.7.5.6.5.1(A), PARTS A, C, AND E, SHALL APPLY IF CORRIDOR DETECTION IS UTILIZED, ADDITIONAL SMOKE DETECTORS FOR MAGNETIC DOOR HOLDERS WILL NOT BE REQUIRED IN ACCORDANCE

Canterbury Health Laboratories, Christchurch Hospital, New Zealand; 7 Centro de Genética Médica Doutor Jacinto Magalhães, Instituto Nacional de Saúde Doutor Ricardo

Alexander Klein; CIAM; collective dwelling; domestic space; housing crisis; Karel Teige; minimum dwelling; rationalization; Taylorism; universal

We provide empirical analysis on the security assumption (Assumption 1) and the adversarial test accuracy against black-box substitute model training attacks for the MNIST [29]

H1a: There is a significant relationship between the perceived online attachment motivation (POAM) of individual online users of social media platforms and

Commercial ships visiting the Arctic need to comply with the GMDSS requirements of the SOLAS Convention, which requires them to be fit for Sea Area A4 when sailing above

Then the posterior mean estimates of the GARCH coefficients in regime 1, α 1 (1) and β 1 (1) are approximately equal, which indicates that small prediction errors are followed by

The cell bodies of the bipolar, horizontal and amacrine cells contribute to the inner nuclear layer of the retina while the photoreceptor cell bodies form the outer nuclear