• No results found

Scalable Parallel Distance Field Construction for Large-Scale Applications

N/A
N/A
Protected

Academic year: 2021

Share "Scalable Parallel Distance Field Construction for Large-Scale Applications"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Scalable Parallel Distance Field

Construction for Large-Scale Applications

Hongfeng Yu

(UNL),

Jinrong Xie

, Kwan-Liu Ma

(UC Davis),

(2)

Motivation

• Distance transform

A fundamental requirement for many

applications

• Image processing

• Computational geometry

• Robotics

A critical role in visualization

• Reduce visual clutter

• Index and compress data

at extreme resolutions

2

reference.wolfram.com

(3)

Challenges

3

p

Definition of distance transform:

Challenges at scale:

• High communication

cost

(4)

Objectives

• Scalable distance transform for large-scale scientific

computing

• Support of various data types, structures, and

semantics

• Used in an in-situ setting with a parallel computing

environment

(5)

Related Work

• Selected Existing Approaches

– Complete Euclidean distance transformation by parallel operation, H.

Yamada, 1984

– Fast hierarchical 3D distance transforms on the GPU, N. Cuntz and A.

Kolb, 2007

– Data-parallel octrees for surface reconstruction, K. Zhou et al. 2011

• Limitations

– Less feasible to address the scalability issue

• Communication overhead and unbalanced workload

(6)

Our Contribution

• Highly scalable parallel distance transform

– Leverage spatial and temporal coherence in simulation data

– Minimize communication cost across processors

– Achieve balanced workload among processors

– Scale up to 69,120 CPU cores on state-of-the-art

supercomputer

(7)

Our Approach

7

Global Tree for Workload Partition

Local Tree for Distance Computation

(8)

Our Approach

8

Collect coarse global element distribution

Construct global

distance tree Assign leaf octants

Construct full-grown local distance tree

Compute

distance field

Update tree and distance field

P0 P1 P2 P3 P4

P4

Local Tree for Distance Computation

Parallel Distance Tree Construction

(9)

Our Approach

9 Ω Γ

P0

P1

P2

P3

P4

Workload Partition

(10)

Our Approach

10 Ω Γ

P0

P1

P2

P3

P4

Collective Reduction

bitmap

Workload Partition

(11)

Our Approach

11 Ω Γ

P0

P1

P2

P3

P4

Collective Reduction

bitmap

A bitmap of 128KB

can represent more

than 1 million blocks

Workload Partition

(12)

12

Our Approach

For each processor do:

Input: bitmap

Output: global distance tree

x

y

b

c

d

e

f

g

a

h

i

j

k

l

m

n

o

p

P0 P1 P2 P3 P4

(13)

x

y

b

c

d

e

f

g

a

h

i

j

k

l

m

n

o

p

13

Our Approach

b c d e

f

g h

i

j

k

l m n o

a

p

For each processor do:

Z-curve

Morton code

P0 P1 P2 P3 P4

(14)

x

y

b

c

d

e

f

g

a

h

i

j

k

l

m

n

o

p

14

Our Approach

b c d e

f

g h

i

j

k

l m n o

a

p

For each processor do:

Two-pass task assignment

P0 P1 P2 P3 P4

1

Each processor handle local data

Assign to idle processors

2.1

Assign to processor owning a neighboring data domain

2.2

(15)

15

Our Approach

b c d e

f

g h

i

j

k

l m n o

a

p

For each processor do:

r

3r

r

3r

Local octant

Construct Full-grown Local Distance Tree

r

(16)

16

Our Approach

b c d e

f

g h

i

j

k

l m n o

a

p

Leave node vertex

Local elements

distance

For each processor do:

Compute Distance on Leave Node Vertex

(17)

17

vertices

Our Approach

Exchange Vertices

exchange the vertices that need to be

checked with the remote processors

P4 P3

(18)

18

local elements

Our Approach

For each processor do:

Compute Min-distance to Remote Vertices

P4

P3 P3

(19)

19

Processors exchange

the results

remote distance

Our Approach

P4

P3

(20)

20

Collect coarse global element distribution Construct initial coarse global distance tree

Assign leaf octants Construct full-grown local distance tree

Compute distance field Update tree and

distance field

Only a marginal portion of the tree structure needs to be updated with

respect to the field evolution between two consecutive time steps.

We do not need to update the distance field of a region if there are no field

changes within the region’s triple.

(21)

Integration with Combustion Simulations

21

H. Yu, C. Wang, R. W. Grout, J. H. Chen, and K.-L. Ma, “In situ visualization for large-scale combustion simulations,” IEEE Computer Graphics and Applications, vol. 30, no. 3, pp. 45–57, 2010.

Based on the APIs developed in our

in-situ visualization framework

Simulation provides the size and

coordinates of each processor’s global domain and local partition

Simulation provides the pointer to the buffer of the local field data

Distance field construction module is initialized and invoked by the solver at a given rate

(22)

Evaluation

22

Data Set Data Type and Scale Mode System Combustion Volume (1.3B grid points) In-situ processing Hopper Combustion Volume (1.6B grid points) Post- processing Intrepid Car Polygon (3.4M triangles) Post-processing Hopper Boeing Polygon (350M triangles) Post-processing Hopper

System Configuration

Hopper A Cray XE6 supercomputer

Lawrence Berkeley National Laboratory

6384 nodes, 2.1 GHz 12-core CPUs X 2, 32 GB of RAM/node Intrepid An IBM Blue Gene/P supercomputer

Argonne National Laboratory

(23)

Performance

23 0 20 40 60 80 100 120 140 4320 8640 17280 34560

T

ime (sec.)

Number of cores

Simulation

Distance Field Construction

(24)

Performance

24

Post-processing

Remote distance

T ime(s ec ) #cores

Exchange

T ime(s ec ) #cores

Local tree

T ime(s ec ) #cores

Local distance

T ime(s ec ) #cores

Distance volume

T ime( se c) #cores

Accumulated time

T ime(sec ) #cores

Local tree

T ime(s ec ) #cores

Local distance

T ime(s ec ) #cores

Distance volume

T ime( se c) #cores

Accumulated time

T ime(s ec ) #cores

(25)

Applications

25

Conventional Transfer Function Distance-based Transfer Function

(26)

Applications

26

Distance-based Transfer Function

(27)

Applications

27

Study material types applied on the front hood of the car:

temperature distribution at different distances from the car

(28)

Applications

28

(29)

Conclusion and Future Work

29

Scalability

Well balanced workload

Low communication cost

Parallel Distance Field Construction

Storage?

GPUs?

(30)

Acknowledgments

• National Science Foundation through grants IIS-1320229,

CCF-1025269, and IIS-1423487

• Department of Energy through grants DE-FC02-06ER25777

and DE-FC02-12ER26072 with program managers Lucy

Nowell and Ceren Susut-Bennett, and the ExaCT Center for

Exascale Simulation of Combustion in Turbulence.

(31)

Thank you!

References

Related documents

While making such a comparison, we must there- fore assume no network interface support for the switch-based multicasting schemes, i.e., every communication phase under the

According to last week’s predication, the value FB’s stock would not increase; the price of MSFT’s stock would be tending to decrease; the value of GM’s stock is

Pencegahan fraud pada umumnya adalah aktivitas yang dilaksanakan dalam hal penetapan kebijakan, sistem dan prosedur yang membantu bahwa tindakan yang diperlukan

• Real time tool inventory –Instrument Library • Inventory tracking – Real time parts inventory • Kitting – Automated ordering when consumed • Calibration – Add field

After defining its meaning, participating academic psychiatrists at this local South African university, however, disregarding of their own personal views on spirituality

However, our proposed measure of convergence between Eurozone business cycles, based on an analysis of the properties of the distribution of all bivariate correlation coef-

must submit an offer to the system operator for each trading period in the schedule period, under which the genera- tor is prepared to sell electricity to the clearing manager,

In the multilateral context we will focus on gender equality in the Arctic, by launching a project within the Arctic Council, co-lead by Norway, to promote