• No results found

Understand Java Parallel Streams Internals: Mapping Onto the Common Fork-Join Pool

N/A
N/A
Protected

Academic year: 2021

Share "Understand Java Parallel Streams Internals: Mapping Onto the Common Fork-Join Pool"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Understand Java Parallel Streams Internals:

Mapping Onto the Common Fork-Join Pool

Douglas C. Schmidt

[email protected]

www.dre.vanderbilt.edu/~schmidt

Professor of Computer Science

Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA

(2)

2

• Understand parallel stream internals, e.g.

• Know what can change & what can’t • Partition a data source into “chunks”

• Process chunks in parallel via the common fork-join pool

• Recognize how parallel streams are mapped onto the common fork-join pool framework

Learning Objectives in this Part of the Lesson

(3)

3

Mapping Parallel Streams

Onto the Java Fork-Join Pool

(4)

4

• Each worker thread in the common fork-join pool runs a loop scanning for tasks to run

(5)

5

• Each worker thread in the common fork-join pool runs a loop scanning for tasks to run

In this lesson, we just care about tasks associated with parallel streams

map(phrase -> searchForPhrase(…))

filter(not(SearchResults::isEmpty))

collect(toList())

45,000+ phrases

Search Phrases

(6)

6

• Each worker thread in the common fork-join pool runs a loop scanning for tasks to run

• Goal is to keep worker threads & cores as busy as possible!

(7)

7

• Each worker thread in the common fork-join pool runs a loop scanning for tasks to run

• Goal is to keep worker threads & cores as busy as possible!

• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks

See en.wikipedia.org/wiki/Double-ended_queue

Sub-Task1.1 Sub-Task1.2

Sub-Task1.3 Sub-Task3.3 Sub-Task3.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

(8)

8

• The parallel streams framework automatically creates fork-join tasks that are run by worker threads in the common fork-join pool

map(phrase -> searchForPhrase(…)) filter(not(SearchResults::isEmpty)) collect(toList()) 45,000+ phrases Search Phrases Sub-Task1.1 Sub-Task1.2 Sub-Task1.3 Sub-Task3.3 Sub-Task3.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

(9)

9

• The AbstractTask super class is used by most fork-join tasks to implement the parallel streams framework

abstract class AbstractTask ... { ... public void compute() {

Spliterator<P_IN> rs = spliterator, ls; boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){ K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); } else

{ forkRight = true; ... taskToFork = ...makeChild(ls); } taskToFork.fork();

} } ...

See openjdk/8-b132/java/util/stream/AbstractTask.java

Manages splitting logic, tracking of child tasks, & intermediate results

(10)

10

• The AbstractTask super class is used by most fork-join tasks to implement the parallel streams framework

abstract class AbstractTask ... { ... public void compute() {

Spliterator<P_IN> rs = spliterator, ls; boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){ K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); } else

{ forkRight = true; ... taskToFork = ...makeChild(ls); } taskToFork.fork();

} } ...

Decides whether to split a task further and/or compute it directly

(11)

11

See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#trySplit

• The AbstractTask super class is used by most fork-join tasks to implement the parallel streams framework

abstract class AbstractTask ... { ... public void compute() {

Spliterator<P_IN> rs = spliterator, ls; boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){ K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); } else

{ forkRight = true; ... taskToFork = ...makeChild(ls); } taskToFork.fork();

} } ...

Keep partitioning input source until trySplit() returns null

(12)

12

• The AbstractTask super class is used by most fork-join tasks to implement the parallel streams framework

abstract class AbstractTask ... { ... public void compute() {

Spliterator<P_IN> rs = spliterator, ls; boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){ K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); } else

{ forkRight = true; ... taskToFork = ...makeChild(ls); } taskToFork.fork();

} } ...

Alternate which child is forked to avoid biased spliterators

(13)

13

• The AbstractTask super class is used by most fork-join tasks to implement the parallel streams framework

abstract class AbstractTask ... { ... public void compute() {

Spliterator<P_IN> rs = spliterator, ls; boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){ K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); } else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork(); }

} ...

Fork a new sub-task & continue processing the other in the loop

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html#fork

(14)

14

• The AbstractTask super class is used by most fork-join tasks to implement the parallel streams framework

abstract class AbstractTask ... { ... public void compute() {

Spliterator<P_IN> rs = spliterator, ls; boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){ ...

}

task.setLocalResult(task.doLeaf()); } ...

After trySplit() returns null this method typically calls forEachRemaining(), which then processes all elements sequentially by calling tryAdvance()

See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#forEachRemaining

(15)

15

• After the AbstractTask.compute() method calls fork() on a task this task is pushed onto the head of its worker thread’s deque

Sub-Task1.1 Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

See gee.cs.oswego.edu/dl/papers/fj.pdf 2.push() 1.fork()

(16)

16

• Each worker thread processes

its deque in LIFO order Sub-Task

1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop() 1.join()

See en.wikipedia.org/wiki/Stack_(abstract_data_type)

(17)

17

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

Sub-Task1.1 Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop() 1.join()

See en.wikipedia.org/wiki/Run_to_completion_scheduling

(18)

18

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

• LIFO order improves locality of reference & cache performance

Sub-Task1.1 Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

See en.wikipedia.org/wiki/Locality_of_reference 2.pop()

1...

(19)

19

• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques

See upcoming lessons on “The Java Fork-Join Framework”

Sub-Task1.2 Sub-Task1.3 Sub-Task1.4 Sub-Task1.1 Sub-Task3.3 Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

(20)

20

End of Understand Java Parallel

Stream Internals: Mapping Onto

References

Related documents

Nevertheless, the Defendants in both cases resolved to continue to argue that there must be a requirement for a level at which material increase of risk is

In order to determine the optimum processing parameters four factors (percentage of kenaf fibers, processing temperature, time and pressure) were identified as likely

The framework currently supports two different kinds of constraints: Simple UIMA Ruta rules, which express spe- cific expectations concerning the relationship of annotations,

Abstract Real-time interactive virtual classroom is an important type of dis- tance learning. However currently available systems are insufficient in the sup- port of large-scale

Abstract—We experimentally demonstrate a discrete wavelet multitone (DWMT) modulation scheme based on pulse amplitude modulation (PAM) for next generation passive optical

The disk structure and ev luti n are m dified by the presence f the bi- nary. Res nances emit waves and pen disk gaps. The binary’s t tal mass and mass rati as well as rbital

During the month of November a total of 246 names were sold directly to buyers as package deals, including one package that sold for 7-figures to an individual investor in

Interestingly, we often find evangelistic fervour when it comes to supporting Waterfall or Agile software development methodologies and their associated project