Threading Building Blocks - System Architecture

3.2 System Architecture

4.1.2 Threading Building Blocks

Intel TBB flow graph [16] is used in order to express our computation in terms of a flow graph. As Figure 4.1 implies, the computation we execute can be expressed as a flow graph, with nodes that indicate computations and edges between nodes that express the data flow between them. TBB flow graph helps us to express such a computation, designing a graph with different data dependencies between nodes. It also helps us to parallelize our solution, since each independent node can be run using as many threads as a user wishes.

4 . 1 . T E C H N O L O G I E S U S E D

TBB flow graph provides several types of nodes, each with different functionalities and use cases. Here, we showcase the ones we take advantage of:

Source Node The source node is the node where all flow graphs begin. It has no pre- decessors. Also, it is always serial so it can never be executed concurrently. Its important then, to make sure that its body is not a bottleneck and does not starve its successors. The source node works by evaluating a condition which, if true, will keep producing more data. If no successors accept the message it produces, it will be kept in a buffer until another successor is added or requests it.

Function Node This node receives an input, applies its body to it, and outputs the data to its successors. It is the main computation node, as it receives data, performs calculations and broadcasts the output. The (maximum) concurrency of this node can be specified in its constructor. If one sets the concurrency totbb::flow::serial, its

body will never execute concurrently, while if its set to tbb::flow::unlimited, there

will be no concurrency limit. When set totbb::flow::unlimited, it is important to note

that as soon as a message arrives to the function node, a task will be spawned to process it, but this does not mean, however, that a thread will be created. Tasks can only spawn threads from the available library’s thread pool. This means there will be no excess of threads, resulting in hindered performance. One can also set the concurrency level to a specified number such as 8 which means that, at most, 8 tasks will be spawned.

We also iterate upon TBB flow graph and, with the help of the Marrow library in [23] and [9], add additional functionality. We create two additional nodes, that allow us to spread this flow graph through the network, and not be limited to one machine. This is the functionality that Storm offers. We decided to opt with this solution instead of storm, due to the difficulty that lies in implementing a solution that has to constantly load information to and from the GPU, as well as execute algorithms on it. It would introduce unnecessary overheads to design a solution that uses Storm, as a lot of information would have to be loaded to the Storm topology and then fetched from it. The solution we designed turns out to be much simpler to implement, easier to use and very efficient at its task. There was no need to introduce additional complexity to our solution. Its objective was to design two nodes that enable communication between machines, and abstract TBB from the fact that the two nodes connected are, in fact, residing on different machines. As far as TBB is aware, the flow graph on the first machine ends on the node responsible for sending the information to the remote machine. On the second machine, TBB knows nothing about where the data is coming from, it is merely reading it from a node that parses information from the network, just like a typical source node would. Essentially, we split a flow graph in two, distribute it between two different machines and introduce a way to communicate necessary data between them. In order to support this functionality, we introduce two new nodes:

Network Source Node The network source node has a predecessor that resides on another machine. It serves as a typical TBB flow graph source node in order to start a new flow graph in a different machine. It feeds on information sent to it by the client node (described below) through the network. In order to create a network source node, the user needs to specify the port in which the server will run. Afterwards, the node will listen on that port for incoming data and, as soon as it arrives, it feeds it to its successors, just like a source node would. The dashed lines in Figure4.1

illustrate this communicate between two different machines using a network source node and a client node.

Client Node The client node has a predecessor (typically a normal TBB flow graph function node) that sends information to it. It then sends that data, through the network, to a network source node. The successor of this node is always a network source node residing on a different machine. A user need only specify the hostname and port in which the network source node is running, and it will take care of the rest, sending data along, and successfully enabling communication between machines through the use of flow graphs.

In document Image Stream Similarity Search in GPU Clusters (Page 66-68)