RTNeuron - Tide Remote Parallel Rendering
F PS 0 6 12 18 24 30 36 Resolution 2K 4K 8K 256² 512² 1024² 1920x1080 512² Tile Size Resolution 2K 4K 8K 75% 95% 100%
Tile Size JPEG Compression Quality
Benchmark 3.d: Remote RTNeuron - Tide Parallel Rendering
based application used in the Blue Brain Project to analyse results from detailed simulations of neuronal simulations.
FDR InfiniBand Four-Rack BlueGene/Q Visualization Node Tesla K20 Visualization Node Tesla K20 Tesla K20 40 total Tesla K20 Visualization Node GTX 580 GTX 580 GTX 580 Visualization Node GTX 580 GTX 580 GTX 580 10 GBit/s WAN Link Lausanne 10 GE 10 GE QDR IB GPFS Lugano
Figure 3.14: Remote Streaming Scenario
The results show that interactive frame rates are available even at the full na-
tive resolution, that a 5122
tile size is the best op- tion, and that 95% compres- sion delivers the best perfor- mance in most cases. Based on the experiments we set-
tled on a 5122 tile size and
100% compression quality to avoid artefacts as the default settings in Equalizer.
Decoupling the display system and software from the rendering system has many benefits. It increases robustness, provides reliably performance on a shared, collaborative device, facilitates media and device inteagration, and minimises data movement.
3.6
Compositing
In contrast to most other parallel rendering frameworks, Equalizer decouples the compositing algorithm from the task decomposition. This is a key aspect of our architecture, allowing a flexible configuration, often in many unforeseen ways.
The compound tree with its task decomposition, input and output frames, is a specialised description to “program” scalable rendering across parallel resources. Compositing is configured using output frames connected to input frames, compound tasks and eye passes, as well as frame parameters. In its simplest form, a sort-first source compound provides an output frame, which is routed to an input frame using the same name on the destination compound. The source viewport de- composes the task, and the output frame collects this partial result from the source channel to composite it using the correct offset onto the destination channel.
Frame parameters customise pixel handling. They include the transfer buffers (colour, depth), partial channel viewport, pixel zooming (upscale and downscale), and transport method (on-GPU texture or CPU memory). An output frame may be connected to multiple input frames. Frame parameters are used together with compound tasks for parallel compositing, and advanced features such as monitor- ing and dynamic frame resolution, introduced in Chapter 6.
readback
send/receive
composite readback
gather tiles
source1
(destination) source 2 source 3
send/receive
Figure 3.15: Direct Send Compositing
Figure 3.15 shows the pixel flow of direct send compositing for a three- way sort-last decomposition, where the destination chan- nel also contributes to the
rendering. In the first
step, each channel exchanges two colour+depth tiles with its neighbours, and then z- composites its own tile. This yields one complete tile on each channel, of which two colour tiles are then assem- bled on the destination chan- nel, where the third tile is al- ready in place.
The corresponding com- pound tree is shown schemat- ically in Figure 3.16. Each of the three channels has a child compound to execute the rendering and read back two incomplete tiles for sort-last compositing on the corresponding two sibling compounds. These three leaf compounds represent the first step in Figure 3.15. One level up, each channel receives two tiles and assem- bles them onto its partially rendered result, creating a complete tile (middle step in Figure 3.15). For the two source-only channels, a final colour-only output image is connected to the destination channel. The arrows illustrate the pixel flow for one of the tiles.
3.6 Compositing 37
range [0 ⅓]
CLEAR, DRAW, READBACK
outputframe "f1.dest" viewport [ 0 ⅓ 1 ⅓ ] outputframe "f2.dest" viewport [ 0 ⅔ 1 ⅓ ] channel "source1"
ASSEMBLE, READBACK inputframe "f2.dest" inputframe "f2.source2"
outputframe COLOR "frame.source1" viewport [ 0 ⅔ 1 ⅓ ] channel "destination"
buffer [ COLOR DEPTH ] CLEAR, DRAW, ASSEMBLE
channel "source2"
ASSEMBLE, READBACK inputframe "f2.source1" inputframe "f1.dest"
outputframe COLOR "frame.source2" viewport [ 0 ⅓ 1 ⅓ ]
inputframe "f1.source1" inputframe "f1.source2" inputframe "frame.source1" inputframe "frame.source2"
range [⅓ ⅔]
CLEAR, DRAW, READBACK
outputframe "f1.source1" viewport [ 0 0 1 ⅓ ] outputframe "f2.source1" viewport [ 0 ⅓ 1 ⅓ ]
range [⅔ 1]
CLEAR, DRAW, READBACK
outputframe "f1.source2" viewport [ 0 ⅓ 1 ⅓ ] outputframe "f2.source2" viewport [ 0 ⅔ 1 ⅓ ]
Figure 3.16: Direct Send Compound
For most rendering ap- plications even a relatively complex setup such as this example requires very lit- tle programmer involvement. The abstractions provided by the resource description, ren- der context and compounds enable Equalizer to recon- figure the application almost
transparently. For polygo-
nal rendering, it suffices that the application honours the
range parameter of the ren- der context to decompose its rendering. All other tasks, in particular the parallel com- positing and pixel transfers, are fully handled by Equal- izer. Applications which re- quire ordered compositing, for example volume render- ing, overwrite the assemble
callback to reorder the input frames correctly, before passing them on to the com- positing code.
The abstraction through frames is flexible, but still allows many architectural optimisations:
Unordered Compositing: Unless overwritten by the application, Equalizer will composite all input frames by default in the order they become available, not in the order they are configured. In the example in Figure 3.15, the des- tination channel will assemble its four input frames one by one as the output frame is received. Due to asynchronous execution and resulting pipelining of operations, the availability changes each frame depending on the runtime of other tasks.
Parallel Compression, Downloads and Network Transfers: Compressing, trans- mitting and receiving frames between nodes is handled by threads indepen- dent from the render thread. GPU transfers are handled by asynchronous PBO transfers. Pipelining all these operations with the actual rendering sig- nificantly minimises the compositing overhead.
On-GPU Transfers: Occasionally the source and destination channel are on the same GPU. Using textures as pixel buffers minimises overhead for the out- put to input frame transfer.
In Chapter 5 we provide detailed information on our compositing advances.
3.7
Load Balancing
Compounds provide only a static configuration of the parallel rendering setup.
Equalizers are algorithms hooking into a compound and modify parameters of their respective subtree at runtime, to dynamically optimise the resource usage. Each equalizer focuses on tuning one aspect of the decomposition, allowing them to be composited in a configuration. Due to their nature, they are transparent to application developers, but might have application-accessible parameters to tune their behaviour. Resource equalisation is the critical component for scalable par- allel rendering, and therefore the eponym for the Equalizer project name.
Compounds are a static snapshot of a configuration, and equalizers provide dynamic configuration on top. This separation of responsibilities is an important architectural component of our parallel rendering framework. In Chapter 6 we provide an extensive overview over the available equalizers.