5.5 Rendering in RaaS
5.5.3 Post-processing Filters
RaaS provides a number of post-processing techniques with the aim of improving the quality of the final result. Generally these filters either have system-wide application and can be applied to synthesised frames irrespective of which ren- dering algorithm was used to generate them (e.g. tone mapping), or are tied to a
5. Rendering as a Service (RaaS) 103
. . .
t
3t
5t
6t
9. . .
. . . c
9c
2c
0. . .
d
r
decompress
worker
i. . .
. . .
c
jFigure 5: Asynchronous decompression for image tiles received from worker processes.
high-definition image may well occupy 6 MB of memory
(1920⇥1080, 24-bit colour); the bandwidth requirements
for interactive rendering are very demanding on the inter-
connect infrastructure since within a single second, mul-
tiple images (frames) must be transferred. This is fur-
ther exacerbated by techniques that require passing rich
bu↵ers (normals, direct and indirect lighting contributions,
albedo, etc) between workers and Task Coordinator. In or-
der to alleviate the e↵ects of communication overheads and
bandwidth limitations, all results sent back to the master
are compressed using a lossless scheme.
In a straightforward approach, the master would de-
compress the results as soon as they come in, but this
would introduce a bottleneck, forcing the workers to wait
for their next task more than is necessary. Moreover, if
the workers return all at once, performance would degrade
further due to the contention introduced at the master.
To minimise worker delays, a slightly di↵erent approach
was taken, where decompression of results was decoupled
from the receiving thread. Results are received into a cir-
cular bu↵er and a decompression thread asynchronously
expands and orders them into a frame bu↵er (Fig. 5).
This decoupling allows the master to respond to workers’
requests for work more quickly.
5.3. Post-processing and filtering
RaaS provides a number of post-processing and filter-
ing techniques with the aim of improving perceived output
quality. These are categorised in two classes: those with
system-wide application, which can be applied to synthe-
sised frames irrespective of the rendering algorithm used
to generate them (e.g., tone mapping), and those which
are closely tied to a specific rendering algorithm (e.g., dis-
continuity bu↵er). Either class is applied to a frame before
it is presented to the user.
5.3.1. Tone mapping
Tone mapping is performed on resultant frames, to fa-
cilitate the transformation of high dynamic colour ranges
into 8-bit RGB triplets. The system supports a number of
techniques of varying quality and complexity [27, 10, 9].
Client Client-side filters Clientm Accumulation and temporal filtering Clientk No filtering Task Coordinatorm No filtering Task Coordinator Centralised filters Task Coordinatork Accumulation and temporal filtering Worker1..n Distributed filters Worker1..n
Tone mapping Worker1..n
Discontinuity bu↵er and tone
mapping client-side
server-side
Figure 6: Left shows generic post-processing filter configuration; right shows two scenarios where filters are applied at di↵erent stages.
The amenability of a technique to parallelisation is an im-
portant factor in deciding where it should be applied (i.e.,
at what stage and by whom, Fig. 6
server-side). Thus, if
a technique does not require access to regions of the image
besides what is already available at a worker (e.g. global
sigmoid operator), it can be applied using a
distributed fil-
ter
before the region is compressed and sent back to the
Task Coordinator; otherwise it has to be applied at the
Task Coordinator using a
centralised filter.
5.3.2. Accumulation and temporal filtering
Interactive rendering imposes demands on frame gen-
eration times that limit the quality of global illumination
solutions. Accumulation is used in conjunction with a form
of progressive rendering to amortize the cost of computing
a high quality solution over a number of frames, when the
scene and observer are static. When the state of the scene
or observer changes, temporal filtering is used to minimise
artefacts between frames. The contributions from both
components are combined via a weighting function which
favours the temporal contribution when the observer or
scene is changing and shifts to the accumulation contribu-
tion otherwise.
5.3.3. Client-side post-processing
Computations which cannot be carried out by workers
translate into a sequential component which limits scal-
ability. Thus, the kind of computations carried out at
the Task Coordinator are limited and performed centrally
only if the costs of distributing them would outweigh the
benefits of parallel computation. The penalty incurred
by such computations is especially evident at high frame
rates, where a sequential computation time of 100 ms is
enough to limit the maximum frame rate to 10 Hz. To
overcome this potential bottleneck, computations which
do not incur additional communication overheads can be
o✏oaded onto the client, provided the latter has enough
8
Figure 5.7: Three post-processing filter configurations are shown; the left con- figuration is generic, specifying which kind of filter goes where. For instance, distributed filters can execute at a worker process, while centralised filters exe- cute on the Task Coordinator. Centralised filters may also execute at the client, but whether this is feasible or not depends on the communication required to sat- isfy the data dependencies of the filter. The configurations on the right exemplify two typical scenarios where filters are applied at different stages.
specific rendering algorithm (e.g. discontinuity buffer for interleaved sampling). Either way, filters are applied to a frame before it is presented to the user. In the framework, the application of post-processing filters is categorised into two classes: distributed and centralised (Figure. 5.7). Distributed filters are usually local filters and do not require a complete view of the image or additional asso- ciated buffers. Workers can apply distributed filters on their assigned tile, after radiance has been computed for the region. The tiles are then assembled at the Task Coordinator. On the other hand, centralised filters require information that is not entirely available to any single worker. For instance, in the case of the ap- plication of a spatial filter such as box blur, the filter iterates through each pixel in the image and replaces its colour with the average of the neighbouring pixels. The neighbourhood centred about each processed pixel is called the window, or
5. Rendering as a Service (RaaS) 104
Complete V iew
Figure 5.8: Centralised filters require more information than what is available at any individual worker. Image blurring is a typical example of a centralised filter; here a kernel is convolved with the image to achieve the blur effect and cannot be applied as a distributed filter at the individual workers because it lacks a complete view of the image. In cases where the communication overhead associated with aggregating the required data at a Task Coordinator is overly high, the regions marked by tiles can be padded to encompass any required additional information from adjacent tiles, albeit at the cost of duplicating computation.
boundaries of each tile, thus failing to account for logically adjacent pixels that would have fallen within the neighbourhood, had the entire image been processed instead. This introduces artefacts and discontinuities at tile boundaries when the tiles are composited into the final image, as shown in Figure 5.9.
The use of centralised filters prevents the computation from being amortised over the workers, introducing an additional sequential component to the ren- dering pipeline and a loss of scalability of the system in general. Moreover, communication between workers and master is also increased in cases where rich
5. Rendering as a Service (RaaS) 105
(a) Workers, no padding (b) Centralised or workers with padding
(c) Discontinuity at tile boundaries (d) No discontinuities
Figure 5.9: Artefacts from executing centralised filters at workers without padding tile boundaries. Figure 5.9a, shows the result of applying a blurring filter at the workers without the necessary padding. Figure 5.9c provides a detail view of the seam running down the middle part of the image, where the hori- zontal tile boundaries lie. Figures 5.9b and 5.9d show the corresponding results when running the blur as a centralised filter or as a distributed filter on padded tiles.
buffers are required for filtering, limiting system bandwidth. This is the case with geometry-aware post-processing filters which may require depth and normal buffers, together with the colour buffers, for instance. Conversely, distributed fil-
5. Rendering as a Service (RaaS) 106
ters minimise communication overheads as only final colour values need be sent to the master process. Notwithstanding the constraints over distributed filters and their local nature, a tradeoff can be made when the communication over- heads of a centralised filter are too steep, and execute the filter by the workers. Under the proviso that additional information required by the filter is available in spatially adjacent tiles, the region is enlarged, or padded, to encompass this information, forcing the system to render additional pixels at the borders of a tile. The filter is then applied on the padded tiles, and the padding discarded once the post-processing work is complete. The tile sent back to the master is free of discontinuities and artefacts. Unfortunately, the padding results in com- putation overlap, or overdraw, which grows with the number of tasks per frame; as emphasised, this approach is only viable when the loss in scalability or com- munication costs become excessive. Currently, the only way of determining the performance of one approach over the other is via empirical tests.