Post-processing Filters

5.5 Rendering in RaaS

5.5.3 Post-processing Filters

RaaS provides a number of post-processing techniques with the aim of improving the quality of the final result. Generally these filters either have system-wide application and can be applied to synthesised frames irrespective of which rendering algorithm was used to generate them (e.g. tone mapping), or are tied to a

5. Rendering as a Service (RaaS) 103

. . .

_t

₃

_t

₅

_t

₆

_t

₉

. . .

. . . c

c

. . .

d

r

decompress

worker

. . .

c

Figure 5: Asynchronous decompression for image tiles received from worker processes.

high-definition image may well occupy 6 MB of memory

(1920_⇥1080, 24-bit colour); the bandwidth requirements

for interactive rendering are very demanding on the inter-

connect infrastructure since within a single second, mul-

tiple images (frames) must be transferred. This is fur-

ther exacerbated by techniques that require passing rich

bu↵ers (normals, direct and indirect lighting contributions,

albedo, etc) between workers and Task Coordinator. In or-

der to alleviate the e↵ects of communication overheads and

bandwidth limitations, all results sent back to the master

are compressed using a lossless scheme.

In a straightforward approach, the master would de-

compress the results as soon as they come in, but this

would introduce a bottleneck, forcing the workers to wait

for their next task more than is necessary. Moreover, if

the workers return all at once, performance would degrade

further due to the contention introduced at the master.

To minimise worker delays, a slightly di↵erent approach

was taken, where decompression of results was decoupled

from the receiving thread. Results are received into a cir-

cular bu↵er and a decompression thread asynchronously

expands and orders them into a frame bu↵er (Fig. 5).

This decoupling allows the master to respond to workers’

requests for work more quickly.

5.3. Post-processing and filtering

RaaS provides a number of post-processing and filter-

ing techniques with the aim of improving perceived output

quality. These are categorised in two classes: those with

system-wide application, which can be applied to synthe-

sised frames irrespective of the rendering algorithm used

to generate them (e.g., tone mapping), and those which

are closely tied to a specific rendering algorithm (e.g., dis-

continuity bu↵er). Either class is applied to a frame before

it is presented to the user.

5.3.1. Tone mapping

Tone mapping is performed on resultant frames, to fa-

cilitate the transformation of high dynamic colour ranges

into 8-bit RGB triplets. The system supports a number of

techniques of varying quality and complexity [27, 10, 9].

Client Client-side filters Clientm Accumulation and temporal filtering Clientk No filtering Task Coordinatorm No filtering Task Coordinator Centralised filters Task Coordinatork Accumulation and temporal filtering Worker1..n Distributed filters Worker1..n

Tone mapping _Worker₁_..n

Discontinuity bu↵er and tone

mapping client-side

server-side

Figure 6: Left shows generic post-processing filter configuration; right shows two scenarios where filters are applied at di↵erent stages.

The amenability of a technique to parallelisation is an im-

portant factor in deciding where it should be applied (i.e.,

at what stage and by whom, Fig. 6

server-side). Thus, if

a technique does not require access to regions of the image

besides what is already available at a worker (e.g. global

sigmoid operator), it can be applied using a

distributed fil-

ter

before the region is compressed and sent back to the

Task Coordinator; otherwise it has to be applied at the

Task Coordinator using a

centralised filter.

5.3.2. Accumulation and temporal filtering

Interactive rendering imposes demands on frame gen-

eration times that limit the quality of global illumination

solutions. Accumulation is used in conjunction with a form

of progressive rendering to amortize the cost of computing

a high quality solution over a number of frames, when the

scene and observer are static. When the state of the scene

or observer changes, temporal filtering is used to minimise

artefacts between frames. The contributions from both

components are combined via a weighting function which

favours the temporal contribution when the observer or

scene is changing and shifts to the accumulation contribu-

tion otherwise.

5.3.3. Client-side post-processing

Computations which cannot be carried out by workers

translate into a sequential component which limits scal-

ability. Thus, the kind of computations carried out at

the Task Coordinator are limited and performed centrally

only if the costs of distributing them would outweigh the

benefits of parallel computation. The penalty incurred

by such computations is especially evident at high frame

rates, where a sequential computation time of 100 ms is

enough to limit the maximum frame rate to 10 Hz. To

overcome this potential bottleneck, computations which

do not incur additional communication overheads can be

o✏oaded onto the client, provided the latter has enough

8

Figure 5.7: Three post-processing filter configurations are shown; the left configuration is generic, specifying which kind of filter goes where. For instance, distributed filters can execute at a worker process, while centralised filters execute on the Task Coordinator. Centralised filters may also execute at the client, but whether this is feasible or not depends on the communication required to sat- isfy the data dependencies of the filter. The configurations on the right exemplify two typical scenarios where filters are applied at different stages.

specific rendering algorithm (e.g. discontinuity buffer for interleaved sampling). Either way, filters are applied to a frame before it is presented to the user. In the framework, the application of post-processing filters is categorised into two classes: distributed and centralised (Figure. 5.7). Distributed filters are usually local filters and do not require a complete view of the image or additional associated buffers. Workers can apply distributed filters on their assigned tile, after radiance has been computed for the region. The tiles are then assembled at the Task Coordinator. On the other hand, centralised filters require information that is not entirely available to any single worker. For instance, in the case of the application of a spatial filter such as box blur, the filter iterates through each pixel in the image and replaces its colour with the average of the neighbouring pixels. The neighbourhood centred about each processed pixel is called the window, or

5. Rendering as a Service (RaaS) 104

Complete V iew

Figure 5.8: Centralised filters require more information than what is available at any individual worker. Image blurring is a typical example of a centralised filter; here a kernel is convolved with the image to achieve the blur effect and cannot be applied as a distributed filter at the individual workers because it lacks a complete view of the image. In cases where the communication overhead associated with aggregating the required data at a Task Coordinator is overly high, the regions marked by tiles can be padded to encompass any required additional information from adjacent tiles, albeit at the cost of duplicating computation.

boundaries of each tile, thus failing to account for logically adjacent pixels that would have fallen within the neighbourhood, had the entire image been processed instead. This introduces artefacts and discontinuities at tile boundaries when the tiles are composited into the final image, as shown in Figure 5.9.

The use of centralised filters prevents the computation from being amortised over the workers, introducing an additional sequential component to the rendering pipeline and a loss of scalability of the system in general. Moreover, communication between workers and master is also increased in cases where rich

5. Rendering as a Service (RaaS) 105

(a) Workers, no padding (b) Centralised or workers with padding

Figure 5.9: Artefacts from executing centralised filters at workers without padding tile boundaries. Figure 5.9a, shows the result of applying a blurring filter at the workers without the necessary padding. Figure 5.9c provides a detail view of the seam running down the middle part of the image, where the hori- zontal tile boundaries lie. Figures 5.9b and 5.9d show the corresponding results when running the blur as a centralised filter or as a distributed filter on padded tiles.

buffers are required for filtering, limiting system bandwidth. This is the case with geometry-aware post-processing filters which may require depth and normal buffers, together with the colour buffers, for instance. Conversely, distributed fil-

5. Rendering as a Service (RaaS) 106

ters minimise communication overheads as only final colour values need be sent to the master process. Notwithstanding the constraints over distributed filters and their local nature, a tradeoff can be made when the communication overheads of a centralised filter are too steep, and execute the filter by the workers. Under the proviso that additional information required by the filter is available in spatially adjacent tiles, the region is enlarged, or padded, to encompass this information, forcing the system to render additional pixels at the borders of a tile. The filter is then applied on the padded tiles, and the padding discarded once the post-processing work is complete. The tile sent back to the master is free of discontinuities and artefacts. Unfortunately, the padding results in computation overlap, or overdraw, which grows with the number of tasks per frame; as emphasised, this approach is only viable when the loss in scalability or communication costs become excessive. Currently, the only way of determining the performance of one approach over the other is via empirical tests.

In document High fidelity graphics using unconventional distributed rendering approaches (Page 119-123)

5.5 Rendering in RaaS

5.5.3 Post-processing Filters

. . .

t

t

t

t

. . .

. . . c

c

c

. . .

d

r

decompress

worker

. . .

. . .

c

high-definition image may well occupy 6 MB of memory

(1920⇥1080, 24-bit colour); the bandwidth requirements

for interactive rendering are very demanding on the inter-

connect infrastructure since within a single second, mul-

tiple images (frames) must be transferred. This is fur-

ther exacerbated by techniques that require passing rich

bu↵ers (normals, direct and indirect lighting contributions,

albedo, etc) between workers and Task Coordinator. In or-

der to alleviate the e↵ects of communication overheads and

bandwidth limitations, all results sent back to the master

are compressed using a lossless scheme.

In a straightforward approach, the master would de-

compress the results as soon as they come in, but this

would introduce a bottleneck, forcing the workers to wait

for their next task more than is necessary. Moreover, if

the workers return all at once, performance would degrade

further due to the contention introduced at the master.

To minimise worker delays, a slightly di↵erent approach

was taken, where decompression of results was decoupled

from the receiving thread. Results are received into a cir-

cular bu↵er and a decompression thread asynchronously

expands and orders them into a frame bu↵er (Fig. 5).

This decoupling allows the master to respond to workers’

requests for work more quickly.

5.3. Post-processing and filtering

RaaS provides a number of post-processing and filter-

ing techniques with the aim of improving perceived output

quality. These are categorised in two classes: those with

system-wide application, which can be applied to synthe-

sised frames irrespective of the rendering algorithm used

to generate them (e.g., tone mapping), and those which

are closely tied to a specific rendering algorithm (e.g., dis-

continuity bu↵er). Either class is applied to a frame before

it is presented to the user.

5.3.1. Tone mapping

Tone mapping is performed on resultant frames, to fa-

cilitate the transformation of high dynamic colour ranges

into 8-bit RGB triplets. The system supports a number of

techniques of varying quality and complexity [27, 10, 9].

The amenability of a technique to parallelisation is an im-

portant factor in deciding where it should be applied (i.e.,

at what stage and by whom, Fig. 6

server-side). Thus, if

a technique does not require access to regions of the image

besides what is already available at a worker (e.g. global

sigmoid operator), it can be applied using a

distributed fil-

ter

before the region is compressed and sent back to the

Task Coordinator; otherwise it has to be applied at the

Task Coordinator using a

centralised filter.

5.3.2. Accumulation and temporal filtering

Interactive rendering imposes demands on frame gen-

eration times that limit the quality of global illumination

solutions. Accumulation is used in conjunction with a form

of progressive rendering to amortize the cost of computing

a high quality solution over a number of frames, when the

scene and observer are static. When the state of the scene

or observer changes, temporal filtering is used to minimise

_t

_t

_t

_t

(1920_⇥1080, 24-bit colour); the bandwidth requirements