Elasticity - High fidelity graphics using unconventional distributed rendering approaches

5.6 Results

5.6.3 Elasticity

The system needs to be elastic enough to respond quickly to user demands when the computation load increases and dispose of unutilised resources when it de- creases. To measure the response of the system with respect to resource provisioning, two tests are carried out. In the first test, two jobs, J0 and J1, are

submitted to the system and their worker allocations are varied using a sinusoidal function of time. The configuration parameters for the two jobs can be seen in Table 5.2, under Elasticity I. The results are shown in Figure 5.19; within ap- proximately 300 seconds, the workers allocated to job J0 scale from 48 up to 64

and down to 32 workers, which is reflected in the frame rate fluctuating between 8 and 12 Hz, while J1’s resources are scaled from 32 to 64 and back to 32, with

the frame rate varying between 15 to 25 Hz. The number of workers W at time

t are given by:

W = 48 + 16 sin 2π(t+φ) 300 , (5.7)

where φ = 0 for J0 and φ = 16 for J1. In the second test (see Figure 5.20), an

equal-split resource partitioning scheme is assumed. Four jobs are admitted into the system with arrival times of 0,100,200 and 400 seconds respectively. The configuration parameters for the four jobs is given in Table 5.2, under Elasticity II. J0 starts with 128 workers; at time t = 100, J1 is started, J0’s allocation is

5. Rendering as a Service (RaaS) 122

Experiment Job Scene Technique Arrival Time

Client Scalability

J0 Sibenikˇ IGI-X 0

J₁ Conference IGI-X 25

J2 Sponza PT 50

Elasticity I J0 Conference IGI-X 0

J1 Sibenikˇ IGI-X 0 Elasticity II J₀ Sibenikˇ Whitted 0 J1 Conference Whitted 100 J2 Conference IGI-X 200 J3 Sibenikˇ Whitted 400

Table 5.2: The table shows the job details for the scalability and elasticity tests. Rendering output was set to a resolution of 512_×512 for all of the scenes listed.

equally split between the two jobs, and so forth with the remaining J2 and J3.

The system responds timely in both scenarios and is able to quickly bind and boot (though object lazy-loading, see §5.4.3) an idle resource to a running job, as well as promptly detach a resource from an active job and move it back to the free pool.

5.7 Discussion

Figure 5.21 shows the parallel efficiency of the system for different resolutions and worker configurations. There is reasonable scalability for up to 64 processors, with an average parallel efficiency of 72% for both resolutions. Given the fixed problem size, it is expected for the parallel efficiency to decrease further as more processors are added. Reducing tile sizes to achieve a finer task granularity increases the relative communication and setup time, which become the dominant time factor. The overhead of synchronisation and the large bandwidth requirements per frame per second constrain the implementation to a maximum frame rate of 30 Hz. Rendering at high-definition (1080p), which nowadays is a common target, would reduce this figure significantly due to the higher bandwidth and computational requirements, especially for higher fidelity techniques. The rendering techniques have been implemented entirely on the CPU; in Figure 5.15b, it can be seen that compared with other works, the implementations are not well-optimised. For instance, Benthin et al. (2003) have recorded frame rates of up to 16 Hz for the Conference room rendered at video resolution (640 _× 480), with 12 VPLs per

5. Rendering as a Service (RaaS) 123 100 ₂₀₀ ₃₀₀ 400 ₅₀₀ ₆₀₀ 700 9 10 11 40 60

Time (in s) Frame Rate (in Hz)

ork

ers

(a) Change in frame rate for job J0 as resource allocation varies sinusoidally with

respect to time. 100 ₂₀₀ ₃₀₀ 400 ₅₀₀ ₆₀₀ 700 15 20 40 60

Time (in s) Frame Rate (in Hz)

ork

ers

(b) Change in frame rate for job J1 as resource allocation varies sinusoidally with

respect to time. 0 100 200 300 400 500 600 700 10 15 20 25 Time (in s) F rame Rate (in Hz) 40 50 60

sinusoidally with respect to time.

Figure 5.19: The resource allocation for two concurrent jobs, J0 andJ1, has been

artificially throttled such that the number of workers assigned to each job varied sinusoidally against time, from 32 to 64. In 5.19a and 5.19b, the frame rate against the allocated number of resources for each job is shown. 5.19c shows the results for both jobs superimposed.

5. Rendering as a Service (RaaS) 124 0 100 200 400 600 800 5 10 15 20 25 30 Time (in s) F rame Rate (in Hz) J0 - ˇSibenik, Whitted J1 - Conference, Whitted J2 - Conference, IGI-X J3 - ˇSibenik, Whitted

Figure 5.20: Effects of even resource distribution on frame rate.

pixel running on 48 cores, while our implementation of IGI runs at 3 Hz and IGI-X at 10 Hz, albeit both use 32 VPLs per pixel. In this case, the generality of the system works against it, because less assumptions can be made on the roles of workers. Benthin et al. (2003) use a well-defined system, where shared memory communication is employed between rendering threads running on the same physical machine. Within RaaS, the individual cores at any given node can be assigned to different jobs, and there is no guarantee that any assigned workers would be running on the same physical machine. Thus, all communication occurs via message passing.

The reprovisioning of resources in figures 5.19 and 5.20 show the system’s ca- pacity for elasticity. Particularly in Figure 5.19, at 145_≤t_≤155, the system has to respond with a second to changes in the resource allocation ofJ₀. Thus, for the particular setup, a guarantee can be given that reprovisioning can occur within a second of a request being issued, provided physical resources are available. Resource migration, which determines how resources are detached from one job and assigned to another, was targeted at interactive systems, and thus, resources are detached at specific checkpoint during synchronisation stage. Hence, once a frame starts being computed, a resource cannot be preempted into idle state. Jobs running with very high-quality settings may degrade into non-interactivity

5. Rendering as a Service (RaaS) 125 8 16 32 48 64 0.7 0.8 0.9 1 Workers Efficiency (%) 512 _× 512 1024 _× 1024

Figure 5.21: Parallel efficiency for different worker configurations.

and increase the time a resource takes to switch to idle. Resources revert to idle state whenever they have been released, even when explicitly released for the purpose of allocation to another job. The release-request cycle is not atomic; thus, a resource released to be allocated to a job, say J₁, may be out run by another job J2, which issues its request, after the release of the resource, but prior

to its allocation to J1. This behaviour is triggered by the way release-request

mechanism works: when a job requires additional resources, a request is made to the Resource Manager. If no available resources are found, a release command may be sent to one or more jobs, to force them to relinquish some resources. The original requester may then try to ask for the resources again. To prevent a job running ahead of another during an allocation, unsatisfied requests could be queued, with the requesting job being informed once the resources are available, via a callback mechanism. This solution could also solve any potential starvation problems that are introduced due to the non-atomic nature of resource allocation. This work was carried out during a period where GPUs were becoming more relevant in the acceleration of high-fidelity rendering; in hindsight, the use of GPUs could have helped reduce computation times and consequently, the number of workers, in turn leading to greater possibilities of batching during both processing and communication. This is especially true in the light of the advances in virtualisation of GPUs and the adoption of GPU cloud computing by services vendors. Furthermore, it is clear that a cost model for the usage of various cloud resources would have given insight on the financial advantages of adopting a cloud service for rendering as opposed to constructing dedicated rendering clusters, if any. Although this fact is acknowledged, the focus of this work was the compu-

5. Rendering as a Service (RaaS) 126

tational aspect of rendering as a service rather, specifically the scalability and elasticity of such a system.

5.8 Summary

This chapter presented RaaS, a novel system for rendering as a service and a first attempt towards a scalable and elastic solution for interactive high-fidelity rendering in the cloud. Multiple reference high-fidelity rendering techniques have been used to measure the performance of the system. Progressive rendering and temporal filtering are used to reduce discontinuities between different levels of solution convergence. The system employs object lazy-loading to ensure fast and efficient resource binding, required for an elastic solution.

RaaS provides a many-to-many connectivity pattern between clients and server resources. It also furnishes resources with rapid elasticity to allow for almost in- stantaneous de-provisioning and reprovisioning to adapt to workload changes. Please refer to Table 9.3 for details.

The results show that even though RaaS addresses but some of the issues that may arise in the realisation of such a system in something as amorphous as the cloud, it is nonetheless a promising first step in the direction of a system for economically providing a step change in the fidelity of rendering solutions achievable to most clients. While the current implementation does not achieve the highest of frame rates it serves to demonstrate the potential of such a method and such levels of interactivity may still be used by the engineer and artist for quasi real-time performance.

CHAPTER 6

Remote Asynchronous Indirect Lighting

(RAIL)

Modelling of global illumination increases the level of realism and immersion in virtual environments. While a large number of methods for computing graphics of higher fidelity have been developed, they typically trade off quality for performance and are incapable of running on all but the machines with the highest specifications. This chapter proposes Remote Asynchronous Indirect Lighting (RAIL), an extension to the rendering as a service paradigm, described in Chap- ter 5, that decouples inexpensive computations, such as the synthesis of the direct lighting component, from the rest of the rendering pipeline, and moves their exe- cution to the client device. The decoupling of different lighting components allows the system to provide an asynchronous computation method that is suitable for low-latency, high-resolution rendering prevalent in simulations, computer games and other real-time applications that require high-fidelity interactive graphics, making it overall faster than the work presented in Chapter 5.

The chapter is structured as follows: §6.1 introduces the chapter and out- lines the respective contributions, §6.2 gives overview of RAIL as a centralised global illumination algorithm, §6.3 discusses the decomposition of RAIL into a distributed rendering pipeline, §6.4 and §6.5 demonstrate results from RAIL fol- lowed by a discussion and §6.6 concludes the chapter.

6.1 Introduction

RAIL has minimal bandwidth and client-side computation requirements. It can achieve client-side updates at 60 Hz or more, at HD and UHD resolutions. A

6. Remote Asynchronous Indirect Lighting (RAIL) 128

component-based approach is used in the computation of the global illumination solution, which is split into direct and diffuse indirect lighting components. The indirect lighting is decoupled from the rest of the rendering and computed re- motely due to its computational expense. Different approaches are used for static and dynamic geometry. In the latter case, a higher-order ambient function is used to coarsely approximate indirect lighting. This service is provided to clients of multi-user environments, thus amortising the cost of computation over all connected consumers of the service. Computation results are stored in an efficient object space representation that does not require the large bandwidths seen in previous work. Another advantage over related work is the resolution agnostic representation of indirect lighting, where increasing client display resolution does not increase bandwidth requirements. The reconstruction of indirect lighting on the client is lightweight and efficient, making this method suitable for any device that supports basic hardware acceleration functionality. The method scales well to many clients connected to the same service, achieving a significant overall boost in performance via amortisation of computation. The contributions of this chapter are:

• a novel fast algorithm for multi-bounce global illumination based on instant radiosity methods

• a scalable asynchronous distributed rendering algorithm with low bandwidth requirements that is resolution-independent and robust to network service fluctuations

• a novel higher order ambient function for approximating diffuse indirect lighting

• the application of RAIL to RaaS for highly interactive, high-fidelity rendering in HD (and higher), over a wide spectrum of devices

6. Remote Asynchronous Indirect Lighting (RAIL) 129

6.2 Method

The rendering method proposed in this work is described by the following se- quence of steps:

1. Generate point cloud for sampling indirect diffuse lighting (offline precom- putation) (§6.2.1)

2. Use light tracing to generate VPLs (§6.2.2)

3. Progressively shade point cloud representation using VPLs (§6.2.2) 4. Reconstruct observer’s view from shaded point cloud (§6.2.2)

• Static geometry is shaded using interpolated irradiance from the point- cloud (§6.2.3)

• Dynamic geometry is shaded using a higher-order ambient function, a coarse approximation of indirect lighting (§6.2.4)

5. Combine reconstructed view with direct lighting and present (§6.2.3) Section 6.2 describes a centralised and synchronous version of the proposed rendering algorithm, elaborating on how a point description of the scene is gener- ated and used to sample the diffuse indirect lighting function on object surfaces and its updating and reconstruction in real-time from these sparse samples. Sub- sequently, in Section 6.3 the asynchronous version of the algorithm is introduced, together with its distribution to a client-server setting which is amenable to the rendering as a service paradigm presented in Chapter 5.

In document High fidelity graphics using unconventional distributed rendering approaches (Page 138-146)