2.6 Adaptivity of the 3D Web
2.6.2 Client Capability Solutions
This section addresses the challenge of letting commodity mobile devices even old and deprecated Personal Digital Assistants (PDAs) render complex and large 3D models. In other words, how the challenge of rendering 3D models on limited graphical capability devices was addressed by researchers?
2.6.2.1 Remote & Mixed Rendering
The simplest definition of “Remote Rendering (RR)” is rendering all the 2D or 3D graphics on one computing machine known as the rendering server while visualising the result or the outcome on other computing machines. All machines are connected through a network. An“interactive remote rendering” system is a system capable of accepting user controls from input devices [367].
2.6. ADAPTIVITY OF THE 3D WEB 69
RR was used extensively for 2D graphics in systems that share remote desktops and applications’ elements (as an example, Virtual Network Computing (VNC) clients such as TightVNC [165]). 3D Remote rendering applications are a lot more complex than their 2D counterparts and usually refresh the screen at a much higher rate [367].
RR was used as a realistic approach for achieving adaptivity of 3D content on the web [334, 385, 435], in the sense of providing a solution for the problem of incapable client devices with limited hardware and software resources. In fact, compared to traditional approaches where the rendering happens locally on the client machine,
the biggest selling point of“Remote Rendering” is the fact that it actually solves the problem of low computational power or graphical capability of“thin client devices”
such as mobile devices since with this technique very complex 3D models could be visualized easily on these types of client devices even on devices with no GPU. The client device in this situation, is used only for remote visualisation acting as a front-end that sends control signals from user input to the server. All the storage and 3D computations happen on the server side [226, 367]. Another advantage is the ability of a remote rendering system to be cross-platform, in the sense that once client programs are developed and deployed, the same quality of 3D rendering could be actualised on all the client systems [367]. In lay terms, this means that RR allows the possibility of mobile devices with very limited capabilities to consume 3D content of a quality similar to that rendered on powerful computers with powerful graphics cards.
Yet another advantage can be mentioned, which is the fact that RR helps preventing, a-priori, piracy of 3D content by malicious users. Remote Rendering provides a good level of copyright protection of 3D models and 3D environments since users do not have the 3D content stored on their machines but can only access the 2D images or video streams rendition of the 3D content. One example of such system is actualized in the ScanView Remote Rendering system of Koller et al. [228] and DeepView from IBM [225].
RR has many prominent disadvantages. First, it stresses considerably on the demand of network bandwidth [367], in addition to the problems of long interaction latency and sporadic images or video frames bitrate [402]. “Interaction Latency” is defined, in the context of RR, as the time it takes for the appearance of the first updated frame on the client device screen after a user interaction request was sent to the renderer server (from per example, input devices like mice, keyboards, touch screens or game
controllers). Long interaction latency leads to bad performance and lower QoE. It is shown in interactive first person shooting games that a latency higher than 100ms is considered intolerable by users [39].
Second, no off-line usage of a 3D model is possible compared to other approaches in which users can interact with it while off-line meaning after the 3D model is completely downloaded or progressively streamed [226].
Third, Remote Rendering is not very beneficial in 3D environments where continuous, dynamic and real-time visualisation is required such as 3D walk-through, virtual environments with avatars and VR applications designed for exploring large archaeological sites [226]. However, this is changing lately with the advent of on-line cloud gaming services which can host games and 3D environments. One prominent example is GamingAnywhere [196], an open source cloud gaming service which sends to the clients, video streams of games of 720p High Definition (HD) quality or even a higher quality, if needed, while achieving a frame rate > 35 FPSs [195].
A fourth disadvantage is that complete remote rendering requires a capable server with a powerful GPU to handle all the processing load and interaction commands from the clients. Scalability and delay can also constitute serious issues especially if the RR system has to serve a big number of clients [367].
Table 2.5 is a table that summarises the pros and cons of the remote rendering approach for adaptivity of 3D Web content.
Table 2.5: Pros and Cons of Remote Rendering
Advantages Disadvantages
Low computational devices
can see 3D Consuming a lot of bandwidth Privacy and copyright protection High Interaction Latency
Same 3D rendering quality No offline usage Need powerful server
In what follows, a cursory exposition of the types of RR found throughout the literature is presented.
2.6. ADAPTIVITY OF THE 3D WEB 71
Complete Remote Rendering
A technique proposed by many researchers such as Lamberti et al. [240], Diepstraten et al. [115], Quax et al. [333] and Doellner et al. [119] which keeps the 3D scenes rendering load on the server while the client plays the role of receiving rendered images. The idea is similar to the one used by VNC viewers.
RR allows devices with limited resources such as smart phones and tablets to render large models. This requires large amount of bandwidth and network resources [335]. According to Räsch et al. [335], latency can be compensated when viewing 3D artefacts in Depth-Image-Based Rendering, and by using and transmitting G-Buffers. The drawback of this approach is that G-Buffers have very high size at least twice as much data as an image. In this work, Räsch et al. propose a method to compress the G-Buffers thus allowing efficient decoding which runs on the GPU directly and suitable for web applications. They used only WebGL for their implementations. WebSockets protocol was used to stream the compressed G-Buffers in the form of binary frames sent to the client device.
Their method applies a bespoke compression method in addition to using the compression methods available in the WebSockets protocol. They call the combined method: Sampling Compression. The result of compression ratios (i.e size of the compressed data relative to the uncompressed) is between 8% and 11%. The authors developed an algorithm to compress G-Buffers, and executed test cases to compare compression rates.
Nevertheless in such approach, bandwidth and latency remain two challenges to address. In addition there is a need for enough computing power to accommodate all the decompression overhead on the client.
The nature of transmitted data between the client and the server in RR varies between being graphics commands, 2D pixels, 2D primitives, 3D vectors, single 3D objects or multiple 3D objects.
Pixels For this type of data, the rendering has to happen on the server. The server renders the 3D scene as 2D pixels, and send them to the client. These pixels could be encoded differently for the sake of the client consumption. Lamberti et al. [240] per example, proposes an accelerated remote graphics architecture for PDAs by having a cluster of PCs do the rendering of parts of the images that are sent to
the PDAs. On the PDA side, the user can interact with the 3D scene rendition. They encoded the 2D rendered images as a video stream sent to the client. Aranha et al. [21] for example have used only 2D still images. In other types of systems such as that of Simoens et al. [371], a hybrid approach in complete remote rendering is used. If the 3D content is static, 2D pixels are sent to the client. If 3D scene content has many animations or motion, h.264 video is sent to the client. This would minimise the Latency constraint especially with animated 3D content.
Primitives or Vectors Other scholars have proposed sending primitives or vectors to the client. The client in this situation plays a certain role in rendering. Feature extraction techniques are utilised in order to extract vectors from the scene and then send them to the client device to render them either in 2D akin to the system developed by Diepstraten et al. [115] which generates a simplified model from a group of line primitives for clients that have no graphical capability at all; or to be rendered in 3D akin to the system of Quillet et al. [334], which uses this type of rendering approach for remote visualisation of large city 3D models on the web.
Graphics Commands Streaming One of the techniques of RR employs the use of graphics commands as data interchange between the server and the client. Low level draw calls or operations between the server application and its own graphics card are intercepted and sent to be rendered on the client [198, 286].
Partial, Mixed and Combined Approaches
In partial or mixed rendering, part of the rendering happens on the client side while other rendering processes occur on the server side. In particular, the scene rendering remains on the client side but processes such as baking which computes light-maps, textures, lights, and reflections are computed on the server side.
A pertinent example of such system can be seen in Spini et al. [385]. In Spini’s approach, the baking service is on the server implemented as Node JS and Express Web server which receives the scene from the client as a JSON file and Three JS for the authoring and exploration tools. The authors introduced a novel 3D work- flow designed to address the need of VR users. Their work-flow incorporates the authoring and production of quasi photo-realistic 3D content. They use a mixed on demand remote rendering approach, which is different than traditional approaches
2.6. ADAPTIVITY OF THE 3D WEB 73
as the scene rendering remains on the client-side instead of the server. Geometry data are stored in binary buffer files and the textures and image maps are presented as base64 URLs in the JSON file.
As a side effect to the baking service on the server, the strain on bandwidth demand remains an important issue in Spini et al. [385] approach.
Another adaptivity approach of interest for CH and 3D Web is that of Frame by Frame View Dependent Rendering. This stems from the fact that such technique aims to render high resolution 3D digital heritage models in web browsers.
Frame by Frame View Dependent Rendering is a method utilised by 3DHOP [330– 332] tool, which is built on top of the Nexus Multi-Resolution meshes engine [329]. The system is dedicated to view very high resolution 3D meshes and point clouds (above the 50M faces) on the web. It is targeted for specialists who require to view such high resolution models for scientific inquiry (i.e. specialists that need high zooming capabilities on rendered frames of hotspots to detect minute features in artefacts). It can also be used in kiosks with large interactive displays in-situ inside a physical museum [332].
The 3DHOP system is not a universal platform that can support any visual project but specifically designed to achieve the needs mentioned above. It can not manage 3D complex scenes with many 3D objects. It is based on using Progressive Multi- resolution meshes of the format that is only specific to the Nexus framework (.nxs). It can take a large mono-resolution file format in .ply extension and then through a set of many processes, transform that mono-resolution file into a Multi-resolution progressive .nxs file (which can be further compressed if the resulting file is massive in size) [358]. Concerning the WebGL viewer, 3DHOP based their implementation on using SpiderGL [113], although recently support was added for the Three JS Library [116].
3DHOP on top of the Nexus system is a visualisation technique based on view dependent rendering. View dependent means the chunk rendered depends on the observer position view, the orientation of the user camera and the distance from the 3D model. In addition, the technique uses multi-resolution progressive mesh transmission. It solves the problem of limited graphical capabilities of the client devices due to the type of rendering used. The assumption of the system is based on it not being targeted for casual WBVM users since perception of fidelity of 3D models above certain thresholds could not be noticed any more even with extreme
zooming on commodity devices screens. The tool is more targeted for specialists for whom the visualisation of extremely high fidelity models is an aim in itself.
Many problems still need fine-tuning in frame by frame view dependent rendering. Despite the fact that the small“observed” chunk of the 3D mesh is the only part being streamed to the client device, the same part or chunk when viewed again could be totally resent or streamed to the client especially if the receiving web browser disabled caching thus exaggerating the demand on bandwidth in such cases. In addition, the quality of presentation or fidelity is not good as many parts of the 3D mesh get stuck on certain LODs especially on LODs of low resolution. In addition, view dependant rendering systems have a serious limitation of increased runtime load since they aim to choose and extract a certain resolution at runtime which is a CPU-intensive operation. This also leads to low frame rate [253].
It is pertinent to mention that, at the time of writing, all the major on-line social 3D repositories such as Sketchfab, Google Poly and Microsoft remix3d only support mono-resolution formats such as Wavefront OBJ, Collada, gLTF, ply, vel cetera. These platforms do not support multi-resolution meshes of any kind yet.
2.6.2.2 Divers 3D Mesh Optimisations
Gobbetti et al. [166] introduced a remote rendering approach to convert large input textured geometric meshes into multi-resolution representations of quad patches of equal sizes while preserving the information necessary about vertices, colours, and normals, and storing them in an image format. They optimised the rendering of the meshes by using mip-map operations and tightly packed texture atlas to enable multi-resolution.
The following section surveys few semantic ontologies pertaining to digital heritage and 3D modelling. This section is relevant to the work reported in Chapter 6.