• No results found

Optimizing AAA Games for Mobile Platforms

N/A
N/A
Protected

Academic year: 2021

Share "Optimizing AAA Games for Mobile Platforms"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Optimizing AAA Games for Mobile Platforms

Niklas Smedberg

Senior Engine Programmer, Epic Games

(2)

Who Am I

• A.k.a. “Smedis”

• Epic Games, Unreal Engine

• 15 years in the industry

• 30 years of programming

• C64 demo scene

(3)
(4)

Mobile GPU Architecture

• Traditional GPU

– NVIDIA Tegra

• Tile-based GPU

– Very different from desktop or consoles

– Common on smartphones and tablets

– ARM Mali GPUs fall into this category

(5)

Tile-Based Mobile GPU

TL;DR Summary

• Splits the screen into tiles

– E.g. 16x16 pixels

• The whole tile fits in GPU registers

• Process all drawcalls for one tile

– Repeat for each tile to fill the screen

• Each tile is written to RAM as it finishes

(6)

ARM Mali Rendering Process

Software Command

Buffer

Vertex Processing

Tiling Parameter

Buffer

Pixel Processing

Frame Buffer

(7)

Render Target On Chip

• No bandwidth cost for drawing or alpha-blend

• Cheap depth/stencil testing

• MSAA is cheap

– Have seen 0-5 ms cost for MSAA

– Be wary of buffer restores (color or depth) – Note: Depth textures doesn’t work with MSAA

(8)

Mali 400 vs 600

• Mali 400 precision

– 16-bit floating point in pixelshader (one size fits all) – 24-bit interpolators

• Shader units

– Mali 400: Separate vertex and pixel hardware – Mali 6xx: Unified shader unit

• Alpha-blending

– Mali 400: Separate specialized hardware – Mali 6xx: Can be done in shader

(9)

Architecture Caveats

• Extremely expensive rendertarget switches

• No “free” hidden surface removal

– Not like on ImgTec-based devices – Sort drawcalls front-to-back

– Big occluders first, skybox last

• No texture compression for RGBA textures

– Use uncompressed RGBA or two compressed RGB textures – ASTC will save the day

• OpenGL ES 3.0 not available on Android and iOS

(10)

Rendering Techniques

• How to take advantage of this GPU?

(11)

Render Buffer Management (1 of 2)

• Each render target is a whole new scene

• Avoid switching render target back and forth!

• Can cause a full restore:

– Copies full color/depth/stencil from RAM into Tile Memory at the beginning of the scene

• Can cause a full resolve:

– Copies full color/depth/stencil from Tile Memory into RAM at the end of the scene

(12)

Render Buffer Management (2 of 2)

• Avoid buffer restore

– Clear everything! Color/depth/stencil

– A clear just sets some dirty bits in a register

• Avoid buffer resolve

– Use discard extension (GL_EXT_discard_framebuffer) – See usage case for shadows

• Avoid unnecessarily different FBO combos

– Don’t let the driver think it needs to start resolving and restoring any buffers!

(13)

Mobile Materials

• Full Unreal Engine materials are too complicated

(14)

Mobile Materials

• Initial idea: Pre-render into a single texture

(15)

Mobile Materials

• Current solution:

– Pre-render components into separate textures

– Add mobile-specific settings

– Feature support driven by artists

(16)

Mobile Shaders

• One hand-written ubershader

– Lots of #ifdef for all features

– Exposed as fixed settings in the artist UI – Checkboxes, lists, values, etc

(17)

Material Example: Rim Lighting

(18)

Material Example: Vertex Animation

(19)

Shader Offline Processing

• Run C pre-processor offline

– Reduces in-game compile time

– Eliminates duplicates at off-line time

(20)

Shader Compiling

• Compile all shaders at startup

– Avoids hitching at run-time

– Compile on the GL thread, while loading on Game thread

• Compiling is not enough

– Must issue dummy drawcalls!

– Some states affect shaders! (But not as much on ARM GPUs) – May need experimenting to avoid shader patching

E.g. alpha-blend states, color write masks

(21)

God Rays

(22)

God Rays

• Ported from Xbox 360 to OpenGL ES 2.0

• Moved all math to vertex shader

• Pass down data through interpolators

– But, now we’re out of interpolators 

• Split radial filter into 4 draw calls

– 4 x 8 = 32 texture lookups total (equiv. 256)

• Went from 30 ms to 5 ms

(23)

Original

Shader

(24)

Mobile

Shader

(25)

God Rays

• Original Scene

• No God Rays

(26)

1st Pass

• Downsample Scene

Identify pixels

RGB: Scene color A: Occlusion factor

• Resolve to texture:

“Unblurred source”

(27)

2nd Pass

• Average 8 lookups

From “Unblurred source”

1st quarter vector

Uses 8 .xy interpolators

• Opaque draw call

(28)

3rd Pass

• Average +8 lookups

From “Unblurred source”

2nd quarter vector

Uses 8 .xy interpolators

• Additive draw call

• Resolve to texture:

“Blurred source”

(29)

4th Pass

• Average 8 lookups

From “Blurred source”

1st half vector

Uses 8 .xy interpolators

• Opaque draw call

(30)

5th Pass

• Average +8 lookups

From “Blurred source”

2nd half vector

Uses 8 .xy interpolators

• Additive draw call

• Resolve final result

(31)

6th Pass

• Clear the final buffer

Avoids buffer restore

• Opaque fullscreen

• Screenblend apply

Blend in pixelshader

(32)

Character

Shadows

(33)

Character Shadows

• Ported one type of shadows from Xbox:

– Projected, modulated dynamic shadows

• Fairly standard method

– Generate shadow depth buffer – Stencil potential pixels

– Compare shadow depth and scene depth – Darken affected pixels

(34)

Character Shadows

1. Project character depth from light view

(35)

Character Shadows

2. Reproject into camera view

(36)

Character Shadows

3. Compare with SceneDepth and modulate

(37)

Character Shadows

4. Draw character on top (no self-shadow)

(38)

Shadow Optimizations (1 of 2)

• Shadow depth first in the frame

– Avoids a rendertarget switch (resolve & restore!)

• Resolve SceneDepth just before shadows*

– Write out tile depth to RAM to read as texture – Keep rendering in the same tile

– Unfortunately no API for this in OpenGL ES

(39)

Shadow Optimizations (2 of 2)

• Optimize color buffer usage for shadow

– We only need the depth buffer!

– Unnecessary buffer, but required in OpenGL ES – Clear (avoid restore) and disable color writes – Use glDiscardFrameBuffer() to avoid resolve

– Could encode depth in F16 / RGBA8 color instead

• Draw screen-space quad instead of cube

– Avoids a dependent texture lookup

(40)

Questions?

References

Related documents

Note that if the joint owner is a surviving spouse, the proportionate share of the account would be transferred at ACB unless an election was made in the deceased’s terminal return

Detailing your experience more things to avoid in a personal statement topic is one thing you have written draft. Returned on one to avoid in a personal statement over your

Figure 55 Vendôme Column, Place Vendôme, Paris Photograph by Gwilym Wren, February

If you are a small business owner that is thinking about upgrading your current phone system because it’s outdated, you’re moving offices or because you just want to see if you

Understanding agricultural innovation processes and recognizing the potential for catalysing them is crucial for many countries in sub-Saharan Africa (SSA),

The purpose of this chapter was to investigate parents' awareness and perspectives about the California High School Exit Exam (CAHSEE) and to explore some of the social

HID (Human Interface Device) Stack Interface Kernel mode User mode Applications Software Key-Logger Software Key-Logger Hardware Key-Logger Software Key-Logger.

Technical Housing Assessment Report: Examining Mold and Moisture Conditions of Homes at the Ho-Chunk Nation..