HOW TO VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS. Peter Messmer, NVIDIA

(1)

HOW TO VISUALIZE YOUR

GPU-ACCELERATED SIMULATION RESULTS

(2)

RANGE OF ANALYSIS AND VIZ TASKS

 Analysis: Focus quantitative

 Visualization: Focus qualitative

(3)

TRADITIONAL HPC WORKFLOW

Workstation Viz Cluster Supercomputer File System Setup Dump, Checkpointing Visualization, Analysis Analysis, Visualization

(4)

TRADITIONAL WORKFLOW: CHALLENGES

Workstation Viz Cluster Supercomputer File System Setup Dump, Checkpointing Visualization, Analysis Analysis, Visualization Lack of interactivity prevents “intuition”

I/O becomes main simulation bottleneck

Viz resources need

to scale with simulation High-end viz

neglected due to workflow complexity

(5)

OUTLINE

 Visualization applications  CUDA/OpenGL interop  Remote viz  Parallel viz  In-Situ viz

High-level overview. Some parts platform dependent. Check with your sysadmin.

(6)

(7)

NON-REPRESENTATIVE VIZ TOOLS SURVEY

OF 25 HPC SITES

Surveyed sites:

(8)

NON-REPRESENTATIVE VIZ TOOLS SURVEY

OF 25 HPC SITES

Surveyed sites:

(9)

VISIT

 Scalar, vector and tensor field data features

— Plots: contour, curve, mesh, pseudo-color, volume,.. — Operators: slice, iso-surface, threshold, binning,..

 Quantitative and qualitative analysis/vis

— Derived fields, dimension reduction, line-outs — Pick & query

 Scalable architecture

 Open source

(10)



Cross-platform

— Linux/Unix, OSX, Windows



Wide range of data formats

— .vtk, .netcdf, .hdf5,..



Extensible

— Plugin architecture



Embeddable



Python scriptable

VISIT

(11)

VISIT’S SCALABLE ARCHITECTURE

 Client-server architecture

 Server MPI parallel

 Distributed filtering

 (multi-)GPU accelerated, parallel rendering*

(12)

PARAVIEW

 Scalar, vector and tensor field data features

— Plots: contour, curve, mesh, pseudocolor, volume,.. — Operators: slice, iso-surface, threshold, binning,..

 Quantitative and qualitative analysis/vis

— Derived fields, dimension reduction, line-outs — Pick & query

 Scalable architecture

 Developed by Kitware, open source

(13)

PARAVIEW’S SCALABLE ARCHITECTURE

 Client-server-server architecture

 Server MPI parallel

 Distributed filtering

 GPU accelerated, parallel rendering*

(14)

SOME OTHER TOOLS

 Wide range of visualization tools

 Often emerged from specialized application domain — Tecplot, EnSight: structural analysis, CFD

— IndeX: seismic data processing & visualization — IDL: image processing

 Early adopters of visual programming

(15)

VISUALIZATION TOOLKIT

 Focus on visualization, not (only) rendering

 Provides more complex operations on data (“filtering”)

 Introduces visualization pipeline

 At the core of many high-level viz tools

(18)

VTK PIPELINE

Source

Filter

Mapper

Renderer

Raw data, shapes

Transform raw data

Map data to geometry

(19)

OpenGL

OPENGL

(20)

OPENGL: API FOR GPU ACCELERATED

RENDERING

• Primitives: points, lines, polygons

• Properties: colors, lighting, textures, ..

• View: camera position and perspective

• Shaders: Rendering to screen/framebuffer

• C-style functions, enums

See e.g. “What Every CUDA Programmer Should Know About OpenGL”

(21)

A SIMPLE OPENGL EXAMPLE

glColor3f(1.0f,0,0);

glBegin(GL_QUADS);

glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner

glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner

glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner

glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner

glEnd(); glFlush(); State-based API (sticky attributes) Drawing Render to screen

(22)

glColor3f(1.0f,0,0); glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); glVertex3f(-1.0f, 1.0f, 0.0f); glVertex3f(1.0f, 1.0f, 0.0f); glVertex3f(1.0f, -1.0f, 0.0f); glEnd(); glFlush(); float* vert={-1.0f, -1.0f, ..}; float* d_vert; cudaMalloc(&d_vert, n); cudaMemcpy(d_vert, vert, n, cudaMemcpyHostToDevice); renderQuad<<<N/128, N>>>(d_vert); flushToScreen<<<..>>>();

?

(23)

CUDA-OPENGL INTEROP: MAPPING MEMORY

• OpenGL: Opaque data buffer object

• Virtex Buffer Object (VBO)

• User has very limited control

• CUDA: C-style memory management

• User has full control

• CUDA-OpenGL Interop:

(24)

RENDERING FROM A VBO

Create VBO Initialize VBO Populate Render

init

()

display()

Display

(25)

RENDERING FROM A VBO WITH CUDA

INTEROP

Create VBO Initialize VBO Populate Render

Register VOB with CUDA

Map VOB to CUDA

Unmap VBO from CUDA

init

()

display()

(26)

MAIN OPENGL-CUDA INTEROP ROUTINES

 Register VBO for CUDA

cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags);

 Mapping of VBO to CUDA

cudaGraphicsMapResources(1, cuda_vbo, 0);

cudaGraphicsResourceGetMappedPointer(&d_ptr, size, *cuda_vbo);

 Unmapping VBO from CUDA

(27)

CAN ALL GPUS SUPPORT OPENGL?

 GeForce : standard feature set including OpenGL 4.3/4.4

 Quadro : + certain highly accelerated features (e.g. CAD)

 Tesla: If GPU is in “All on” operation mode*

nvidia-smi –-query-gpu=gom.current nvidia-smi –q

*requires “m” or “X” class devices

Graphics capabilities disabled

(28)

OPENGL SUMMARY

+ Relatively simple for basic viz

+ Existing/fixed/simple rendering pipeline

- Low level

- Triangles, points, rather than isosurfaces, height-fields

- (currently) depends on Xserver for context creation - What about remote, parallel etc?

(29)

MORE OPENGL INFORMATION AT GTC

S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A

S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E

S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C

S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web, 3/25, 13:30, LL21C

S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C

(30)

(31)

OPENGL CONTEXT

 State of an OpenGL instance

— Incl. viewable surface

— Interface to windowing system

 Context creation: platform specific

— Not part of OpenGL

— Handled by Xserver in Linux/Unix-like systems

(32)

LOCAL RENDERING

Application libGL Xlib GLX X11 Ev ents Comm and s Driver

GPU, monitor attached 2D/3D X Server

(33)

X-FORWARDING: THE SIMPLEST FORM OF

“REMOTE” RENDERING

Application libGL Xlib OpenGL/GLX X11 Events X11 Commands Driver

GPU, monitor attached 2D/3D X Server

On remote system:

export DISPLAY=59.151.136.110:0.0

(34)

X-FORWARDING

+ Simple! + ssh –X

+ No need to run X on remote machine

- Lots of data crosses the network

- All rendering performed on the local Xserver

- Not useful for visualization of remote CUDA Interop apps (X-server doesn’t “see” remote GPU)

(35)

SERVER-SIDE RENDERING + REMOTE VIZ

APPLICATION

Driver GPU 2D/3D X Server Application libGL Xlib GLX OpenGL X11 Events Cmds X11 Driver

GPU, monitor attached

Network

(36)

SERVER-SIDE RENDERING + SCRAPING

Driver GPU 2D/3D X Server Application libGL Xlib GLX OpenGL X11 Events Cmds X11 Driver

Network Sc rape r Images Client

(37)

SERVER-SIDE RENDERING + SCRAPING

+ Full GPU acceleration + No X server on client

Question: when to scrape?

— Xserver not informed about direct rendering — Intercept glxSwapBuffers()

- Not multi-user

(38)

GLX FORKING: OUT-OF-PROCESS

Driver GPU 3D X Server Application libGL Xlib OpenGL Images Images Driver

GPU, monitor attached Client App Network X11 Events _CmdsX11 Proxy X Server OpenGL/ GLX GLX

(39)

GLX FORKING: OUT-OF-PROCESS

+ Full GPU acceleration + No X server on client + Multi-User

- All traffic through Proxy X server

(40)

GLX FORKING WITH INTERPOSER LIBRARY

Driver GPU

3D X Server Application

libGL VirtualGL Xlib

VirtualGL client

GLX

OpenGL Images Images

X11 Events

X11 Commands VGL transport

(compressed)

Driver

GPU, monitor attached 2D X Server

(41)

GLX FORKING WITH INTERPOSER LIBRARY

Driver GPU

3D X Server Application

libGL VirtualGL Xlib

GLX OpenGL Images

Driver

Network X11 Events X11 Cmds Proxy X Server Images Client App

(42)

VIRTUAL GL + TURBOVNC

+ Compressed image transport + Transparent to the application + Fully GPU accelerated OpenGL

+ Client with or without Xserver

- Requires Xserver to access GPU

(43)

HOW TO SET UP VIRTUAL GL + TURBOVNC

 Requires Xserver running on server

— Root privileges for Xserver

 Requires installation of VirtualGL, TurboVNC on server

 Start VirtualGL-accelerated VNC server

vncserver :3

 TurboVNC viewer on client

(44)

(45)

CONNECTING TO REMOTE VNC SERVER VIA

A GATEWAY

 Establish tunnel on client

ssh –L 3333:node01.cluster.net:5903 login.cluster.net

 Connect client to localhost:3333

Gateway

node01.cluster.net login.cluster.net client

(46)

LAUNCHING REMOTE CUDA/OPENGL

APPLICATIONS VIA VGLRUN

vglrun :3 glxgears export DISPLAY=:3 vglrun simpleGL

(47)

REMOTE VIZ IN PARAVIEW/VISIT

 Approach 1: VirtualGL and VNC

export DISPLAY=:3 vglrun paraview

 Approach 2: Local Client

— On remote server:

pvserver

On local workstation:

paraview

Paraview Client & Server under VirtualGL

(48)

REMOTE VISUALIZATION SUMMARY

 Multiple approaches to remote rendering

— X forwarding, remote viz app, scraping, interposer process/library

 Currently requires X server to generate context

(49)

Parallel Viz

PARALLEL VISUALIZATION

(50)

PARALLEL VISUALIZATION

 Parallelism at multiple levels

— Filtering — Rendering

- Both supported by VisIt & Paraview

- Heavy lifting already done!

- Challenge: Setup in parallel environment

- Both tools provide support for most common cases

(51)

BASIC STEPS TO PARALLEL VISUALIZATION

 Launch visualization server processes

— Most likely through queuing system

 Connect client to head node

 Tunnel from workstation

(52)

DOMAIN DECOMPOSITION OF DATA

- Visualization algorithms can work on decomposed data

- May require ghost cells

- May lead to load imbalance No ghost cells required

(53)

PARALLEL COMPOSITING

- Distributed geometry - Render with depth

information

- Composition using IceT

(54)

PARALLEL VISUALIZATION

 Particularly important for large datasets

 Visualization time often determined by filtering

 Rendering important if

— Highly complex visualization — Complex visualization effects — Low-power CPU

(55)

(56)

BENEFIT OF IN-SITU VISUALIZATION

 Pipeline simulation cycle

— Visualization/analysis integral part of simulation

 Immediate feedback

 Reduce pressure on file system

(57)

(58)

(59)

BUILD VISUALIZATION PIPELINE FOR

RUNNING APPLICATION

(60)

BASIC INTERACTION/STEERING WITH

RUNNING APPLICATION

(61)

INTERACTIVE VIZ APPLICATION WITH

LIBSIM

 User implements callbacks for VisIt

— meta-data, mesh, variables and domains

 GetData: VisIt requests data for visualization

— Work directly on simulation data — Transfer data from GPU

 Command server: Interaction VisIt front-end <-> simulation

— “Steering”

http://www.visitusers.org/index.php?title=VisIt-tutorial-in-situ

(62)

The Future

THE FUTURE

(63)

VISUALIZATION DATA FLOW

CPU

GPU

Simulation

(64)

VISUALIZATION WITH GPU ACCELERATED

FILTER

CPU

GPU

Simulation Filtering Rendering Filtering

(65)

VISUALIZATION WITH GPU ACCELERATED

FILTER AND HARDWARE RENDERING

CPU

GPU

Simulation Filtering Rendering Filtering

(66)

FULLY GPU ACCELERATED VISUALIZATION

CPU

GPU

(67)

SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT

SCALE (2011-2016)

 Management, Analysis, Visualization

— In-situ analysis, indexing/compression — I/O-, Viz frameworks

— Supporting application teams

 SDAV tools deployment: Paraview, VisIt, IceT, ..

(68)

SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT

SCALE (2011-2016)

 PISTON (LANL): Data Parallel Visualization Operators

— Isosurface, cut, threshold — Built on top of Thrust

— Support of most Paraview operators — Incorporation into VTK

 DAX (Sandia), DIY (Argonne), EVAL (ORNL)

S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL

4620 - DAX: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

(69)

VIZ TALKS AT GTC2014

 S4571 - Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight Center

 Abel Brown ( Principal Systems Engineer, A.I. Solutions

 S4632 - Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging

 S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL

 S4620 - Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

 S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD

 S4745 - Now You See It: Unmasking Nuclear and Radiological Threats Around the World

 S4203 - Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies

 S4516 - Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems

 S4599 - An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling

 S4778 - Interactive Processing and Visualization of Geospatial Imagery

 S4140 - Live, Interactive, In-Situ, In-GPU Visualization of Plasma Simulations Running on GPU Supercomputers

 S4400 - Petascale Molecular Ray Tracing: Accelerating VMD/Tachyon with OptiX

(70)

SUMMARY

 Different methods of visualization at different levels

— OpenGL, VTK, VisIt & Paraview

 Remote visualization concepts

— X forwarding, Viz app, scraping, GLX forking

 Parallel rendering and compositing

— Handled transparently in key tools

 In-situ visualization concepts

— Expose simulation variables to visualization tool, interactive viz

 Future directions

— GPU accelerated filtering and rendering

Thanks to Gilles Fourestey (CSCS), Nina Suvanphim (Cray), Jean Favre (CSCS), Adam DeConinck (NVIDIA), Robert Crovella (NVIDIA), Dale Southard (NVIDIA), Hank Childs (U Oregon), Jeremy Meredith (ORNL), Ian Williams (NVIDIA), Steve Parker (NVIDIA), Kitware, Paraview, VisIt and VirtualGL developers for their support!

(71)

ENJOY THE CONFERENCE AND SEE YOU AT

GTC 2015!

(72)

ABSTRACT (FOR REFERENCE ONLY)

Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis

techniques allowing you to investigate your data on the fly. Starting with some basic CUDA/OpenGL interoperability, we will introduce more sophisticated data models allowing you to take advantage of widely used tools like ParaView and VisIt to visualize your GPU resident data. Questions like parallel compositing, remote

visualization and application steering will be addressed in order to allow you to take full advantage of the GPUs installed in your supercomputing system.

HOW TO VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS. Peter Messmer, NVIDIA