The Things of Shapes: Waveform Generation using 3D Vertex Data

Kevin Schlei

University of Wisconsin-Milwaukee 3223 N. Downer Ave.

Rebecca Yoshikane University of Wisconsin-Milwaukee

2400 E. Kenwood Blvd.

ABSTRACT

This paper discusses the implementation of a waveform generation system based on 3D model vertices. The sys-tem, built with the Metal API, reflects the GPU transformed vertex data back to the CPU to pass to the audio engine.

Creative manipulation of 3D geometry and lighting changes the audio waveform in real time. The results are evaluated in a piece ’The Things of Shapes,’ which uses unfiltered results to demonstrate the textural shifts of model manipu-lation.

1. INTRODUCTION

Visual-music systems have explored a variety of techniques to translate imagery into sound, in both pre-computational and post-computational contexts [1]. 3D modeling has con-tributed to this area, from rigid-body simulations to cre-ative user interfaces. This paper outlines a method of con-necting 3D model data to audio synthesis, initial results and evaluations, and further avenues for investigation.

The system presented in this paper is not a physical model simulation. Instead, 3D model vertices are treated as a creative stream of audio or control data. This allows for the exploration of odd geometries, impossible shapes, and glitches for imaginative results.

The generated audio is strongly linked to the visual prod-uct. By generating audio data directly from model vertices, the system creates an interaction mode where real-time ob-ject geometry manipulations alter the sonic result (see Fig-ure 1). Changes to scene characteristics, like lighting, cam-era position, and object color, can contribute directly to the synthesized output. This allows for instant changes in tim-bre when switching between fragment shaders in real-time.

2. RELATED WORK

3D modeling is often used to create simulations of physical objects [2, 3, 4]. Models represent physical shapes, sizes, and material qualities.

3D models can also be representations of sound, or pro-vide a UI for performance. Sound Sculpting altered syn-thesis parameters like chorus depth, FM index, and vibrato by manipulating properties of an object like position, ori-entation, and shape [5].

Copyright: c2016 Kevin Schlei et al. This is an open-access article dis-tributed under the terms of theCreative Commons Attribution License 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Figure 1. Vertex x and y coordinates create a waveform. The icosphere above is shown crushed below.

Non-geometric qualities of models, like textures and bump maps, can be used to simulate frictional contact, rough-ness, and impact events [6].

Auralization and spatialization techniques built around 3D model representations of physical space can use model data for physical simulation. swonder3Dq uses wave-field synthesis in conjunction with a virtually represented 3D space to model the radiation characteristics of sounding objects [7] .

Wave terrain synthesis is a method of interpolating over an arbitrary path that reads from a 2D array of amplitude values, often visualized in a 3D graph [8, 9, 10]. ’Wave voxel,’ a 3D array lookup system offers a similar approach with an added axis [11]. These systems of interpolated val-ues between vertex positions highlight how consideration of geometric shape can create variable sonic results.

The viability of GPU audio calculation has been evalu-ated in a number of studies. Gallo et al examined the cost of GPU vs. CPU operations on a number of audio tasks, including FFT, binaural rendering, and resampling [12].

GPU hardware limitations, including re-packing data into GPU recognized data formats, single input operations, and distribution for parallel computation, are addressed with

’Brook for GPUs,’ a system designed for GPU stream-ing data calculations [13]. The outcomes are positive and show practical gains from assigning certain algorithms to

285

284 Proceedings of the International Computer Music Conference 2016

the GPU rather than the CPU.

3. VERTEX DATA MINING

A number of variables were identified as potentially valu-able vertex data sources.

3D vertices’ positions {x, y, z} are a primary data source, and they exist in multiple world spaces. The model con-tains its own model space, which is projected into a dis-play world space, then flattened onto a viewport. The static model vertices were determined to be of little use, since the goal was to respond to display and user transformations.

The projected and viewport coordinates proved to be dy-namic and useful.

A vertices’ normal vector reports which direction it is fac-ing: whether it is pointed towards or away from a light source or eye (camera). In a typical lighting system, nor-mal vectors determine how bright a face reflects the light source towards the eye.

In some render pipelines, a texture is stretched over the faces of a model. In others, the vertices have their own implicit color value. In both cases, the supplied lighting calculations alter the color values, which could be followed as a data stream.

Second order vertex properties, including velocity, color shift, brightness shift, etc., are under consideration for fu-ture implementations. Interpolation systems, such as those found in Haron et al [11], may also be explored.

Vertex Component Value

Coordinate (projection space) x, y, z Coordinate (viewport) x, y

Normal x, y, z

Normal (angle to eye) radians

Color r, g, b

Color (value) h, s, b

Table 1. Table captions should be placed below the table.

4. IMPLEMENTATION 4.1 Render Pipeline

The implementation aims to access ’cooked’ vertex data:

the transformed, projected, and fragment shaded result of a render pass of the GPU.

Like many graphic APIs, Metal requires two shaders to form a render pipeline: the vertex shader and the fragment shader. The vertex shader takes the model space vertices and transforms them into view projection space. This is also where user transformations (translate, rotate, scale) are applied.

The fragment shader (also known as pixel shader) is re-sponsible for calculating the fragment (pixel) output af-ter vertex rasaf-terization. It inaf-terpolates between the ver-tex points to fill in the triangle faces seen on the display.

Fragment shaders calculate lighting, texture sampling, and custom color functions. Unlike the vertex shader, which is called three times per triangle (once per vertex), the frag-ment shader may be called many thousands of times as it interpolates between vertices. This color data was mined

to gain insight into which vertices are lit brightly, along with their color response to lighting conditions.

4.2 Accessing Vertex Data

Metal has two methods to retrieve data from the GPU:

transform feedback and kernel functions.

Transform feedback is accomplished by passing an ex-tra ’output’ memory buffer to a vertex shader. The vertex shader writes the transformed vertices to the output buffer, which is then accessible to the CPU. However, the Metal API does not allow vertex shaders to pass data on for terization when they return output buffers. This means ras-terization and transform feedback are mutually exclusive.

Kernel functions are parallel data computations that can be run on the GPU. They support writing to output buffers for CPU access. They also support multithreading for large data sets.

Unfortunately, neither solution leverages the existing ren-der pipeline. This means two GPU pipelines are necessary:

one to compute and reflect the vertex data back to the CPU, and a second that renders to the screen. While not ideal, it is not a doubling of computation time for the GPU, as ex-plained in 4.3.

4.3 Compute Pipeline

A compute pipeline was created to calculate and retrieve the vertex data after transformation, projection, and light-ing. It is passed identical copies of the vertex buffers that feed the render pipeline, as shown in Figure 2.

A kernel function was chosen to perform the calculation rather than a transform feedback vertex shader, due to its ability to split work into multiple threads. The kernel func-tion body contains the same code as the render pipeline’s vertex and fragment shaders, plus a few additional calcula-tions for values like the viewport position.

Even though the same shader code is run in both pipelines, the GPU does not have twice the workload. A majority of the work of the render pipeline vertex shader is spent on implicit render actions, like depth testing and pre-fragment shader rasterization. Similarly, the fragment shader runs many more times than the number of vertices to calculate each pixel’s color. The kernel function only performs a fraction of this total work, as seen in Table 2.

Total Avg. Std. Dev.

Kernel 4.40 ms 72.10 µs 1.41 µs Vertex 25.96 ms 432.64 µs 2.28 µs Fragment 70.28 ms 1.17 ms 8.19 µs

Table 2. GPU shader computation times during 1000 ms of activity. Per-formed on a 2016 iPad Pro with A9X processor.

4.4 Pathway to Audio

The retrieved vertex data buffer is accessed after the kernel function has completed, at the end of each frame update.

The frame ends by reading through the output buffer to pull the desired vertex component data. The data is written directly to a wavetable, which is continuously oscillated by the audio engine (libpd).

Proceedings of the International Computer Music Conference 2016

Kernel Function Vertex Shader Vertex Buffer

Frame Update Uniforms Buffer Materials Buffer

Fragment Shader

Render

Compute Depth Testing /

Rasterization

Output Buffer

Figure 2. The render and compute pipelines for a frame update. Updated buffer data is passed to both shader pathways.

Typically a single vertex component is followed to gen-erate a waveform. For example, the viewport (screen) x-position generates a waveform that changes as the model rotates, scales, or translates, and also when the camera po-sition changes. Other components, such as color bright-ness and normal direction, created waveforms that reacted to other changes like lighting position or type.

Graphing combinations of components, like the normal x-value multiplied by the color brightness, can provide fur-ther variations on waveform responses that pull from more than one environmental change.

5. EVALUATION 5.1 Meshes App

A test application, ’Meshes,’ was authored to test the per-formance and sonic results of the implementation. The user interface allows for 3-axis rotation of a model and translation away from the center point. A simple momen-tum system lets the 3D model be ’thrown’ around the world space. A slider sets the wavetable oscillation frequency.

5.2 General Observations

The ability to generate audio data from 3D model data shows strong promise in a few areas. First, the offloading of complex parallel data calculation to the graphics card frees the CPU to perform more audio functions. This is especially beneficial on mobile devices.

Second, the direct link from model shapes and lighting systems presents a novel interaction mode for sound gen-eration. The variety of sonic outputs from different 3D model shapes and lighting formulas allows a 3D modeling artist to creatively engage with the sound generation.

5.3 Vertex Component Variations

The different ’cooked’ vertex components resulted in a va-riety of wavetable results. Projected coordinate values (x, y, and z position) created subtle harmonic shifts when ro-tated about their axes. Translating the 3D model away from the camera position, however, produced no change.

This is because the projected model data remains steady in its world-space position. Meanwhile the viewport position (screen x and y) would shrink and expand as the model moves closer and farther from the camera.

Figure 3. Normal values of vertices often produce areas of .

The normal values often created areas of visible wave-form continuity. For example, a prosthetic cap model (Fig-ure 3) produced cyclical patterns where its geometric cutouts produced a cylindrical shape. Oscillated slowly, these pat-terns produced shifting rhythmic cycles.

The color brightness value shows great potential as a way to drastically alter the sonic output of a model. By switch-ing between fragment shaders, shown in Figure 4, the bright-ness values can be shifted, sloped, and quantized.

Figure 4. The lighting direction or fragment function changes vertices’

brightness values, graphed here using the ’realship’ model.

5.4 Duplicate Pipeline Performance

The splitting of vertex calculations into two pipelines (ren-der and compute) has some drawbacks, but also some ma-jor positives.

One drawback is the lack of access to depth testing that

287

286 Proceedings of the International Computer Music Conference 2016

occurs during the render pipeline (see Figure 2). Faces of objects that fail depth tests, i.e., are ’behind’ other faces, are not fragment shaded or drawn to screen. Since that pro-cess is automatically performed between the vertex shader and fragment shader in a render pipeline, it is not avail-able to the compute kernel function. These changes to the vertex data may have allowed for interesting effect possi-bilities.

A significant benefit of splitting work between the render and compute pipeline is the possibility of decoupling audio updates from screen updates. Performance tests, like Table 2, indicate that the GPU spends around 95.5% of a frame update on the render pipeline, vs. just 4.5% on the compute pipeline. This shows that the compute pipeline could be run separately at a much faster rate, perhaps called directly from the audio buffer callback.

5.5 CSIRAC and the ’blurt’

CSIRAC was the first computer to generate digital audio by sending its bitstream directly to an amplifier [14]. The direct mapping of vertex data to a waveform outlined in this research is similar to CSIRAC’s sonification of com-puter data.

An interesting historic note is the ’blurt:’ a short, recog-nizable loop of raw pulses which CSIRAC programmers added to the end of a program. Lacking a display terminal, this aural cue helped signify when a program had finished.

The sonification of vertex data also acted as a helpful de-bugging tool. For example, when listening to the projected x-coordinate of a model translated entirely off-screen, one might expect to hear silence. Surprisingly, the waveform persisted. This lead to the realization that the flat, viewport coordinates and the model’s projection-space coordinates were separate data.

Furthermore, viewing the generated waveform illustrated how unorganized model vertices could be. Simple geomet-ric shapes, created in commercial 3D modeling software, were shown to have no discernible pattern of face order, as shown in Figure 5. This is not an issue in the implementa-tion, but rather highlights the naturally occurring structures of 3D models.

Figure 5. Four iterations of drawing an icosphere, where consecutive vertices are allowed to be drawn.

5.6 The Things of Shapes

’The Things of Shapes’ is a piece that uses the unfiltered output from the vertex wavetable to create a collage com-position with a frenetic character. The gestures are artic-ulated by both automated and user-driven manipulation of 3D models. The 3D models used include simple geomet-ric shapes (cube and icosphere) and complex models (real-ship). The majority of the piece used only the x-coordinate property of vertices for waveform generation.

One automated manipulation was a noise function which randomly fluttered vertex positions, with a variable spread, as seen in Figure 6. The result added noise to the form, but even at high levels of flutter a discernible wave-form timbre could be maintained and controlled.

Figure 6. Vertices are shifted by a random amount each frame to add noise to the shape and waveform.

Next a modulo function was used to crush vertices to-wards the zero-point of the coordinate space, as seen in Figure 1. This function was cycled to ramp between unal-tered model shapes and crushed shapes. This caused some models to pulse from zero to their original scale.

User-driven model manipulation was achieved through touch screen interactions and physics simulations. Rota-tion, offset, and scale of the models were attached to touch panning gestures. These in turn were given momentum and resistance properties to allow for natural deceleration of position and rotation. This formed a major influence on the gestural quality of the final piece. Slowly rotated or shifted shapes produced steadily changing timbres in the waveform. ’Thrown’ shapes, where the model rotated at some distance around the center of the projected space like a tetherball, brought the vertices into and out of the view-port. This cyclical appearing and disappearing produced a fluttering sound that decelerates towards a steady tone.

5.7 Improvements and Future Work

Synchronization of audio buffer callbacks and kernel func-tion calculafunc-tions is the first priority of future implementa-tions. In addition to being lower latency than the current display update rate of 60Hz, synchronization could allow for audio-rate data to be fed into the kernel function. This would allow for smooth audio calculation from within the kernel function, rather than the control-rate updates cur-rently implemented. Another option would be to pursue streaming implementations such as those found in Brooks et al [13].

Systems of generating audio that do not rely on direct mapping could create new synthesis possibilities. Instead of using wavetable synthesis, an internal oscillation method could be devised and continuously output. This may be based on interpolating between weighted vertices, or us-ing relational analysis of the entire collection of vertices to drive synthesis parameters.

Model manipulation could be improved with a variety of methods to alter model geometry and fragment calcu-lation. Advanced object deformation, such as fabric or vis-cosity mesh simulations, could be sonified. More graphic pipeline functions, including masking, blending, bump map-ping, etc., could be implemented as creative methods of generating data.

Proceedings of the International Computer Music Conference 2016

Geometry shaders are relatively new shaders where the GPU can generate new vertices from the originally pro-vided vertices. The Metal API currently does not support geometry shaders. Two-stage vertex calculation has been offered as a workaround for this.

6. CONCLUSIONS

A method for accessing projected and fragment shaded vertex data from a GPU has been outlined. Initial observa-tions show a successful split between rendering and com-pute pipelines, with the possibility of further decoupling to improve audio calculation latency. A prototype applica-tion demonstrated how sonic changes follow the transfor-mation of models fed through the system. ’The Things of Shapes’ takes that tool and assembled a collage of shape-driven sounds and phrases.

Acknowledgments

The authors would like to thank the Office for Undergrad-uate Research at the University of Wisconsin-Milwaukee for their support and funding for this project.

7. REFERENCES

[1] G. Levin, “Painterly interfaces for audiovisual perfor-mance,” Ph.D. dissertation, Massachusetts Institute of Technology, 2000.

[2] J. F. O’Brien, C. Shen, and C. M. Gatchalian, “Syn-thesizing sounds from rigid-body simulations,” in Pro-ceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation. ACM, 2002, pp.

175–181.

[3] N. Raghuvanshi and M. C. Lin, “Interactive sound syn-thesis for large scale environments,” in Proceedings of the 2006 symposium on Interactive 3D graphics and games. ACM, 2006, pp. 101–108.

[4] K. Van Den Doel, P. G. Kry, and D. K. Pai, “FoleyAu-tomatic: physically-based sound effects for interactive simulation and animation,” in Proceedings of the 28th annual conference on Computer graphics and interac-tive techniques. ACM, 2001, pp. 537–544.

[5] A. Mulder and S. Fels, “Sound sculpting: Manipulat-ing sound through virtual sculptManipulat-ing,” in Proc. of the 1998 Western Computer Graphics Symposium, 1998, pp. 15–23.

[6] Z. Ren, H. Yeh, and M. C. Lin, “Synthesizing con-tact sounds between textured models,” in Virtual Re-ality Conference (VR), 2010 IEEE. IEEE, 2010, pp.

139–146.

[7] M. Baalman, “swonder3Dq: Auralisation of 3D ob-jects with wave field synthesis,” in LAC2006 Proceed-ings, 2006, p. 33.

[8] Y. Mitsuhashi, “Audio signal synthesis by functions of two variables,” Journal of the Audio Engineering Soci-ety, vol. 30, no. 10, pp. 701–706, 1982.

[9] A. Borgonovo and G. Haus, “Sound synthesis by means of two-variable functions: experimental criteria and results,” Computer Music Journal, vol. 10, no. 3, pp. 57–71, 1986.

[10] R. C. Boulanger, The Csound book: perspectives in

In document proceedings ICMC2016.pdf (Page 163-166)