• No results found

Touchstone -A Fresh Approach to Multimedia for the PC

N/A
N/A
Protected

Academic year: 2021

Share "Touchstone -A Fresh Approach to Multimedia for the PC"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Touchstone -A Fresh Approach to Multimedia for the PC

Emmett Kilgariff Martin Randall

Silicon Engineering, Inc

(2)

Presentation Outline

Touchstone Background

Chipset Overview

Sprite Chip

Tiler Chip

Compressed Textures

Advanced Rendering Features

(3)

Touchstone Background

First Implementation of Features in the Talisman Architecture

Integrated 3D/multimedia chipset for the PC

Joint development project:

– Cirrus Logic

– Fujitsu Microelectronics – Microsoft Corporation – Samsung Semiconductor – Silicon Engineering

(4)

Touchstone Goals

Create a new media solution which addresses key limitations in the PC architecture

– Memory bandwidth limitations for 3D graphics – Advanced 3D rendering features

– Integration of 2D, 3D, audio and video – Enable programmable media accelerators – Optimized for DirectX

(5)

Product Differentiation

Integrated Audio, Video, 2D and 3D graphics

Simplifies content development & Lowers overall system cost

1024x768 24-bit RGB at 72Hz versus 640x480 16-bit RGB at 30Hz

Sharper images without banding

72Hz frame rate greatly improves immersive experience Optimized for guaranteed frame rate

Effective polygon rate of 2M triangles/sec

Allows for more complex environments

Advanced rendering features

Provides higher-quality imaging and special effects

(6)

Chipset Overview

Tiler & Sprite Chips: Silicon Engineering/Cirrus Logic

Memory Controller 3D Rendering Pipeline

Performs GSprite transformation and controls Compositor

Media Signal Processor (MSP) - Samsung Semiconductor

Real-Time Kernel, Graphics Control, Geometry, MPEG, Audio

Compositing DAC: Fujitsu Microelectronics

Compositing Buffer for GSprites into display image

RAMDAC

(7)

3D Graphics

Existing architectures evolved from Mechanical CAD

Animation achieved through brute force

– Re-render entire screen at frame rates

How can we use silicon more efficiently?

(8)

Chipset Overview

PCI Polygon

Object Processor 2Mx8

RDRAM

2Mx8 RDRAM

Image Layer Compositor Media DSP

Touchstone VLSI Components

Standard Components

Commodity DRAM Memory

Compositing DAC

R

G

B

Video

Audio Chip

2 Channel Audio

Modem

(Tiler) (Sprite)

(9)

Key Concepts:

DirectDraw Surfaces

Affine Tranform and Composite Surfaces in Display Pipeline.

Exploits Temporal Coherency of 3D apps, re-use surfaces

Compressed Textures and Surfaces

Amplifies Memory Capacity and Bandwidth by 10x.

Chunking

Render 32x32 tiles of a surface at a time

No Global Z-buffer, or Anti-aliasing buffers needed.

(10)

Rendering Pipeline

MSP performs “Geometry Engine” operations

Transform, Light, Clip, Setup

Separate triangles into bins, depending on viewable area

Tiler renders DirectDraw Surfaces, one chunk at a time

Display list input - pointers to bins needed for each chunk Renders primitives into chunk-sized buffer (32x32)

Chunk is compressed and stored in memory

(11)

Display Pipeline

MSP creates a surface display list, in front-to-back order

Compositing-DAC signals start of display

Display is composited one band at a time (1344x32)

Sprite Chip fetches surfces, performs affine transform

Compositing buffer receives pixels from Sprite Chip, composites one band at a time, then displays completed bands

(12)

Advanced Feature: Anisotropic Texture Filtering

Provides sharper textures (without temporal artifacts) especially at high angles of perspective

(13)

Sprite Chip - Block Diagram

Tiler

Rasterizer

Compositing Buffer Controller Image Layer

Cache Control

Interface Control

To Comp.

Buffer

Decompress

Compressed Image layer

Cache

Image Layer Filter Engine Image Layer

Cache Pre-Rasterizer

Image Layer Header Registers

Image Layer Read Queue Initial

Evaluation

Image Layer

Queue Cache

Address Map

(14)

Sprite Chip - Functional Overview

Fetches DirectDraw Surfaces from memory at display time

Performs affine transforms on DirectDraw Surfaces

Filtering Modes: Point Sampled, Bilinear, Bicubic

Composites one display band at a time, rendering front to back and writing results to Compositing Buffer

Composites Pixels at 320Mpixels/sec (1.28GBytes/sec)

(15)

Tiler Chip - Block Diagram

Compress

Depth/Stencil/Priority Buffer

Rasterizer

Pixel Engine Fragment

Buffer

Color Buffers Fragment

Resolve

Texture Cache Control

Command and Memory

Control

Media DSP

RAMBUS Channels

Decompress

Compressed Texture

Cache

Texture Filter Engine Texture

Cache Pre-Rasterizer

Primitive Register

Texture Read Queue Initial

Evaluation

Primitive

Queue Cache

Address Map Pixel Queue

(16)

Tiler Chip - Functional Overview

Chunk-based rendering pipe

Display List Processor

Clip triangle to Chunk, and calculate starting point’s attributes Interpolate coordinates, colors and texture coefficients - Calculate

coverage mask

Calculate texture coordinates, lookup samples and filter

Per-Pixel operations: Z-buffering, Anti-aliasing, Transparency, stencil, etc

After entire chunk is rendered, it is compressed and written to memory

(17)

Compressed Textures: Challenges

Texture mapping accesses data randomly

– Most compression methods need to read most of image to decompress one pixel

– How do you find the random data in a compressed image?

Long Latency Between Address Generation and Uncompressed Data Avaliable.

– Uncompressed 3-D address Needs to be mapped to linear, compressed address

– Data Fetch

– Data Decompress

(18)

Challenge 1: Not Fetching Most of Image.

Solution: Use Jpeg-like 8x8 compressed blocks.

Key difference: DC components not differentially encoded.

Cache 8x8 blocks uncompressed for re-use.

(19)

Challenge 2: Random Access.

Solution: List of pointers to Compressed blocks kept in linear Memory

Compressed data stored in Linked-list memory: Simplifies memory allocation.

Caching of sections of the pointer list, and

linked - list elements (128-byte MAUs) in

compressed cache increases effective cache

size by 10x.

(20)

Challenge3: High Latency.

Solution 1 (Sprite Chip): Two Rasterizers.

– Surface edge equations are fed into Pre-Rasterizer and FIFO to Post-rasterizer.

– Pre-Rasterizer calculates addresses, forcing compressed cache hits/misses early.

– Post Rasterizer calculates same address sequance later, forcing uncompressed cache hits/misses.

– Time beween Pre-and post rasterization used to fetch and uncompress blocks.

(21)

Challenge3: High Latency.

Solution 2: (Tiler Chip): One Rasterizer, Big texel address FIFO.

– Rasterizer calculates addresses, forcing compressed cache hits/misses early.

– Rasterizer places addresses in Pixel FIFO

– Addresses are applied to the uncompressed cache later, after blocks have been fetched and uncompressed.

(22)

Advanced Rendering Features:

Anisotropic Texture Filtering

Shadows

Transparency

Anti-Aliasing

Motion Blur

Depth-of-Field Effects

(23)

Advanced Features: Motion Blur and Depth-of-Field

Render Object in separate GSprite, shrink in one or both dimensions. Expand to normal size during display.

(24)

Functionality and Performance

Single PCI board with audio, video, 2D and 3D graphics

High-resolution display 1344x1024 @ 75Hz

24-bit true color for all resolutions

Rendering rate of 40M pixels/sec with anisotropic filtering and Z-buffered anti-aliasing

Image compositing at 320M pixels/sec

Equivalent to 2M polygons/sec

Advanced rendering features - subpixel-filtered anti- aliasing, translucent surfaces, shadows, blur, fog, etc

(25)

Conclusions

>120x overall performance improvement:

– >4x performance improvement through Spriting – 10x memory capacity and bandwidth

improvement through Compression

– 3x memory bandwidth improvement through Chunking

Allows RE2-level performance, image quality,

features at PC price points

References

Related documents

manufactured and can be made in less time, hence we have selected this project el-bow mechanism is an ingenious link mechanism of slider and kinematic chain principle. This is

• Hybrid power plants: Different kinds of renew- able energy sources (solar cell, fuel cell, wind generator, etc.) to produce electrical energy com- bined together overcome the

that, of all women workers in household employ- ment as shown by BLS and Census data

In order to improve user experience these service providers have come together to specify methods to allow users to share their identity among different services in secure man-

Dado el carácter de intangibilidad del valor del capital no financiero, así como el de sus tres componentes principales (capital humano, capital estructural y capital

Por último, los resultados obtenidos relativos al tercer objetivo indican que las agrupaciones de comportamientos caracterizadas por elevado tiempo dedicado a conductas

Since we are interested in the implications of productivity effect of profit sharing in conventional models of profit sharing, we assume, in line with Holmlund (1991), Layard

In order to address the current accessibility gaps in libraries related to collection management and electronic resources, the research for this paper has identified