• No results found

Agile High-Performance Software Development

N/A
N/A
Protected

Academic year: 2021

Share "Agile High-Performance Software Development"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Agile High-Performance Software

Development

Chris Mueller and Andrew Lumsdaine Open Systems Lab/Indiana University

RIDMS-2

(2)

Modern Processors

IBM Cell BE

(3)

* Featuring *

Advanced “make” build system! Cutting edge “gdb” debugger! Unparalleled C standard library! Works with any text editor!

*Auto-parallelizing, auto-simdizing, optimizing compiler not yet available. For maximum SIMD performance, use of assembly may be required.

Void where prohibited, prohibited where void.

(4)

A Brief History of High Performance Computing

(Commodity hardware and language edition)

1950s FORTRAN

John Backus, et al.

Captures and improves common assembly practices for scientific computing

1970s (BCPL)/C

Denis Ritchie, et al.

Captures and simplifies best assembly practices for systems programming

1990s Java

James Gosling, et al.

Abstract, single-processor machine model + runtime optimizer for all computing tasks,

provides rich environment for Web applications

VB/Python/Perl

van Rossum, Wall, et al.

Scripting language + low level language for rapid application development

1980s

(mini/micro computers) (personal computers)

(commodity SIMD, dual processor)

(heterogeneous multi-core pushes C to its semantic limits)

(5)

State of the Art for High Performance Computing

(Commodity hardware and language edition)

1950s FORTRAN

John Backus, et al.

Captures and improves common assembly practices for scientific computing

1970s (BCPL)/C

Denis Ritchie, et al.

Captures and simplifies best assembly practices for systems programming

(6)

State of the Art for High Performance Computing

(Commodity hardware and language edition)

1950s FORTRAN

John Backus, et. al.

Captures and improves common assembly practices for scientific computing

1970s (BCPL)/C

Denis Ritchie, et. al.

Captures and simplifies best assembly practices for systems programming

(mini/micro computers)

Is there an alternative?

(7)

Our Approach

Take a modern programming technique…

(8)

Our Approach

Take a modern programming technique…

…provide direct access to the hardware…

(9)

Our Approach

Take a modern programming technique…

…provide direct access to the hardware…

… and let programmers explore the SIMD and

multi-core design spaces.

(10)

CorePy

Instruction Set

Architecture (ISA) Instruction Stream Processor

Hardware/OS Abstractions

Memory

A layered collection of Python libraries for generating and executing high-performance code at run-time.

Types, Control Flow, and Optimizers

Variables Iterators Extended Instructions Memory Models

PPC AltiVec/VMX SPU Linux OS X

(11)

A Simple Example

1. c = InstructionStream() 2. ppc.set_active_code(c) 3. ppc.addi(gp_return, 0, 31) 4. ppc.addi(gp_return, gp_return, 11) 5. p = Processor() 6. r = p.execute(c) 7. print r 8. --> 42

r = ((0 + 31) + 11)

(12)

Variables

CorePy Variables encapsulate a register, backing store, and valid operations for a user defined data type.

1. a = SignedWord(11) 2. b = SignedWord(31) 3. c = SignedWord(0, reg=gp_return) 4. c.v = (a + b) * 10 5. --> c = 420 Scalar example: 1. a = VecWord([2,3,4,5]) 2. b = VecWord([3,3,3,3]) 3. c = VecWord(0) 4. c.v = vmin(a, b) * b + 10 5. --> c = [16, 19, 19, 19] Vector example:

(13)

Iterators

Iterators enable user-defined loop semantics.

1. # Basic Iteration

2. a = SignedWord(c, 0)

3. for i in syn_iter(c, 5):

4. for j in syn_iter(c, 5, mode = ‘ctr’):

5. a.v = a + 1

6. proc.execute(c)

(14)

Iterator Examples

1. # Array iteration

2. for x in var_iter(c, a): sum.v = sum + x

3. for x in vec_iter(c, a): sum.v = sum + x

4. # Data stream merge

5. for x,y,z,r in zip_iter(c, X,Y,Z,R):

6. r = vmadd(x,y,z)

7. # Loop unrolling

8. for x in unroll(vec_iter(c, a), 3): body(x)

9. # Auto-parallelization

10.for x in parallel(vec_iter(c, a)): body(x)

11.t1 = proc.execute(c, mode=‘async’, params=[0,2,0])

(15)

CorePy Research Model

 Use CorePy to develop real applications

 Use Python for coarse-grained application and data flow

 Use CorePy libraries for high-performance code sections

 Identify common implementation patterns

 esp. SIMD/multi-core

 Generalize patterns into library components

(16)

Example: Particle System

for vel, point in parallel(zip_iter(c, vels, points)): # Forces - Gravity and air resistance

vel.v = vel + gravity

vel.v = vel + vmadd(vsel(one, negone, (zero > vel)), air, zero) point.v = point + vel

# Bounce off the zero extents (floor and left wall) # and positive extents (ceiling and right wall)

vel.v = vmadd(vel, vsel(one, floor, (zero > point)), zero) vel.v = vmadd(vel, vsel(one, negone, (point > extents)), zero)

# Add a 'floor' at y = 1.0

point.v = vsel(point, one, (one > point))

v1: Numeric Python (~20k particles/sec) v2: CorePy “asm” (~200k particles/sec)

v3: CorePy variables/iters (~200k particles/sec) Development Iterations:

(17)

Example: BLASTP on the Cell

⇒ Cell SPU support

⇒ Blocked memory components ⇒ “Stream shift” iterator

⇒ Instruction replication ⇒ Python multi-core control

components

(18)

Community Projects

Cell SPU Big Num library

(Andrew Friedley)

 ~5G inst/s on 1 SPU

Image processing, fractals

(Ben Martin)

DGEMM/BLAS

(Andrew Lumsdaine)

Generic Convolution Framework

(Alex Breuer)

(19)

Thank You!

Funding: Lilly Endowment Support and Feedback:

IBM Cell Ecosystem Team, especially: Hema Reddy, Gordon Ellison,

Jennifer Turner, Bob Arenburg Ben Martin, Andrew Friedley, Alex Breuer, Jeremiah Willcock

www.synthetic-programming.org [email protected]

References

Related documents