• No results found

Application of General Purpose HPC Systems in HPEC. David Alexander Director of Engineering SGI, Inc.

N/A
N/A
Protected

Academic year: 2021

Share "Application of General Purpose HPC Systems in HPEC. David Alexander Director of Engineering SGI, Inc."

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

dca 8/2003

Application of General Purpose

HPC Systems in HPEC

David Alexander

Director of Engineering

SGI, Inc.

(2)

dca 8/2003

(3)

SGI Scalable ccNUMA Architecture

Basic Node Structure and Interconnect

Physical Memory

Physical Memory

(2-16GB)

C A C H E

CPU

Interface

Chip

CPU

C A C H E

NUMAlink Interconnect

(>6GB/sec)

Physical Memory

(2-16GB)

C A C H E

CPU

Interface

Chip

CPU

C A C H E

Memory Interface

(>8GB/sec)

Memory Interface

(>8GB/sec)

Processor Interface

(>6GB/sec)

Processor Interface

(>6GB/sec)

(4)

SGI Scalable ccNUMA Architecture

Scaling to Large Node Counts

Memory CPU Interface Chip CPU Memory CPU Interface Chip CPU Memory CPU Interface Chip CPU Memory CPU Interface Chip CPU

System Interconnect Fabric

Scaling to 100’s of processors

• RT Memory Latency < 600ns worst case (64p config)

• Bi-section bandwidth >25GB/sec (64p config)

(5)

SGI Scalable ccNUMA Architecture

SGI

®

NUMAflex™ Modular Design

System

Interconnect

CPU &

Memory

Storage Expansion

System I/O

High BW I/O

Expansion

Standard I/O

Expansion

Graphics Expansion

Scalability: No central bus or switch; just

modules and NUMAlink™ cables

Scalability:

No central bus or switch; just

modules and NUMAlink™ cables

Investment protection: Add new

technologies as they evolve

Investment protection:

Add new

technologies as they evolve

Flexibility: Tailored configurations for

different dimensions of scalability

Flexibility:

Tailored configurations for

different dimensions of scalability

Performance: High-bandwidth

interconnect with very low latency

Performance:

High-bandwidth

interconnect with very low latency

(6)

Traditional Clusters

SGI

®

Altix

3000

node

+

OS

...

Fast NUMAflex™ interconnect

Global shared memory

node

+

OS

...

Commodity interconnect

mem

mem

mem

mem

mem

mem

node

+

OS

node

+

OS

node

+

OS

node

+

OS

node

+

OS

node

+

OS

node

+

OS

node

+

OS

What is shared memory?

• All nodes operate on one large shared memory space, instead of each node having its own small

memory space

Shared memory is high-performance

• All nodes can access one large memory space efficiently, so complex communication and data

passing between nodes aren’t needed

• Big data sets fit entirely in memory; less disk I/O is needed

Shared memory is cost-effective and easy to deploy

• The SGI Altix 3000 family supports all major parallel programming models

• It requires less memory per node, because large problems can be solved in big shared memory

• Simpler programming means lower tuning and maintenance costs

SGI Scalable ccNUMA Architecture

(7)

dca 8/2003

(8)

SGI

®

Altix™ 3000

Overview

First Linux

®

node with 64 CPUs in

single-OS image

First clusters with global shared

memory across multiple nodes

First Linux solution with HPC

system-and data-management tools

World-record performance for

floating-point calculations, memory

performance, I/O bandwidth, and real

technical applications

(9)

Industry-standard Linux

®

Intel

®

Itanium

®

2 processor family

SGI

®

NUMAflex™

(Third-generation modular supercomputing architecture)

SGI

®

NUMAlink™

(Built-in high-bandwidth, low-latency interconnect)

SGI

®

Altix™ 3000

Fusion of Powerhouse Technologies

SGI

®

supercomputing technology, Intel’s most advanced

processors, and open-source computing

SGI ProPack™ and supercomputing enhancements

Global shared memory

(10)

SGI

®

Altix

3000 Family

Extreme Power, Extreme Potential

Single-node entry offering

4–12 1.30 GHz Itanium

®

2

processors, 3MB L3 cache

Up to 96GB memory

Scalable performance offering

4–64 Itanium 2 processors in a single node

1.30 GHz/3MB L3 cache

1.50 GHz/6MB L3 cache

Shared memory across nodes

Scalable to 2,048 processors, 16TB memory

Nodes up to 64P, 4TB memory

Model 3300 Servers

(11)

dca 8/2003

(12)

Multi-Paradigm Architecture

Overview

C

C

C

C

C

C

C

C

C

C

C

V

R

R

R

R

IO

IO

IO

IO

MSE

MSE

MSE

MSE

• NUMAlink system interconnect

• General-purpose compute nodes

• Peer-attached general purpose I/O

• Mission-specific accel. and/or I/O (MSE)

• Integrated graphics/visualization

R

C

IO

MSE

V

(13)

Multi-Paradigm Architecture

Mission Specific Element (MSE)

MSE

NL4 Ports

(>5GB/s x 2)

FPGA,

DSP, or

ASIC

TIO ASIC

System

Interconnect

Fabric

>5GB/s

Memory

I/O

Implementation variants:

Customer supplied/loaded FPGA algorithms and/or ASIC/DSP

• Subroutine or standard library acceleration

• Specific use “appliance”

(14)

dca 8/2003

Embedded High Performance

Computing (EHPC)

(15)

EHPC Today

Different Development and Deployment Environments

EHPC Vision

Unified Development and Deployment

Enhanced software development productivity

Superior performance and HW utilization

Demonstration to deployment in days

Benefits from mainstream HPC advances

Development Environment

HPC methods, tools, hardware

Architecture provides freedom to develop new approaches

Platform Port

Months of work Millions of dollars Significant schedule risk

Significant architectural/performance risk

Deployment

Proprietary hardware and software

“Islands of memory” narrows algorithm options

Embedded High Performance Computing

Unified Development/Deployment Environment

SGI

EHPC

System

Scalable ccNUMA

System Architecture

Embedded Systems

Packaging

Seamless HPC Application

Development Model

Mission-Specific

Processing

(16)

Embedded High Performance Computing

EHPC Platform

• Unified software environment for development,

prototype and deployment

• Full multi-paradigm computing support

• Field-upgradeable mission specific processing

acceleration

• Highly optimized performance per watt and

performance per slot

• Design to fit into established environmentals, form

factors, and interfaces

• 6U Eurocard form factor

• Passive, slot configurable backplane

• PMC module connectivity to standard interfaces

• Able to address oceanic, ground and airborne

environmentals

SGI

EHPC

System

Scalable ccNUMA System Architecture Embedded Systems

Packaging

Seamless HPC Application

Development Model

Mission-Specific Processing

(17)

Embedded High Performance Computing

Family of 6U Form Factor Modules

6U Form Factor NUMAlink4 Ports Diagnostic & L1 Ports System Control PMC Module Site NUMAlink Controller PMC Module Site PIC 6U Form Factor NUMAlink4 Ports Diagnostic & L1 Ports uP Cache NUMAlink Controller Memory

DDR SDRAMDDR SDRAMMemory Memory DDR SDRAMDDR SDRAMMemory

Memory DDR SDRAMDDR SDRAMMemory

Memory DDR SDRAMDDR SDRAMMemory

uP Cache

System Control

Scalable General Purpose

(18)

6U Form Factor NUMAlink4 Port Diagnostic & L1 Ports TIO System Control Memory FPGA 6.4GB/s TIO Memory FPGA 6.4GB/s NUMAlink4 Port NL4 Port External I/O (Optional) 6U Form Factor NUMAlink4 Port Diagnostic & L1 Ports TIO System Control Memory TIO Memory NUMAlink4 Port NL4 Port External I/O (Optional) FPGA FPGA ASIC ASIC

Embedded High Performance Computing

Family of 6U Form Factor Modules

(19)

www.sgi.com

©2002 Silicon Graphics, Inc. All rights reserved. Silicon Graphics, SGI, IRIX, Challenge, Origin, Onyx, and the SGI logo are registered trademarks and OpenVault, FailSafe, Altix, XFS, and CXFS are trademarks of Silicon Graphics, Inc., in the U.S. and/or other countries worldwide. MIPS is a registered trademark of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. UNIX is a registered trademark of The Open Group in the U.S. and other countries. Intel and Itanium are registered trademarks of Intel Corporation. Linux is a registered trademark of Linus Torvalds. Mac is a registered trademark of Apple Computer, Inc. Windows is a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries.Red Hat is a registered trademark of Red Hat, Inc. All other trademarks are the property of their respective owners. (1/03)

References

Related documents

By performing immunohistochemical staining in human HCC tissues and functional in vitro assays of Huh7 and HepG2 cell lines, we provided the first report on a

[r]

H ow ever, some participants (particularly some of the students) expressed the view that the complexity and the w ide range of competences presented seemed both highly demanding

All transactions are subject to the Company's Standard Trading Conditions (copies available on request from the Company) and which, in certain cases, exclude or limit the

The aims of the study were to investigate wheth- er (a) VPT/VLBW children of both age groups performed poorer than controls (deficit hypothesis) or caught up with increasing age

To the best of our knowledge, no prior study has theorized and empirically investigated how functional, organizational, and cultural boundaries interact with task

In late season trials at Aberdeen, Idaho and Hermiston, Oregon in 2001, total yields for Alpine Russet were significantly higher than Ranger Russet and Russet Burbank at Aberdeen

Two-port VNAs are common, so an alternative is to measure all port combinations two ports at a time and process the two-port S-parameter files into seven-port S-parameter