• No results found

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to

High Performance Cluster Computing

Cluster Training for UCL

Part 1

(2)

What is HPC

 HPC = High Performance Computing

Includes Supercomputing

 HPCC = High Performance Cluster Computing

Note: these are NOT High Availability clusters

 HPTC = High Performance Technical Computing

(3)

Agenda

• Parallel Computing Concepts

• Clusters

(4)

Concurrency and Parallel Computing

A central concept in computer science is concurrency:

• Concurrency: Computing in which multiple tasks are active at the

same time.

There are many ways to use Concurrency:

• Concurrency is key to all modern Operating Systems as a way to

hide latencies.

• Concurrency can be used together with redundancy to provide

high availability.

• Parallel Computing uses concurrency to decrease program

runtimes.

(5)

Hardware for Parallel Computing

Parallel computers are classified in terms of streams of data and

streams of instructions:

• MIMD Computers: Multiple streams of instructions acting on

multiple streams of data.

• SIMD Computers: A single stream of instructions acting on

multiple streams of data.

Parallel Hardware comes in many forms:

• On chip: Instruction level parallelism (e.g. IPF)

• Multicore: Multiple execution cores inside a single CPU

• Multiprocessor: Multiple processors inside a single computer.

• Multicomputer: networks of computers working together.

(6)

Hardware for Parallel Computing

Symmetric

Multiprocessor

(SMP)

Non-uniform

Memory

Architecture

(NUMA)

Massively

Parallel

Processor

(MPP)

Cluster

Single Instruction

Multiple Data (SIMD)*

Multiple Instruction

Multiple Data (MIMD)

Parallel Computers

Shared Address Space

Disjoint Address Space

Distributed

Computing

(7)

What is an HPC Cluster

A cluster is a type of parallel or distributed processing system,

which consists of a collection of interconnected stand-alone

computers cooperatively working together as a single,

integrated computing resource.

A typical cluster uses:

• Commodity off the shelf (COTS) parts

• Low latency communication protocols between the disjoint

address spaces (memory)

(8)

What is HPCC?

Cluster

Cluster

Management

Management

Tools

Tools

Master Node

File Server / Gateway

(9)

Cluster Architecture View

desktop

desktop

shmem

shmem

Other

Other

OSes

OSes

Parallel Benchmarks:

Parallel Benchmarks:

Perf

Perf

, Ring, HINT, NAS,

, Ring, HINT, NAS,

Ethernet

Ethernet

TCP/IP

TCP/IP

Real Applications

Real Applications

Hardware

Hardware

Hardware

Interconnect

Interconnect

Interconnect

Protocol

Protocol

Protocol

OS

OS

OS

Middleware

Middleware

Middleware

Application

Application

Application

Workstation

Workstation

Server

Server

4U +

4U +

Proprietary

Proprietary

MPI

MPI

Server

Server

1P/2P

1P/2P

Linux

Linux

PVM

PVM

Myrinet

Myrinet

VIA

VIA

Infiniband

Infiniband

Quadrics

Quadrics

(10)

Cluster Hardware

The Node

• A single element within the cluster

• Compute Node

• Just computes – little else

• Private IP address – no user access

• Master/Head/Front End Node

• User login

• Job scheduler

• Public IP address – connects to external network

• Management/Administrator Node

• Systems/cluster management functions

• Secure administrator address

• I/O Node

• Access to data

(11)

Interconnect

Typical

Bandwidth MB/s

Typical

Latency usec

Interconnect

1400-2500

2-4

InfiniBand*

2500

2.2-3

Myricom Myrinet*

800

12-20

10 Gb/s Ethernet

90

60-90

1Gbit/s Ethernet

80

75

100 Mbps Ethernet

(12)

Agenda

• Parallel Computing Concepts

• Clusters

(13)

Cluster Usage

Performance Measurements

Usage Model

Application Classification

Application Behaviour

(14)

The Mysterious FLOPS

1 GFlops = 1 billion floating point operations per second

Theoretical v Real GFlops

Xeon Processor

• Theoretical peak = 4 x Clock speed

• Xeons have 128 bit SSE registers which allows the processor to carry

out 2 double precision floating point add and 2 multiply operations

per clock cycle

• 2 computational cores per processor

• 2 processors per node (4 cores per node)

Sustained (Rmax) = ~35-80% of theoretical peak (interconnect

dependent)

(15)

Other measures of CPU performance

SPEC

Spec CPU2000/2006 Base – single core performance indicator

Spec CPU2000/2006 Rate – node performance indicator

SpecFP – Floating Point performance

SpecINT – Integer performance

Many other performance metrics may be required

STREAM - memory bandwidth

HPL – High Performance Linpack

NPB – suite of performance tests

Pallas Parallel Benchmark – another suite

IOZone – file system throughput

(16)

Technology Advancements in 5 Years

4781

24

4

2

3.0

June 2006

Woodcrest

14500

72

4

6

3.0

Nov 2009

Westmere

Linpack on

256

Processors

Peak GFLOPS

per CPU

Peak FLOP

per CPU cycle

Number

of cores

GHz

Release

date

Codename

* From November 2001 top500 supercomputer list (cluster of Dell Precision 530)

** Intel internal cluster built in 2006

(17)

Usage Model

Many Serial Jobs

(Capacity)

One Big Parallel Job

(Capability)

Load Balancing More Important

Job Scheduling very important

Interconnect More Important

Normal

Mixed Usage

Electronic Design

Monte Carlo

Design Optimisation

Parallel Search

Many Users

Mixed size Parallel/Serial jobs

Ability to Partition and Allocate

Jobs to Nodes for Best Performance

Batch Usage

Appliance Usage

Meteorology

Seismic Analysis

Fluid Dynamics

Molecular Chemistry

(18)

Application and Usage Model

HPC clusters run parallel applications, and applications in parallel!

One single application that takes advantage of multiple computing

platforms

• Fine-Grained Application

• Uses many systems to run one application

• Shares data heavily across systems

• PDVR3D (Eigenvalues and Eigenstates of a matrix)

• Coarse-Grained Application

• Uses many systems to run one application

• Infrequent data sharing among systems

• Casino (Monte-Carlo stochastic methods)

• Pleasurably Parallel/HTC Application

• An instance of the entire application runs on each node

• Little or no data sharing among compute nodes

(19)

Types of Applications

Forward Modelling

Inversion

Signal Processing

(20)

Forward Modelling

Solving linear equations

Grid Based

Parallelization by domain decomposition (split and distribute the data)

Finite element/finite difference

(21)

From measurements (F) compute models (M) representing properties (d)

of the measured object(s).

Deterministic

• Matrix inversions

• Conjugate gradient

Stochastic

• Monte Carlo, Markov chain

• Genetic algorithms

Generally large amounts of shared memory

Parallelism through multiple runs with different models

(22)

Signal Processing/Quantum Mechanics

Convolution model (stencil)

Matrix computations (eigenvalues…)

Conjugate gradient methods (matrix methods)

Normally not very demanding on latency and bandwidth

Some algorithms are embarrassingly parallel

Examples: seismic migration/processing, medical imaging,

SETI@Home

(23)

Searching/Comparing

Integer operations are more dominant than floating point

IO intensive

Pattern matching

Embarrassingly parallel – very suitable for grid computing

Examples: encryption/decryption, message interception,

bio-informatics, data mining

(24)

Application Classes

Applications

• FEA – Finite Element Analysis

• The simulation of hard physical materials, e.g. metal, plastic

Crash test, product design, suitability for purpose

• Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash,

Radioss

• CFD – Computational Fluid Dynamics

• The simulation of soft physical materials, gases and fluids

Engine design, airflow, oil reservoir modelling

• Examples: Fluent, Star-CD, CFX

• Geophysical Sciences

• Seismic Imaging – taking echo traces and building a picture of the

sub-earth geology

• Reservoir Simulation – CFD specific to oil asset management

(25)

Application Classes

Applications

• Life Sciences

• Understanding the living world – genome matching, protein folding,

drug design, bio-informatics, organic chemistry

• Examples: BLAST, Gaussian, other

• High Energy Physics

• Understanding the atomic and sub-atomic world

• Software from Fermi-Lab or CERN, or home-grown

• Financial Modelling

• Meeting internal and external financial targets particularly regarding

investment positions

• VaR – Value at Risk – assessing the impact of economic and political

factors on the bank’s investment portfolio

• Trader Risk Analysis – what is the risk on a trader’s position, a group

of traders

References

Related documents

This paper has proposed a complex stock trading strate- gy, namely weight reward strategy (WRS), generated from different combinations of moving average and trading range break-out

The interviews showed that the growth path of firms depends on two different factors: one is changing legislation and administration, the other is the need to change the

The fact that blocking of ISS-N1 by low concentrations of Anti-N1 restored exon 7 inclusion in mRNA derived from endogenous SMN2 demonstrated the feasibility of an intron-

In addition to these public, private and community-based health insurance providers/funds, the health financing landscape also comprises direct state funding and donor

static local variables do not get erased when function/ block terminates. the next time the function is called, a static variable still has the

Ranking Type of Rating Organization Primary Audience Companies Considered Key Measures Dow Jones Sustainability Index (Dow Jones Sustainability Index 2012) Futures exchange