Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

(1)

Introduction to

High Performance Cluster Computing

Cluster Training for UCL

Part 1

(2)

What is HPC

 HPC = High Performance Computing

Includes Supercomputing

 HPCC = High Performance Cluster Computing

Note: these are NOT High Availability clusters

 HPTC = High Performance Technical Computing

(3)

Agenda

• Parallel Computing Concepts

• Clusters

(4)

Concurrency and Parallel Computing

A central concept in computer science is concurrency:

• Concurrency: Computing in which multiple tasks are active at the

same time.

There are many ways to use Concurrency:

• Concurrency is key to all modern Operating Systems as a way to

hide latencies.

• Concurrency can be used together with redundancy to provide

high availability.

• Parallel Computing uses concurrency to decrease program

runtimes.

(5)

Hardware for Parallel Computing

Parallel computers are classified in terms of streams of data and

streams of instructions:

• MIMD Computers: Multiple streams of instructions acting on

multiple streams of data.

• SIMD Computers: A single stream of instructions acting on

multiple streams of data.

Parallel Hardware comes in many forms:

• On chip: Instruction level parallelism (e.g. IPF)

• Multicore: Multiple execution cores inside a single CPU

• Multiprocessor: Multiple processors inside a single computer.

• Multicomputer: networks of computers working together.

(6)

Hardware for Parallel Computing

Symmetric

Multiprocessor

(SMP)

Non-uniform

Memory

Architecture

(NUMA)

Massively

Parallel

Processor

(MPP)

Cluster

Single Instruction

Multiple Data (SIMD)*

Multiple Instruction

Multiple Data (MIMD)

Parallel Computers

Shared Address Space

Disjoint Address Space

Distributed

Computing

(7)

What is an HPC Cluster

A cluster is a type of parallel or distributed processing system,

which consists of a collection of interconnected stand-alone

computers cooperatively working together as a single,

integrated computing resource.

A typical cluster uses:

• Commodity off the shelf (COTS) parts

• Low latency communication protocols between the disjoint

address spaces (memory)

(8)

What is HPCC?

Cluster

Management

Tools

Master Node

File Server / Gateway

(9)

Cluster Architecture View

desktop

shmem

Other

OSes

Parallel Benchmarks:

Perf

, Ring, HINT, NAS,

…

Ethernet

TCP/IP

Real Applications

Hardware

Interconnect

Protocol

OS

Middleware

Application

Workstation

_Server

4U +

Proprietary

MPI

Server

1P/2P

Linux

PVM

Myrinet

VIA

Infiniband

Quadrics

(10)

Cluster Hardware

The Node

• A single element within the cluster

• Compute Node

• Just computes – little else

• Private IP address – no user access

• Master/Head/Front End Node

• User login

• Job scheduler

• Public IP address – connects to external network

• Management/Administrator Node

• Systems/cluster management functions

• Secure administrator address

• I/O Node

• Access to data

(11)

Interconnect

Typical

Bandwidth MB/s

Typical

Latency usec

Interconnect

1400-2500

2-4

InfiniBand*

2500

2.2-3

Myricom Myrinet*

800 12-20

10 Gb/s Ethernet

90 60-90

1Gbit/s Ethernet

80

75 100 Mbps Ethernet

(12)

Agenda

• Parallel Computing Concepts

• Clusters

(13)

Cluster Usage

Performance Measurements

Usage Model

Application Classification

Application Behaviour

(14)

The Mysterious FLOPS

1 GFlops = 1 billion floating point operations per second

Theoretical v Real GFlops

Xeon Processor

• Theoretical peak = 4 x Clock speed

• Xeons have 128 bit SSE registers which allows the processor to carry

out 2 double precision floating point add and 2 multiply operations

per clock cycle

• 2 computational cores per processor

• 2 processors per node (4 cores per node)

Sustained (Rmax) = ~35-80% of theoretical peak (interconnect

dependent)

(15)

Other measures of CPU performance



SPEC

–

Spec CPU2000/2006 Base – single core performance indicator

–

Spec CPU2000/2006 Rate – node performance indicator

–

SpecFP – Floating Point performance

–

SpecINT – Integer performance



Many other performance metrics may be required

–

STREAM - memory bandwidth

–

HPL – High Performance Linpack

–

NPB – suite of performance tests

–

Pallas Parallel Benchmark – another suite

–

IOZone – file system throughput

(16)

Technology Advancements in 5 Years

4781

24

4

2

3.0 June 2006

Woodcrest

14500

72

4

6

3.0 Nov 2009

Westmere

Linpack on

256 Processors

Peak GFLOPS

per CPU

Peak FLOP

per CPU cycle

Number

of cores

GHz

Release

date

Codename



* From November 2001 top500 supercomputer list (cluster of Dell Precision 530)



** Intel internal cluster built in 2006

(17)

Usage Model

Many Serial Jobs

(Capacity)

One Big Parallel Job

_(Capability)

Load Balancing More Important

Job Scheduling very important

Interconnect More Important

Normal

Mixed Usage

Electronic Design

Monte Carlo

Design Optimisation

Parallel Search

Many Users

Mixed size Parallel/Serial jobs

Ability to Partition and Allocate

Jobs to Nodes for Best Performance

Batch Usage

_{Appliance Usage}

Meteorology

Seismic Analysis

Fluid Dynamics

Molecular Chemistry

(18)

Application and Usage Model

HPC clusters run parallel applications, and applications in parallel!

One single application that takes advantage of multiple computing

platforms

• Fine-Grained Application

• Uses many systems to run one application

• Shares data heavily across systems

• PDVR3D (Eigenvalues and Eigenstates of a matrix)

• Coarse-Grained Application

• Uses many systems to run one application

• Infrequent data sharing among systems

• Casino (Monte-Carlo stochastic methods)

• Pleasurably Parallel/HTC Application

• An instance of the entire application runs on each node

• Little or no data sharing among compute nodes

(19)

Types of Applications

Forward Modelling

Inversion

Signal Processing

(20)

Forward Modelling

Solving linear equations

Grid Based

Parallelization by domain decomposition (split and distribute the data)

Finite element/finite difference

(21)

From measurements (F) compute models (M) representing properties (d)

of the measured object(s).

Deterministic

• Matrix inversions

• Conjugate gradient

Stochastic

• Monte Carlo, Markov chain

• Genetic algorithms

Generally large amounts of shared memory

Parallelism through multiple runs with different models

(22)

Signal Processing/Quantum Mechanics

Convolution model (stencil)

Matrix computations (eigenvalues…)

Conjugate gradient methods (matrix methods)

Normally not very demanding on latency and bandwidth

Some algorithms are embarrassingly parallel

Examples: seismic migration/processing, medical imaging,

SETI@Home

(23)

Searching/Comparing

Integer operations are more dominant than floating point

IO intensive

Pattern matching

Embarrassingly parallel – very suitable for grid computing

Examples: encryption/decryption, message interception,

bio-informatics, data mining

(24)

Application Classes

Applications

• FEA – Finite Element Analysis

• The simulation of hard physical materials, e.g. metal, plastic

Crash test, product design, suitability for purpose

• Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash,

Radioss

• CFD – Computational Fluid Dynamics

• The simulation of soft physical materials, gases and fluids

Engine design, airflow, oil reservoir modelling

• Examples: Fluent, Star-CD, CFX

• Geophysical Sciences

• Seismic Imaging – taking echo traces and building a picture of the

sub-earth geology

• Reservoir Simulation – CFD specific to oil asset management

(25)

Application Classes

Applications

• Life Sciences

• Understanding the living world – genome matching, protein folding,

drug design, bio-informatics, organic chemistry

• Examples: BLAST, Gaussian, other

• High Energy Physics

• Understanding the atomic and sub-atomic world

• Software from Fermi-Lab or CERN, or home-grown

• Financial Modelling

• Meeting internal and external financial targets particularly regarding

investment positions

• VaR – Value at Risk – assessing the impact of economic and political

factors on the bank’s investment portfolio

• Trader Risk Analysis – what is the risk on a trader’s position, a group

of traders