• No results found

High Performance Computing. Course Notes HPC Fundamentals

N/A
N/A
Protected

Academic year: 2021

Share "High Performance Computing. Course Notes HPC Fundamentals"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

High Performance Computing High Performance Computing

Course Notes 2007

Course Notes 2007 - - 2008 2008

HPC Fundamentals

HPC Fundamentals

(2)

Introduction Introduction

What is High Performance Computing (HPC)?

‰

Difficult to define - it’s a moving target.

Later 1980s, a supercomputer performs 100m FLOPS

Today, a 2G Hz desktop/laptop performs a few giga FLOPS

Today, a supercomputer performs tens of Tera FLOPS (Top500)

High performance: O(1000) more powerful than the latest desktops

Most supercomputers are obsolete in terms of

performance before the end of their physical life.

(3)

Applications of HPC Applications of HPC

‰

HPC is Driven by demand of computation-intensive applications from various areas

Medical, Biology, neuroscience (e.g.

simulation of brains)

Finance (e.g. modelling the world economy)

Military and Defence (e.g.

modelling explosion of nuclear weapons)

Engineering (e.g. simulations of a car crash or a new

airplane design)

(4)

An Example of Demands in Computing An Example of Demands in Computing

Capability Capability

Project: Blue Brain

aim: construct a simulated brain

Building blocks of a brain are neurocortical columns A column consists of about 60,000 neurons

Human brain contains millions of such columns

First stage: simulate a single column (each processor acting as one or two neurons)

Then: simulate a small network of columns

Ultimate goal: simulate the whole human brain

IBM contributes Blue Gene supercomputer

(5)

Related Technologies Related Technologies

HPC covers a wide range of technologies:

‰

computer architecture

CPU, memory, VLSI

‰

Compilers

Identify inefficient implementations

Make use of the characteristics of the computer architecture

Choose suitable compiler for a certain architecture

‰

Algorithms (for parallel and distributed systems)

How to program on parallel and distributed systems

‰

Middleware

From Grid computing technology

Application->middleware->operating system

Resource discovery and sharing

(6)

History of High Performance Computing History of High Performance Computing

‰1960s: Scalar processor

‰ Process one data item at a time

‰1970s: Vector processor

‰ Can process an array of data items at one go

‰ Architecture

‰ Overhead

‰ Difference between vector processor and scalar processor

‰Later 1980s: Massively Parallel Processing (MPP)

‰ Up to thousands of processors, each with its own memory and OS

‰ Break down a problem

‰ Difference between MPP and vector processor

‰Later 1990s: Cluster

‰ Not a new term itself, but renewed interests

‰ Connecting stand-alone computers with high-speed network

‰ Difference between cluster and MPP

‰Later 1990s: Grid

‰ Tackle collaboration among geographically distributed organisations

‰ Draw an analogue from Power grid

‰ Difference between Grid and cluster

(7)

Parallel computing vs. distributed Parallel computing vs. distributed

computing computing

‰ Parallel Computing

‰

Breaking the problem to be computed into parts that can be run simultaneously in different processors

‰

Example: an MPI program to perform matrix multiplication

‰

Solve tightly coupled problems

‰ Distributed Computing

‰

Parts of the work to be computed are computed in different places (Note: does not necessarily imply simultaneous processing)

‰

An example: C/S model

‰

Solve loosely-coupled problems (no much

communication)

(8)

Architecture Types Architecture Types

‰

SMP (Symmetric Multi-Processing)

‰Multiple CPUs, single memory, shared I/O

‰All resources in a SMP machine are equally available to each CPU

‰Does not scale well to a large number of processors (less than 8) - (Scalability is the measure of how well the system performance improves linearly to the number of processing elements)

‰

NUMA (Non-Uniform Memory Access)

‰Multiple CPUs

‰Each CPU has fast access to its local area of the memory, but slower access to other areas

‰Scale well to a large number of processors

‰Complicated memory access pattern and system bus

‰

MPP (Massively Parallel Processing)

‰

Cluster

(9)

Illustration for Architecture Types Illustration for Architecture Types

Shared memory (uniform memory access - SMP)

‰

Processors share access to a common memory space.

Implemented over a shared memory bus or communication network.

‰

Support for critical sections are required

‰

Local cache is critical:

If not, bus contention (or network traffic) reduces the systems efficiency.

For this reason, pure shared memory systems do not scale naturally. Cache introduces

problems of coherency (ensuring that stale cache lines are invalidated when other

processors alter shared memory).

Shared Memory

Interconnect

PE 0

PE n

(10)

Illustration for Architecture Types Illustration for Architecture Types

Shared memory (Non- uniform memory access:

NUMA)

‰

PE may be fetching from local or remote memory - hence non-

uniform access times.

NUMA

cc-NUMA (cache-coherent Non- Uniform Memory Access)

‰

Groups of processors are connected together by a fast interconnect (SMP)

‰

These are then connected together by a high-speed interconnect.

‰

Global address space.

PE (m-1)n+1

PE m.n Shared Memory

m Interconnect

PE 1

PE n Shared Memory

1

(11)

Illustration for Architecture Types Illustration for Architecture Types

Distributed Memory (MPP, cluster)

‰

Each processor has it’s own local memory.

‰

When processors need to exchange (or share data), they must do this through an explicit communication

Message passing (MPI language)

‰

Typically larger latencies between PEs (especially if they communicate via over- network interconnections).

‰

Scalability is good if the problems can be sufficiently contained within PEs.

Interconnect

M 0

M n PE

0

PE n

(12)

Goals of HPC Goals of HPC

Minimise the execution time given the certain number of applications (strong scaling)

Maximise the number of applications being completed, given a certain amount of time (weak scaling)

Identify compromise between performance and cost.

References

Related documents

Key words: γ- radiation, Cladocera, resting eggs, life cycle parameters, population performance

• Multiple processors connected to a single centralized memory – since all processors see the same memory organization  uniform memory access (UMA). • Shared-memory because

In the novels Dickens provided narrative observations regarding his characters’ articulation of some word sounds, thus facilitating the reading of the speech for

Objective This study aims: (1) to describe the pattern and extent of multimorbidity and polypharmacy in UK Biobank participants with chronic obstructive pulmonary disease (COPD)

This study investigated the hospital admission process in relation to two areas associated with known patient related risks, venous thromboembolism (VTE) risk

Relationship between baseline glycemic control and cognitive function in individuals with type 2 diabetes and other cardio- vascular risk factors: the action to control

adalah pada bagian tepi abdomen mempunyai paratergite membulat atau kompak dengan 2 setae posterior, gonopod tidak meruncing di bagian posterior dari abdomen,

Climate Change). 2015–2016 Guide to Eating Ontario Fish; Guide de consommation du poisson de l’Ontario. Toronto, Ontario, Canada:OMOECC. Developmental exposure to polychlorinated