• No results found

Using & Managing HPC Systems. Presenter: William Lu, Ph.D., Marketing Director Date: October 7, 2008

N/A
N/A
Protected

Academic year: 2021

Share "Using & Managing HPC Systems. Presenter: William Lu, Ph.D., Marketing Director Date: October 7, 2008"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Using & Managing HPC Systems

Presenter: William Lu, Ph.D., Marketing Director

Date: October 7, 2008

(2)

Challenges on HPC systems

Using and managing HPC systems, organizations

are facing some challenges:

• Performance relies on CPU, memory, storage,

interconnects,…

Complex hardware

• Commodity hardware has good performance/cost,

but…

Reliability

• Why my jobs are waiting?

(3)
(4)

What is HPC Management Software?

HPC Management Software

Hardware and OS

Applications

… that enables users to develop, run

and manage compute or data intensive

applications and manage HPC systems.

The software

between the

application and

the operating

system…

(5)
(6)

Platform Computing - Leader in HPC

2,000

Customers worldwide

Years of profitable growth

Employees in 15 offices

16

500

1

Leader in HPC

(7)

Industries Served by Platform

BNPCitigroupFortisHSBCKBC FinancialJPMCLehman BrothersLBBWMass MutualMUFGNomuraPrudentialSal. OppenheimSociété GénéraleAirbusBAE SystemsBoeingBombardier

Deere & Company

EricssonHondaGeneral ElectricGeneral MotorsGoodrichLockheed MartinNissanNorthrop Grumman

Pratt & Whitney

ToyotaVolkswagenAbott LabsAstraZenecaCeleraDuPontEli LillyJohnson & JohnsonMerckNational Institutes of HealthNovartisPartners Health NetworkPharsightPfizerSanger InstituteCERNDoD, USDoE, USENEAGeorgia TechHarvard Medical SchoolJapan Atomic Energy Inst.MaxPlanck Inst.MITSSC, ChinaStanford MedicalTACCU. Of GeorgiaU. TokyoWashington U.

Financial

Services

Industrial

Mfg.

Electronics

AgipBPBritish GasChina PetroleumConocoPhillipsEMGSGaz de FranceHessKuwait OilPetroBrasPetro CanadaPetroChinaShellStatoilHydroTotalWoodside

Other Industries

AMDARMBroadcomCadenceCiscoInfineonMediaTekMotorolaNVidiaQualcommSamsungSonyST MicroSynopsysTIToshiba GE Bell Canada IRI AT&T Cingular

Telecom Italia Telefonica

DreamWorks Animation SKG

Walt Disney Co.

Life

Sciences

Gov,

Research

& Edu

(8)

Solutions with Partners

Platform OCS 5 and Platform Manager integrated in Dell cluster systems

Platform LSF, Platform Manager form key parts of Unified Cluster Portfolio

Platform enterprise solutions support a wide range of IBM HPC systems

Integrates Platform LSF and Platform Symphony in grid solutions

Platform OCS 5 powers the Red Hat

®

HPC Solution

OEMs Platform’s core technology in SAS

®

applications

(9)

Platform HPC Management Solutions

Platform MPI

Platform LSF

Platform LSF MultiCluster

EnginFrame

Platform EGO

Platform Manager

Platform RTM

Platform LSF License Scheduler

Platform Analytics

Platform Process Manager

Platform VM Orchestrator

Platform LSF Session Scheduler

Platform Open Cluster Stack 5: Accelerate & manage single Linux cluster

(10)

Accelerate Parallel Application – Platform MPI

• Platform MPI contains

forward looking

MPI

features:

Parallel Checkpoint/Restart

Dynamic affinity (load monitoring)

Windows CCS support (OS agnostic clusters)

Dynamic process creation

Network Failover

Replace run time (.so file) without

recompile/relink

(11)

Platform MPI Benchmarks

FL5L2 Performance 0 100 200 300 400 500 600 700 4 8 16 32 # CPUs P e rf or m a nc e ( jobs /da y ) Scali MPI 4.3.1 MPICH 1.2.4 103 % 100% 103% 100% 102% 100% 124% 100% 136% 100% 0.0 50.0 100.0 150.0 200.0 250.0 300.0 Jobs/24h Number of nodes

Scali MPI Connect 5.5 29.7 57.9 107.3 192.4 257.1 HP MPI 28.8 56.3 105.0 155.1 189.1

1 2 4 8 16

Commercial MPI

Platform MPI

out-performs by ~30%

(12)

Powerful workload scheduler – Platform LSF

• Robust, fault tolerance, and high performance

• Powering mission critical environment

• Special price for education

By scheduling workloads intelligently according to policy, Platform LSF reduces

application run-time and optimizes resource use.

(13)
(14)

Centralized monitoring/reporting – Platform RTM

Tool for LSF Administrators and Operators

Near Real-Time Monitoring for LSF and extendable to other physical devices and

application software

Drill-down to Individual job statistics

It allows you to graphically see your clusters job history without the use of b-commands

Based on Open Source Cacti

Platform RTM enables system administrators to make timely decisions for proactively

managing compute resources against business demand

(15)
(16)

Single Cluster All-in-One Solution

• Simplify HPC cluster

10/7/2008 16

Develop

Schedule & Run

Manage

Intel Software Tools Kit MPI & HPX Kit

OFED tools Kit Intel Cluster Ready Kit

Platform Lava

PVFS2 Core Kusu Management Nagios, Cacti NTOP Tools

HPC Management Software –

for Single Linux Cluster

Platform Open Cluster Stack (OCS)

(17)

Creating a OCS 5 Installer Node

Creating an Installer Node

Boot from CD Choose Language Configure Networks DNS, & Gateway Root Password Partitioning & LVM Adding Kits Install Summary

Installing Packages Installer Node Boot

Installation complete. Installer node is ready to use.

(18)

Platform OCS 5 Features

Repository Management: Multiple repositories Repository Snapshots Full Dependency Checking Package management Cluster Update:

Update from RHN

Update from Yum repository Update individual rpms Configuration Management:

Node Templates/Classes Cluster File Synronization Driver Management

Cluster Network Configuration Cluster Administration

Cacti Configured to monitor

Memory on a node CPU Load on a node Logged In users Processes on a node

logwatch configured to monitor activity on the cluster

Pdsh for administering nodes on the cluster Operating Systems: RHEL 4, 5* Fedora 6 Fedora 7 CentOS 5

Message Passing Libraries:

MPICH 1,2 MVAPICH 1 OpenMPI Intel MPI* Workload Management: Platform Lava Platform LSF* Resource Management: Platform EGO* Application Portals: NICE EngineFrame* Certification Tools:

Intel Cluster Checker+

Applications & Benchmarks: Atlas blacs bonnie++ fftw hdf5 iozone iperf linpack modules netcdf scalapack stream Networking Hardware: Gigabit Ethernet Cisco Infiniband Qlogic Infiniband Mellanox Infiniband

Cluster Provisioning Methods: Package

Diskless Imaged

Nagios configured to manage & Notify

Nodes in the cluster Web server

Load

logged in users DNS Server status MySQL Server status NTP daemon status Node 'PING' status Root Partition disk SSH Daemon SWAP Space

Total Processes on node Email Notifications to Administrator

Plone CMS containing: Manual pages for OCS Documentation for Platform OCS 5

List of installed Kits on the cluster

*Optional Commercial products that must be purchased separately +Available only from Intel Cluster Ready Partners

(19)

Platform Technology Roadmap

• Workload based dynamic power management

Turn on/off machine

Avoid hot spot

Slow down CPU speed when MPI task is waiting.

• Workload driven dynamic provisioning

Switch machines between Windows and Linux

automatically based on workload

• “Cloud” Toolkit

User web portal

Dynamic provisioning

Workload scheduling

(20)

World-class Support & Services

“Platform’s standard of

support has been excellent.”

“Platform has been

proactive, involved and

very, very friendly in

providing support.”

Henry Neeman

Director, Oklahoma University Supercomputing Centre

Tim Cutts

Platform LSF Administrator Sanger Institute

24x7 Support

across the globe

(21)

Summary

• Platform’s Value proposition

Total solution for HPC management

16 years of HPC experiences and broader

customer base

(22)

www.platform.com

[email protected]

References

Related documents

The aim of this paper is to evaluate the psychometric properties and criterion validity of the HILDA SF-36 data. We have used the manuscript by Sanson-Fisher and Per- kins [4] as

After sclera detection is performed, the image is subjected to Circular Hough Transform for the detection of iris and pupil boundaries.. The mechanism and methodology used in

If the cost and energy input of wafer-Si cells can be significantly reduced, Si may remain as the dominant solar cell material for the foreseeable future.. One approach

Available April 2021 Spacious large Bayou first floor unit fully furnished with Call or text 440-477-7333.. updated tile throughout living room and

• develop skills that underpin the requirements of modernising public sector organisations, including policy analysis, evaluation, use of evidence, problem solving, cut-back

LS14 will follow closely the structure that has evolved and proved very popular over the more than 40 year history of the symposium series. The conference will begin

A transvaginal ultrasound revealed a left ovarian tumor composed of 4-cm multilocular cysts and a 2-cm solid component.. The cysts showed variable signal intensity on

Among the main observations was the functional connectivity between the temporal pole and the inferior frontal gyrus, compatible with a ventral processing, as well as the links