• No results found

Technical Computing Suite Job Management Software

N/A
N/A
Protected

Academic year: 2021

Share "Technical Computing Suite Job Management Software"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Technical Computing Suite

Job Management Software

PRIMERGY x86 cluster Supercomputer PRIMEHPC FX10

Toshiaki Mikamo

Fujitsu Limited

(2)

Copyright 2012 FUJITSU LIMITED

Outline

System Configuration and Software Stack

Features

The major functions of job scheduler

Efficient Resource Usage

Fair Share Scheduling

System-optimal Resource Assignment

Summary and Future

(3)

Hybrid System Configuration

File management nodes Job management nodes Control nodes Login nodes User Management nodes Administrator

Global file system

(Data storage area)

Local file system

(Temporary area occupied by jobs) 6D mesh/torus Interconnect (Tofu)

IO network (IB), management network (GbE)

• Login

• Compilation • Job submission

• System operations management • Job operations management

Supercomputer PRIMEHPC FX10

Fat-Tree Interconnect (Infiniband)

(4)

Technical Computing Suite

System Software Stack

User/ISV Applications

High-performance file system

 Lustre-based distributed file system  High scalability

 IO bandwidth guarantee  High reliability & availability

HPC Portal / System Management Portal

Supercomputer PRIMEHPC FX10

System operations management

 System configuration management  System control

 System monitoring

 System installation & operation

Job operations management

 Job manager  Job scheduler

 Resource management

 Parallel execution environment

VISIMPACTTM

 Shared L2 cache on a chip  Hardware intra-processor synchronization Compilers Support Tools MPI Library  Scalability of High-Func.  Barrier Comm.  IDE

 Profiler & Tuning tools  Interactive debugger

 Hybrid parallel programming  Sector cache support

 SIMD / Register file extensions

Linux-based enhanced Operating System

PRIMERGY x86 cluster Red Hat Enterprise Linux

(5)

Features

Same job operations

in FX10 and PRIMERGY

Efficient, fair and system-optimal

job scheduling

 See slide below for details

Resource / Access control

 Elapsed time limit / CPU time limit / Physical memory limit

 Enable / Disable execute permission of job operation commands

Reduce OS jitter / Power saving control

Job statistical information

 The amount of CPU time / Memory / IO

(6)

Job Scheduler

Renew our job scheduler for large-scale system

Our job scheduler features:

Multi-process

enable to coexist multiple scheduler in a cluster.

Multi-thread

(7)

Efficient Resource Usage

Backfill scheduling

for keeping the resources busy

Our scheduler manages space(compute nodes) and time.

It will backfill the low priority jobs so as not to prevent high priority jobs.

Time Job A Job B Job D Running job Job D Job A Job B Job C Running job Job C Now t1 t2 t3 Not backfilled Backfilled

Job D Job C

(8)

Fair Share Scheduling

Fairly share resources between users/groups based on past usage.

① Fair share value is issued in advance for each user/group.

② The value is changed by the result of resource usage.

③ The job execution priority is determined dynamically according to the value. Fair share value is like money.

time Payment Return of overpaid Deposit Fai r sh are va lu e (m o ney )

Payment[P] = (#Node allocated) x (Elapsed time limit of job) Deposit[D] = (Elapsed time) x (Recovery rate)

Return of overpaid[R]

(9)

Optimal Job Scheduling for FX10

Interconnect topology-aware resource assignment

One interconnect unit : 12 nodes (2 x 3 x 2)

Job assignment rule: rectangular solid shape

 Guaranteeing neighbor communication

 Avoiding interfering with other jobs

Rotates rectangular solid of interconnect unit to reduce fragmentation

In-use unoccupied

x

z y

8 4 6 8 6 4 6 8 4 4 8 6

(10)

Asynchronous file staging

Login

nodes Global file system

(Data storage area)

IO network (IB), management network (GbE)

PRIMEHPC FX10 Time Stage IN Stage IN Job A Job B

Running job Job C

Now t1 t2 t3 Compute nodes IO nodes Stage OUT Stage IN Stage IN

Asynchronously transfer files from Global to Local FS before the job starts.

Co-scheduling of

computation and file transfer.

Interconnect

IO nodes Compute nodes

Local file system

Stage IN/OUT

Job A

Stage OUT

Stage OUT

Asynchronously transfer files from Local to Global FS after the job ends.

Async. Async.

Optimal Job Scheduling for FX10

(11)

Fine-grained node assignment

 Node selection method : balancing / concentration

 Rank placement policy : pack / unpack

 Priority control of allocated nodes

 Execution mode : node is occupied or not by a job.

Strict core assignment

 Processes are bound to cores in the job territory.

 No process can move to cores in other job territory.

Node concentration

Job A

Job B

Job C

Node#0 Node#1 Node#2

Job D Job B Job A Rank unpack R0 R1 R0 R1 Rank pack Node#1 Node#2 Node#0 Node 0 1 2 3 4 5 6 7 Job A Job B P core

(12)

Summary and Future

We developed the job management software.

Unified operability on PRIMEHPC FX10 and PRIMERGY

New job scheduler : Efficiency, Fairness and System-optimization

Practical resource control and job statistical information

Future Work

Operation simulator

Administrator will be able to simulate the operation situation

subsequent to operation parameter changes.

(13)

References

Related documents

genetic resources, phylogenetic analysis, chloroplast genome, gene duplication, genome sequencing, amylose content, bulk segregant analysis, starch biosynthesis genes, genetic

In order to effectively teach these strategies the educator must participate in “Guided Reading,” allowing the focus of teaching before, during, and after reading activities to

The Africa Capacity Report 2014 is therefore meant to serve as a guide to African governments, development partners, regional economic communities (RECs) and continental

This unit is designed to introduce students to new directions in the study of diplomacy, security and intelligence, to help develop a fundamental knowledge of strategic studies

These studies were: INdacaterol: Value in COPD: Longer term Validation of Ef ficacy and safety (INVOLVE, Clinicaltrials.gov registration number NCT00393458), a double- blind

The overall goal of the research was to investigate the role of leadership in municipalities in the implementation of service delivery initiatives, with a

Several promising methods have been proposed to accommodate the varying amount of smoothness across the imaging space by using function-on-scalar regression in the functional

„ When a court intends to commit an offender with a serious and persistent mental illness, as defined in section 245.462, subdivision 20, paragraph (c), to the custody of the