• No results found

PARIS*: Programming parallel and distributed systems for large scale numerical simulation applications. Christine Morin IRISA/INRIA

N/A
N/A
Protected

Academic year: 2021

Share "PARIS*: Programming parallel and distributed systems for large scale numerical simulation applications. Christine Morin IRISA/INRIA"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

PARIS*: Programming pArallel and distRibuted

systems for large scale numerical sImulation

applicationS

Kerrighed, Vigne

Christine Morin

IRISA/INRIA

(2)

Members of PARIS Project (sept 05)

 Scientific leader

 T. Priol (DR INRIA)

 Researchers

 F. André (Prof IFSIC)  G. Antoniu (CR INRIA)  J-P. Banâtre (Prof IFSIC)  M. Bertier (MdC INSA)  L. Bougé (Prof ENS)  Y. Jégou (CR INRIA)

 A-M. Kermarrec (DR INRIA)  C. Morin (DR INRIA)  J.L. Pazat (MdC INSA)  C. Perez (CR INRIA)  Post-docs  A. Viana  A. Ribes  G. Vallée  Engineers

 D. Margery (IR INRIA)  P. Morillon (IE IFSIC)

 Engineers  P. Gallard (DGA)  V. Lefèvre (G5K - IFSIC)  R. Lottiaux (DGA)  G. Mornet (G5K - PRIR)  P. Palosaari (CoreGRID)  J. Parpaillon (Ing. Associé)  PhD candidates  H-L. Bouziane (INRIA 2)  J. Buisson (MENRT 3)  Y. Busnel (ENS 1)  L. Cudennec (INRIA-Région 1)  M. Fertré (MENRT 1)  M. Jan (MENRT 3)  E. Jeanvoine (CIFRE 2)  S. Lacour (INRIA 3)  E. Le Merrer (CIFRE FT)  S. Monnet (INRIA-Région 3)  Y. Radenac (MENRT 3)  E. Riviere (MENRT 2)  L. Rilling (ENS 3)

(3)

Studied Systems

Clusters

 A set of interconnected PC used as a single computing

resource

Grid

 A set of resources (processor, memory, disk, …)

interconnected via Internet

P2P systems

(4)

Research Directions

 Single system image operating system

Problem: clusters are difficult to program/use

Challenge: to give the illusion that a cluster is a single machine  Component based middleware

Problème: code coupling applications are complex

Challenge: How to facilitate the design of such applications while providing high

performance ?

 Advanced programming models

Problème: Current programming models are not adequate for highly dynamic systemsChallenge: How to express computing/coordination is such an environement?

 Data sharing service

Problème: data sharing in large scale gridsChallenge: sharing mutable data

 Systèmes P2P

Problème: Master and optimize a P2P system

Challenge: Characterize a P2P system and searching relevant information  Experimental grid platform

Problème: Need to experiment to validate our research resultsChallenge: building a reconfigurable grid

(5)

Grid 5000 Experimental Platform

 Contribution to the construction of

Grid 5000

 9 sites, 5000 processors

 Rennes site

 500 processors (powerPC, Xeon, Opteron)

 Dual processor nodes

 Participants

 Researcher: Y. Jégou

 Engineers: V. Lefèvre, D. Margery, P. Morrillon, G. Mornet 1000 500 500 500 500 500 500 500 500 Grid-5000 Rennes

(6)

International Collaborations (funded)

Cluster OS

 University of Ulm (Germany), ORNL (USA), Rutgers

University (USA)

Grids

 Pisa University (Italy), SNU (Korea)

Large scale data management

 UIUC (USA)

P2P systems

(7)

OS for Clusters and Grids

Kerrighed

 Single System Image (SSI) operating system for high

performance computing on clusters

Vigne

 Operating system to ease the use and programming of

(8)

Kerrighed

Objectives

 Virtual shared memory multiprocessor

 Global and transparent resource management  Tolerating node failures transparently for the

applications

 High performance

Approach

 Design of distributed OS mechanisms within an

(9)

Kerrighed

 Achievements

 Customizable efficient full SSI operating system for high performance

computing on clusters

 Small clusters (up to 256 nodes)

 Advanced research prototype

 Integration of the work of 3 Ph.D. students (R. Lottiaux (2001), Geoffroy

Vallée (2004), P. Gallard (2004))

 Robust prototype able to execute real applications provided by EDF R&D and

DGA

 Open source software

 www.kerrighed.org

 Stable version (K V1.0.2) based on Linux 2.4.29  Demo LiveCD based on Knoppix

[email protected]

 integrated in OSCAR ssi-oscar.irisa.fr

OSCAR is a snapshot of methods for building, programming, and using

clusters. It consists of a fully integrated and easy to install software bundle designed for high performance cluster computing.

(10)

Efficient Operating System

 Comparison with other SSI for clusters

 OpenSSI, openMosix

 Results published in CC-GRID 2005

 Internship of Benoît Boissinot

 Efficient communication system

 Highly reactive communication system to support Kerrighed

distributed operating system services

 Compatibility of Kerrighed with efficient communication drivers

(11)

Collaborations

 EDF R& D (since 2000)  PhD and post-doc grants  DGA (2003-2005)

 Funding for research engineers  ORNL (S. Scott)

 Integration of Kerrighed in

OSCAR

 University of Ulm (M. Schöttner)  Fault tolerant SDSM

 University of Rutgers (L. Iftode)  High availability

 Invited researchers

 R. Badrinath (IIT Kharagpur)  Isaac Scherson (UCI)

(12)

Current Research Directions

 Fault tolerance

 Large scale parallel application

checkpointing

 System initiated checkpoints  Checkpointing grid

applications

 Master & PhD Thesis of

Matthieu Fertré

 High availability

 Current work of Pascal Gallard

and Renaud Lottiaux

 Tolerating hot node addition

and eviction  Phenix

 Investigating the backdoor

approach in the context of Kerrighed

 Master Thesis of Benoît

Boissinot

Node 1 Node 2 Node 3

SSI cluster OS Application

(13)

Technology Transfer

KerLabs (

http://www.kerlabs.com

)

 Start-up funded by Pascal Gallard and Renaud Lottiaux  Software suite based on Kerrighed technologies

 EasyAdmin: Global cluster management

 EasyCheckpoint: Checkpoint/restart of parallel applications  EasyRun: Application deployment & scheduling on clusters  EasyCluster: the whole Kerrighed SSI solution

 Optimized support for high performance networking technologies

(14)

Vigne: a Grid OS

Design and implementation of a Grid OS to ease

the use and programming of very large grids

 Highly decentralized system

 Algorithms based on local knowledge

 Self-healing system

 Dealing with multiple quasi-simultaneous reconfigurations

 Single System Image  Flexible system

(15)

Vigne

 Infrastructure based on decentralized overlays

 Structured and unstructured overlays

 Application manager for reliable application execution  Resource discovery & allocation service

 On-going work (PhD Thesis of Emmanuel Jeanvoine)

 Volatile data sharing service

 PhD Thesis of Louis Rilling

 Complex application deployment

 PhD Thesis of Boris Daix (co-advised with Christian Pérez)

(16)

Collaborations

EDF R&D

(17)

Future Work

XtreemOS Project

 Integrated Project (FP6 - Call 5)

 Under evaluation

Goal

 Building and Promoting a Linux-based Operating

System to Support Virtual Organisations for Next Generation Grids

 18 partners

 Academic & industrial partners  8 countries (including China)

(18)

XtreemOS Main Objectives

We will design, implement, evaluate and

distribute an open source Grid OS with native support for virtual organizations.

 Development of a Grid Operating System

Enhance Linux to support VO

across multiple administrative domains

Manage very large and

Self-organizing and

self-healing system

 Available on PC, SMP, clusters,

PDA and mobile phones

 Experimentation and evaluation with a comprehensive set of real

use-cases provided by ISVs and end-users

Integration in notorious Linux

distributions

 Mandriva, Red Flag Linux

XtreemOS software: 3 flavours  Standard flavour for PC

 Federation flavour based on Kerrighed  SD flavour for small devices

 XtreemOS software will make the VO

management easy for administrators and work, within VOs, easy, secure and

efficient.

Building a reference open source Grid

OS Computer Computer Linux Linux Computer Linux Computer Linux XtreemOS Application Middleware

(19)

Talks

Kerrighed

 Pascal Gallard  Matthieu Fertré 

Vigne

 Emmanuel Jeanvoine 

JuxMem

 Sébastien Monnet

References

Related documents

To preview the results, we find that estimated MTC coefficients have a systematic seasonal pattern in models for the probability of temporary layoffs, with larger (more

According to last week’s predication, the value FB’s stock would not increase; the price of MSFT’s stock would be tending to decrease; the value of GM’s stock is

Untuk ukuran buffer yang terlampau kecil dapat mengakibatkan buffer overflow sehingga sel akan sering hilang, tetapi juga perlu diingat jika buffer terlampau besar

A second referendum on the proposed reor- ganization may be held in the territory of the af- fected school districts (the entire territory of all of the school

The ASA states that the non-anesthesiologist sedation practi- tioner who supervises or personally administers medications for moderate sedation should have completed a formal training

JUN 12, 2014 The GA passes state budget, excluding Medicaid expansion and including sufficient barriers to Medicaid expansion. McAuliffe vetoes portion of the state budget that

must submit an offer to the system operator for each trading period in the schedule period, under which the genera- tor is prepared to sell electricity to the clearing manager,

„ Allows off-site units, land, cash, or existing units..