• No results found

Experience of Data Transfer to the Tier-1 from a DIRAC Perspective

N/A
N/A
Protected

Academic year: 2021

Share "Experience of Data Transfer to the Tier-1 from a DIRAC Perspective"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Experience of Data Transfer to

the Tier-1 from a DIRAC

Perspective

Lydia Heck

Institute for Computational Cosmology

Manager of the DiRAC-2 Data Centric

(2)

Talk layout

● Introduction to DiRAC ?

● The DiRAC computing systems ● What is DiRAC

● What type of science is done on the DiRAC facility ? ● Why do we need to copy data to RAL?

● Copying data to RAL – network requirements

● Collaboration between DiRAC and RAL to produce the archive ● Setting up the archiving tools

● Archiving ● Open issues

(3)

Introduction to

DiRAC

DIRAC -- Distributed Research utilising Advanced

Computing established in 2009 with DiRAC-1

Support of research in theoretical astronomy,

particle physics and nuclear physics

Funded by STFC with infrastructure money

allocated from the

Department for Business,

Innovation and Skills

(BIS)

The running costs, such as staff costs and

(4)

Introduction to

DiRAC

, cont’d

● 2009 – DiRAC-1

– 8 installations across the UK of which COSMA-4 at the

ICC in Durham is one. Still a loose federation.

● 2011/2012 – DiRAC-2

– major funding of £15M for e-Infrastructure

– in bidding to host – 5 installations identified – judged by

peers

– for successful bidders scrutiny and interview by

representatives for BIS to see if we could deliver by a tight deadline

(5)

Introduction to

DiRAC

, cont’d

DiRAC has full management structure.

Computing time on the DiRAC facility is allocated

through a peer-reviewed procedure.

Current director: Dr Jeremy Yates, UCL

Current technical director:Prof Peter Boyle,

(6)

The DiRAC computing systems

Blue Gene Edinburgh Cosmos Cambridge Complexity Leicester Data Centric Durham Data Analytic Cambridge

(7)

The Bluegene @ DiRAC

Edinburgh – IBM Blue

Gene

– 98304 cores

– 1 Pbyte of GPFS storage – designed around

(8)

COSMA @ DiRAC (Data Centric)

Durham – Data Centric

system –IBM IDataplex

– 6720 Intel Sandy Bridge

cores – 53.8 TB of RAM – FDR10 infiniband 2:1 blocking – 2.5 Pbyte of GPFS storage (2.2 Pbyte used!)

(9)

Complexity @ DiRAC

Leicester Complexity –

HP system

• 4352 Intel Sandy Bridge

cores

• 30 Tbyte of RAM

• FDR 1:1 non-blocking • 0.8 Pbyte of Panasas

(10)

Cosmos @ DiRAC (SMP)

Cambridge COSMOS

SGI shared memory

system

– 1856 Intel Sandy Bridge

cores

– 31 Intel Xeon Phi

co-processors

– 14.8 Tbyte of RAM – 146 Tbyte of storage

(11)

HPCS @ DiRAC (Data Analytic)

Cambridge Data Analytic

– Dell

• 4800 Intel Sandy Bridge

cores

• 19.2 TByte of RAM

• FDR Infiniband 1:1

non-blocking

(12)

What is DiRAC

A national service run/managed/allocated by the

scientists who do the science funded by BIS and

STFC

The systems are built around and for the

applications with which the science is done.

We do not rival a facility like ARCHER, as we do

not aspire to run a general national service.

DiRAC is classed as a major research facility by

(13)

What is DiRAC, cont’d

● Long projects with significant amount of CPU hours

allocated for 3 years typically on a specific system – for 2012 – 2015 with examples:

– Cosmos - dp002 : ~20M cpu hours on Cambridge Cosmos – Virgo-dp004 : 63M cpu hours on Durham DC

– UK-MHD-dp010 : 40.5M cpu hours on Durham DC – UK-QCD-dp008 : ~700M cpu hours on Edinburgh BG

– Exeter – dp005: ~15M cpu hours on Leicester Complexity

(14)

What type of Science is done on DiRAC ?

For the highlights of science carried out on

the DiRAC facility please see:

http://www.dirac.ac.uk/science.html

Specific example: Large scale structure

calculations with the Eagle run

4096 cores

~8 GB RAM/core

47 days = 4,620,288 cpu hours

200 TB of data

(15)

Why do we need to copy data (to RAL) ?

● Original plan - each research project should make

provisions for storing the research data

– requires additional storage resource at researchers’ home institutions – Not enough provision – will require additional funds.

– data creation considerably above expectation ?

(16)

Why do we need to copy data (to RAL) ?

● Research data must now be shared with/available to interested parties ● Install DiRAC’s own archive – requires funds and currently there is no

budget.

● we needed to get started:

– Jeremy Yates negotiated access to the RAL archive system

● Acquire expertise

● Identify bottlenecks and technical challenges

– submitted 2,000,000 files and created an issue at the file servers

● How can we collaborate and make use of previous experience. ● AND: copy data!

(17)

Copying data to RAL – network

requirements

network bandwidth – situation for Durham

now:

● currently possible 300-400 Mbytes/sec

● required investment and collaboration from DU CIS ● upgrade to 6GBit/sec to JANET - Sep 2014

● will be 10 Gbit/sec by end of 2015 – infra structure

already installed

past:

(18)

Copying data to RAL – network

requirements

network bandwidth – situation for Durham

investment to by-pass of external campus firewall:

 two new routers (~£80k) – configured for throughput

with minimal ACL enough to safeguard site.

 deploying internal firewalls – part of new security

infrastructure, essential for such a venture

 Security now relies on front-end system of Durham

(19)

Copying data to RAL – network

requirements

Result for COSMA and GridPP in Durham

guaranteed 2-3 Gbit/sec with bursts of up to 3-4Gbit/sec (3 Gbit/sec outside of term time)

 pushed the network performance for Durham GridPP from

bottom 3 in the country to top 5 of the UK GridPP sites

 achieves up to 300 – 400 Mbyte/sec throughput to

(20)

Collaboration between DiRAC

and GridPP/RAL

Durham Institute for Computational Cosmology

(ICC) volunteered to be the prototype installation

Huge thanks to Jens Jensen and Brian Davies -

there were many emails exchanged, many

questions asked and many answers given.

Resulting document

“Setting up a system for data archiving using

FTS3” by Lydia Heck, Jens Jensen and Brian

Davies

(21)

Setting up the archiving tools

Identify appropriate hardware – could mean

extra expense:

need freedom to modify and experiment with -

cannot have HPC users logged in and working!

free to do very latest security updates

requires optimal connection to storage -

(22)

Setting up the archiving tools

Create an interface to access the

file/archving service at RAL using the

GridPP tools

gridftp – Globus Toolkit – also provides Globus

Connect

Trust anchors (egi-trustanchors)

voms tools (emi3-xxx)

(23)

Archiving?

long-lived voms proxy?

– myproxy-init; myproxy-logon; voms-proxy-init;

fts-transfer-delegation

● How to create a proxy and delegation that lasts

weeks even months? – still an issue

grid-proxy-init; fts-transfer-delegation

– grid-proxy-init –valid HH:MM

– fts-transfer-delegation –e time-in-seconds

(24)

Archiving

● Large files – optimal throughput limited by network

bandwidth

● Many small files – limited by latency; using ‘-r’ flag to

fts-transfer-submit to re-use connection

● Transferred:

– ~40 Tbytes since 20 August – ~2M files

– challenge to FTS service at RAL

(25)

Open issues

ownership and permissions are not

preserved

depends on single admin to carry out.

what happens when content in directories

change? – complete new archive sessions?

tries to archive all the files again but then

‘fails’ as file already exists – should be

more like rsync

(26)

Conclusions

● With the right network speed we can archive the DiRAC

data to RAL.

● The documentation has to be completed and shared with

the system managers on the other DiRAC sites

● Each DiRAC site will have their own dirac0X account ● Start with and keep on archiving

● Collaboration between DiRAC and GridPP/RAL DOES

work!

References

Related documents

This is because space itself is to function as the ‘form’ of the content of an outer intuition (a form of our sensi- bility), as something that ‘orders’ the ‘matter’

The chosen case study in this research is the AS proposed by two social actors, the main Spanish political parties and media groups, during the period of the Spanish general

Deep-band K fertilization has slightly increased no-till corn yield over the broadcast and planter-band methods only for the lowest K rate applied.. Iowa State University,

The Legislative committee invites you to be sure to attend the February program, “It’s Time to Support Working Women” to learn more about paid Family And Medical Leave Act

Offhand grinding is performed on utility grinding machines which generally have fixed spindle speeds and fixed wheel size requirements, so that the cutting speed of the wheel

Č Čítanie alebo zápis parametrov prostredníctvom súboru dát 47 ...79 D DFE32B opis svoriek ...18 pripojenie ...18 signalizácie prevádzky ...25 Diagnostické postupy

3040 Service kit for SEL Clamp-in Gen2 type 433MHZ Renault conical seal coloured rings Schrader Clamp-in Gen 2/3 Type fixed 20deg angle Conical Seal 433MHZ Renault. 3043 Service kit

In respect of the arguments that there is a better way of making the same (or more) shareholder value, I argue that in this economic context shareholder value is