• No results found

Large scale Data Storage Services for Science at STFC. David Corney

N/A
N/A
Protected

Academic year: 2021

Share "Large scale Data Storage Services for Science at STFC. David Corney"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

David Corney TERENA 8th April 2008 1

Large scale Data Storage

Services for Science at STFC

David Corney

Deputy Division Head, Data Services

Division, e-Science centre, Rutherford Lab

Science and Technology Facilities Council

(2)

What is STFC?

The Science and Technology Facilities Council (UK)

Created on April 1, 2007

It is responsible for

– fundamental research in particle physics, nuclear

physics, astronomy, space

– major UK facilities for the physical and life sciences

synchrotrons, light sources, lasers, neutrons

– national laboratories at RAL, Daresbury, UKATC

– international science projects

CERN, ESO, ESA, ILL, ESRF…

(3)

David Corney TERENA 8th April 2008 3

(4)
(5)

David Corney TERENA 8th April 2008 5

e-Infrastructure for

scientific facilities

Physical facilities provide

data for the information

infrastructure

Diamond

synchrotron

ISIS

neutron

and muon

facility

Vulcan

laser

facility

(6)

providing the

e-infrastructure

throughout the

research

(7)

David Corney TERENA 8th April 2008 7

Curation and Preservation

Atlas Petabyte Store

Switch_1 Switch_2 RS6000 RS6000 RS6000 RS6000 fsc0 fsc1 fsc0 fsc1 9940B 9940B 9940B 9940B 9940B 9940B 9940B 9940B 1 2 3 4 5 6 7 8 11 14 15 11 14 fsc1 fsc0 fsc1 fsc0 12 13 12 13 15 rmt1 rmt5-8 rmt2 rmt5-8 rmt3 rmt5-8 rmt4 rmt5-8 A A A A A A A A STK 9310 “Powder Horn” Gbit network 1.2TB 1.2TB 1.2TB 1.2TB Switch_1 Switch_2 RS6000 RS6000 RS6000 RS6000 fsc0 fsc1 fsc0 fsc1 9940B 9940B 9940B 9940B 9940B 9940B 9940B 9940B 1 2 3 4 5 6 7 8 11 14 15 11 14 fsc1 fsc0 fsc1 fsc0 12 13 12 13 15 rmt1 rmt5-8 rmt2 rmt5-8 rmt3 rmt5-8 rmt4 rmt5-8 A A A A A A A A STK 9310 “Powder Horn” Gbit network 1.2TB 1.2TB 1.2TB 1.2TB

Datastore Usage by Family

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

Jun-97Dec-97Jun-98Dec-98Jun-99Dec-99Jun-00Dec-00Jun-01Dec-01Jun-02Dec-02Jun-03Dec-03Mar-04

Gbytes CR-AFRC CRAYSUP CR-EPSRC CR-NERC CR-PPARC DCI-ISE DCI-NET DCI-OH DCI-PC DCI-VIS DL-SRD EDG ESCIENCE EXTERNAL FACILMAN FUJISUP ITD-SER ITD-SUP NUCPHYS RAL-ADM RAL-ENG RAL-SCI RAL-TECH SCALSUP SCALUSER SSD SSD-EOD SSD-PPAR

Save the bits and Save the

information

Digital Curation Centre

Edinburgh, Bath, Glasgow, STFC

4-1. 2 MANAGEMENT Ingest Data Management SIP AIP DIP queries result sets Access P R O D U C E R C O N S U M E R Descriptive Info AIP orders Descriptive Info Archival Storage Administration Preservation Planning

(8)

Major User Communities

High Energy Physics Experiments (CMS, Atlas, LHcb,….)

ISIS Neutron muon facility

British Atmospheric Data Centre

EISCAT (Radar research)

National Earth Observation Data Centre

Solar Physics World Data Centre

Central Laser Facility

Diamond Light Source

National Crystallography Service

Hartley Library, Southampton University

WASP, VIRGO Consortium, SOLAR-B, e-minerals

BBSRC archive, CICT

(9)

David Corney TERENA 8th April 2008 9

Growth in Data Capacity

Atlas

Petabyte

DataStore

5 Petabytes of

on-line storage

available to

STFC facilities

and the

UK research

community

(10)

Growth in data volume

Total Holdings

0.00

0.50

1.00

1.50

2.00

2.50

3.00

Ju

n-97

De

c-97

Ju

n-98

De

c-98

Ju

n-99

De

c-99

Ju

n-00

De

c-00

Ju

n-01

De

c-01

Ju

n-02

De

c-0

2

Jun

-0

3

De

c-0

3

Jun

-0

4

De

c-04

Ju

n-05

De

c-05

Ju

n-06

De

c-0

6

Ju

n-07

De

c-0

7

Pb

y

te

s

Castor

DMF

ADS

(11)

David Corney TERENA 8th April 2008 11

Data Storage and Management

Services - overview

DMF (Data Management

Facility)

Atlas Data Store and Storage

Resource Broker

CASTOR for LHC. Part of the

UK Tier1 centre.

(12)

Data Management Facility

500TB total licence; 40TB Front end disc

Commercial HSM system. Mature; low

staff costs

Operational services since 2005

Currently exploring GRIDFTP access for

remote services to UK NERC data

centres…

Recently integrated with Gresham Virtual

Tape Library (VTL) to optimise transfer

rates for small files (SOHO)

(13)

David Corney TERENA 8th April 2008 13

DMF Services

Project

Volume (TB)

#Files (x10

6

)

BADC

114

3

SOHO,

TRACE

20

18

SOLAR-B

4

6

Total

138

27

(14)

Atlas Data Store & Storage

Resource Broker

(15)

David Corney TERENA 8th April 2008 15

STK 9310

8 x 9940 tape drives

ADS_switch_1 ADS_Switch_2

Brocade FC switches

4 drives to each switch

ermintrude

AIX

dataserver

florence

AIX

dataserver

zebedee

AIX

dataserver

dougal

AIX

dataserver

mchenry1

AIX

Test

flfsys

basil

AIX

test

dataserver

brian

AIX

flfsys

ADS0CNTR

Redhat

counter

ADS0PT01

Redhat

pathtape

ADS0SB01

Redhat

SRB interface

dylan

AIX

Import/export

buxton

SunOS

ACSLS

User

array4

array3

array2

array1

catalogue

cache

catalogue

cache

Test system

SRB

Inq; S commands; MySRB

Tape devices

ADS

tape

ADS

sysreq

admin

commands

create query

User pathtape

commands

Logging

Physical connection (FC/SCSI)

Sysreq udp command

User SRB command

VTP data transfer

SRB data transfer

STK ACSLS command

Production system

SRB pathtape commands

All sysreq, vtp and

ACSLS connections to

dougal also apply to

the other dataserver

machines, but are left

out for clarity

(16)

Storage Layer – ADS

Tape based Storage Archive. In house system.

20 yrs old.

De-couple user and application from storage

media.

Upgrades and media migration occur “behind

the scenes”

High resilience - very few Single Point Failures

High reliability, high availability (24/7)

Lifetime data integrity checks

Fire safe and off-site backups; Tested disaster

recovery procedures; media migration, recycling

(17)

SRB

Distributed data management client-server system.

– Provides searchable access to data (info held in metadata

database).

– Uniform interface to different types of resources (each

resource type has a plugin).

– Insulates the end user from needing to know where data

physically stored (logical-to-physical mapping of data and

resources).

– Allows replication of data, proxy commands, grouping of

resources.

– Supports X509 certificate authentication.

– Supports grid-ftp.

(18)

SRB What does it give me?

Uniform, searchable access to distributed resources means I don't have to:

– remember physical location of data

– know how to access different storage types

– worry about migrating to new hardware (insulation)

Scalable system:

– I can add more different types of resources dynamically to the

system.

Auditing:

– I can keep track of operations performed on data.

(19)

David Corney TERENA 8th April 2008 19

(20)

SRB Services

Biological and Biotech research council. In prod since April/06. Initial 10

year SLA. 50TB limit. Around 6000 scientists across 12 centres in the UK.

Arts and Humanities Data Service, more than 200K files, 2TB data. SRB

used as a 'dark archive' data md5 checksummed and backed up to tape.

In prod since Apr/07.

ISIS Facility – provide an SRB system to backup ISIS facility data. In

production since 2006. Currently 2TB of data stored. System being

extended to include experimental data.

Diamond, CLF Facilites – provide SRB-based data management systems

providing distributed access to data. Systems currently being deployed.

Will hold petabytes of data.

SRD Facility – provide a data management system with distributed access

through web-based front-end. System currently being deployed.

(21)

4/8/2008 3 4 1 2 Local Storage Local Vault Local machines Local SRB Server Firewall In ge stion Central SRB Server Central “cache” Vault Firewall Firewall ADS Tape Resource SRB-ADS Server ADS SRB Disk Cache Resource ads0sb01.cc.rl.ac.uk Sphymove in to container Sreplcont Ssyn cont

Remote Institute Site

Central ‘Cache’ Site

RAL Site

WAN

JANET WAN

BBSRC SRB Archive process Data Path

disk disk disk Filer Tape Traffic 3 1 Archive Submission Interface

- Data Ingestion of collection hierarchy into SRB - Uses Java jargon API interface (equivalent of Sput –b)

- Ingested to /bbsrc/institute/scratch/project/year/user/dateandtime

- At end of ingestion data logically moved using Smvto:

/bbsrc/institute/local-archive/project/year/user/dateandtime

Scheduled transfer to ADS resource

- Implemented via CRON job using Sreplcont command which is driven by central SRB Server

-Entire container replicated using Sreplcontcommand -Logical Structure preserved as /bbsrc/institute/remote-archive/project/year/user/dateandtime

-Synchronization of container to tape resource and removal of original container from Central SRB Server

-Ssyncont –d –acommand used, allowing for a family of containers

4 2 Scheduled transfer to Central SRB Server (Driven from Central SRB Server)

-Smkcontcommand used to create container on central SRB Server

- Data moved from Site SRB to container on central SRB Server using Sphymove

- Upon data transfer completion archived data is logically move with Smvto

/bbsrc/institute/remote-archive/project/year/user/dateandtime

(22)

BBSRC Accumulated Data

Volume

(23)

Diamond Data Flow

Roger Downing

(24)
(25)
(26)

ALICE

CMS

4 Experiments

(27)

David Corney TERENA 8th April 2008 27

ATLAS Detector

7,000

tonnes

42m

long

22m

wide

22m

high

(About the

height of a 5

storey

building)

2,000

Physicists

150

Institutes

34

Countries

(28)
(29)

David Corney TERENA 8th April 2008 29

Tier Structure

Tier 0

Tier 1

National centres

Tier 2

Regional groups

Institutes

Offline farm

Online system

CERN computer centre

RAL,UK

ScotGrid NorthGrid SouthGrid

London

France

Italy

Germany

USA

Glasgow Edinburgh Durham

Useful model

for Particle

Physics but not

necessary for

(30)

Team Organisation

(GRIDPP2)

Grid Services

Grid/Support

Ross

Condurache

Hodges

Klein (EGEE)

Vacancy1 50/50 PPD

Vacancy2 50/50 PPD

Fabric

(H/W and OS)

Bly (team leader)

Wheeler

Holt

Thorne

White (OS support)

Adams (HW support)

CASTOR

SW/Robot

Corney (GL)

Strong (Service Manager)

Folkes (HW Manager)

deWitt

Jensen

Kruk

Ketley

Bonnet

2.5 FTE effort

CICT Machine Room operations (1.8 FTE)

CICT Networking Support (0.5 FTE)

Database Support (Brown) 0.5 FTE

(31)

David Corney TERENA 8th April 2008 31

Storage Layer - CASTOR

Massively Scaleable GRID based HSM

(Currently 10PB ~ 100 million files at

CERN)

Developed by CERN in collaboration with

STFC (SRM) and others. ORACLE engine

Deployed at STFC to manage relatively

large volumes of LHC data for the UK

(2-3 PB per year)

Aim to make CASTOR the de-facto

storage system for STFC – eventually (SRB

interface for CASTOR)

(32)

Hardware: Tape

Tape Drives

– 8 9940B drives

Used on legacy ADS/dCache service – phase out soon

– 18 T10K tape drives and associated servers delivered, 15

in production, remainder soon

Planned bandwidth 50MB/s per drive

Actual bandwidth (8-80MB/s) - a work in progress

Media

(33)

David Corney TERENA 8th April 2008 33

Hardware: Disk

Production capacity: 138 Servers, 2800 drives, 850TB (usable)

1.6PB capacity delivered in January by Viglen

– 91 Supermicro 3U servers with dual AMD 2220E (2.8GHz) dual-core

CPUs, 8GB RAM, IPMI

1 x 3ware 4 port 9650 PCIe RAID controller with 2 x 250GB WD

HDD

1 x 3ware 16 port 9650 PCIe RAID controller with 14 x 750GB

WD HDD

91 Supermicro 3U servers with dual Intel E5310 (1.6GHz)

quad-core CPUs, 8GB RAM, IPMI

1 x 3ware 4 port 9650 PCIe RAID controller with 2 x 400GB

Seagate HDD

1 x 3ware 16 port 9650 PCIe RAID controller with 14 x 750GB

Seagate HDD

Acceptance test running – scheduled to be available end of March.

– 5400 spinning drives after planned phase out in April (expect drive

failure every 3 days)

(34)

Test Architecture

stager

DLF+

LSF

stager DLF

LSF

1 Diskserver

- variable

Tape

Server

Oracle

stager

Oracle

NS+

vmgr

Name

Server

+vmgr

Shared

Services

Server

Tape

Oracle

NS+

vmgr

Name

Server

+vmgr

Shared

Services

Oracle

DLF

Oracle

repack

Oracle

stager

repack

Oracle

DLF

stager DLF

LSF

Certification Testbed

Oracle

DLF

Oracle

repack

Oracle

stager

repack

Preproduction

Development

1 Diskserver

- variable

1 Diskserver

- variable

(35)

David Corney TERENA 8th April 2008 35

CASTOR Production Architecture

Oracle

NS+

vmgr

Name

Server 1

+vmgr

Tape

Server

Tape

Server

Tape

Server

Tape

Server

Tape

Server

Tape

Server

Name

Server 2

Shared

Services

stager

DLF

LSF

stager

DLF

LSF

stager DLF

LSF

stager

DLF

LSF

1

Diskserver

Oracle

stager

CMS Stager

Instance

Diskservers

Oracle

DLF

Oracle

stager

Oracle

DLF

Oracle

stager

Oracle

DLF

Oracle

DLF

Oracle

repack

Oracle

stager

repack

Atlas Stager

Instance

LHCb Stager

Instance

Repack and Small

User Stager Instance

(36)

CASTOR Memory Lane

1Q07

2Q07

3Q07 4Q07

1Q08

4Q06

3Q06

2Q06

1Q06

4Q05

CASTOR1

tests OK

CASTOR2 Core Running

Hard to install + dependencies

CSA07 encouraging

OC Committees

note improvement

but concerned

CMS on CASTOR for

CSA06. Encouraging.

2.1.4 upgrade Goes

well. Disk 1 support!

CSA08

reasonably

successful

Problems with

functionality and

performance – it

doesn’t work!

Happy days!

2.1.2 bad

ATLAS on

CASTOR

Service stopped for

extended upgrade

2.1.3 good

but missing

functionality

(37)

David Corney TERENA 8th April 2008 37

(38)

Large scale Data Storage

Services for Science at STFC

David Corney

[email protected]

Deputy Division Head, Data Services

Division, e-Science centre, Rutherford Lab

Science and Technology Facilities Council

(39)

David Corney TERENA 8th April 2008 39

(40)

Common European Multiple Science Data

Infrastructure (CEMSDI)

A project planned for FP7 INFRA-2008-1.2.5

Scientific Data Infrastructure

(41)

David Corney TERENA 8th April 2008 41

Aims and Objectives

Build a core European grid-based data

infrastructure (RAL, DESY? CNAF?|IN2P3?)

(est. 2009 – 2012)

Build on existing LHC grid and storage

expertise at Tier 1 sites

Provide set of generic data storage

services and curation services useful for

many science communities across Europe

Federated to other services across the

world

Initially funded via FP7 – sustained

long-term by cost neutral service charge (2013

onwards)

(42)

(Some of the) High Priority

Requirements

Check Data integrity:

– at block, device, location, archive level

– via automatic policies, timed if necessary to

repeat at intervals.

– Detect storage device and media failures

Rules for

– data replication and backup

– error detection

– rules for integrity verification

(43)

David Corney TERENA 8th April 2008 43

(Some of the) High Priority

Requirements

Scaleable from bit to Terabyte to Petabyte and Exabyte.

High integrity redundant storage for data capture levels

(i.e. when the data first gets written.)

Security: Control over who can read and write data sets

and meta data.

Security: Control over who can change data sets and meta

data

Audit: Log of actions taken by system for normal and

exceptional operation

Audit: Log of performance data

Audit: Log system /device utilisation

Audit: Reporting of used capacity and projection of

requirements based on usage over time period.

References

Related documents

This service provides for a yearly health check (the “Service” or “Services”) on a Dell PowerEdge Server, PowerVault single tape external Tape Backup Unit (“TBU”),

primary Windows SAN server, click the green squares next to Manager, Server, Server Admin, BrightStor ARCserve Backup for Windows Tape Library Option r11.5 and Storage Area

18 17

The credit and collection manager is considering instituting a stricter collection policy, whereby bad debts would be reduced to 1.5 percent of total sales, and the average

Data Mover File System Symmetrix Tape Device NDMP Server NDMP Client with Backup Software NDMP Control Station Data Mover File System Symmetrix Tape Device NDMP Server NDMP Client

This efficient use of tape capacity can reduce the number of tapes in the library, the amount of floor space needed to store tapes, the work associated with transporting tapes

Primary Server E: Volume 1 Surround SCM Server Mirroring Software Backup Server Tape Drive Fibre Channel Fibre Channel SAN F: Backup Boston Server E: F: Surround SCM Server

Storage Area Network Local Network Tape Library Network Disk Local Disk Sesam Server File Server Oracle Server SAP Server eMail Server Directory Server Virtual