David Corney TERENA 8th April 2008 1
Large scale Data Storage
Services for Science at STFC
David Corney
Deputy Division Head, Data Services
Division, e-Science centre, Rutherford Lab
Science and Technology Facilities Council
What is STFC?
The Science and Technology Facilities Council (UK)
Created on April 1, 2007
It is responsible for
– fundamental research in particle physics, nuclear
physics, astronomy, space
– major UK facilities for the physical and life sciences
−
synchrotrons, light sources, lasers, neutrons
– national laboratories at RAL, Daresbury, UKATC
– international science projects
−
CERN, ESO, ESA, ILL, ESRF…
David Corney TERENA 8th April 2008 3
David Corney TERENA 8th April 2008 5
e-Infrastructure for
scientific facilities
Physical facilities provide
data for the information
infrastructure
Diamond
synchrotron
ISIS
neutron
and muon
facility
Vulcan
laser
facility
providing the
e-infrastructure
throughout the
research
David Corney TERENA 8th April 2008 7
Curation and Preservation
Atlas Petabyte Store
Switch_1 Switch_2 RS6000 RS6000 RS6000 RS6000 fsc0 fsc1 fsc0 fsc1 9940B 9940B 9940B 9940B 9940B 9940B 9940B 9940B 1 2 3 4 5 6 7 8 11 14 15 11 14 fsc1 fsc0 fsc1 fsc0 12 13 12 13 15 rmt1 rmt5-8 rmt2 rmt5-8 rmt3 rmt5-8 rmt4 rmt5-8 A A A A A A A A STK 9310 “Powder Horn” Gbit network 1.2TB 1.2TB 1.2TB 1.2TB Switch_1 Switch_2 RS6000 RS6000 RS6000 RS6000 fsc0 fsc1 fsc0 fsc1 9940B 9940B 9940B 9940B 9940B 9940B 9940B 9940B 1 2 3 4 5 6 7 8 11 14 15 11 14 fsc1 fsc0 fsc1 fsc0 12 13 12 13 15 rmt1 rmt5-8 rmt2 rmt5-8 rmt3 rmt5-8 rmt4 rmt5-8 A A A A A A A A STK 9310 “Powder Horn” Gbit network 1.2TB 1.2TB 1.2TB 1.2TB
Datastore Usage by Family
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
Jun-97Dec-97Jun-98Dec-98Jun-99Dec-99Jun-00Dec-00Jun-01Dec-01Jun-02Dec-02Jun-03Dec-03Mar-04
Gbytes CR-AFRC CRAYSUP CR-EPSRC CR-NERC CR-PPARC DCI-ISE DCI-NET DCI-OH DCI-PC DCI-VIS DL-SRD EDG ESCIENCE EXTERNAL FACILMAN FUJISUP ITD-SER ITD-SUP NUCPHYS RAL-ADM RAL-ENG RAL-SCI RAL-TECH SCALSUP SCALUSER SSD SSD-EOD SSD-PPAR
Save the bits and Save the
information
Digital Curation Centre
Edinburgh, Bath, Glasgow, STFC
4-1. 2 MANAGEMENT Ingest Data Management SIP AIP DIP queries result sets Access P R O D U C E R C O N S U M E R Descriptive Info AIP orders Descriptive Info Archival Storage Administration Preservation Planning
Major User Communities
High Energy Physics Experiments (CMS, Atlas, LHcb,….)
ISIS Neutron muon facility
British Atmospheric Data Centre
EISCAT (Radar research)
National Earth Observation Data Centre
Solar Physics World Data Centre
Central Laser Facility
Diamond Light Source
National Crystallography Service
Hartley Library, Southampton University
WASP, VIRGO Consortium, SOLAR-B, e-minerals
BBSRC archive, CICT
David Corney TERENA 8th April 2008 9
Growth in Data Capacity
Atlas
Petabyte
DataStore
5 Petabytes of
on-line storage
available to
STFC facilities
and the
UK research
community
Growth in data volume
Total Holdings
0.00
0.50
1.00
1.50
2.00
2.50
3.00
Ju
n-97
De
c-97
Ju
n-98
De
c-98
Ju
n-99
De
c-99
Ju
n-00
De
c-00
Ju
n-01
De
c-01
Ju
n-02
De
c-0
2
Jun
-0
3
De
c-0
3
Jun
-0
4
De
c-04
Ju
n-05
De
c-05
Ju
n-06
De
c-0
6
Ju
n-07
De
c-0
7
Pb
y
te
s
Castor
DMF
ADS
David Corney TERENA 8th April 2008 11
Data Storage and Management
Services - overview
DMF (Data Management
Facility)
Atlas Data Store and Storage
Resource Broker
CASTOR for LHC. Part of the
UK Tier1 centre.
Data Management Facility
500TB total licence; 40TB Front end disc
Commercial HSM system. Mature; low
staff costs
Operational services since 2005
Currently exploring GRIDFTP access for
remote services to UK NERC data
centres…
Recently integrated with Gresham Virtual
Tape Library (VTL) to optimise transfer
rates for small files (SOHO)
David Corney TERENA 8th April 2008 13
DMF Services
Project
Volume (TB)
#Files (x10
6
)
BADC
114
3
SOHO,
TRACE
20
18
SOLAR-B
4
6
Total
138
27
Atlas Data Store & Storage
Resource Broker
David Corney TERENA 8th April 2008 15
STK 9310
8 x 9940 tape drives
ADS_switch_1 ADS_Switch_2
Brocade FC switches
4 drives to each switch
ermintrude
AIX
dataserver
florence
AIX
dataserver
zebedee
AIX
dataserver
dougal
AIX
dataserver
mchenry1
AIX
Test
flfsys
basil
AIX
test
dataserver
brian
AIX
flfsys
ADS0CNTR
Redhat
counter
ADS0PT01
Redhat
pathtape
ADS0SB01
Redhat
SRB interface
dylan
AIX
Import/export
buxton
SunOS
ACSLS
User
array4
array3
array2
array1
catalogue
cache
cataloguecache
Test system
SRB
Inq; S commands; MySRB
Tape devices
ADS
tape
ADS
sysreq
admin
commands
create query
User pathtape
commands
Logging
Physical connection (FC/SCSI)
Sysreq udp command
User SRB command
VTP data transfer
SRB data transfer
STK ACSLS command
Production system
SRB pathtape commands
All sysreq, vtp and
ACSLS connections to
dougal also apply to
the other dataserver
machines, but are left
out for clarity
Storage Layer – ADS
Tape based Storage Archive. In house system.
20 yrs old.
De-couple user and application from storage
media.
Upgrades and media migration occur “behind
the scenes”
High resilience - very few Single Point Failures
High reliability, high availability (24/7)
Lifetime data integrity checks
Fire safe and off-site backups; Tested disaster
recovery procedures; media migration, recycling
SRB
Distributed data management client-server system.
– Provides searchable access to data (info held in metadata
database).
– Uniform interface to different types of resources (each
resource type has a plugin).
– Insulates the end user from needing to know where data
physically stored (logical-to-physical mapping of data and
resources).
– Allows replication of data, proxy commands, grouping of
resources.
– Supports X509 certificate authentication.
– Supports grid-ftp.
SRB What does it give me?
Uniform, searchable access to distributed resources means I don't have to:
– remember physical location of data
– know how to access different storage types
– worry about migrating to new hardware (insulation)
Scalable system:
– I can add more different types of resources dynamically to the
system.
Auditing:
– I can keep track of operations performed on data.
David Corney TERENA 8th April 2008 19
SRB Services
Biological and Biotech research council. In prod since April/06. Initial 10
year SLA. 50TB limit. Around 6000 scientists across 12 centres in the UK.
Arts and Humanities Data Service, more than 200K files, 2TB data. SRB
used as a 'dark archive' data md5 checksummed and backed up to tape.
In prod since Apr/07.
ISIS Facility – provide an SRB system to backup ISIS facility data. In
production since 2006. Currently 2TB of data stored. System being
extended to include experimental data.
Diamond, CLF Facilites – provide SRB-based data management systems
providing distributed access to data. Systems currently being deployed.
Will hold petabytes of data.
SRD Facility – provide a data management system with distributed access
through web-based front-end. System currently being deployed.
4/8/2008 3 4 1 2 Local Storage Local Vault Local machines Local SRB Server Firewall In ge stion Central SRB Server Central “cache” Vault Firewall Firewall ADS Tape Resource SRB-ADS Server ADS SRB Disk Cache Resource ads0sb01.cc.rl.ac.uk Sphymove in to container Sreplcont Ssyn cont
Remote Institute Site
Central ‘Cache’ Site
RAL Site
WAN
JANET WAN
BBSRC SRB Archive process Data Path
disk disk disk Filer Tape Traffic 3 1 Archive Submission Interface
- Data Ingestion of collection hierarchy into SRB - Uses Java jargon API interface (equivalent of Sput –b)
- Ingested to /bbsrc/institute/scratch/project/year/user/dateandtime
- At end of ingestion data logically moved using Smvto:
/bbsrc/institute/local-archive/project/year/user/dateandtime
Scheduled transfer to ADS resource
- Implemented via CRON job using Sreplcont command which is driven by central SRB Server
-Entire container replicated using Sreplcontcommand -Logical Structure preserved as /bbsrc/institute/remote-archive/project/year/user/dateandtime
-Synchronization of container to tape resource and removal of original container from Central SRB Server
-Ssyncont –d –acommand used, allowing for a family of containers
4 2 Scheduled transfer to Central SRB Server (Driven from Central SRB Server)
-Smkcontcommand used to create container on central SRB Server
- Data moved from Site SRB to container on central SRB Server using Sphymove
- Upon data transfer completion archived data is logically move with Smvto
/bbsrc/institute/remote-archive/project/year/user/dateandtime
BBSRC Accumulated Data
Volume
Diamond Data Flow
Roger Downing
ALICE
CMS
4 Experiments
David Corney TERENA 8th April 2008 27
ATLAS Detector
7,000
tonnes
42m
long
22m
wide
22m
high
(About the
height of a 5
storey
building)
2,000
Physicists
150
Institutes
34
Countries
David Corney TERENA 8th April 2008 29
Tier Structure
Tier 0
Tier 1
National centres
Tier 2
Regional groups
Institutes
Offline farm
Online system
CERN computer centre
RAL,UK
ScotGrid NorthGrid SouthGrid
London
France
Italy
Germany
USA
Glasgow Edinburgh Durham
Useful model
for Particle
Physics but not
necessary for
Team Organisation
(GRIDPP2)
Grid Services
Grid/Support
Ross
Condurache
Hodges
Klein (EGEE)
Vacancy1 50/50 PPD
Vacancy2 50/50 PPD
Fabric
(H/W and OS)
Bly (team leader)
Wheeler
Holt
Thorne
White (OS support)
Adams (HW support)
CASTOR
SW/Robot
Corney (GL)
Strong (Service Manager)
Folkes (HW Manager)
deWitt
Jensen
Kruk
Ketley
Bonnet
2.5 FTE effort
CICT Machine Room operations (1.8 FTE)
CICT Networking Support (0.5 FTE)
Database Support (Brown) 0.5 FTE
David Corney TERENA 8th April 2008 31
Storage Layer - CASTOR
Massively Scaleable GRID based HSM
(Currently 10PB ~ 100 million files at
CERN)
Developed by CERN in collaboration with
STFC (SRM) and others. ORACLE engine
Deployed at STFC to manage relatively
large volumes of LHC data for the UK
(2-3 PB per year)
Aim to make CASTOR the de-facto
storage system for STFC – eventually (SRB
interface for CASTOR)
Hardware: Tape
Tape Drives
– 8 9940B drives
−
Used on legacy ADS/dCache service – phase out soon
– 18 T10K tape drives and associated servers delivered, 15
in production, remainder soon
−
Planned bandwidth 50MB/s per drive
−
Actual bandwidth (8-80MB/s) - a work in progress
Media
David Corney TERENA 8th April 2008 33
Hardware: Disk
Production capacity: 138 Servers, 2800 drives, 850TB (usable)
1.6PB capacity delivered in January by Viglen
– 91 Supermicro 3U servers with dual AMD 2220E (2.8GHz) dual-core
CPUs, 8GB RAM, IPMI
−
1 x 3ware 4 port 9650 PCIe RAID controller with 2 x 250GB WD
HDD
−
1 x 3ware 16 port 9650 PCIe RAID controller with 14 x 750GB
WD HDD
–
91 Supermicro 3U servers with dual Intel E5310 (1.6GHz)
quad-core CPUs, 8GB RAM, IPMI
−
1 x 3ware 4 port 9650 PCIe RAID controller with 2 x 400GB
Seagate HDD
−
1 x 3ware 16 port 9650 PCIe RAID controller with 14 x 750GB
Seagate HDD
Acceptance test running – scheduled to be available end of March.
– 5400 spinning drives after planned phase out in April (expect drive
failure every 3 days)
Test Architecture
stager
DLF+
LSF
stager DLF
LSF
1 Diskserver
- variable
Tape
Server
Oracle
stager
Oracle
NS+
vmgr
Name
Server
+vmgr
Shared
Services
Server
Tape
Oracle
NS+
vmgr
Name
Server
+vmgr
Shared
Services
Oracle
DLF
Oracle
repack
Oracle
stager
repack
Oracle
DLF
stager DLF
LSF
Certification Testbed
Oracle
DLF
Oracle
repack
Oracle
stager
repack
Preproduction
Development
1 Diskserver
- variable
1 Diskserver
- variable
David Corney TERENA 8th April 2008 35
CASTOR Production Architecture
Oracle
NS+
vmgr
Name
Server 1
+vmgr
Tape
Server
Tape
Server
Tape
Server
Tape
Server
Tape
Server
Tape
Server
Name
Server 2
Shared
Services
stager
DLF
LSF
stager
DLF
LSF
stager DLF
LSF
stager
DLF
LSF
1
Diskserver
Oracle
stager
CMS Stager
Instance
Diskservers
Oracle
DLF
Oracle
stager
Oracle
DLF
Oracle
stager
Oracle
DLF
Oracle
DLF
Oracle
repack
Oracle
stager
repack
Atlas Stager
Instance
LHCb Stager
Instance
Repack and Small
User Stager Instance
CASTOR Memory Lane
1Q07
2Q07
3Q07 4Q07
1Q08
4Q06
3Q06
2Q06
1Q06
4Q05
CASTOR1
tests OK
CASTOR2 Core Running
Hard to install + dependencies
CSA07 encouraging
OC Committees
note improvement
but concerned
CMS on CASTOR for
CSA06. Encouraging.
2.1.4 upgrade Goes
well. Disk 1 support!
CSA08
reasonably
successful
Problems with
functionality and
performance – it
doesn’t work!
Happy days!
2.1.2 bad
ATLAS on
CASTOR
Service stopped for
extended upgrade
2.1.3 good
but missing
functionality
David Corney TERENA 8th April 2008 37
Large scale Data Storage
Services for Science at STFC
David Corney
Deputy Division Head, Data Services
Division, e-Science centre, Rutherford Lab
Science and Technology Facilities Council
David Corney TERENA 8th April 2008 39
Common European Multiple Science Data
Infrastructure (CEMSDI)
A project planned for FP7 INFRA-2008-1.2.5
Scientific Data Infrastructure
David Corney TERENA 8th April 2008 41
Aims and Objectives
Build a core European grid-based data
infrastructure (RAL, DESY? CNAF?|IN2P3?)
(est. 2009 – 2012)
Build on existing LHC grid and storage
expertise at Tier 1 sites
Provide set of generic data storage
services and curation services useful for
many science communities across Europe
Federated to other services across the
world
Initially funded via FP7 – sustained
long-term by cost neutral service charge (2013
onwards)
(Some of the) High Priority
Requirements
Check Data integrity:
– at block, device, location, archive level
– via automatic policies, timed if necessary to
repeat at intervals.
– Detect storage device and media failures
Rules for
– data replication and backup
– error detection
– rules for integrity verification
David Corney TERENA 8th April 2008 43
(Some of the) High Priority
Requirements
Scaleable from bit to Terabyte to Petabyte and Exabyte.
High integrity redundant storage for data capture levels
(i.e. when the data first gets written.)
Security: Control over who can read and write data sets
and meta data.
Security: Control over who can change data sets and meta
data
Audit: Log of actions taken by system for normal and
exceptional operation
Audit: Log of performance data
Audit: Log system /device utilisation
Audit: Reporting of used capacity and projection of