Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
GridKa: Roles and Status
Forschungszentrum Karlsruhe GmbH
Institute for Scientific Computing
P.O. Box 3640
D-76021 Karlsruhe, Germany
Holger Marten
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
History
History
10/2000: First ideas about a German Regional Centre for LHC Computing - planning and cost estimates
05/2001: Start a BaBar-Tier-B with Univ. Bochum, Dresden, Rostock
07/2001: German HEP communities send “Requirements for a Regional Data and Computing Centre in Germany (RDCCG)”
- more planning and cost estimates
12/2001: Launching committee establishes RDCCG
(renamed to “Grid Computing Centre Karlsruhe, GridKa” later)
04/2002: First prototype
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Atlas
(SLAC, USA)(CERN) (FNAL ,USA)
(FNAL ,USA)
LHC experiments
non-LHC experiments
• Comm
itted
to Gri
d Com
puting
• Have
real
data
alread
y today
Other sciences later
High Energy Physics experiments served by GridKa
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft GridKa Technical Advisory Board Overview Board Board
•
BMBF•
Physics Committees•
HEP Experiments•
LCG•
FZK Management•
Head FZK Comp. Centre•
Chairman of TAB•
Project Leader•
Alice•
Atlas•
CMS•
LHCb•
BaBar•
CDF•
D0•
Compass•
Physics Committees•
DESY•
Project LeaderGridKa Project Organization
GridKa Project Organization
•
Planning•
Development•
Technical realizationForschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft
22 institutions
44 user groups
350 scientists
Aachen (4)● Bielefeld (2)● Bochum (2)● Bonn (3)● Darmstadt (1)▲ Dortmund (1)● Dresden (2)● Erlangen (1)● Frankfurt (1)● Freiburg (2)● Hamburg (1)▲ Heidelberg (1)▲(6)● Karlsruhe (2)● Mainz (3)● Mannheim (1)● München (1)●(5)▲ Münster (1)● Rostock (1)● Siegen (1)● Wuppertal (2)● ▲German Users
German Users
of GridKa
of GridKa
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
France: IN2P3, Lyon
Germany: Forschungszentrum Karlsruhe Italy: CNAF, Bologna
Japan: ICEPP, University Tokio Spain: PIC, Barcelona
Switzerland: CERN, Genf
Taiwan: Academia Sinica, Taipei
UK: Rutherford Laboratory, Chilton USA: Fermi Laboratory, Batavia, IL
USA: BNL
GridKa in the network of international Tier-1 centres
GridKa in the network of international Tier-1 centres
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft
CMS
ATLAS
LHCb
CERNTier 0 Centre at CERN
Working Groups Virtual Organizations Tier 2 (Uni-CCs, Lab-CCs) Lab y Uni a Lab i Uni b Lab z Lab x Uni c Uni d Uni e Tier 3 (Institute computers) Tier 4 (Desktop) The global LHC Computing Centre Germany (FZK) Tier 1 USA (Fermi, BNL) UK (RAL) France (IN2P3) Italy (CNAF) ………. CERN Tier 1 ………. Tier 0
The fifth LHC
The fifth LHC
subproject
subproject
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft RAL IN2P3 BNL FZK CNAF PIC ICEPP FNAL
LHC Computing Model (
simplified!!
)
•
Tier-0 – the accelerator centre
– Filter raw data
– Reconstruction summary data (ESD)
– Record raw data and ESD
– Distribute raw and ESD to Tier-1
•
Tier-1 –
– Permanent storage and management
of raw, ESD, calibration data, meta-data, analysis data and databases grid-enabled data service
– Data-heavy analysis
– Re-processing raw ESD – National, regional support
USC NIKHEF Krakow CIEMAT Rome Taipei TRIUMF CSCS Legnaro UB IFCA IC MSU Prague Budapest Cambridge Tier-1 small centres desktops portables Le s R ob ert son , G D B, M ay 2004 Santiago WeizmannTier-2
“online” to data acquisition process -- high availability (24h x7d) -- managed mass storage -- long-term commitment
-- resources: 50% of “average Tier-1”
GridKa School 2004, September 20-23, 2004, Karlsruhe, Germany Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft RAL IN2P3 BNL FZK CNAF PIC ICEPP FNAL USC NIKHEF Krakow CIEMAT Rome Taipei TRIUMF CSCS Legnaro UB IFCA IC MSU Prague Budapest Cambridge Tier-1 small centres desktops portables Santiago WeizmannTier-2
•
Tier-2 –
– Well-managed disk storage – grid-enabled
– Simulation
– End-user analysis – batch and interactive
– High performance parallel analysis (PROOF?)
•
Each Tier-2 is associated with a Tier-1 that
– Serves as the primary data source
– Takes responsibility for long-term storage and management of all of the data generated at the Tier-2 (grid-enables mass storage)
– May also provide other support services (grid expertise, software distribution, maintenance, …)
•
CERN will not provide these services for Tier-2s
except by special arrangement
Les Rob ert son , G D B, M ay 2004
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft
0
2000
4000
6000
8000
2002 2003 2004 2005 2006 2007 2008 2009
Tb
yt
e
4000 3000 2000 1000 0kS
I9
5
LCG Phase I Phase II Phase III
CPU Disk Tape
GridKa planned resources
GridKa planned resources
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft 0% 20% 40% 60% 80% 100% 2002 2003 2004 2005 2006 2007 2008 2009 0% 20% 40% 60% 80% 100% 2002 2003 2004 2005 2006 2007 2008 2009
Distribution of planned resources at GridKa
Distribution of planned resources at GridKa
0% 20% 40% 60% 80% 100% 2002 2003 2004 2005 2006 2007 2008 2009 CPU Disk Tape non-LHC non-LHC non-LHC LHC LHC LHC
Signifi
cant c
ontribu
tions t
o non-L
HC !!
• BaBa
r Tier-A
• D0,
CDF R
egional
Centre
Jan-2004Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
GridKa Environment
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft IWR 441,442 Tape Storage Main building
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Worker Nodes & Test beds
Worker Nodes & Test beds
Production environment
97x dual PIII, 1,26 GHz
97 kSI2000
1 GB mem, 40 GB HD64x dual PIV, 2,2 GHz
102 kSI2000
1 GB RAM, 40 GB HD72x dual PIV, 2,667 GHz 130 kSI2000
1 GB RAM, 40 GB HD267x dual PIV, 3,06 GHz
534 kSI2000
1 GB RAM, 40/80 GB HD36x dual Opteron 246
90 kSI2000
2 GB RAM, 80 GB HD
•
Σ 536 nodes, 1072 CPUs, 953 kSI2000
•
installed with RH7.3, LCG 2.2.0
(except for Opterons)Test environment
additional 30 machines in several test beds
Next OS
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft 50 283 50 210 56 140 150 143 kSI2000 5 000 28 300 5 000 21 000 5 600 14 000 15 000 14 300 share 4.6 Compass 26.2 Dzero 4.6 CDF 19.4 BaBar 5.2 LHCb 12.9 CMS 13.9 Atlas 13.2 Alice percentage experiment 1-oct-2004
The default (test) queue is not handled by the fair share. These 20-30 CPUs are kept free for test jobs.
PBSPro fair share according to requirements
PBSPro fair share according to requirements
45% LHC 55 % nLHC
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft 0 10 20 30 40 50 60 A L IC E A T L A S C M S L H C b B a B a r C D F D 0 C o m p a s s T B y te
Oct 04
29 % LHC 71 % nLHCDisk Space available for HEP experiments: 202 TB
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
•
about 40 TB stored in NAS (better: DAS)•
dual CPU, 16 EIDE disks, 3Ware controllerOnline Storage I
Online Storage I
Experience
•
hardware cheap, but not very reliable•
RAID software & management messages not always useful•
good throughput for a few simultaneous jobs,but doesn’t scale to a few hundred simultaneous file accesses
Workarounds
•
disk mirroring•
“management software” (“managed disks”): file copies on multiple boxes)Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
TCP/IP/NFS Expansion
Compute nodes
Online Storage: I/O Design with NAS (DAS)
Online Storage: I/O Design with NAS (DAS)
Alice Atlas
~ 30 MB/s r/w bottleneck disk access
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
•
about 160 TB stored in a SAN•
SCSI disks (rpm 10k) with redundant controllers•
parallel file system on a file server cluster exported via NFS on a cluster of file server to the WNsOnline Storage II
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
SAN/SCSI Fibre Channel TCP/IP/NFS
file server cluster
RAID 5 storage
Expansion
Compute nodes
Online Storage: Scalable I/O Design
Online Storage: Scalable I/O Design
Alice
Atlas striping + parallel file system;
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
•
about 160 TB stored in a SAN•
SCSI disks (rpm 10k) with redundant controllers•
parallel file system on a file server cluster exported via NFS on a cluster of file server to the WNsOnline Storage II
Online Storage II
Advantages
•
high availability through multiple redundant servers•
load balancing via automounter program mapExperience
•
many teething problems (bugs, learn how to configure,...)•
ratio (CPU/Wall clock) near to 1 in some applicationsForschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Why telling all this?
Why telling all this?
Because we need
Because we need
your
your
experience and feedback as users !
experience and feedback as users !
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft 0 20 40 60 80 100 120 A L IC E A T L A S C M S L H C b B a B a r C D F D 0 C o m p a s s T B y te
Oct 04
27 % LHC 73 % nLHCTape Space available for HEP experiments: 374 TB
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
•
tape library IBM 3584 LTO Ultrium•
8 drives LTO-1, 4 drives LTO-2•
375 TB native (uncompressed)•
Tivoli Storage Manager (TSM) for Backup and Archive•
installation of dCache in progress-
tape backend interfaced to Tivoli Storage Manager-
installation with 1 head and 3 pool nodes currently tested by CMS & CDF•
other-
SAM station caches for D0 and CDF-
JIM (Job information management) station for D0-
tape connection via scripts (D0)-
CORBA Naming service (for CDF)Tape Storage
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
GridKa – Plan for WAN connectivity
GridKa – Plan for WAN connectivity
2001 2002 2003 2004 2005 2006 2007 2008 34 Mbps 155 Mbps 2 Gbps 10 Gbps 20 Gbps Start discussion with Dante !
Sept 2004 DFN upgraded the capacity from Karlsruhe to Géant to 10 Gbps; tests have been started !
Routing (full 10 Gbps): GridKa – DFN (Karlsruhe) – DFN (Frankfurt) – Géant (Frankfurt) – Géant (Milano) – Géant (Geneva) – CERN
Start 10 Gbps
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Further services & sources of information
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft
GGUS
GGUS
(Global Grid User Support) www.ggus.orgForschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
User information
User information
www.gridka.de → GridKa Info
-
user registration-
globus installation-
batch system PBS-
backup & archive-
getting a certificate from GermanGrid CA-
listserver / mailing lists-
monitoring status with Gangliawww.gridka.de → HEP experiments
-
experiment specific informationwww.ggus.org
-
FAQ-
Documentaion-
...Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Tools
Tools
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Final remarks
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
EU-Project EGEE
April 2004 to March 2006 32 Mio. Euro f. personnel
70 partner institutes in 27 countries organized in 9 federations
applications LHC grid, Biomed,....
Russland
Operations Management Centre (OMC)
Core Infrastructure Centre (CIC)
Regional Operations Centre (ROC) Russland
Russland
Operations Management Centre (OMC)
Core Infrastructure Centre (CIC)
Regional Operations Centre (ROC) Operations Management Centre (OMC)
Core Infrastructure Centre (CIC)
Regional Operations Centre (ROC)
„Provide distributed European research communities with a common market of computing, offering round-the-clock access to major
computing resources, independent of geographic location, ..“
Europe on the way to e-science
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
http://goc.grid-support.ac.uk/lcg2
Status of LCG / EGEE
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
Last but not least
Last but not least
We want to help
-
our users on our systems
-
support/discuss cluster installations at other institutes
-
support/discuss middleware installations at other centres
-
creating a German Grid Infrastructure
We will continue the balancing act between
-
testing & Data Challanges
-
production with real data
and...
Forschungszentrum Karlsruhe
in der Helmholtz-Gemeinschaft
We appreciate the continuous interest and support by the Federal Ministry of Education and Research, BMBF.