ALICE GRID
&
Kolkata Tier-2
Site Name :-
IN-DAE-VECC-01
& IN-DAE-VECC-02
VO :- ALICE
City:- KOLKATA
Country :- INDIA
Vikas Singhal
VECC, Kolkata
Events at LHC
Luminosity : 1034cm-2 s-1
40 MHz – every 25 ns 20 events overlaying
CMS
ATLAS
LHC
b
CERN
Tier 0 Centre at CERN
The Grid Computing Model
Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 physics department Desktop Germany Tier 1 USA UK France Italy Scandinavia CERN Tier 1 Japan CERN Tier 0
ALICE computing model
APROC Taiwan France Regional
Center Germany Regional Center Italy Regional Center
10Gb/s Tier 1 100 - 1000 Mb/s Tier 4 Tier2 Center 1-10 Gb/s Tier2 Center Tier2 Center Tier2 Center Kolkata Tier 2 Institute Institute Institute Institute
Physics data cache
155/622 Mb/s Tier 3 Tier 0 ~40 Gb/s Online System Online Farm CERN Computer Center
RAW data delivered by DAQ undergo Calibration and Reconstruction which produce for each event 3 kinds of objects:
1. ESD object 2. AOD object 3. Tag object
Further reconstruction and calibration of RAW data will be done at Tier 1 and Tier 2.
DPD (Derived Physics Data) objects will be Processed in Tier 3 and Tier 4.
The generation, reconstruction, storage and distribution of Monte-Carlo simulated data will be the main task of Tier 1 and Tier 2.
HMPID
Muon Arm
TRD
PHOS
PMD
ITS
TOF
TPC
Indian contribution to ALICE : PMD, Muon Arm
Size: 16 x 26 meters
Total weight 10,000t Overall diameter 16.00m Overall length 25m Magnetic Field 0.4Tesla
ALICE Collaboration
~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1100 people
30 countries, 80 Institutes
The ALICE collaboration & detector
Data volumes
• RAW data – 2.5 PB/year
• Two distinct periods –
•p+p (~7.5 months) and
•Pb+Pb (~40 days)
• Reconstructed and simulated data
• 1.5PB – first level RAW filtering (ESDs)
• 200TB – second level RAW filtering (AODs)
• 1PB of simulated data
• User generated data ~500TB
•
Total ~5 PB of data per year (without replicas)
• Replication 2x RAW, 3x ESD/AODs, 2x user files
Processing
• RAW data reconstruction ~10K CPU cores
• MC processing ~15K CPU cores
• User analysis ~7K CPU cores (450 distinct users)
• ~40Mio jobs per year
• ~ 1.3 job completed every second
• ½ production, ½ user jobs
• 200 Mio files per year
Taken from L. Betev Slides in T1-T2 Meeting at KIT
ALICE Sites on MONALISA
Europe
Asia
North America Africa
72 active computing sites
South America
Why Tier 2 ?
1.
Tier-2 is the lowest level to be accessible by the entire
collaboration.
2.
Each sub-detector of ALICE has to be associated with
minimum Tier-2 because of large volume of calibration
and simulated data.
3.
PMD is one of the important sub-detectors of ALICE.
4.
We are solely responsible for PMD – from conception to
commissioning.
Vikas Singhal, VECC, INDIA
Grid Site As per WLCG &
Experiment Requirement
SE (PureXrootD) WNs (More and More WNs) Disks (More and More Disks) WMS MyProxy VOMS…. CREAM-CE Site BDII LCG-UIKOLKATA or General Site
Site BDII NFS SERVER Blade 64 bit Servers With Blade Enclosures Disks Arrays (More and More Arrays) Central Services WMS MyProxy VO-BOX CREAM-CE DPM PureXrootD XrootD Redirector XrootD Disk Server Local and G lo bal Netw ork / F ib er Lin e from Netw ork PBS SERVER DNS SERVER 32or64bit Servers 1U & 2U Servers Few Tower ServersNew SAN Box Old NAS Older NAS Even Older DAS UI SERVER Tier3 Manage ment Server and cluster HA SERVER Monitoring Server Installation,DHCP Server etc.. Cooling, UPS Fire Alarm, Access Control etc… HP DELL IBM Etc…
Frontend component of Site & Installation
LCG-CE SE CREAM-CE Site BDII LCG-UI VO-BOX PURE XrootDGrid middleware meta-packages installed through YUM and configured through YAIM.
Middleware changed time to time like GLITE EMI. (follow manual)
During Kolkata Site installation and configuration we
experienced about RPM dependencies with JAVA, Security packages etc.
Community and mailing list helps a lot. For most of the problem we got the solution from mailing list.
Middleware installed on
IN-DAE-VECC-02 Site
1.Installed SLC 5.8 (x86_64) operating system on x86_64 Machine.
2. Upgrading below middleware packages to EMI middleware.
glite-VOBOX
CREAM-CE (64bit)
glite-BDII
Pure XROOTD
Redirector as
Storage Element
glite-WN (64bit)
grid01.tier2-kol.res.in gridce02.tier2-kol.res.in dcache-server.tier2-kol.res.inFor 79 Worker Nodes (476 core)
wn045-wn123.internal.tier2-kol.res.in
Vikas Singhal, VECC, INDIA
Backend Component of SITE
Router & Switch
2 networks, one Public Network and another Private network.
Domain Name Server
DNS server is critical component. We have 2 redundant Name servers Naamak & suchak for High Availability.
Time Server
Configured NTP protocol Installer
Using Network installation and
Automated configuration Quattor like tools.
Storage Server
Using NFS mounted Common shared space
PBS Server
CE & PBS batch scheduler on a
Server. Configured Firewall (through iptables) and did NAT ing on it.
TIER-3 Cluster
Separate cluster for local users with Interactive and non interactive nodes.
Monitoring Server
Configured MRTG (Network Traffic Monitoring) and cluster monitoring tool.
Vikas Singhal, VECC, INDIA
Vikas Singhal, VECC, INDIA
Kolkata TIER-2 centre logical diagram
Router Switch gridce02 Backup-server wn045 wn046 wn122 wn123 Switch-1 Switch-2 Internet 300Mbps Computing Nodes 25 Nodes Dell and Wipro Blades Cluster with 25 TB of As Tier-3 1 9 2 .16 8 .x.x (St an d b y ) 144.16.112.xx/27 130 TB Backup grid Grid-peer gridse001
wn001 wn002 wn024 wn025 Switch-1 Switch-2 GRID-PEER Tier-3
cluster with 32 & 64 bit machine Computing Nodes 1 9 2 .16 8 .x.x (St an d b y ) IN-DAE-VECC-02 Site with 64 bit machine
Installer DELL and HP Blade Server with Multi Core Xeon 3.0 GHz
naamak suchak grid01
dache-server 4 – Xrootd Disk Servers Consisting of 230 TB of IBM And HP SAN system SINP 1Gbps Fiber Backbone
Vikas Singhal, VECC, INDIA
ALICE Tier-2 Grid Started in 2002
CERN
512Kbps Ethernet Bandwidth
Operating System
› Scientific Linux 3.05
Middleware
› Alice Environment with PBS as batch system
Hardware (CPU, Disk)
› 1xDuel Xeon,4GB Compute Node
› 2xDuel Xeon,2GB WNs
› 2x80GB Disk Space
Bandwidth
› 512Kbps Shared
S. K. Pal & T. Samanta Started in 2002.
Vikas Singhal, VECC, INDIA
From 2 Core to 700 Cores
Started with
----2 Desktop Machine
2002
----2 Tower Like Servers
2003
----9 HP 1U Servers
2004
----17 Wipro 1U Servers Single Core
2006
----40 HP Blades Dual Core
2008
----8 HP Blades Quad Core
2009
----32 Dell Bladed Dual Processor Dual Core
2011
Vikas Singhal, VECC, INDIA
2007
2011
2009
Vikas Singhal, VECC, INDIA
From 512MB Disk to 300TB Disk
Started with
----512MB in Desktop Machine
2002
----40GB in Tower Like Servers as DAS
2003
----400GB in HP MSA 500
2004
----2TB Wipro NAS
2006
----108TB HP EVA SAN
2008
---- 25 TB i-scsi
2009
----200TB IBM DS 5100
2011
Vikas Singhal, VECC, INDIA
2006
2008
2010
Vikas Singhal, VECC, INDIA
From 128Kbps to 1Gbps Disk
Started with
----128Kbps shared link
2002
----512Kbps
2003
----2Mbps Dedicated Link
2004
----4Mbps from Bharti
2006
----30Mbps from Reliance
2008
----100Mbps from VSNL (ERNET)
2009
----300 Mbps from NKN
2011
----Upgrading with 1Gpbs
2012
Vikas Singhal, VECC, INDIA
Efficient Cooling Concept and Implementation
Hot and Cool Air is separated.
For air separation, Cold Air Containment is
created.
Cold Air Containment is least accessible
Area.
Cool only hardware racks, not human, walls
etc.
Human intervention to Cold Aisle
Containment is restricted.
All the management and monitoring of the
server, storage is from outside Cold Aisle
Containment.
All the power and Ethernet cables are also
from outside Cold Aisle Containment.
Temperature gradient between Cold and
Hot aisle is 5
oC
Vikas Singhal, VECC, INDIA
Major Achievements
Vikas Singhal, VECC, INDIA Consistently more than 400 ALICE Jobs are running after Commissioning of the efficient Cooling Solution.
Vikas Singhal, VECC, INDIA
Kolkata Tier-2
provided total 6.0K
HEP SPEC2006 CPU
and 230TB of Disk
Storage.
Vikas Singhal, VECC, INDIA Performance:
~1M jobs successfully completed during last one year
Jobs
completed
Vikas Singhal, VECC, INDIA
Total Kolkata Tier-2 Resources
Computing Resources:-
Total :- 476 Cores
DELL Blades 32 * 8 = 256
HP Quad Core Blades 8*8= 64
HP Dual Core Blades 39 * 4 = 156
Storage :- 230TB under one HP 2U Management Server
74TB : HP EVA 6100 under 2 * 2U HP disk server
156TB : IBM DS 5100 under 2 * 1U IBM disk server
300Mbps Network speed. It will be increased upto 1Gbps
Grid-Peer Tier-3 Cluster
1U Sliding LCD Monitor with 16 port KVM
Dell(TM) PowerEdge(TM) M1000e Blade Server Chassis.
16 Number of Dell(TM) PowerEdge(TM) M610 High Performance Intel Blade
Each blade has latest Nehalem based 2 * Intel Quad Core E5530 Xeon 2.4GHz CPU with 8MB cache.
Each blade has16GB RAM.
Each blade has 2 * 146GB Mounted as RAID1.
Installed SLC 5.6 x86_64 OS (kernel version 2.6.18-164.6.1.el5).
Dell™ ISCSI EqualLogic Storage
16 * 2TB SAS hard disks.
24.88TB Usable space after RAID5 and Hot Spare.
Vikas Singhal, VECC, INDIA
Total 25 Nodes for VECC users and PMD Collaborators.
12 32bit nodes
13 64bit computing nodes
32 bit nodes are on oldest hardware procured in 2004 (slowly we
will deprecate them as High noise, power and Heat Generation.).
25 TB of Total storage.
50 + active users (across India.)
30 + active users (in VECC.)
Quota implemented.
Root, Geant3, Aliroot, Alien, Fortran etc user specific software
installed according to hardware like 32 bit and 64 bit.
Extensively used by the users, need to extend.
Vikas Singhal, VECC, INDIA
Intra-DAE Grid
EU-India Grid
Health Grid
IGCA
GARUDA Grid
Bi-product of WLCG GRID
Thank
You
Vikas Singhal, VECC, INDIA
Main data types in ALICE
• ESD – run/event numbers, trigger word, primary vertex, arrays of tracks/vertices, detector info
• AOD standard – cleaned-up ESD’s, reducing the size by a factor of 5
– Can be extended on user demand with extra information
• ESD and AOD inheriting from the same base class (keep same event interface)
Raw data Conditions Calibration Alignment data AliRoot RECONSTRUCTION OCDB (updated by pass0 -passN AliEn FC
Event Summary Data Pass1 – T0
Event Summary Data Pass2 – T1
Event Summary Data PassN – T1
ESD filtering AOD standard Analysis + extra Analysis Analysis
Vikas Singhal, VECC, INDIA Monte Carlo
Site ALICE central services
Job submission
Job 1 lfn1, lfn2, lfn3, lfn4 Job 2 lfn1, lfn2, lfn3, lfn4 Job 3 lfn1, lfn2, lfn3 Job 1.1 lfn1 Job 1.2 lfn2 Job 1.3 lfn3, lfn4 Job 2.1 lfn1, lfn3 Job 2.1 lfn2, lfn4 Job 3.1 lfn1, lfn3 Job 3.2 lfn2 Optimizer AliEn CEWMS
CE
WN
Env OK? Die with grac e Execs agent Sends job agent to site Yes NoClose SE’s & Software Matchmaking
Receives work-load Asks work-load
Retrieves workload
Sends job result Updates
TQ
Submits job User ALICE Job Catalogue
VO-Box LCG User Job ALICE catalogues Registers output lfn guid {se’s} lfn guid {se’s} lfn guid {se’s} lfn guid {se’s} lfn guid {se’s}
ALICE File Catalogue
packman
Xrootd architecture
Client
Redirector
(Head Node)Data Servers
open file XA
B
C
go to C Who has file X?
Cluster
Client sees all servers as xrootd data servers
All storages are on WAN
2nd open X
go to C Redirectors
Cache file location
Global redirector (not in picture) – intra-site storage collaboration
Grid security (in a nutshell!)
Important to be able to identify and authorise users
Possibly to enable/disable certain actions
Using X509 certificates
The Grid passport, delivered by a certification authority. (IGCA for India)
For using the Grid, create short-lived “proxies”
Same information as the certificate
… but only valid for the time of the action
Possibility to add “group” and “role” to a proxy
Using the VOMS extensions
Allows a same person to wear different hats (e.g. normal user or
production manager)
Your certificate is your passport, you should sign whenever you use it,
don’t give it away!
Less danger if a proxy is stolen (short lived)
The VOBOX
The VOBOX
is a WLCG
service developed in 2006
to provide the experiments with a service to:
a)
Run their own services.
b)
In addition it also provides file system access to the
experiment software area.
The concept of VOBOX is not the same for the 4 LHC
experiments
a)
ALICE requires the STANDARD WLCG VOBOX
Storage strategy
WN
SE head node xrootd (manager) MSS xrootd (worker) Disk S RM xrootd (worker) DPM xrootd (worker) Castor S RM S RM MSS xrootd emulation (worker) dCache S RM DPM, CASTOR, dCache are LCG-developed SEs, xrootd is entering as a strategic solution Old implement ation Current version 2.1.8 Working, but severe limits with multiple clientsWhat is MonALISA ?
Caltech project started in 2002
http://monalisa.caltech.edu/
Java-based set of distributed, self-describing services
Offers the infrastructure to collect any type of information
Can process it in near real time
The services can cooperate in performing the monitoring tasks
Can act as a platform for running distributed user agents
MonALISA software components
and the connections between
them
Data consumersMultiplexing layer Helps firewalled endpoints connect
Registration and discovery
JINI-Lookup Services Secure & Public
MonALISA services Proxies Clients HL services Agents Network of
Data gathering services
Fully Distributed System with no Single Point of Failure
PROOF
Parallel ROOT Facility
Interactive parallel analysis on a local
cluster
Parallel processing of (local) data
Fast Feedback
Output handling with direct visualization
PROOF is part of ROOT
root
Remote PROOF Cluster
Data root root root
Client
–
Local PC
ana.C
stdout/result
node1 node2 node3 node4 ana.C rootPROOF Schema
DataProof master
Proof slave
Result Data Result Data Result Result