• No results found

LHCb activities at PIC

N/A
N/A
Protected

Academic year: 2021

Share "LHCb activities at PIC"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

CCRC08 post mortem

CCRC08 post-mortem

LHCb activities at PIC

G. Merino

(2)

LHCb Computing

• Main user analysis supported at CERN + 6Tier-1s • Tier-2s essentially MonteCarlo production facilities

(3)

CCRC08: Planned tasks

• May activities: Maintain equivalent of 1 month data

t ki i 50% hi l ffi i

taking assuming a 50% machine cycle efficiency • Raw data distribution from pit → T0 centre

• Raw data distribution from T0 → T1 centres

– Use of FTS - T1D0Use of FTS T1D0

• Recons of raw data at CERN & T1 centres

– RAW (T1D0) rDST (T1D0)

• Stripping of data at CERN & T1 centres

– RAW & rDST (T1D0) DST (T1D1)

• Distribution of DST data to all other centres • Distribution of DST data to all other centres

(4)

Activities across the

sites

sites

• Planned breakdown of processing activities (CPU needs) prior to CCRC08 Site Fraction (%) CERN 14 FZK 11 IN2P3 25 CNAF 9 NIKHEF/SARA 26 PIC 4 RAL 11

(5)

Tier 0

Tier 1

• FTS from CERN to Tier-1 centres

– Transfer of RAW will only occur once data has migrated to tape & checksum is verified

– Rate out of CERN ~35MB/s averaged over the period Peak rate far in excess of requirement

– Peak rate far in excess of requirement

(6)
(7)

Tier 0

Tier 1

• To first order all transfers eventually succeeded

plot shows efficiency on 1st attempt

Issue with UK certificates

– plot shows efficiency on 1st attempt…

Restart IN2P3 SRM endpoint CERN outage CERN SRM endpoint problems

(8)

Reconstruction

• Used SRM 2.2

– LHCb space tokens are:

• LHCb_RAW (T1D0); LHCb_RDST (T1D0)

• Data shares need to be preserved

– Important for resource planningImportant for resource planning

• Input 1 RAW file & output 1 rDST file (1.6 GB)

• Reduced nos of events per recons job from 50k to 25k (job ~12 hour duration on 2.8 kSI2k machine)

– In order to fit within the available queues

– Need to get queues at all sites that match our i ti

processing time

(9)

Reconstruction

• After data transfer file should be online, as job submitted immediatelyy

– NOTE: in principle only LHCb has this requirement of “online reconstruction”

• Reco jobs will read the input data from the T1D0 write buffer

• Just in case… LHCb pre-stages files (srm_bringonline) & th h k th t t f th fil ( l ) b f

then checks on the status of the file (srm_ls) before submitting pilot job via GFAL

– Pre-stage should ensure access availability from cachePre stage should ensure access availability from cache – Only issue at NL-T1 with reporting of file status

(10)

Reconstruction

• 41.2k reconstruction jobs41.2k reconstruction jobs Sub Done Done/

submitted

• 27 6k jobs proceeded to

jobs jobs Sub

NIKHEF 10.3k (26%) 2.3k (6%) 23% • 27.6k jobs proceeded to done state D / d 6 % PIC 1.8k (4%) 1.6k (4%) 89% ☺ RAL 4.7k 3.5k 74% • Done/created ~67% RAL 4.7k(11%) 3.5k (8%) 74% CERN 6.1k (14%) 5.3k 86% (14%) (13%) CNAF 3.9k (9%) 2.8k (7%) 72% ( ) ( ) GridKa 4.1k (11%) 3.1k (7%) 76% IN2P3 10 3k 6 1k 56% IN2P3 10.3k (25%) 6.1k (14%) 56%

(11)

Reconstruction

• 27.6k reconstruction jobs in27.6k reconstruction jobs in 25k Fail Success

done state

– 21 2k jobs processed 25k

events upload /Created

NIKHEF 1.2k (53%) 0.9k (70%) 4% 21.2k jobs processed 25k events – Done/25k events ~77% PIC 1.6k (99%) 0.0k (0%) 89% ☺ RAL 3.1k 0.0k 68% – Done/25k events 77%

• 3.0k jobs failed to upload DST t l l SE (89%) (1%) CERN 5.2k (100%) 0.7k (14%) 76% rDST to local SE

– Only 1 attempt before t i F il CNAF 2.6k (95%) 0.0k (1%) 67% GridKa 3.0k 0.7k 58% trying Failover – Failover/25k events ~13% (99%) (22%) IN2P3 5.1k (90%) 0.7k (14%) 43%

(12)
(13)

Error humano en el PIC:

WN con la red desconfigurada 24-27 de Mayo Hacía de black-hole (ticket-4386)

(14)

Reconstruction

CPU efficiency: ratio of wall/cpu time on running jobs

CNAF: more jobs than cores on a WN …

IN2P3 & RAL:

P bl di

Problems reading input data

(15)

Reconstruction

CPU efficiency: ratio of wall/cpu time on running jobs

PIC: The most PIC: The most cpu-efficient T1

(16)

dCache Observations

• Official LCG recommendation - 1.8.0-15p3 • LHCb ran smoothly at half of T1 dCache sites

PIC OK version 1 8 0 12p6 (dcap) – PIC OK - version 1.8.0-12p6 (dcap) – GridKa OK - version 1.8.0-15p2 (dcap)

– IN2P3 - problematic - version 1.8.0-12p6 (gsidcap)

• Seg faults - needed to ship version of GFAL to run • Could explain CGSI-gSOAP problem????

– NL-T1 - problematicp ((gsidcapg p))

• Many versions during CCRC to solve number of issues • 1.8.0-14 -> 1.8.0-15p3->1.8.0-15p4p p

(17)

Databases

• Conditions DB used at CERN & Tier-1 centres

– No replication tests of conditions DB Pit ↔Tier-0 (and beyond)

– Switched to using Conditions DB 15th May for reconstruction

• LFC

U “ t i ” t l t th d l i t t

– Use “streaming” to populate the read-only instance at T1 from CERN

P bl ith CERN i t l d l l i t

– Problem with CERN instance revealed local instances not being used by LHCb!

T ti d

(18)

Stripping

• Stripping on rDST files

• 1 rDST file & associated RAW file

• Space tokens: LHC RAW & LHCb rDSTSpace tokens: LHC_RAW & LHCb_rDST

• DST files & ETC produced during the process stored locally on T1D1 (add storage class)

locally on T1D1 (add storage class)

• Space tokens: LHCb_M-DST

• DST & ETC file then distributed to all other computing • DST & ETC file then distributed to all other computing

centres on T0D1 (except CERN T1D1)

(19)

Stripping

Subm Done CERN 2.4k 2.3k CNAF 2.3k 2.0k GridKa 2 0k 2 0k GridKa 2.0k 2.0k IN2P3 4.5k 0.2k

NIKHEF 0 3k 0 1k • 31.8k stripping jobs were

submitted

NIKHEF 0.3k <0.1k

PIC 1.1k 1.1k

• 9.3k jobs ran to “Done” • Major issues with LHCb

RAL 2.2k 1.6k

Failed to resolve

17.0k

Major issues with LHCb book-keeping

resolve datasets

(20)

Stripping: T1-T1 transfers

CNAF PIC Initial Catch up ok Initial problems uploading to M-DST Token p once solved M DST Token at PIC GridKa RAL GridKa RAL 20

(21)

Conclusiones

• A pesar de ser el Tier-1 más pequeño de LHCb, la calidad de servicio del PIC ha sido la más alta en el CCRC08

servicio del PIC ha sido la más alta en el CCRC08

• Se han testeado los siguientes procesos para los Tier-1

Recepción de datos desde el CERN – Recepción de datos desde el CERN – Reconstrucción

– Stripping y envío de DST a otros Tier-1Stripping y envío de DST a otros Tier 1

• Los resultados en el PIC han sido positivos

– Recepción de datos desde el CERN (~5MB/s)Recepción de datos desde el CERN ( 5MB/s) – Lectura de datos desde WNs (dcap) – OK

– Demostrada replicación de DST a otros Tier-1s a más velocidad p

de la requerida (catch-up)

• El ejercicio ha sido también útil para que LHCb detecte los puntos débiles de su infraestructura Grid DIRAC

débiles de su infraestructura Grid DIRAC

References

Related documents

When you need to measure very wide silos, domes, large open bins, bulk storage rooms, stockpiles and warehouses, the Rosemount 5708 3D Solids Scanner system is the solution..

This section describes basic severe thunderstorm-producing synoptic weather patterns for mid-latitudes, and describes acknowledged parameters used to identify areas

Increasing the density of the matrix has traditionally been the most commonly used method to investigate the effects of three-dimensional matrix stiffness on

Cohort with nested case-control Case - control Screening 4 Partners – 26 institutes – 17 MS EU / EEA MS Studies in 15 sites – national case-control. based on sentinel networks

Community: Patients and Doctors, the midwives believed that, although some women were appreciative of them, pregnant women often disregarded their knowledge and only trusted

Affirmative Action and Public Policies: an Overview of Achievements for Equal Opportunities in the Brazilian Context From what was exposed about the historical context of