• No results found

HEP computing and Grid computing & Big Data

N/A
N/A
Protected

Academic year: 2021

Share "HEP computing and Grid computing & Big Data"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

CERN IT Department CH-1211 Genève 23 Switzerland

www.cern.ch/it

HEP computing and

Grid computing

&

Big Data

Massimo Lamanna/CERN

IT department - Data Storage Services group

May 11

th

2014

(2)

CERN IT Department CH-1211 Genève 23 Switzerland

www.cern.ch/it

My objectives for today!

Why High-Energy Physics (HEP) needs computing?

Computing: CPU cycles and Data Storage

How can we deliver enough computing to HEP?

Lots of CPU, disks and tapes!

Any relevance beyond HEP?

Who is CERN IT? Staff, fellows and students!

And of coursetrying to answer all your questions!

(3)

What are all these computers/disks/networks... for?

3

Theory and

(Detector) simulation

Data analysis and

Results interpretation

Data acquisition and

(4)

From Physics to Raw Data

4

Basic physics Fragmentation,

Decay Interaction with detector material Multiple scattering, interactions Detector response Noise, pile-up, cross-talk, inefficiency, ambiguity, resolution, response function, alignment, temperature 2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 Raw data (Bytes) Read-out addresses, ADC, TDC values, Bit patterns

e+,q

e-,q

f

f

Z0

_

_

(5)

From Raw Data to Physics

5

e+,q

e-,q

f

f

Z0

Basic physics Results Fragmentation, Decay Physics analysis Interaction with detector material Pattern, recognition, Particle identification Detector response apply calibration, alignment, 2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 Raw data Convert to physics quantities

Reconstruction

Simulation (Monte-Carlo)

Analysis

_

_

(6)

(Innovation in) Computing in High-Energy Physics

Demanding science

Demanding computing

Power usage

Innovation

Web invention

Grid computing (LHC Computing Grid)

(7)

7

Invented by/in/for the HEP community

Now at the hearth of our lives

(8)

WLCG (Worldwide LHC Computing Grid)

• The Worldwide LHC Computing Grid (WLCG) collaboration was set up in 2002 for the Large Hadron Collider (LHC) physics programme at CERN. It links up national and international grid infrastructures.

• WLCG mission is to provide global computing resources to store, distribute and analyse the ~25 Petabytes (25 million Gigabytes) of data annually generated by the Large Hadron Collider (LHC) at CERN

• Challenging points:

– Large data volumes

– Large computing needs (simulation and data reconstruction and analysis)

– Very large community (including large number of computer centres and other institutes not necessary part of a LHC experiment)

– Aiming to total:CERN resource ratio of the order of 5:1 or more (20%)

• Historical notes:

– Computing activities in the late 1990 (e.g. Monarc project: http://monarc.web.cern.ch/MONARC/ )

– “Grid” concepts (I. Foster and K. Kesselman) matched with our approach

– Distributed approach “matches” our distributed user communities

– Participation to multiscience Grid projects (notably EGEE 2004-2010)

• WLCG now operates a distributed infrastructure with 200+ major cooperating computer centres worldwide and provides the foundation for all data processing activities. WLCG delivers access to LHC data and computing power for extensive data analysis to virtually every LHC physicists.

• Several tens of PB per year of data files are collected and distributed: experimental data (collected at rates up to several GB/s) supplemented by derived data (processing and filtering) and simulation data (generated on the same infrastructure). Depending on data type and usage, files are replicated on disk farms and tape repositories to ensure

(9)

Typical data rates 100-1000 MB/s aver multiple 10 Gb fibres.

Large (and final) filtering takes place at the experiment (trigger filter farms). All data are then kept (RAW data)

Bd 513 9 AMS ALICE ATLAS LHCb CMS COMPASS

(10)

10

FR-CCIN2P3

DE-KIT

NL-T1

ES-PIC

IT-INFN-CNAF

NDGF

US-T1-BNL

CA-TRIUMF

TW-ASGC

UK-T1-RAL

12 Tier1

SK-KISTI

US-FNAL

(11)

Intercontinental links

(

data and jobs

)

AP area: Taipei (Tier1), Tokyo, Beijing, Seul, Melbourne, Mumbai, …

North America: BNL and Fermilab (US Tier1), Victoria (Canada Tier1) and

many Tier2 like Stanford, MIT, Wisconsin, Argonne, …

South America: several Tier2s Africa: few sites as the South

Africa Tier2s

(12)

Data distribution (e.g. CERN distributes RAW data to Tier1s)

12 centres on-line

Tier1s being built in Moscow area (@Dubna and @Kurchatov Institute)

(Large) processing campaigns on preplaced data (e.g. Reprocessing)

Download small sample for local analysis? … but

Scale out with grid jobs (user executable dispatched where data are)

Using file parallelism (data set

list of files

list of independent jobs)

Federating storages

(13)

Grid at work (computing power)

CPU contributions across the worldwide

LHC Computing Grid

See next slides: ~80,000 cores

(14)

CMS processing: wall

clock consumption

Tier0 and Tier1 processing

Top/Bottom: particle

collisions off/o

Sizeable even if no data

taking: continuous

reprocessing

Reconstruction activities

RAW

reconstructed

objects

Organised processing

Output for physicists

analysis

They can access RAW data if

needed

Final analysis more efficient

on the files containing the

“reconstructed objects”

The grid never sleeps…

(15)

Data on the Grid (CMS experiment)

Much more data than

RAW!

Primary data from

collisions

+ their replicas

Different levels of

reconstructed and

filtered data sets

+ their replicas

70 PB

(16)

LHC run 1 (2012) data (transfers)

Data taking is part of the full

picture

Primary data

Replication, multi-step

analysis and simulation

“generates” more data

And network traffic (data

placement)

(17)

Data handling is an operational

challenge

Source destination correlation matrix

(18)

Analysis flow (user view)

18

Raw data

(Exp. Data)

Production System Production System User analysis

Simulation

Data selection (quality, type, config…)

But how this is done

in practice

? Of course we need CPUs, disks, networks etc..

The problem is orchestrating hardware resources, software and humans :)

“All” data are stored in files (aggregated as “datasets” = collections of files). Only a small

fraction of data in real DBs (e.g calibrations). This is one characteristics of HEP computing.

(19)

19

Central services

Regional services

Tier2/3 (data on disk + CPU)

Execution site

2011 data; quality; run conditions… to be analyse it with myprog.exe [f1,f2,…fn]

Federation/location: fjis in [site1,site2,…sitem] Executed myprog.exe(/storage/data/2011/run4/fj)

In general is from [site1,site2,…sitem] but not always(flexibility, robustness, efficiency)

Toy model of user analysis:

(20)

20

CMS user analysis

as May the 11th

(21)

Close collaboration with CERN IT

COMPASS was the first/among the first to:

Migration to Linux PC farms

CHEP2000

CCF at CERN

ACID in Trieste

Use of Objectivity/DB at the 10

2

TB scale

First production user of CASTOR

Deliver and operate CORAL (C++ framework)

for data processing

And the algorithmic parts, notably the

RICH reconstruction

CCF at CERN

ACID in Trieste (INFN)

Still in use

Still in use

Eventually decommissioned

(22)

Design

Value (35 MB/s)

5thof May 2014 CASTOR statistics (tape files only)

Total size: 87.337 PB

Total number of files: 262.601 M Top-6 size (by group)

30.436 PB with group ATLAS (id 1307) 17.266 PB with group CMS (id 1399) 13.888 PB with group COMPASS (id 1665)

9.616 PB with group ALICE (id 1395) 6.585 PB with group LHCb (id 1470) 1.839 PB with group zf (id 1018)

(23)

23

(24)

CERN Computer Centre

CERN computer centre:

• Built in the 70s on the CERN site (Meyrin-Geneva) • ~3000 m2(3 main machine rooms)

• 3.5 MW for equipment • Est. PUE ~ 1.6

New extension:

• Located at Wigner (Budapest) • ~1000 m2

• 2.7 MW for equipment

• Connected to the Geneva CC with 2x100Gb links (21 and 24 ms RTT)

Another room ~20 ms away…

(25)

CASTOR and EOS

CASTOR and EOS are using the same commodity HW

RAID-1 for CASTOR

2 copies in the mirror (same box, multiple copies possible)

JBOD with RAIN for EOS

enforce replicas to be on different disk servers

tunable number (usually 2 replicas)

Erasure coding

“Arbitrary” configurable durability

Disaster recovery: cross-site replication

CASTOR and EOS:

developed at CERN

(IT department)

Workhorses for

physics data

(archival,

reconstruction,

analysis and

distribution)

25

(26)

26

Data durability

CERN

Tier1s

Simplified schema: 1 copy + 1 “distributed” copy (2 tape copies)

Scalability + data durability (disaster recovery)

(27)

27

Data durability (2)

MGM

NS + Stager

D

t ~ several 10

3

s

D

t ~ several 10

3

s

Single disk failure

: probability p;

healing time:

D

t

Double failure

: p

2

effect; overlapping

failure during healing time

 D

t;

(disk

x

,disk

y

)

Single disk failure

: probability p;

Double failure

: p

2

effect; overlapping

failure during healing time

 D

t;

(disk

x

,disk

y

)

(28)

Data & Storage Services

Total installed capacity: 25PB ~900 disk servers

Total installed disk capacity: 13.0PB ~600 disk servers

Mainly used as disk front-end

(includes LHC data repository)

Disk-only data (analysis)

CERN data management (physics data)

Unique function across LCG:

• Primary repository for LHC data (raw)

• Source of prompt reconstruction

• Distribution point for LHC data (raw)

28

“Low” numbers (maintenance due to LHC technical stop)

(29)

Availability and I/O performances

Data taken driven

Analysis driven

pp 2012

pA 2013

29

HW types

Different generations of PC boxes with 20-200 TB disks space

(30)

Multiple copy

• Geolocalisation

• 1 in Meyrin (CERN Geneva), 1 in

Budapest (Wigner)

• Whenever possible

Multiple MGM

deployment

• Complex deployment to guarantee low

latency

• Xroot protocol: redirect clients

• Single master MGM

• Most operations can be done on R/O

MGM

• Transparent redirection

(31)

CERN Disk/Tape Storage Management @ storage-day.ch

CASTOR (tape archive)

31

Data:

~90 PB of data on tape; 250 M files

Up to 4.5 PB new data per month

Over 10GB/s (R+W) peaks

Infrastructure:

~ 52K tapes (1TB, 4TB, 5TB, 8TB)

9 Robotic libraries (IBM and Oracle)

(32)

Other main services:

TSM

Internal/beta services:

Hadoop (*)

CERNBOX (cloud sync)

(*) Initiated by a Technical Student (S. Russo) from TS+UD uni (ATLAS experiment)

(33)

CERNBOX to enter beta soon

CERNBOX provides a cloud synchronisation service for all CERN users between personal devices (laptops, smartphone, tablets) and a centrally-managed data storage.

Coherent data handling (with our other services): SLA, ACL, curation... CERNBOX is built on top of the ownCloud software. CERNBOX

supports access via web browsers, desktop clients (Linux, Mac and Windows) and mobile-device applications (Android and iOS).

http://owncloud.com

(34)

Ceph Architecture and Use-Cases

• Open-software storage

solution

• Scalable from 10s to

10,000s of machines,

from Terabytes to

Exabytes (10

3

PB)

• Fault tolerant with no

SPOF, uses commodity

hardware

• Self-managing,

self-healing

(35)

Deploying a 3-PB prototype

• Fully puppetized deployment

• Automated machine

commissioning and maintenance

• Add a server to the hostgroup (osd, mon, radosgw)

• OSD disks are detected, formatted, prepared, auth’d

• Also after disk replacement

• Auto-generated ceph.conf

• Last step is manual/controlled: service ceph start

• Mcollective for bulk operations on

the servers

• Ceph rpm upgrades • daemon restarts

# df -h /mnt/ceph

Filesystem Size Used Avail Use% Mounted on

(36)

Beyond HEP

Ground floor exposition

Sharing experience with other activities (e.g. Biology)

Planning strategic partnership with other sciences (HelixNebula)

Collaboration with technology providers (OpenLab)

(37)
(38)

The 3 Vs in HEP

Volume

• For sure at somewhere between 10 and 100 PB you are getting big

Variety

• # of different technical solutions, operational procedures, users and workflows?

• # of different types (>3? >30?)

Velocity

• Dvolume/Dt in 1 PB/month (disk installation) • Last 2 years of EOS@CERN

• Dvolume/Dt in 4 PB/month (recording, processing and redistributing)

• LHC running (end of pp Run1) But do not forget:

• Max time you can be down! • Max integrated time in a month! • # of human interventions per day!

...and we want to read high volume of data at high velocity in a variety of ways 

User program

Output data

∀ event in 2012 CMS AOD …

(39)

Our “Big Data”

Physics Data on CASTOR/EOS

LHC experiments produce ~10GB/s 25PB/year

User Data on AFS & DFS

Home directories for 30k users

Physics analysis development

Project spaces for applications

Service Data on AFS/NFS/Filers

Databases, admin applications

Tape archival with CASTOR/TSM

Physics data (RAW and derived/additional)

Desktop/Server backups

Data growth

2 PB/month over the last 2 years (raw disk

space)

4 PB/month new data at the end of LHC phase 1

● Factor of 2 more at the beginning of phase 2?

Service Size Files

AFS 290TB 2.3B

CASTOR 89.0PB 325M

(40)

Staff, fellows and students!

Xav ie r Co rt ad a (w ith th e p arti ci pati o n o f ph ysic ist P et e Ma rko w itz ), "I n s earc h o f th e H igg s bo so n: H -> ZZ" , d ig ital art , 2013. 40

Staff, fellow and students

cern.ch/jobs

– Many opportunities for students!

• Physicists & Engineers (incl. Computer science) – Challenging but exciting!

• Summer Student (Jan/Feb 8-12 weeks in summer)

• OpenLab Summer Students (Jan/Feb 8-12 weeks in summer)

• Technical Students (2 times per year 6-12 months)

(41)
(42)

But finally why so much data?

Ldt

Luminosity

It is a measurement of the potential statistics accumulated in the collisions over time (here it accounts for the 2011 and 2012 data taking)

Think to the exposure time of a camera when taking pictures Events with 2 Zs decaying in 2-lepton pairs (here e+e-and m+m- )

(43)

But finally why so much data?

Ldt

Luminosity

It is a measurement of the potential statistics accumulated in the collisions over time (here it accounts for the 2011 and 2012 data taking)

Think to the exposure time of a camera when taking pictures Events with 2 Zs decaying in 2-lepton pairs (here e+e-and m+m- )

References

Related documents

These included: (1) a branch came off proximal to the ischial spine and coursed through the sacrotuberous ligament (these branches are often entrapped where they penetrate the

All Polling Places subject to change for reasons beyond control of the Election Board... All Polling Places subject to change for reasons beyond control of the

PPE placed on the market must satisfy the basic health and safety requirements laid down in the Regulation on PPE and must not endanger the safety of users, other individuals,

Finally, it is shown that the importance of leadership and its influence on organisational learning is due to the leader having to develop management policies of human resources

ILO core labour standards (but does not make reference to discrimination based on gender) and includes a reference to acceptable conditions of work with respect to

The presented statistics illustrate that reference portfolios of average quality (a 7.63% default probability over 10 years for all loans, conforming to a Baa rating, according

„ Hopefully, in the future SQL Developer will Hopefully, in the future SQL Developer will be available on the actual Oracle CD?. be available on the actual