• No results found

Data analysis in Par,cle Physics

N/A
N/A
Protected

Academic year: 2021

Share "Data analysis in Par,cle Physics"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)

Data  analysis    

in    

Par,cle  Physics  

(2)

$  whoami  

Lukasz  (Luke)  Kreczko  –  Par,cle  Physicist  

Graduated  in  Physics  from  University  of  

Hamburg  in  2009  

2009  –  2013  PhD  in  Par,cle  Physics  at  the  

University  of  Bristol  

Currently  Compu,ng  Research  Assistant  at  the  

(3)

Outline  

Data  taking  at  the  Compact  Muon  Solenoid  

(CMS)  experiment  

Data  format  (and  distribu,on)  

Data  analysis  procedure  

(4)

Outline  

Data  taking  at  the  Compact  Muon  Solenoid  

(CMS)  experiment  

Data  format  (and  distribu,on)  

Data  analysis  procedure  

(5)

Outline  

Data  taking  at  the  Compact  Muon  Solenoid  

(CMS)  experiment  

Data  format  (and  distribu,on)  

Data  analysis  procedure  

(6)

What  is  CERN  

Conseil  Europeen  pour  la  Recherche  Nucleaire  –  aka  

European  Laboratory  for  Par,cle  Physics  

Between  Geneva  and  the  Jura  mountains,  straddling  the  

Swiss-­‐French  border  

Founded  in  1954  with  an  interna,onal  treaty  

Our  business  is  fundamental  par,cle  and  how  our  universe  

works  

What  is  the  origin  of  mass?  We  are  a  step  closer  with  the  Higgs!  

What  is  96  %  of  the  universe  made  of?  We  only  see  4%!  

Why  isn’t  there  an,-­‐maber  in  the  universe?  

(7)
(8)

Large  Hadron  Collider  

Mankind’s  biggest  machine  (27  km  circumference)  

Ho:er  than  the  centre  of  the  sun:  collisions  are  100,000  @mes  ho:er  

Colder  than  deep  space:  (super)  liquid  helium  cooling  at  1.9  K  (-­‐271  C)  

(9)
(10)
(11)

The  experiment:  a  big  digital  camera  

40  million  “pictures”  

per  second  

Each  “picture”  around  

1  MB!  

(12)
(13)

The  data:  a  structured  mess  

(14)

What  do  we  do?  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

(15)

What  do  we  do?  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

Paper    

(16)

The  experiment  -­‐  CMS  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

Paper    

Input  from  LHC  

40  million  collisions  

per  second  

40  Tera  bytes  per  

second  

Hardware  trigger  (L1)  

Low  resolu,on  

Makes  decision  in  3  

micro  seconds  

Reduces  output  to  100  

(17)

High  Level  Trigger  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

Paper    

Input  from  experiment  

100,000  collisions  per  

second  

Sodware  trigger  (HLT)  

“poor  man’s”  

reconstruc,on  

High  resolu,on  

Writes  around  700  Hz  

(700  MB/s)  in  ROOT  

data  format  

(18)

(Event)  Reconstruc,on  

hbp://en.wikipedia.org/wiki/

Event_reconstruc,on

 

Reading  the  detector  informa,on  and  

bundling  it  into  par,cles  

Detector  response  from  different  detector  regions  

helps  to  iden,fy  par,cles  

In  addi,on  algorithms  look  for  specific  par,cle  

behaviour  (i.e.  b-­‐quark:  travels  half  a  millimetre  

before  decaying)  and  iden,fy  them  

(19)

ROOT  

ROOT  (

hbp://root.cern.ch

,  

hbp://root.cern.ch/git/root.git

)  

Developed  in  1995  

ROOT  is  a  lot  of  things:  

hbp://root.cern.ch/drupal/content/about

 

Most  used  features  (subjec,ve):    

(20)

ROOT  

Also  has  a  C  interpreter  (CINT)  

blessing  and  curse    

ask  any  student  which  one  is  more  accurate  

177  PB  of  LHC  data  stored  in  ROOT  format  

“ROOT  –  The  Next  Genera,on”:  

hbps://indico.cern.ch/conferenceTimeTable.py?

(21)

ROOT  data  format  

hbp://root.cern.ch/drupal/content/root-­‐

files-­‐1

 

Binary  storage  for  C++  objects    

Serialisa,on  via  TObject  class  

Supports  par,al  reads  (i.e.  subset  of  objects)  

Objects  grouped  by  event  (i.e.  

file.GetEvent(10).electron.at(0).energy())  

Supports  read-­‐ahead  (tuneable  parameter  for  

(22)

CERN  T0  –  data  reconstruc,on  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

Paper    

Input:    

300-­‐350  collisions  per  

second  

Rest  is  done  when  

machine  is  shut  down  

Reconstruc,on  

Connec,ng  the  dots  

(23)

Analysing  all  data  

CMS  records  10  000  Terabytes  of  data  every  

year  (around  70  years  of  full  HD  movies)  

+  same  amount  of  simula,on  

To  analyse  this  on  a  single  computer  would  

(24)

Analysing  all  data  

CMS  records  10  000  Terabytes  of  data  every  

year  (around  70  years  of  full  HD  movies)  

+  same  amount  of  simula,on  

To  analyse  this  on  a  single  computer  would  

(25)

The  LHC  grid  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

Paper    

Distribu,ng  on  a  global  

scale  

This  is  where  the  analysis  

(26)

The  data:  a  much  nicer  picture  

Muon:
 pT
=
71.5
GeV/c
 η
=
‐0.82
 Missing
ET:
 22.3
GeV
 Jet:
 pT
=
89.0
GeV/c
 η
=
2.14
 Jet:
 pT
=
85.3
GeV/c
 η
=
2.02
 Jet:
 pT
=
90.5
GeV/c
 η
=
‐1.40
 Run:





163583
 Event:
 
26579562
 Jet:
 pT
=
84.1
GeV/c
 η
=
‐2.24
 m(F)=1.2
TeV/c_ 2

(27)

The  goal:  extend  our  knowledge  

Muon:
 pT
=
71.5
GeV/c
 η
=
‐0.82
 Missing
ET:
 22.3
GeV
 Jet:
 pT
=
89.0
GeV/c
 η
=
2.14
 Jet:
 pT
=
85.3
GeV/c
 η
=
2.02
 Jet:
 pT
=
90.5
GeV/c
 η
=
‐1.40
 Run:





163583
 Event:
 
26579562
 Jet:
 pT
=
84.1
GeV/c
 η
=
‐2.24
 m(F)=1.2
TeV/c_ 2


Billions  of                        +  simula,on  

 

(GeV)

γ γ

m

110

120

130

140

150

S/

(S

+

B

)

W

ei

ghted

Events

/ 1

.5 GeV

0

500

1000

1500

Data S+B Fit B Fit Component σ 1 ± σ 2 ± -1 = 8 TeV, L = 5.3 fb s -1 = 7 TeV, L = 5.1 fb s CMS (GeV) γ γ m 120 130 Events / 1.5 GeV1000 1500 Unweighted
(28)

The  goal:  extend  our  knowledge  

Muon:
 pT
=
71.5
GeV/c
 η
=
‐0.82
 Missing
ET:
 22.3
GeV
 Jet:
 pT
=
89.0
GeV/c
 η
=
2.14
 Jet:
 pT
=
85.3
GeV/c
 η
=
2.02
 Jet:
 pT
=
90.5
GeV/c
 η
=
‐1.40
 Run:





163583
 Event:
 
26579562
 Jet:
 pT
=
84.1
GeV/c
 η
=
‐2.24
 m(F)=1.2
TeV/c_ 2


Billions  of                        +  simula,on  

 

(GeV)

γ γ

m

110

120

130

140

150

S/

(S

+

B

)

W

ei

ghted

Events

/ 1

.5 GeV

0

500

1000

1500

Data S+B Fit B Fit Component σ 1 ± σ 2 ± -1 = 8 TeV, L = 5.3 fb s -1 = 7 TeV, L = 5.1 fb s CMS (GeV) γ γ m 120 130 Events / 1.5 GeV1000 1500 Unweighted
(29)

Analysis  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

Correc@ons

:  applying  the  newest  knowledge  about  

the  experiment  

(30)

Analysis  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

Filtering

:  we  know  more  or  less  what  we  

are  looking  for  

(31)

Analysis  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

Selec@on

:  very  refined  selec,on  to  increase  

signal  purity  (usually  a  ,ny  effect  compared  

to  backgrounds)  

Muon:
 pT
=
71.5
GeV/c
 Missing
ET:
 22.3
GeV
 Jet:
 pT
=
89.0
GeV/c
 η
=
2.14
 Jet:
 pT
=
85.3
GeV/c
 η
=
2.02
 Jet:
 pT
=
90.5
GeV/c
 η
=
‐1.40
 Jet:
 pT
=
84.1
GeV/c
 η
=
‐2.24

(32)

Analysis  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

Analysis:  

apply  algorithms  (produce  derived  data)  

Histograms

:  data  reduc,on  

+ B ) W ei ghted Events / 1 .5 GeV 500 1000 1500 Data S+B Fit B Fit Component σ 1 ± -1 = 8 TeV, L = 5.3 fb s -1 = 7 TeV, L = 5.1 fb s CMS (GeV) γ γ m 120 130 Events / 1.5 GeV1000 1500 Unweighted

(33)

Analysis  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

Rinse    

&  

repeat

 

(34)

Analysis  in  Big  data  terms  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

MAP  

MAP  

REDUCE  

REDUCE  

(35)

Analysis  in  Big  data  terms  

Data  prepara,on  

Data  reduc,on  

Event  selec,on  

histogramming  

MAP  

MAP  

REDUCE  

REDUCE  

LHC  Grid  

Usually  

local  site  

(36)

Summary  

The  data  from  the  experiments  are  reduced  

before  storing  them  to  disk/tape  

All  data  is  stored  in  ROOT  format:  either  as  

classes  or  as  basic  data  types  

Heavy  workflows  are  performed  on  the  LHC  

grid,  frequent  and  fast  work  usually  on  local  

servers  

The  final  result  is  a  histogram  (or  table)  and  is  

a  huge  reduc,on  step  from  the  input  (20  PB  -­‐>  

100  MB)  

(37)

Ques,ons?  

 

Thank  you  for  listening.  

 

(38)
(39)

ROOT  and  CMS  

hbps://indico.cern.ch/getFile.py/access?

contribId=16&resId=0&materialId=slides&con

fId=217511

 

(40)
(41)

What  do  we  do?  

Experiment  

Local  compu,ng  farm  

CERN  data  centre  

Globally  distributed  

data  centres  

My  computer  

Paper    

online  

offline  

References

Related documents

(ii) The ratio of visual signal level to coherent disturbances which are frequency- coincident with the visual carrier shall not be less than 47 decibels for coherent channel

NOW IS THE TIME FOR HEROES! MUTANTS & MASTERMINDS A G R E E N RONIN PRODUCTION Design & Development Steve Kenson Cover Art Ramón Pérez Editing Jon Leitheusser Executive Producer

ó9ê¶Ø/ô9Õ~Ú;çuցè9ÚÕAÙ%Ú;ïˆ×¼ê£ð~Ù%Øu鼨7ÕÇÖwêŸÚ åaååaååaååaåHååaåHååHåaååaååaååaåaå õ ä/å¬ò9å~ä

[r]

[r]

[r]

Ö %HÑ Ø ÓUÓ1ÜåÖlðÒç1ÖÝ1ÝLÜ éçoæ ç!ÑÓ1Ô Ó1éÐÖRÓ1ܹԂälÑ ç!ÐÜsî·éçfÑ ØóÑ

[r]