• No results found

Open access to data and analysis tools from the CMS experiment at the LHC

N/A
N/A
Protected

Academic year: 2021

Share "Open access to data and analysis tools from the CMS experiment at the LHC"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

Open access to data and

analysis tools from the CMS

experiment at the LHC

Thomas McCauley

(for the CMS Collaboration and QuarkNet) University of Notre Dame, USA

[email protected] !

(2)

Outline

• CMS at the LHC

• 1st public release of CMS data

• CMS masterclasses

• Large data release

• Open data portal

(3)

CMS at the LHC

• CMS (Compact Muon

Solenoid) is one of the two general-purpose

experiments at the LHC

• Over 350 papers

published describing searches for SUSY and exotica, measurements of QCD, electroweak, top, b, forward, and heavy-ion physics, as well as the discovery of the Higgs boson and its properties

• Collected ~ 28 1/fb of proton-proton collision data at COM energies up to 8 TeV

• Nearly 3000 physicists and ~800 engineers from over 40 countries

(4)

CMS public data (i)

The CMS experiment has allowed the release of the following data to the public for use in education and

outreach:

2000 events each of J/ψ μμ, J/ψ → ee! 2000 events each of Υ → μμ, Υ → ee$

500 events each of Z → μμ, Z → ee! 1000 events each of W → μν, W → eν!

100,000 events each of di-muon, di-electron, and di-jet events in the energy range 2-110 GeV!

19 Higgs candidate events: 10 γγ, 1 2e2μ, 1 4e, 1 4μ, 2 bb, 2 ττ, 2 WW in the mass range 120-130 GeV!

~50 1/pb single muons for top quark analysis

Bold: indicates datasets already delivered and/or in use

(5)
(6)

Masterclasses

• Masterclasses: students travel to nearby universities and

research laboratories to listen to lectures, analyze real LHC data, and interact with other groups via videoconference.

• International masterclasses organized under the auspices of

IPPOG, the International Particle Physics Outreach Group (http:// ippog.web.cern.ch) with central organization at TU Dresden and Notre Dame. In 2014 (from Feb 12 - Apr 12) there were 69 CMS masterclasses in 26 countries in 12 languages.

• CMS masterclass developed in collaboration with QuarkNet

(http://quarknet.fnal.gov)

(7)

CMS masterclasses in

2014

https://quarknet.i2u2.org/content/running-cms-wzh-path-masterclass

!

(8)

CMS masterclasses in

2014

(9)
(10)

2014 CMS masterclass

exercise

• Students use up to 30 separate datasets each with 100 events containing samples from

the W, Z, and di-lepton events (one 4-lepton and two di-photon Higgs candidate events included)

• Each group views in an event display up to 100 events and attempts to determine

whether or not it is a W or Z (di-lepton) event.

• If a W, did it decay into an electron and a neutrino or into a muon and a neutrino? What

is the charge of the lepton?

• If a Z, is it di-electron or di-muon? What is the invariant mass?

• What is the W+:W- ratio? What does it mean for proton and its structure?

• What does the invariant mass spectrum look like? (There will be several unexpected

peaks from the di-lepton background)

• 2015: content the same data analysis tools improved (covered later); what follows shows

(11)

After an introduction by moderator covering HEP and

the experiment, start by opening the event display:

(12)

Select a set of 100 W, Z, J/

ψ

, and Y

events (each with a Higgs candidate

(13)

electron?

significant

MET?

(14)
(15)

Mark the answer on the spreadsheet

(hosted on Google docs):

(16)

muon!

muon!

(17)
(18)

...students correctly

identified a electron 90% of the time and a muon 93%

of the time ...students correctly identified an event as a W 91% of the time ...students correctly identified an event as a Z candidate (i.e. an event with 2 leptons) 92% of the time

...when the students correctly identified an event as W → μν

(W → eν), they correctly identified the charge 84% (81%) of the time. 11% (16%) of

these events were assigned no charge

(19)
(20)

2014 results

CMS value

(21)
(22)

Videoconference

http://cds.cern.ch/record/1693152

Students communicate and discuss results with other

masterclass groups using Vidyo

http://cern.ch/vidyo

with support from CERN and FNAL IT:

(23)

For 2015

• Exercise to remain the same

• New IPPOG masterclasses start next month

• Masterclasses for CERN visitors start next week

• New browser-based tool developed by RWTH Aachen will replace Google spreadsheets and include creation of plots on-the-fly

• New event display

!

Beyond 2015: new opportunity to use open data from CMS to develop new exercises in the future

(24)
(25)

https://www.i2u2.org/elab/cms/cima/index.php

Web-based data entry and histogram tool developed by RWTH Aachen

(26)

CMS Open Data policy

• Commitment to publication in open-access journals • Release of data to the public

• Preservation and release of software and

documentation needed for reconstruction and analysis

• In the future: a commitment to release data after a

suitable embargo period

https://cms-docdb.cern.ch/cgi-bin/PublicDocDB/ShowDocument?docid=6032

CMS has drafted and adopted a data preservation, re-use, and open-access policy which includes:

(27)

New release (i)

• Half of reconstructed data from 2010 proton-proton

collisions at 7 TeV (tens of 1/pb)

• ~ 30 TB in size

• In CMS Analysis Object Data (AOD) format (ROOT

files)

The new release of CMS data is much larger and more extensive than previous releases:

(28)
(29)

CMS AOD

• Contains information needed for an analysis such

as physics objects, tracks, calo hits, vertices, trigger info, etc.

• ROOT-based format needing CMSSW in order to

read and analyze

• Q: How can/will the public handle such a dataset?

• A (partially): Initially focus on an already-proven,

(30)
(31)
(32)
(33)

Open Data Portal

• Data and tools and resources for analysis has been made available via an

“open data portal”

• Portal is divided into two main areas: “Education” and “Research”

• Datasets are distinguished as either “primary” or “derived”

• Philosophy: include and build upon the previous and current success of

public data in education and outreach but also include the possibility for more in-depth, complex analysis

• Built with Invenio digital library software:

http://invenio-software.org

The portal is a collaboration between CERN, CMS, ATLAS, ALICE, and LHCb: what follows is a description of the CMS content

(34)
(35)
(36)
(37)
(38)
(39)

Derived dataset record

• A “derived” dataset is a

dataset that has been created from a primary dataset and

contains reduced information (like four-vectors)

• Software with which to create

the derived datasets is provided

• Analysis of derived datasets

does not require special CMS software (but production of

(40)
(41)
(42)
(43)
(44)
(45)

CMS-specific CERN VM

• Analysis of primary

datasets requires CMSSW environment; we provide it in a virtual machine image

• VM contains SLC5, CMS

software environment,

access to primary datasets via XRootD

• Example code also

(46)
(47)
(48)
(49)

Invenio and CERN support

• Open data portal built with Invenio (a familiar

example of an application using Invenio is CERN

Document Server http://cdsweb.cern.ch)

• Invenio provides document organization, search

capability, and handling of metadata

• The portal relies on CERN support and services for

data storage, access to and distribution of data, and security and bandwidth restrictions

(50)

Data re-use

• Data released under the Creative Commons CC0

waiver: essentially releasing it into the public

domain http://creativecommons.org/publicdomain/ zero/1.0

• Data are identified with digital object identifiers

(DOI) and it is expected that third parties will access the data using these

(51)

Outlook

• CMS public data has reached thousands of students all

over the world via CMS masterclasses

• Re: open data portal “We can conclude that about ~82k

distinct users visited our site since the launch, out of which ~600 people downloaded EOS files over HTTP, ~5k read About pages, ~21k viewed collections, ~16k used event display, ~3k used histogramming, ~21k viewed records,

and ~10k used search.” - T. Simko (Invenio team)19 Dec

2014

• Next: Improve tools and with new, large data release

(52)

References

Related documents

http://it.toolbox.com/wiki/index.php/RapidMiner Performance Evaluation of Open Source Data Mining Tools 24 Rapid Miner. Weka---Machine Learning

Compact
Muon
Solenoid
(CMS)
is
one
 of
the
four
large
experiments
running
 on
 the
 Large
 Hadron
 Collider
 (LHC)
 facility
 at
 CERN
 intended


The analysis models of the four LHC experiments is based on the basic concept that the data are available at remote sites, spread across the globe, and the Grid Computing tools are

 JRC develops tools to support trade data analysis and collection of open source information news in nuclear safeguards.  Close collaboration exist with IAEA for

LOB Apps Hierarchical Flat files oData Feeds Civil Servant Agency Citizen Developer Open Data Data Catalog Search tools API Data feeds Internal Data Integration Services

Of the six tools reviewed, Apache Hadoop is recommended by this research for crime data analytics because it has some functionalities which are not found to other open source

To process the data saved on the SD card, Studer Innotec provides two analysis tools.. The XTENDER Matlab ® Data Analysis is a Matlab ® script iii that can

The paper gives an overview the current landscape and the activities on national and institutional level regarding Open Science, Open Access to scientific information, Open Data