• No results found

Data Quality Monitoring. workshop

N/A
N/A
Protected

Academic year: 2021

Share "Data Quality Monitoring. workshop"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Quality

Monitoring

(2)

What this presentation is not

What it is and how it is organized

Definition of DQM

Overview of systems and frameworks

Specific chosen aspects

oData collection, Storage, Visualization, analyses

Operations

Qualitative assessment & discussion

Future

(3)

Online feedback on the quality of data

Make sure to take and record high quality

data

Use the data taking time and the precious

bandwidth in an optimal way

Identify and solve problem(s) early

Data Quality Monitoring (1)

(4)

Data Quality Monitoring (DQM) involves

Online gathering of data

Analysis by user-defined algorithm(s)

Storage of monitoring results Visualization

Data Quality Monitoring (2)

Dat afl ow Results (histos, values…) Storage Analysis Data samples Vizualization

(5)

Requirements

Range from formal documents to oral

tradition

Changing difficulties

No interference with data taking

Low latency update during runs

Centralized results

Modular, fast update of users code

Data Quality Monitoring (3)

(6)

• DQParameter  An histo • DQRegion  A subset of detector • DQAlgorithm  A specific analysis • DQResult  A color assessment

Overview: ATLAS

Dat afl ow Web app Configura tion DB DQResults DQMF DQParam DQRegion

• « Generic, yet flexible and

nicely scalable »

• Shared online/offline and

by ~10 independent communities • C++ – Qt – ROOT – XML – JS – CGI - Python MT MT MT 150MTs 50MTs Infor-mation Services Histos 70000 Events Histos EMON 1Hz/MT 10-15%

(7)

• DQM framework part of reconstruction framework

• SMPS allows for trigger path selection

• 1 DQM app per subsystem

 Re-run reco according to needs

Run certain analysis

• C++, Python, Java, Oracle DB

Overview: CMS

12/03/2013 Barthélémy von Haller, ALICE, CERN

Web Server

Collector Memory &

Storage WEB GUI • Monitoring element  Metadata  Reference histo  Quality • Web based

HTTP for data exchange

 Web UI only WS WBM Condi-tion DB Qua nt iti e s V ia DI P Dat afl ow Events Histos SMPS Summary Shifter Experts DQM DQM DQM 1 app/subsys 6/26 TCP/IP Mon itoring El e ment s 70000 10%

(8)

DIM

Network RPC

3 distinct

histo-gramming places

Overview: LHCb

Da ta flows Histogram Presenter Event data From HLT Event data Reco Monitoring

Farm Savesets Histo

Automatic analysis Control System Alarms A larms • C++ - ROOT • Monitoring on 70-100% of data

• Auto. analysis every 15

minutes • Automatic actions ! Aggregated histos ~100 ~10 0 Hi st os 20 x500 5kHz 3-5kHz (>70%) 50Hz Alarms

(9)

• AMORE framework

 C++ - ROOT

Plugin architecture Notifications via DIM MySQL database

• Plugins developed in 20

institutes worldwide

Overview: ALICE

12/03/2013 Barthélémy von Haller, ALICE, CERN

Dat

afl

ow

• Histograms or values also

retrieved in other systems

• MonitorObject

 Metadata (run, quality, expert-shifter flag)

ROOT base type encapsulated

(TH1, TObject, scalar, …)

Events

Monito

-ring Expert GUIs

Generic GUI eLogbook Alarms Orthos Agent Agent Agent ~40 agents Data Pool MonitorObjects ~11500 /min 8/26 HLT DCS Trigger Histos Values …

(10)

Main input to DQM

Events (all) and sub-events (ALICE, ATLAS)

Histograms or values coming from other

systems, esp. HLT.

Challenges

Bandwidth issues: monitoring processes do

events selection themselves instead of using a proxy (ATLAS, ALICE)

Merge of all histograms from HLT (LHCb,

CMS)

(11)

Most typical analysis include

Check for dead channels (holes in

distribution)

Check for noisy channels (peaks in

distribution)

Check for differences with a reference

Check for non-empty error histograms

Also

Check for infrastructure issues Check for reconstruction issues

Check for trigger/dead time issues

Data analysis (1)

(12)

Offline

-

Online

Same framework ? Same analysis ?

ATLAS

CMS

LHCb

ALICE

Data analysis (2)

(13)

Online Storage

In-memory (ATLAS, CMS, LHCb)

Database (ALICE)

Online recent history

Short-term, detailed (all)

Archiving

Long-term per run/per time interval (all)

Does long-term archiving make sense ?

Is it still online DQM ? Rather offline QA ?

Storage

(14)

Desktop GUI (all but CMS)

C++, Qt or ROOT

Very similar display showing online

Also past objects (LHCb)

Summary panel (ATLAS)

Easy customization of

the layout

Quality and alarms

Visualization (Desktop)

Object(s) display Objects Tree Extra information on the object(s) Menu bar Tool bar
(15)

Visualization (Desktop)

(16)

• How can experts access DQM results ?

• Remote ssh, interactive access to control room

 LHCb: open to collaboration

 ATLAS: provides tokens for a few hours access + replica

 CMS: closed but has a collaboration replica  ALICE: closed

• Web GUI

 Images and ROOT objects download (ALICE)  Generated static set of pages (ATLAS)

 Full-fledged web application (CMS)

(17)

Visualization (Web, ALICE)

(18)
(19)

Visualization (Web, CMS)

12/03/2013 Barthélémy von Haller, ALICE, CERN

• Remote monitoring

 Layouts and render plugins

Dynamic zoom and

drawing options

 Caching

 Home-made DB for quick retrieval histograms

(20)

A typical DQM shifter

Check alarms

Go through plots layouts (LHCb, ALICE,

ATLAS)

Find information in a Wiki or directly the GUI Contact experts / shifters (if present)

Inform the shift leader

(21)

• Is it worth it ? Yes !

• ATLAS

• Transition Radiation Tracker timing problem spotted

• CMS

• Wrong DAQ configuration of SST detector used during

beam "ramp"  audio alarm triggered by DQM fix before

stable beam was declared.

• LHCb

• Bad mass plot -> anomaly -> inverse value of the magnetic field used by the HLT

• ALICE

• HLT started compressing the TPC laser events 

recons-truction problems  DQM shifter spotted anomalies fix

Operations (2)

(22)

Personal

hit-list

ATLAS: scalability, service-oriented

CMS: interactive and dynamic web interface

LHCb: automatic actions

ALICE: versatility

General:

oFlexibility and reusability

(23)

Biggest challenges

ATLAS: initial display was in Java and

couldn’t cope with the number of histograms (70k)

CMS: handling huge quantities of histograms

in the web gui and yet be responsive

LHCb: nothing dramatic, summing up HLT

histograms with data rates of the order of a LEP experiment

ALICE: ensure stability despite of the

multiple contributions

Qualitative review (2)

(24)

Could we have used the same system ?

4 efficient and scalable DQM

Similar architecture and technologies I believe the answer is: Yes…

… up to a certain extent

If not the framework, maybe have the

same (Web) UI ?

Could we at least share more ?

(Regular) common meetings ?

(25)

LS1

Algorithm and thresholds adjusting on run

conditions (ALICE, ATLAS)

Adapt SW to dependencies changes (LHCb, CMS) Adapt framework for multi-core multi-thread

environment (CMS, ALICE)

Web GUI and tools (ALICE, [LHCb]) Full archiving (ATLAS)

Proxy monitoring (ALICE)

LS2

Complete rewrite along with online-offline

framework (ALICE)

Future

(26)

Much similarities

General LHC DQM framework ?

Trans-experiments DQM meetings ?

DQM in all 4 experiments

Work well and up to the demand

Proved to be of great importance

Future

DQM is crucial, still often considered late Today is an opportunity to attack it early !

(27)

A big thanks to:

Markus Frank

Clara Gaspar

Serguei Kolos

Marco Rovere

Questions and discussion

(28)
(29)

Remote monitoring

Layouts and render plugins

Dynamic zoom and drawing options

Caching

Visualization (Web, CMS)

12/03/2013 Barthélémy von Haller, ALICE, CERN

Web Server Homemade DB (ROOT files) Web Browser Javascript 6:parsing 7:HTML+CSS generation Python & C++

4:Image & JSON generation ?

1:Async

Request 2:Ask data

3:Monit oring elemen t 5:JSON Histo renderer Shared memory Collector Li ve A rchi v e 28/26

(30)

Visualization (Web, ATLAS)

Web Server Information Service Web Browser JS parsing CGI Request Ask XML XML ATLAS
(31)

Services approach

C++ – Qt – ROOT – XML – JS

Framework DQMF based on DQMCore

Overview ATLAS

12/03/2013 Barthélémy von Haller, ALICE, CERN

• DQParameter  An histo • DQRegion  A subset of detector • DQAlgorithm  A specific analysis • DQResult  A color assessment  [New histo] 30/26

(32)

Overview: ATLAS

Dat afl ow Online Histo- gramming Infor-mation Services MT Data samples Display GUI DQMF EMON MT MT Histos Histos Histos Web app DQResults Configura tion DB DQParam DQAlgo

References

Related documents

30-36” - Warm Season Grass - Full Sun or Part Shade This dwarf variety has narrow leaves that grow in an arching shape, and tan colored plumes in early fall.. 30” - Warm

Buah salak dapat menjadi bahan baku dalam usaha pengolahan dodol salak, keripik salak, manisan salak, sirup salak, kurma salak dan lain sebagainya (Raranta, J., Ruauw, E.,

the composite of all-cause mortality, myocardial (re)MI, disabling stroke and re-hospitalization for cardiovascular causes or severe bleeding within 12 months.

The submenu allows the user to configure Console Redirection settings to support Out-of- Band Serial Port management.. EMS (Emergency Management Services)

By default, if Hungarian is selected as viewer language preference, the receiver should display the EIT text information in accordance with the Latin-2 character

If operation is continued without draining the water separator, fuel feed to the engine is restricted and may cause damage to the engine or fuel injection pump.. Exhaust

DIF items for the WOMAC physical functioning sub- scale, using the Zumbo criteria only one item was identi- fied for DIF by age and gender, using the SR criteria, 14 of the 17

(The Community’s Water Supply Contingency Plan must include information identifying ways to reduce the vulnerability of the water supply system to disruption and to improve