• No results found

The H.E.S.S. Data Acquisition System

N/A
N/A
Protected

Academic year: 2021

Share "The H.E.S.S. Data Acquisition System"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

An overview

The H.E.S.S.

Data Acquisition System

Arnim Balzer

for the H.E.S.S. DAQ group

Technical Seminar Zeuthen, 26.06.2012

The Optical Sky

>Energy of radiation: 1 eV

(2)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 3

The Very High Energy Gamma-Ray Sky before 1989

>Energy of radiation: >100 GeV

http://tevcat.uchicago.edu/

The Very High Energy Gamma-Ray Sky in 1989

>Energy of radiation: >100 GeV

(3)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 5

The Very High Energy Gamma-Ray Sky in 1999

>Energy of radiation: >100 GeV

http://tevcat.uchicago.edu/

The Very High Energy Gamma-Ray Sky in 2012

>Energy of radiation: >100 GeV

(4)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 7

The Very High Energy Gamma-Ray Sky in 2012

>Energy of radiation: >100 GeV

http://tevcat.uchicago.edu/

(5)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 9

The High Energy Stereoscopic System

(6)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 11

Imaging Atmospheric Cherenkov Technique

photon

~ 10 km air shower

Imaging Atmospheric Cherenkov Technique

photon ~ 10 km Cherenkov light cone air shower 120 m 1°

(7)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 13

Single telescope event Four telescope event in common camera plane

photon

~ 10 km Cherenkov

light cone air shower

Imaging Atmospheric Cherenkov Technique

> Intensity Energy

> Orientation Direction

> Shape Primary Particle

A 960 Pixel vs. a 2048 Pixel Digital Camera

H.E.S.S.

Phase 1 H.E.S.S. II

H.E.S.S. II H.E.S.S.

(8)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 15

Most Important Camera Feature

1/10000 s (100 Xs) 1/100000 s (10 Xs) 1/1000000 s (1 Xs) 1/10000000 s (100 ns) 16/100000000 s (16 ns)

>Very short “exposure time”

Most Important Camera Feature

1/10000 s (100 Xs) 1/100000 s (10 Xs) 1/1000000 s (1 Xs) 1/10000000 s (100 ns) 16/100000000 s (16 ns)

(9)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 17

Most Important Camera Feature

1/10000 s (100 Xs) 1/100000 s (10 Xs) 1/1000000 s (1 Xs) 1/10000000 s (100 ns) 16/100000000 s (16 ns)

>Very short “exposure time”

Most Important Camera Feature

1/10000 s (100 Xs) 1/100000 s (10 Xs) 1/1000000 s (1 Xs) 1/10000000 s (100 ns) 16/100000000 s (16 ns)

(10)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 19

Most Important Camera Feature

1/10000 s (100 Xs) 1/100000 s (10 Xs) 1/1000000 s (1 Xs) 1/10000000 s (100 ns) 16/100000000 s (16 ns)

>Very short “exposure time”

field of view: 5° vs. 3.5°

event rate: ~500 Hz vs. ~3500 Hz

(11)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 21

Overview

>Members of the DAQ group

>DAQ requirements

>Hardware

Server farm & network

Cluster monitoring & remote administration >Software

General DAQ design

State machine & transitions

Communication & messaging

Data flow & node switching

GUIs & displays

Monitoring array performance

DAQ virtual machine

Current Members of the H.E.S.S. DAQ Group

Daniel Göring Christian Schneider Mathieu de Naurois Christian Stegmann Matthias Füßling Ullrich Schwanke

Michael Gajdus Anton Lopatin Arnim Balzer

(12)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 23

Requirements for the H.E.S.S. II Data Acquisition

>Be able to process the data rates of the 5 cameras:

500 Hz * 4 Telescopes * 960 Pixels * 2 Channels * 4 byte 15 MB/s

3500 Hz * 1 Telescopes * 2048 Pixels * 2 Channels * (4+2+2) byte 115 MB/s

>Enough storage for at least 2 month of data taking

3177 GB in 228 days with H.E.S.S. Phase 1 420 GB per Month

420 GB per Month * 115 MB/s / 15 MB/s 3220 GB per Month

>Enough computing power for: Event building

Real time analysis & data quality checks

Server Farm on Site

>5 Supermicro storage server RAID 6 + Hot Spare + Backup

12 TB each

Intel Xeon E5420 4*2.5 GHz

8 GB DDR2 RAM

>10 Supermicro worker nodes Intel Xeon E5450 8*3.0 GHz

16 GB DDR2 RAM

>Operating system: Fedora 12 x64

(13)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 25

Network Layout

>Keep networks separated Especially data and user network

Central Gateway

Internet

Data Network User Network

Firewall 10.1.X.X

10.0.X.X

ssh

Network Layout

>Keep networks separated Especially data and user network

Central Gateway

Internet

Data Network User Network

Administration Network Firewall 10.1.X.X 10.0.X.X 192.168.5.X ssh

(14)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 27

Network Layout

>Keep networks separated Especially data and user network

Central Gateway

Internet

Data Network User Network

Administration Network OpenVPN Network Firewall 10.1.X.X 10.0.X.X 192.168.5.X 192.168.9.1 ssh openVPN

Cluster Monitoring

>Ganglia >Collectd >Logwatch

(15)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 29

Current Cluster Usage

Remote Administration

>openVPN for easy access to all

important networks

>IPMI cards for remote

administration of server and nodes

>VNC server for direct access to machines for problem solving

Central control pc in control room

All display machines in control room

(16)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 31

General DAQ Design

>The DAQ is a multi-machine, multi-process, multi-threaded system

23 machines + dedicated hardware

158 processes called controller

>Every piece of hardware has its own controller

Responsible for activation, management and deactivation of hardware

>Dedicated management controllers called manager

Responsible for synchronization and error handling

>Several other controller types with different purposes in use Receiver, displays, sound, alerts, real time analysis, calibration

>Supported programming languages:

C++

Python

State Machine

>Every controller implements a common state machine

>There is only one transition for a given state and direction!

>Default states:

Day time: Safe Night time: Ready Data taking: Running

Safe Ready Configured Running

GettingReady Configuring Starting

Stopping Unconfiguring

(17)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 33

Transition Scheme

>Controller only receive a target state

>Target state does not need to be an adjacent state

>Controller will wait for its dependencies to finish their current state transition before proceeding with their own

>“Immediate” transitions will ignore dependencies!

Inter Process Communication

>DAQ uses omniORB implementation of the CORBA inter process

communication standard Client Caller object Stub foo() Language A Machine 1 Server Instance Skeleton virtual foo() Language B Machine 2 foo() CORBA

(18)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 35

Messaging System

>Processes can subscribe to messages of other processes

>Used to monitor changes in the state of other processes Error handling,

>Used for logging and shifter information

Process Manager SubArray Manager Run Manager Info Fatal Error Warning Caution Stop Stop, GoToSafe Stop Stop, GoToSafe Stop Stop, GoToSafe

Chain of Command

>Example for the hierarchy of the processes inside of the DAQ

Resource Handler Process SubArray Manager CT Manager Node Manager Process Process Process Process Process Process SlowControl Manager Receiver Manager Services Manager Process Process Process Process Process

(19)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 37

Data Flow

>Own receiver for each hardware controller

>Every receiver may have a displayer

>Data can be “pulled” or “pushed” between processes

Push: guaranteed data integrity

Pull: data may be dropped

Controller Hardware Receiver Displayer Managers push data push data pull data data data file display

Camera Readout and Node switching

CT2/Camera CT4/Camera

CT1/Camera CT3/Camera CT5/Camera

Central Trigger Controller

Node01 Node02 Node03

Central Trigger Connector sends node

(20)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 39

Camera Readout and Node switching

CT2/Camera CT4/Camera

CT1/Camera CT3/Camera CT5/Camera

Central Trigger Controller

Node01 Node02 Node03

Central Trigger Connector sends node

trigger

Camera Readout and Node switching

CT2/Camera CT4/Camera

CT1/Camera CT3/Camera CT5/Camera

Central Trigger Controller

Node01 Node02 Node03

Central Trigger Connector sends node

trigger

(21)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 41

Camera Readout and Node switching

CT2/Camera CT4/Camera

CT1/Camera CT3/Camera CT5/Camera

Central Trigger Controller

Node01 Node02 Node03

Central Trigger Connector sends node trigger readout running data trigger data

Camera Readout and Node switching

CT2/Camera CT4/Camera

CT1/Camera CT3/Camera CT5/Camera

Central Trigger Controller

Node01 Node02 Node03

Central Trigger Connector sends node trigger readout running data trigger data

(22)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 43

Camera Readout and Node switching

CT2/Camera CT4/Camera

CT1/Camera CT3/Camera CT5/Camera

Central Trigger Controller

Node01 Node02 Node03

Central Trigger Connector sends node trigger readout running data trigger data

The Displayer

>One generic display controller in the DAQ

>Configured solely via a database

>Uses ROOT object introspection capabilities

>Able to display any data member of any storage class

>Can use every histogram and graphic class of ROOT

(23)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 45

Control Room Displays and Central DAQ GUI

>Dedicated display machines

X11 forwards from any machine in the data network

>Most displays use ROOT graphics and histograms

>Some displays written in Python using PyGTK 2.4

>Central DAQ-GUI and camera controllers also in Python

(24)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 47

Monitoring the Array Performance

>H.E.S.S. II DAQ operational and in use since Spring 2011

Telescope down time Data taking efficiency

H.E.S.S. DAQ Virtual Machine

>Common platform for controller development

>Intended to test controllers and hardware

(25)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 49

Lessons learned

>Use unit tests, apply test driven development and refactor your code

>Error handling is a key feature

>Keep the software modular to keep test and development setups simple

>Focus on one programming language

>Use websites to display data

>Test software with real hardware

>Choose a language for which a good IDE is available

>Have and use a coding standard

>Use a modern versioning system

Conclusions

>The H.E.S.S. DAQ is working fine and it is well prepared for H.E.S.S. II

Taken from “Clean Code” by Robert C. Martin

(26)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 51

The Microwave Sky

>Energy of radiation: 0.000001 eV = 1 XeV

(27)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 53

NASA, DOE, Fermi LAT Collaboration

The High Energy Gamma-Ray Sky

>Energy of radiation: 1000000000 eV = 1 GeV

(28)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 55

Two Sources Up Close

HESS J1303–631

pulsar wind nebula

RX J1713-3946

shell like supernova remnant

Propagation of Particles

>Gamma radiation points back to its origin

proton

(29)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 57

Uninterruptible Power Supplies

>3 APC Smart-UPS RT 6000 VA

Each with a APC Smart-UPS battery pack

currently at 35% of max. capacity

! "# $! %# ! "&# !'

Additional Equipment

>2 Meinberg M300 GPS time server

Used for ntp services for the farm

>2 Transtec 7600 LTO-4 tape drives Transport of data to Europe every month

>6 HP ProCurve 2510G switches

(30)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 59

Hardware Virtualization

>Types of machines:

Two camera boot servers

Central trigger boot server

Machine for backward compatibility to 32 bit software and old kernel >Running on one of the servers

RAID6

Backup

>Almost no maintenance

>High availability

Handling Controller Dependencies

>A controller can depend on any number of other controllers Global database table, must be applicable for every run type

>Controller participation in a run can be specified in a database table Required: Controller must take part in the run, no errors allowed

Optional: Controller can take part in the run, errors or crashes do not stop data taking

Disturbing: Controller is send to safe state and does not take part in the run

(31)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 61

Current Controller Dependencies

Data Storage and Access

>Data is stored using ROOT object serialization mechanism

>ROOT graphics and histograms used in online and offline software

>Data storage classes are experiment specific

>Storage classes can be send within the DAQ using CORBA

>ROOT object introspection capabilities used for generic access to easily display data in real time

Access to every member of every storage class

Displays configured via a database

>Data calibration, reconstruction, algorithms can be used in online and offline software

(32)

Arnim Balzer | The H.E.S.S. DAQ | 26.06.2012 | Page 63

Monitoring the DAQ Performance

Total transition durations: 01.01.2010 until 30.09.2011

TT table getting too big HV dependency fixed

First Upgrade Second Upgrade

Transition Time Tools – Available Plots

Performance

Tel Down Time Tel Participation

Efficiency

Controller Performance DAQ restarts over time

References

Related documents

Similar protocol was also succ ssfully us d to ‘activat ’ AEMFCs operated with air containing low concentrations of CO 2 (filtered air). 19, Copyright 2009; permission

T h e second approximation is the narrowest; this is because for the present data the sample variance is substantially smaller than would be expected, given the mean

Er bringe jedoch, so die Studie, auch „schwerwiegende Probleme“ mit sich, die ihn als normativen Bewertungsan- satz für gentechnische Fragen prinzipiell disqualifizierten

As can be seen in Table 1, comparing the values of the nonlinear absorption coefficient ( β ) for two samples synthesized by laser ablation of magnesium target in acetone

Tranversal images of the [ 123 I]FP-CIT SPECT scans rated visually as abnormal (PD-like; due to relatively low binding in the left putamen vs left caudate nucleus) on the

in LNCaP tumor-bearing SCID mice (Fig. A fourfold increase of the activity in the LNCaP tumor xenograft at 24 h p.i. The high 177 Lu- 11 retention in the tumor xenograft is explained

We use the characterization of the reduced Lefschetz number in Theorem 1.1 to prove the Lefschetz-Hopf theorem in a quite natural manner by showing that the fixed-point index

We hypothesized that if prediction error is higher near event boundaries, then our index of predictive looking would be effective; specifically, we predicted that looks to the