• No results found

ATLAS Software and Computing Week April 4-8, 2011 General News

N/A
N/A
Protected

Academic year: 2021

Share "ATLAS Software and Computing Week April 4-8, 2011 General News"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

April 19, 2011

• Refactor requests for resources (originally requested in 2010) by expected running conditions (running in 2012 with shutdown in 2013)

– 20% more CPU required at T0 which will be redeployed from CAF – No extra CPU for T2 in 2012 but 11% more disk requested

– T1 CPU up 15%, disk and tape OK

• Expect Group and User activities to increase as the experiment matures • Change in data distribution model:

– 1 disk copy of RAW at T1’s

– Rolling buffer of 10% of ESD at T1’s + small streams

– 10 copies of AOD over the clouds (3 copies of previous version) – 2 copies of DESD at T1, 4 at T2

– Larger pileup means ~double the event size

– Work done already to reduce the reconstruction time

• New version of Oracle (11g) to be introduced at end of 2011 (somewhat disruptive move from current 9g)

ATLAS Software and Computing Week April 4-8, 2011

General News

(2)

• Break original cloud and DDM model as network architectures have

evolved away from the original MONARC model

• Allow DDM to transfer data freely between all sites

• T2D (D=Directly connected) are an attempt to break the cloud

boundaries where it makes sense

• In the new model, every T1 is topologically close with the other

T1s, its T2s and all the T2Ds

• Requirements for being/becoming a T2D

– Need satisfy transfer metrics with all T1s

• Strictly quantifiable via SONAR tests

– Need to provide a certain level of commitment and reliability

• Not quantifiable at the moment, up to ATLAS central ops to

decide

Data Movement and Cloud Model

Number of files transferred

SMALL ≤3 4 ≥5

MEDIUM ≤2 3 ≥4

Avg(Byterate)+StD(Byterate) SMALL <0.05MB/s <0.1MB/s ≥0.1MB/s

(3)

April 19, 2011

• Many T2Ds commissioned in less than 2 months

– Represent over 80% of ATLAS resources

• But many clouds have no T2Ds and less than 50% of T2’s

represented

• LHCONE should address some of these issues

• Proposal that only T2Ds participate in multicloud production

activities and even hosting GROUPDISK

• T2Ds could become primary replica sites esp. for GROUPDISK

replication

• Strongly urged to get as many T2Ds as possible in a cloud but up to

squads and sites to do the commissioning

• perfSONAR and independent FTS tests at US sites has been

immensely helpful in commissioning and monitoring

• Won’t discuss LHCONE here but refer you to slides

• Intercloud transfers may need direct T2-T2 – new feature in DDM

based on measurements

(4)

• 15.6.X.Y – last of 2010 analyses

• 16.0.X.Y – recommended for all pp analyses • 16.2.X.Y – HI reprocessing

• 16.6.X.Y (latest 16.6.3.4) – for 2011 MC and data

• Rel 17 due out begin May, validate July for Sept reprocessing

– Hope for significant cleanup (compilation warnings, savannah

bugs, etc)

• 64-bit builds maybe in 16.6.3.Y

– CPU performance gain but 1.5x more memory (reco ~3.5 GB)

• Cleanup of releases needed before deployment on CVMFS • People needed to help the effort (2-3 people at 30-50% FTE)

(5)

April 19, 2011

• MC10a production (for 2011 data)

– Starting now

– Reprocess (digi+reco) all MC10 HITS (16.6.3.Y) – 800M events, ~6 weeks to reprocess

• MC11 Geant4 in validation now (16.6.X.Y)

– All 800M samples redone from scratch – ~ 3 months

– HITS go to TAPE until reconstruction in summer with rel 17.

• Multi-cloud working reasonably well with a few mostly local

concerns

– Monitoring, foreign T2s stealing jobs

• Clouds tend to empty at the same time – this is good

MC Production

ATLAS-Canada Tier-1/2 Computing Meeting Leslie Groer

(6)

• NoSQL database (e.g. MongoDB, Cassandra, Hbase) • Cloud Computing

• xRootD Federation and File Level Caching • Event Data Caching

• Tier-3 Monitoring • CVMFS

• Multicores

– AthenaMP – reduce memory footprint

– Reco prototype working in 16.5.0.3 with caveats

• Network and Transfer Monitoring

• New ATLAS Development Meeting Wed’s 17:00 CET

– Open meeting with people invited from outside of ATLAS – Discuss future computing projects

[email protected]

(7)

April 19, 2011

• Dashboard

– e.g. Historical Views for Job accounting, DDM Dashboard 2.0, Global

Job monitoring

– Some still in validation so report features and bugs

• ADC Monitoring

• ATLAS Grid Information System development:

• Discussions on converging monitoring tools within Dashboard framework • Some useful new pages

– Data Distribution (T0T1, T1T1)

http://panda.cern.ch:25880/server/pandamon/query?mode=listCR

• Site Status Board

• or shifter view

• Autopyfactory monitoring

• Talk about implementing more stringent criteria on site availability to avoid flip-flopping (e.g. from CMS: site OK >5 days in last 7 or in last 2 days)

ATLAS Monitoring and Reporting

ATLAS-Canada Tier-1/2 Computing Meeting Leslie Groer

(8)

• 50 PB, 2.5M datasets, 185M files, 800 end-points and growing • 14/0.6M reads/writes per day (162/7Hz), 500k deletes/day • New accounting service dq2-list-accounting

– E.g. Count all data10 datasets at CERN by datatype:

dq2-list-accounting project=data10* location=cern* datatype

• Automatic notification to squads of suspicious files and tagging

datasets

• Multi-hop/direct transfer services being worked on for T2D’s • Dropping python 2.3/2.4 support – currently 2.5/2.6

• dq2 Stable release 0.1.36 (tested with Python 2.6)

– Few name changes to maintain compatibility

e.g. dq2-put2 replaces old dq2-put

– Python files now in /opt/dq2/lib

• Working on DQ3 code-named “Rucio”

– More hooks for group datasets, symlinks,

multiple replicas, cloud computing, etc

(9)

April 19, 2011

• Distribute read-only binaries (immutable, sha-1 hashed)

• Files/meta-data downloaded on demand and locally cached (fuse) • Self-contained (e.g. /cvmfs/atlas.cern.ch/)

• Local load-balanced squids with fail-over capability

• In deployment at few sites already (RAL, QMUL, Wuppertal) • V2.0 end of May

• Recommended for grid sites for ease of installation, no local repo

to worry about space or performance, do need squid proxy

• Caveat is need local disk cache (at least 8GB recommended) or can

use shared memory

• KV works transparently • Installation DB recognizes

CVMFS site and removes concurrent job restrictions

VO_ATLAS_SW_DIR=

/cvmfs/atlas.cern.ch/repo/sw

Athena software will be installed in

$VO_ATLAS_SW_DIR/software/<version>

Cern VM File System (CVMFS)

ATLAS-Canada Tier-1/2 Computing Meeting Leslie Groer

· /cvmfs/atlas.cern.ch — Production Software

5 release managers, 24 releases SLC4 + 31 releases SLC5 590 GB, 11 Million files, 16 Million entries (shadow) 85 GB and 1.5 Million files (repository)

· /cvmfs/atlas-condb.cern.ch — ATLAS Condition Flat Files Release manager machine hosted by CERN IT

Automatic update several times a day 30 GB, 110000 files, 7000 directories, 3000 symlinks (shadow tree)

30 GB, 70000 files (repository) Only fraction of all conditions data

· /cvmfs/atlas-nightlies.cern.ch — ATLAS Nightlies to be done

(10)

• Merging of output files for analysis jobs • Event picking in PD2P

• Optimize rebrokerage of jobs (general agreement to reduce from 3

days to 1 day for analysis jobs)

• LFC registration by panda server

• Pilot changes to support new functionality

• T1T1 PD2P for secondary copies based on usage – Split dataset containers based on MOU shares

• Further copies made to T2s based on job backlogs (log10 function) • Algorithm will be tuned with experience

– Maybe add popularity and history (needs additional tables in

db)

(11)

April 19, 2011

Backups

ATLAS-Canada Tier-1/2 Computing Meeting Leslie Groer

(12)

• Revise model due to 200 Hz 400 Hz trigger rate • No long term storage of ESDs on disk for physics streams

– Muon, JetTauEtmiss, Egamma, MinBias

• Do not store any ESD on tape

– This removes the possibility of reprocessing from ESD

• Provide 2 replicas of ESD from all physics streams for ~10% of the data

– Last 2 months ‘rolling buffer’ for Tier0 produced ESDs

– Specific data period corresponding to ~10% of the data for ESDs from

reprocessing

• Provide 2 replicas of ESD for some small streams

– CosmicCalo, ZeroBias, Standby & express

• Reduce the ESD size by ~30% by dropping unused or redundant information

• Provide 1 copy of RAW data on disk for the physics stream for data taken in the last year

– In addition to copy of RAW on tape

– Compress RAW data on disk (can achieve a factor of ~2)

(13)

April 19, 2011 ATLAS-Canada Tier-1/2 Computing Meeting

References

Related documents

Major challenges include data cleaning and preparation on raw data, support for both linear algebra programs and parameter servers, as well as automatic plan optimization and

There are two different views on the emergence of Modern Literature in China. 5 Due to this incident, some critics postulated that modern literature emerged in 1919. However,

Based on the experiment we understand about the function of audio amplifier circuit, where this device will produce of sound. After that, we can understand

Through data analysis, four overarching themes emerged: Quality Accredited Education, Professional Identity, A Link between Accreditation, National Certification, and State

We presented NERSO, a named entity recognition and disambiguation system using graph-based scoring method for annotating named entities in a given text using

Gaining control not only means effectively and appropriately managing the details of the crisis response itself, it also means controlling the message – assuring that the company

 Oneida County: Stopped processing new applications for child care and discontinued benefits for those in education and training in November of 2011.[7] In early December 2011,

Abstract—We propose a robust and accurate method for multi-target geo-localization from airborne video. The difference between our approach and other approaches in the literature