• No results found

CERN Site Report. Andrei Dumitru CERN IT Department 12th of October 2020

N/A
N/A
Protected

Academic year: 2021

Share "CERN Site Report. Andrei Dumitru CERN IT Department 12th of October 2020"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

CERN Site Report

Andrei Dumitru CERN IT Department 12th of October 2020

(2)

CERN

European Organisation for Nuclear Research (Conseil Européen pour la Recherche Nucléaire)

Founded in 1954, 23 member states today World’s largest particle physics laboratory Located at Franco-Swiss border near Geneva

≈2500 staff members,≈17000 users

(3)

CERN IT Department

Enabling the laboratory to fulfill its mission Providing ICT services for the laboratory Main data centre in Meyrin site

> 485 PB of data stored

(4)
(5)

Computing Facilities

I Preparing the infrastructure for installation planned for the next 18 months

I ~4000 CPU servers

I ~900 JBODs (300 PB) of storage

I Installation and retirement activities impacted by CERN safe-mode (COVID)

I New Data Centre (PCC) Tender underway

I Upgrade of water-cooled racks in Vault completed (10-year old doors replaced)

I Updated Service Now user portal (moving away from custom development to out-of-the-box)

I Service Now updated to New York release

I Normal CPU and Storage procurements but with delivery delays due to COVID

5

(6)

Computing Facilities - openDCIM

Good progress made with configuring OpenDCIM for use at CERN

(7)

Configuration and Tools

Puppet 6 upgrade on its way

I automatic testing of Puppet 6 catalogue compilation with visualisation of catalogue compilation differences to easy migration

Strengthening bastion hosts

I Two Factor Auth now in production in CERN administrative bastion (aiadm)

I Restricted access to Secrets Service (TEIGI) applications from bastion nodes only Deployed a prototype to get Podman, Docker, OCI containers working on LXPLUS

I accessible to users for testing

7

(8)

Configuration and Tools

Automatic OpenStack Image generation and validation

I Generates new Openstack CC7 and CC8 images and automatic tests on VMs and Ironic bare metal

I Idem for Docker images which are uploaded to Gitlab registry and Dockerhub

I These new image-generation pipelines ran autonomously end-to-end Studied options to provide Crash Core Dump analysis infrastructure

I Deployed FAF (ABRT Analytics) in production by activating it using the abrt module

Implemented GitLab-driven Package Distribution from Source Software Packages (rpmci)

(9)

Monitoring

Migrated Collectd for the CERN DC and services to CentOS 8

I base metrics and alarms

Completed integration of Prometheus metrics and alarms in MONIT Introduced new data retention policy for logs in MONIT of 13 months

I implements current data retention recommendations

Completed migration of WLCG SSB and SAM3 dashboards and reports to MONIT

9

(10)

Batch

I Batch pre-emptible jobs in production with experiments

I Azure spot, local cloud and SLURM backfill

I Significantly improved monitoring, including user-facing job profiling tools

I GPUs into production for batch, interactive and visualisation platforms (HSE)

I Service costing prototyping activity started with engagement from all IT groups

I Significant improved automation (Terrafrom) tools in production both for local and external cloud resources

I VOMS upgrade to OC11-proof(data privacy protection) version

(11)

This Tuesday

Roadmap for the DNS Load Balancing Service at CERN by Kristian Kouros

This Thursday

Anomaly detection for the Centralized Elasticsearch service by Ulrich Schwickerath

CERN Cloud Infrastructure status update

&

Kubernetes in the CERN Cloud by Thomas George Hartland

11

(12)

Storage

EOSHOME / CERNBox: 12PB (+44%) CEPH: 16PB (+37%) I Sync&Share, SAMBA, FuseX I RBD, S3, CEPHFS

I 38K home folders (DFS migration) I Major version upgrades (nautilus)

CVMFS (+48%) AFS (-10%)

I Major optimizations with S3 backend Business Continuity

I S3 is intensely used for internal backups (e.g. RESTIC backup for CERNBox)

I Second S3 region in Prevessin site deployed in 2020

I Part of a wider Business Continuity plan

I replication and snapshots across availability zones

I review of network dependencies (single switch, etc.)

(13)

Storage

CERN Tape Archive (CTA) deployed in production

I ATLAS and ALICE have moved from Castor to CTA

I CTA is the archival platform for Run3

13

(14)

Storage

New tape library

I CERN deployed its first production SpectraLogic tape library as part of Run 3 preparation

I The library will be dedicated to LTO technology

LHC Run3 Preparation

I R&D activity for EOS ALICE O2 storage cluster

I FTS evolution (tokens, QoS, archive monitoring)

(15)

Storage - Sync&Share Storage Federation

CS3MESH project coordinated by CERN

I 13 Partners, 6M EUR total budget, 2020-2022 Main objectives

I Federation of on-premise Sync&Share services

I CERNBox, SURFdrive, SWITCHdrive, etc.

I Borderless sharing

I Federated

I Collaboration Services (e.g. OnlyOffice, CodiMD)

I Science Environments (e.g. SWAN/Jupyter)

I Digital Repositories (e.g. Zenodo)

I Joint efforts with other projects (e.g. ESCAPE)

15

(16)

Disk Pool Manager

Community discussion on future funding of DPM devops

I Discussion started last year at GDB

I Active DPM development at CERN culminated in 2020

I Release 1.14.0: added tokens/OIDC/macaroon support, xrootd5 in prep

I CERN is ramping down to minimal necessary maintenance with reduced effort We would like to help the DPM community (~50 sites) to take the future in their own hands.

(17)

This Wednesday

EOS storage for Alice O2 by Michal Kamil Simon

CVMFS service evolution and infrastructure improvements by Enrico Bocchi

17

(18)

This Wednesday

FTS: Towards tokens, QoS, archive monitoring and beyond by Mihai Patrascoiu

Production deployment of the CERN Tape Archive (CTA) for ATLAS

by Julien Leduc

(19)

Network Infrastructure Upgrades

Technical Network

I Replacing single HP routers by dual Juniper routers to renew hardware and provide redundancy

I Expected to complete by the end of LS2

Data Centre

I Brocade MLXE being replaced by Juniper QFX10008

I Backbone & External router replacement completed

I Good progress in the data centre despite COVID - work to complete by end-2020

19

(20)

Network Infrastructure Upgrades

Campus network

I Upgrade of aging campus infrastructure has started

I Deploy new switches (ICX7150) first, then upgrade the routers

I Upgrade all fibre infrastructure to be 10Gb capable - can select new router model at end of 2021 with no need to support 1Gb uplinks

I All non-Aruba access points have been replaced by Aruba models

I Controller-based Wi-Fi service coverage has been extended to many technical areas

I Wi-Fi is no longer a separate service offering, but a feature of the Campus network service

(21)

External Connectivity

Upgraded 100Gbps LHCOPN links for Nikhef, CNAF, KIT, IN2P3 and RAL (WIP)

I Upgrades expected before Run 3 for NDGF and PIC as well as KIT backup link Two additional 100Gbps links to GEANT expected by end-October

I LHCONE traffic

I generic R&E traffic and traffic to cloud service providers with GEANT access Dedicated 400Gbps link to Amsterdam is up and unning and peering established with Nikhef

I No data transfer as yet, but discussions ongoing to use this link for WLCG data transfer tests

I Expectated to stress the storage infrastructure more than the network links

21

(22)

Fixed Telephony and IoT

Fixed Telephony

I Production-ready Android and iOS CERNphone application plus the necessary infrastructure

I IT-wide pilot to start soon!

LoRaWan introduced as a service

I Joint effort from IT-CS & IT-DB groups

I Successful test of signal transmission in tunnels via radiating cable

I Infrastructure to be reinforced to support COVID tracker devices

(23)

App distribution for iOS

Apple is now enforcing a restriction on apps that address a closed community, rather than being of general interest, being distributed via the App Store.

I Such apps can, however, be distributed via the Developer Enterprise Program (to employees) or via the Apple Business/School manager to people who have an Appstore account linked to the relevant country.

Apple has refused to allow us to distribute the CERNphone app via the App Store and this may also affect future versions of the CERNbox client.

I The Business/School manager programs require us to have presence in many countries. Apple have suggested the Developer program is not applicable but we’re investigating, contractors and "permitted users" may be covered, not just employees.

Institutes providing apps to students (rather than the general public) may be affected by this position.

I Use of the Business/School manager option may be more feasible than for CERN but is likely a more complicated channel than the App Store.

23

(24)

App distribution for iOS

(25)

This Tuesday

NOTED - identification of data transfer in FTS to understand network traffic

by Joanna Waczynska

This Wednesday

Towards a redundant, robust, secure and reliable IoT network by Christoph Merscher

CERNphone - CERN’s upcoming softphone solution by German Cancio

25

(26)

Collaboration services

Indico 2.3 released - new paper review/editing workflow + many other improvements

I Quite a few by external contributions (UN, IEEE)

Newdle - new scheduling tool as a CERN service and Open Source software

I Plans: tighter integration with Indico, room booking, and new groupware service Conference rooms

I CERN Main Auditorium technical upgrade with DANTE technology

I Limited deployment of Intel Unite wireless presentation system in meeting rooms Webcast

I Starting project to replace home-made lecture recording service with Open Source Opencast system

I Starting to train the MLLP automated transcription system with CERN data

(27)

Application and Devices

COVID response

I Experimental VPN access to allow access to internal resources, inlcuding licence servers and - in special cases - Active Directory and device management

I Adjusted endpoint hardware provisioning and repair procedures to allow for off-site pick-up and configuration

I Leveraged CERNBox for work backup while users telework (info campaign)

I Scaled out infrastructure

27

(28)

Infrastructure scale-out for telework

Scale-out of the infrastructure required to cope with increased load due to telework

I Windows Terminal Servers clusters scaled out to serve 4000 interactive sessions per day

I Terminal Services Gateway - for access to the office PCs and terminal servers dedicated for accelerator complex - scaled out to serve up to 1000 concurrent connections

(29)

Windows Client Support and DFS to CERNBox homedir migration

Windows Client Support - completed migration from Windows 7 to Windows 10

I Massive in-place migration campaign over many months

I Some machines remained on Windows 7 mostly due to hardware or software compatibility issues - other mitigations apply

DFS to CERNBox homefolder migration

I Migrated 30000 home folders since Sep ’19

I Mostly transparent to end-users (folders remapped)

I Training courses organised to demonstrated new features (colaborative work, sharing folders, etc.)

I New accounts get CERNBox home folders by default

29

(30)

This Monday

CERN Appstore for BYOD devices

by Tamas Bato

(31)

Computer Security

This Tuesday

Computer Security Update by Nikolaos Filippakis

31

(32)

Talks from CERN this week

(33)

Talks from CERN this week

Computer Security Update

I by Nikolaos Filippakis

Anomaly detection for the Centralized Elasticsearch service at CERN

I by Ulrich Schwickerath

Simulated phishing campaigns at CERN

I by Sebastian Lopienski

CERN Cloud Infrastructure status update

I by Thomas George Hartland Kubernetes in the CERN Cloud

I by Thomas George Hartland EOS storage for Alice O2

I by Michal Kamil Simon

33

(34)

Talks from CERN this week

CERN’s Business Continuity Working Group

I by Helge Meinhard

CERNphone - CERN’s upcoming softphone solution

I by German Cancio

FTS: Towards tokens, QoS, archive monitoring and beyond

I by Mihai Patrascoiu

Roadmap for the DNS Load Balancing Service at CERN

I by Kristian Kouros

MySQL High Availability in the Database On Demand service

I by Abel Cabezas Alonso

Towards a redundant, robust, secure and reliable IoT network

I by Christoph Merscher

(35)

Talks from CERN this week

NOTED - identification of data transfer in FTS to understand network traffic

I by Joanna Waczynska

CVMFS service evolution and infrastructure improvements

I by Enrico Bocchi MAlt Project

I by Maite Barroso Lopez CERN Appstore for BYOD devices

I by Tamas Bato

Production deployment of the CERN Tape Archive (CTA) for ATLAS

I by Julien Leduc

35

(36)

home.cern

References

Related documents

While Roman readers would have been embarrassed by any impression of weakness or vulnerability in their ancestral hero, Virgil could stress that the gods favoured Aeneas and

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it DSS.. CERN Cloud Storage

To study the impact of the ATCM method on the image quality when the patient size varied, the correlation between the AUC of the different phan- tom sizes was calculated using

Important Note: On any box that hosts an Oracle HTTP Server / WebLogic Server / Essbase server it is recommended to increase nofiles (descriptors) to optimal

Patients whose pulmonary nodules improved after treatment continued to experience gradual reduction of cryptococcosis antigen titers, even if antigen titers were positive at the

Using the tar command we extract the files in the pre-requisite syslog-ng archive and then use the configure, make clean, make and make install commands to install them correctly.

Gas valve remains open and blower continues to run until demand stops, flame sensor senses loss of flame, a limit opens or the prove switch opens.. If any of these events occur during

Part I: Flow of HW Pharmaceuticals 1 st Reverse Distributor HW Healthcare Facility/ Pharmacy Potentially Creditable Pharmacy Drugs... Flow of HW Pharmaceuticals 1 st