CERN Site Report
Andrei Dumitru CERN IT Department 12th of October 2020
CERN
European Organisation for Nuclear Research (Conseil Européen pour la Recherche Nucléaire)
Founded in 1954, 23 member states today World’s largest particle physics laboratory Located at Franco-Swiss border near Geneva
≈2500 staff members,≈17000 users
CERN IT Department
Enabling the laboratory to fulfill its mission Providing ICT services for the laboratory Main data centre in Meyrin site
> 485 PB of data stored
Computing Facilities
I Preparing the infrastructure for installation planned for the next 18 months
I ~4000 CPU servers
I ~900 JBODs (300 PB) of storage
I Installation and retirement activities impacted by CERN safe-mode (COVID)
I New Data Centre (PCC) Tender underway
I Upgrade of water-cooled racks in Vault completed (10-year old doors replaced)
I Updated Service Now user portal (moving away from custom development to out-of-the-box)
I Service Now updated to New York release
I Normal CPU and Storage procurements but with delivery delays due to COVID
5
Computing Facilities - openDCIM
Good progress made with configuring OpenDCIM for use at CERN
Configuration and Tools
Puppet 6 upgrade on its way
I automatic testing of Puppet 6 catalogue compilation with visualisation of catalogue compilation differences to easy migration
Strengthening bastion hosts
I Two Factor Auth now in production in CERN administrative bastion (aiadm)
I Restricted access to Secrets Service (TEIGI) applications from bastion nodes only Deployed a prototype to get Podman, Docker, OCI containers working on LXPLUS
I accessible to users for testing
7
Configuration and Tools
Automatic OpenStack Image generation and validation
I Generates new Openstack CC7 and CC8 images and automatic tests on VMs and Ironic bare metal
I Idem for Docker images which are uploaded to Gitlab registry and Dockerhub
I These new image-generation pipelines ran autonomously end-to-end Studied options to provide Crash Core Dump analysis infrastructure
I Deployed FAF (ABRT Analytics) in production by activating it using the abrt module
Implemented GitLab-driven Package Distribution from Source Software Packages (rpmci)
Monitoring
Migrated Collectd for the CERN DC and services to CentOS 8
I base metrics and alarms
Completed integration of Prometheus metrics and alarms in MONIT Introduced new data retention policy for logs in MONIT of 13 months
I implements current data retention recommendations
Completed migration of WLCG SSB and SAM3 dashboards and reports to MONIT
9
Batch
I Batch pre-emptible jobs in production with experiments
I Azure spot, local cloud and SLURM backfill
I Significantly improved monitoring, including user-facing job profiling tools
I GPUs into production for batch, interactive and visualisation platforms (HSE)
I Service costing prototyping activity started with engagement from all IT groups
I Significant improved automation (Terrafrom) tools in production both for local and external cloud resources
I VOMS upgrade to OC11-proof(data privacy protection) version
This Tuesday
Roadmap for the DNS Load Balancing Service at CERN by Kristian Kouros
This Thursday
Anomaly detection for the Centralized Elasticsearch service by Ulrich Schwickerath
CERN Cloud Infrastructure status update
&
Kubernetes in the CERN Cloud by Thomas George Hartland
11
Storage
EOSHOME / CERNBox: 12PB (+44%) CEPH: 16PB (+37%) I Sync&Share, SAMBA, FuseX I RBD, S3, CEPHFS
I 38K home folders (DFS migration) I Major version upgrades (nautilus)
CVMFS (+48%) AFS (-10%)
I Major optimizations with S3 backend Business Continuity
I S3 is intensely used for internal backups (e.g. RESTIC backup for CERNBox)
I Second S3 region in Prevessin site deployed in 2020
I Part of a wider Business Continuity plan
I replication and snapshots across availability zones
I review of network dependencies (single switch, etc.)
Storage
CERN Tape Archive (CTA) deployed in production
I ATLAS and ALICE have moved from Castor to CTA
I CTA is the archival platform for Run3
13
Storage
New tape library
I CERN deployed its first production SpectraLogic tape library as part of Run 3 preparation
I The library will be dedicated to LTO technology
LHC Run3 Preparation
I R&D activity for EOS ALICE O2 storage cluster
I FTS evolution (tokens, QoS, archive monitoring)
Storage - Sync&Share Storage Federation
CS3MESH project coordinated by CERN
I 13 Partners, 6M EUR total budget, 2020-2022 Main objectives
I Federation of on-premise Sync&Share services
I CERNBox, SURFdrive, SWITCHdrive, etc.
I Borderless sharing
I Federated
I Collaboration Services (e.g. OnlyOffice, CodiMD)
I Science Environments (e.g. SWAN/Jupyter)
I Digital Repositories (e.g. Zenodo)
I Joint efforts with other projects (e.g. ESCAPE)
15
Disk Pool Manager
Community discussion on future funding of DPM devops
I Discussion started last year at GDB
I Active DPM development at CERN culminated in 2020
I Release 1.14.0: added tokens/OIDC/macaroon support, xrootd5 in prep
I CERN is ramping down to minimal necessary maintenance with reduced effort We would like to help the DPM community (~50 sites) to take the future in their own hands.
This Wednesday
EOS storage for Alice O2 by Michal Kamil Simon
CVMFS service evolution and infrastructure improvements by Enrico Bocchi
17
This Wednesday
FTS: Towards tokens, QoS, archive monitoring and beyond by Mihai Patrascoiu
Production deployment of the CERN Tape Archive (CTA) for ATLAS
by Julien Leduc
Network Infrastructure Upgrades
Technical Network
I Replacing single HP routers by dual Juniper routers to renew hardware and provide redundancy
I Expected to complete by the end of LS2
Data Centre
I Brocade MLXE being replaced by Juniper QFX10008
I Backbone & External router replacement completed
I Good progress in the data centre despite COVID - work to complete by end-2020
19
Network Infrastructure Upgrades
Campus network
I Upgrade of aging campus infrastructure has started
I Deploy new switches (ICX7150) first, then upgrade the routers
I Upgrade all fibre infrastructure to be 10Gb capable - can select new router model at end of 2021 with no need to support 1Gb uplinks
I All non-Aruba access points have been replaced by Aruba models
I Controller-based Wi-Fi service coverage has been extended to many technical areas
I Wi-Fi is no longer a separate service offering, but a feature of the Campus network service
External Connectivity
Upgraded 100Gbps LHCOPN links for Nikhef, CNAF, KIT, IN2P3 and RAL (WIP)
I Upgrades expected before Run 3 for NDGF and PIC as well as KIT backup link Two additional 100Gbps links to GEANT expected by end-October
I LHCONE traffic
I generic R&E traffic and traffic to cloud service providers with GEANT access Dedicated 400Gbps link to Amsterdam is up and unning and peering established with Nikhef
I No data transfer as yet, but discussions ongoing to use this link for WLCG data transfer tests
I Expectated to stress the storage infrastructure more than the network links
21
Fixed Telephony and IoT
Fixed Telephony
I Production-ready Android and iOS CERNphone application plus the necessary infrastructure
I IT-wide pilot to start soon!
LoRaWan introduced as a service
I Joint effort from IT-CS & IT-DB groups
I Successful test of signal transmission in tunnels via radiating cable
I Infrastructure to be reinforced to support COVID tracker devices
App distribution for iOS
Apple is now enforcing a restriction on apps that address a closed community, rather than being of general interest, being distributed via the App Store.
I Such apps can, however, be distributed via the Developer Enterprise Program (to employees) or via the Apple Business/School manager to people who have an Appstore account linked to the relevant country.
Apple has refused to allow us to distribute the CERNphone app via the App Store and this may also affect future versions of the CERNbox client.
I The Business/School manager programs require us to have presence in many countries. Apple have suggested the Developer program is not applicable but we’re investigating, contractors and "permitted users" may be covered, not just employees.
Institutes providing apps to students (rather than the general public) may be affected by this position.
I Use of the Business/School manager option may be more feasible than for CERN but is likely a more complicated channel than the App Store.
23
App distribution for iOS
This Tuesday
NOTED - identification of data transfer in FTS to understand network traffic
by Joanna Waczynska
This Wednesday
Towards a redundant, robust, secure and reliable IoT network by Christoph Merscher
CERNphone - CERN’s upcoming softphone solution by German Cancio
25
Collaboration services
Indico 2.3 released - new paper review/editing workflow + many other improvements
I Quite a few by external contributions (UN, IEEE)
Newdle - new scheduling tool as a CERN service and Open Source software
I Plans: tighter integration with Indico, room booking, and new groupware service Conference rooms
I CERN Main Auditorium technical upgrade with DANTE technology
I Limited deployment of Intel Unite wireless presentation system in meeting rooms Webcast
I Starting project to replace home-made lecture recording service with Open Source Opencast system
I Starting to train the MLLP automated transcription system with CERN data
Application and Devices
COVID response
I Experimental VPN access to allow access to internal resources, inlcuding licence servers and - in special cases - Active Directory and device management
I Adjusted endpoint hardware provisioning and repair procedures to allow for off-site pick-up and configuration
I Leveraged CERNBox for work backup while users telework (info campaign)
I Scaled out infrastructure
27
Infrastructure scale-out for telework
Scale-out of the infrastructure required to cope with increased load due to telework
I Windows Terminal Servers clusters scaled out to serve 4000 interactive sessions per day
I Terminal Services Gateway - for access to the office PCs and terminal servers dedicated for accelerator complex - scaled out to serve up to 1000 concurrent connections
Windows Client Support and DFS to CERNBox homedir migration
Windows Client Support - completed migration from Windows 7 to Windows 10
I Massive in-place migration campaign over many months
I Some machines remained on Windows 7 mostly due to hardware or software compatibility issues - other mitigations apply
DFS to CERNBox homefolder migration
I Migrated 30000 home folders since Sep ’19
I Mostly transparent to end-users (folders remapped)
I Training courses organised to demonstrated new features (colaborative work, sharing folders, etc.)
I New accounts get CERNBox home folders by default
29
This Monday
CERN Appstore for BYOD devices
by Tamas Bato
Computer Security
This Tuesday
Computer Security Update by Nikolaos Filippakis
31
Talks from CERN this week
Talks from CERN this week
Computer Security Update
I by Nikolaos Filippakis
Anomaly detection for the Centralized Elasticsearch service at CERN
I by Ulrich Schwickerath
Simulated phishing campaigns at CERN
I by Sebastian Lopienski
CERN Cloud Infrastructure status update
I by Thomas George Hartland Kubernetes in the CERN Cloud
I by Thomas George Hartland EOS storage for Alice O2
I by Michal Kamil Simon
33
Talks from CERN this week
CERN’s Business Continuity Working Group
I by Helge Meinhard
CERNphone - CERN’s upcoming softphone solution
I by German Cancio
FTS: Towards tokens, QoS, archive monitoring and beyond
I by Mihai Patrascoiu
Roadmap for the DNS Load Balancing Service at CERN
I by Kristian Kouros
MySQL High Availability in the Database On Demand service
I by Abel Cabezas Alonso
Towards a redundant, robust, secure and reliable IoT network
I by Christoph Merscher
Talks from CERN this week
NOTED - identification of data transfer in FTS to understand network traffic
I by Joanna Waczynska
CVMFS service evolution and infrastructure improvements
I by Enrico Bocchi MAlt Project
I by Maite Barroso Lopez CERN Appstore for BYOD devices
I by Tamas Bato
Production deployment of the CERN Tape Archive (CTA) for ATLAS
I by Julien Leduc
35
home.cern