U.S. CMS Software and Computing Project
U.S. CMS Software and Computing Project
Lothar A T Bauerdick, Fermilab
Lothar A T Bauerdick, Fermilab
U.S CMS Collaboration Meeting
U.S CMS Collaboration Meeting
at UC Riverside May 2001
CMS Software and Computing
CMS Software and Computing
Scale of LHC computing is greater than anything we have tried in HEP
Scale of LHC computing is greater than anything we have tried in HEP
LHC Collaboration
Number of collaborating physicists Number of collaborating institutions Number of collaborating nations Geographic distribution
Projected length of the experiment Size and complexity of dataset
Distributed object data base
This demands a new model of computing
This demands a new model of computing
CMS WorldWide1800 Physicists 150 Institutes 30 Countries
U.S. Part of World-Wide Computing for CMS
U.S. Part of World-Wide Computing for CMS
LHC Computing Model:
distributed multi-tier architecture
Regional Computing centers outside CERN
CMS Tier1 centers:
U.S. Tier1 at Fermilab
also Lyon, RAL, INFN, Moscow CMS Tier2 centers:5 U.S. Tier2 centers
CMS in total ~ 25Coordinated use of distributed computing resources Communication and collaboration at a distance Data Grids as a new form of distributed systems
Software Engineering:
distributed software development for physics analysis
software and computing R&D
Research e.g. in Data Grids, distributed object dbase, … data challenges as check-points
prototype software and prototype computing installations funding for prototypes during 2001 to 2004
These are key elements to its eventual success
(U.S.) CMS Distributed Computing Model
(U.S.) CMS Distributed Computing Model
Tiered Computing model puts U.S. CMS Computing at Labs and Universities
Tiered Computing model puts U.S. CMS Computing at Labs and Universities
Draws strength and exploits the synergy between
Draws strength and exploits the synergy between
U.S. Universities and Fermilab
Software Professionals and Physicists
Takes advantage of and contributes to key developments in the US
Takes advantage of and contributes to key developments in the US
in information technology
in information technology
Drive towards a high speed network infrastructure
Development of ever better network software and applications such as Grid computing concepts
Significant funding, manpower and technologies from Grid projects
It takes advantage through the Tier2 centers of the significant strengths of US
It takes advantage through the Tier2 centers of the significant strengths of US
universities in the area of computer science and information technology
Software and Computing Deliverables
Software and Computing Deliverables
Deliverables of the S&C Project
Deliverables of the S&C Project
CORE SOFTWARE: Engineered Infrastructure
CORE SOFTWARE: Engineered Infrastructure
such as distributed object dbase, software development,
such as distributed object dbase, software development,
analysis tools and processes
analysis tools and processes
MAJOR USER FACILITIES and Services:
MAJOR USER FACILITIES and Services:
Tier1 and Tier2 Regional Centers, production management systems
Tier1 and Tier2 Regional Centers, production management systems
for distributed analysis, interface to wide area networks
for distributed analysis, interface to wide area networks
Non-Deliverables of the S&C Project
Non-Deliverables of the S&C Project
Sub-detector and Physics Software
Sub-detector and Physics Software
Local Computing (Workgroup servers, desktops) at home institutions
Local Computing (Workgroup servers, desktops) at home institutions
NB:
NB:
Physicists working on computing R&D, management are off-project
Physicists working on computing R&D, management are off-project
Will be covered
by S&C MOU
Computing R&D
Computing R&D
+
+
User Facility
User Facility
Develop software / systems
Develop software / systems
architectures
architectures
and
and
implementations
implementations
to deal with a new level of complexity:
to deal with a new level of complexity:
Complexity in data: rate, #channels, pile up
Complexity in data structures: persistent objects, data base computing, Grid
Support for an active and growing user community
Support for an active and growing user community
—
—
>
>
User Facility
User Facility
Detector simulation and reconstruction, Higher Level Trigger, Physics TDR “Production” in 2000/1:
~5M events simulated and reconstructed worldwide (1.5M at Fermilab) ~10 Terabyte data stored in an Object Database
Distributed computing strategies have to be verified
Distributed computing strategies have to be verified
by applying them to real world computing problems!
by applying them to real world computing problems!
Tier1 and Tier2 prototypes start in 2000/1, increase complexity techniques and strategies for distributed computing
operation and management of complex systems
50% complexity
in 2004
Resources are in 3 Categories
Resources are in 3 Categories
User Facilities Equipment
User Facilities Equipment
Prototypes
Tier1 at Fermilab
Tier2s at 5 U.S. Institutions
User Facilities Staff
User Facilities Staff
Computing Professionals for Computing-related R&D Tier1 at Fermilab
Tier2 Maintenance
Core Application Engineers
Core Application Engineers
Computing Professionals/Software Engineers for CMS Core Software Support for specific U.S. activities
Prototype
50% complexity
in 2004
+
+
User Facilities: 3-Phases
User Facilities: 3-Phases
Tier1 and Tier2 regional centers: R&D, Equipment, Staff
Tier1 and Tier2 regional centers: R&D, Equipment, Staff
Prototyping: has started in 2000
Prototyping: has started in 2000
Computing R&D
Computing hardware prototyping and test-beds
Computing for Physic Reconstruction and Selection
Deployment: 2005-2007
Deployment: 2005-2007
Assumes LHC startup in 2006 and design Lumi in 2007
Procurement Model: Start deployment in 2005, 30%, 30%, 40% costs Ramp-up of User Facility Staff
Maintenance and Operations: 2007 on
Maintenance and Operations: 2007 on
Constant staff level
“Rolling Replacement” of hardware components, yearly budget 1/3 of initial investment
Buy late
Buy late
—
—
but not too late
but not too late
…
…
Requirement based on Computing Model allow to estimate Computing Cost Large extrapolation factors O(20)
Large uncertainties —> build to cost!
Tier1 Computing Resources
Tier1 Computing Resources
needed in 2007:
needed in 2007:
CPU 200k SI95 (~5000 1GHz Pentium ///) Disk ~ 1 PetabyteTapes 1.5 Petabyte
Initial Computing Hardware Costs:
Initial Computing Hardware Costs:
$10M ~
$10M ~
O
O
(1 RunII exp) = $9.1M
(1 RunII exp) = $9.1M
Hardware Costs and Procurements
Hardware Costs and Procurements
Moore
2001
2001 20022002 20032003 20042004 20052005 20062006 20072007
1.1 Tier 1 Regional Center
1.1 Tier 1 Regional Center 1313 1717 2020 1.2 System and User Support
1.2 System and User Support 1.31.3 2.252.25 33 33 55 4.54.5 4.54.5 1.3 Operations and Infrastructure
1.3 Operations and Infrastructure 11 1.251.25 1.251.25 33 55 4.54.5 4.54.5 1.4 Tier 2
1.4 Tier 2 RCsRCs 2.52.5 3.53.5 44 55 6.56.5 7.57.5 7.57.5 1.5 Networking
1.5 Networking 0.50.5 11 22 2.52.5 33 33 33 1.6 Computing and Software R&D
1.6 Computing and Software R&D 33 44 4.54.5 1.41.4 0.60.6 0.60.6 0.60.6 1.7 Construction Phase Computing
1.7 Construction Phase Computing 22 22 22 0.25 0.25 1.8 Support FNAL based Computing
1.8 Support FNAL based Computing 0.40.4 1.41.4 11 2.62.6 2.252.25 22 22 User Facilities (total)
User Facilities (total) 1010 1515 1818 1818 3535 3939 4242 (Without T2 Personnel)
(Without T2 Personnel) 7.57.5 11.511.5 1414 1313 2929 3232 3535
Comparison to RunII: D
Comparison to RunII: DØØ / CDF have 36/33.5 FTE! / CDF have 36/33.5 FTE!
User Facilities Staff
User Facilities Staff
Detailed Bottom-up analysis of needs for Tier1 and Tier2 facilities
Detailed Bottom-up analysis of needs for Tier1 and Tier2 facilities
Full time staff working on User Facilities tasks:
CMS Software Cycles and Milestones
CMS Software Cycles and Milestones
The OO “Functional Prototype” phase completed in 2000 Entering new development iteration:
Fully Functional pre-production Software in 2001-2 Distributed Object Data Base System
Production System Software to be ready for first run in 2006
These Milestones are linked with the major CMS Milestones:
These Milestones are linked with the major CMS Milestones:
Technical Design Reports on DAQ, Software, Physics
Technical Design Reports on DAQ, Software, Physics
May 2001 CMS MILESTONES
CORE SOFTWARE
End of Fortran dev
CMS GEANT4 simulation 1 2 3 4
Framework 1 2 3 4
Det Reconstruction 1 2 3 4
Physics Object Reconstruction 1 2 3 4
User Analysis Environment 1 2 3 4
1 Proof of Concept 3 Fully functional
2 Funct. Prototype 4 Production system
1998 1999 2000 2005
Jun-98
Jun-98 Dec-99 Jun-00 Dec-03
2001 2002 2003 2004
Jun-98 Dec-99 Dec-01 Dec-03
Dec-98 Dec-99 Jun-02 Jun-04 Mar-99 Jun-00 Dec-02 Dec-04 Dec-98 Jun-00 Dec-02 Dec-04
Software Engineering
Software Engineering
Software Engineering
Software Engineering
U.S. contributes its share of ~25% to total Core Software Engineering effort
U.S. contributes its share of ~25% to total Core Software Engineering effort
Main U.S. contributions in Software Architecture Interactive Analysis Distributed Computing
8 Engineers now, ramping to 14
This will be subject of an
This will be subject of an
iMOU
iMOU
on Software and computing (End 2001)
on Software and computing (End 2001)
FTE Profile CMS Core Software and Computing
16 29 40 43 44 46 42 7 8 9 10 10 10 10 0 5 10 15 20 25 30 35 40 45 50 Total CMS Offline + Online S/W FTE 16 29 40 43 44 46 42 U.S. CMS Core-SW contribution 7 8 9 10 10 10 10 2000 2001 2002 2003 2004 2005 2006
U.S. Share of CMS Software FTEs USA (DOE and NSF) Italy CERN (CMS RDMS- France-Germany UK Hungary Belgium India Other China Spain Switzerland (ETHZ)
LHC Total Initial Computing Cost:
~$150M
CMS and U.S. CMS S&C
CMS and U.S. CMS S&C
US plans and progress are important driving forces for
US plans and progress are important driving forces for
discussions of LHC Computing at CERN and elsewhere
discussions of LHC Computing at CERN and elsewhere
U.S. Model and Costing sanctioned by
U.S. Model and Costing sanctioned by
“
“
Hoffmann Review
Hoffmann Review
”
”
LHC Computing Review PanelsSoftware Project
Worldwide Analysis/Computing Model Management and Resources
Report of the Steering Group published end of Feb 2001
Findings
Findings
total LHC computing hardware initial costs estimated 240 MCHF CERN Tier0 and Tier1 represents ~ 1/3 of overall capacity
Recognize need for Prototyping and Software Engineering
Total Costs FY01 .. 06 is 54.7 M$ (escalated) + 17.5 in FY07
Total Costs FY01 .. 06 is 54.7 M$ (escalated) + 17.5 in FY07
Integral Funding Profile is 52.5 FY01 to FY06, assume 17.5 in out-years
Integral Funding Profile is 52.5 FY01 to FY06, assume 17.5 in out-years
Changes since Nov2000: shift in LHC schedule
US CMS S&C Project Costs (prelim.)
US CMS S&C Project Costs (prelim.)
Total Project Costs and Funding Profile
2 2.5 3.5 5.5 9.5 12.5 12.5 3.5 4 5.5 8.5 13.5 17.5 17.5 0 2 4 6 8 10 12 14 16 18 20 2001 2002 2003 2004 2005 2006 2007
CAS labor UF Tier1 labor UF Tier2 labor UF Tier1 h/w UF Tier2 h/w Project Office
Mgmt Reserve DOE profile DOE+NSF(assumed) FY
Million AY$
Spread out R&D
Tier1/2 deployment
!
Delayed deployment at T1 (starts 2005, in time ) will need some thinking –3.5 FTE from UF ‘til 2003 –1 FTE from CAS ‘til 2003 –1 FTE in PO
Put in “reality” for 2001 Need to work on FY2002 Staging of R&D in UF
Management Team, Committees
Management Team, Committees
Project Management team in place since November 2000
Project Management team in place since November 2000
Level 1 Project Manager
Lothar Bauerdick/Fermilab, Lucas Taylor/North Eastern U deputy Level 2 Project Manager User Facilities
Vivian O’Dell/Fermilab
Level 2 Project Manager Core Application Software Ian Fisk/UC San Diego
U.S. Advisory Software and Computing Board
U.S. Advisory Software and Computing Board
Elected: Irwin Gaines/Fermilab chair
Paul Avery/Florida, Bob Clare/UC Riverside,
Sridhara Dasu/Wisconsin, David Stickland/Princeton, Shuichi Kunori/Maryland Ex-officio: L1PM, L2PMs, chair U.S. CB, PM CCS,
L1PM construction, U.S. physics coordinator, head Fermilab CD
Fermilab Oversight: PMG
Fermilab Oversight: PMG
U.S. CMS PMG sub-group for Software and Computing
Chair: Mike Shaevitz, Fermilab Associate Director of Research
Added to PMG: L1/2PMs, U.S.ASCB chair, U.S.CB chair, Fermilab CD head 3 experts: Ruth Pordes, Erik Gottschalk, Mike Diesburg
This is our
Interface
to U.S. CMS!
Reports, reviews etc
Reports, reviews etc
2nd Quarterly
2nd Quarterly
Progress Report
Progress Report
released last Friday: FY2001Q2
released last Friday: FY2001Q2
SCOP review
SCOP review
, reporting to Fermilab Directors for Project Oversight
, reporting to Fermilab Directors for Project Oversight
Friday May 25, 9:00 .. 15:00 at FermilabStatus report
Status report
to DOE/NSF
to DOE/NSF
at Washington/NSF May 30Proposed date of
Proposed date of
DOE/NSF review
DOE/NSF review
New date November 27-30 at Fermilab (was: October 9-12) plan: baseline the project…
Common meetings
Common meetings
of (U.S.) LHC S&C community
of (U.S.) LHC S&C community
Networking meeting at Indiana U. June 1-2FY2001 Plans and Funding
FY2001 Plans and Funding
Make the most ofMake the most of Functional Functional
Prototype
Prototype
Software Softwareuse to simulate and reconstruct events for Higher Level Trigger studies
develop tools and systems needed to deliver large and high quality data samples
Proceed towards
Proceed towards Fully Functional
Fully Functional
Software Software milestone milestoneIterate and extend on Software Architecture
Develop Distributed Production and Analysis Environment
Deploy User Facilities and R&D Systems at Tier1 and Tier2 centres
Project Funding
Project Funding in FY2001 from DOE and NSF to support that program: in FY2001 from DOE and NSF to support that program: (but problems with NSF part of the funds!)
(but problems with NSF part of the funds!)
U.S. CMS Software and Computing Caltech UCSD FNAL NEU Princeton UC Davis pT2'2 TOTAL Core Applications Software (CAS) FTE
2.0 1.5 2.5 1.0 1.0 8.0 User Facilities (UF) FTE
0.5 0.5 4.0 0.8 5.8 TOTAL FTE 2.5 0.5 5.5 2.5 1.0 1.0 0.8 13.8
CAS Personnel (salary, PC, travel,...) 317.8 238.3 400.0 158.9 119.2 1,234.2 UF Personnel (salary, PC, travel,...) 75.0 75.0 635.4 100.0 885.4 UF Tier 1 Equipment 410.0 410.0 UF Tier 2 Equipment 200.0 200.0 350.0 750.0 Project Office, Management Reserve 220.4 220.4
Funding Status
Funding Status
Received $1500k from DOE in February and April
Received $1500k from DOE in February and April
+ $500k loan from Construction Project
+ $500k loan from Construction Project
Support CAS and UF personnel, start Project Office Go-ahead for Tier-1 upgrade —> Viv’s talk yesterday
Requested $1500k from NSF
Requested $1500k from NSF
NSF Advice: $1000 planned for FY2001 + $500 for prototype Tier 2 end FY2000 Received only $320 for CAS engineers (NEU) (+ $80 expected)
No concrete news from NSF! Status report meeting at NSF on May 30
All funds in AY$ x 1000 FY2000 FY2001 Total
Requested Received
DOE 1164.7 1500.0 1500.0 2664.7 NSF 310.0 1500.0 400.0 1810.0 Loan from U.S. CMS Detetector Project 500.0 500.0 500.0
Grid Projects
Grid Projects
Two Grid R&D proposals submitted with substantial U.S. CMS contributions
Two Grid R&D proposals submitted with substantial U.S. CMS contributions
PPDG (DOE
PPDG (DOE
SciDAC
SciDAC
, CMS PI: Harvey Newman): approved!
, CMS PI: Harvey Newman): approved!
deployment of Grid technologies in vertically integrated systems for CMS CMS deliverables worked on w/ PPDG provided FTE + CS groups
Funding for personnel at Caltech, UCSD, Fermilab
iVDGL (NSF ITR, PI: Paul Avery)
iVDGL (NSF ITR, PI: Paul Avery)
Funding for CMS Tier-2 prototype efforts: (+ATLAS, Ligo, SDSS) mostly facilities + maintenance
Some funds for Grid integration into CMS s/w See Paul’s talk
IMHO The production Tier-2 funds should be handled in a different way:
IMHO The production Tier-2 funds should be handled in a different way:
The Tier-2 funds should be controlled by the CMS project
The Tier-2 funds should be controlled by the CMS project
This will require
Prototype Tier-2 centers
Prototype Tier-2 centers
Successful deployment and commissioning of (1st stage of) Cal pT2!
Successful deployment and commissioning of (1st stage of) Cal pT2!
University of Florida was selected as 2nd prototype Tier-2 site
University of Florida was selected as 2nd prototype Tier-2 site
Procedure as described in my letter to collaboration
Procedure as described in my letter to collaboration
Solicitation for proposals (deadline April 15) Report of UF Level-2 PM to Level-1 PM
Discussion to get input from U.S. ASCB (meeting April 24) Selection of Florida unanimous
Funding of the prototypes through iVDGL will start FY2002
Funding of the prototypes through iVDGL will start FY2002
WBS, MOU and SOW in preparation (also w/ Caltech/UCSD)
selection procedure for final Tier2 sites
selection procedure for final Tier2 sites
will need discussion in collaboration
will need discussion in collaboration
Should be proposal driven; criteria: U.S. ASCB, but much discussion required Review board to evaluate proposals, membership TBDRecommends sites/proposals with greatest benefits to U.S. CMS and CMS Final decision by L1PM
Time scale: end of 2002
Core Applications Software (CAS)
Core Applications Software (CAS)
4 areas of software development4 areas of software development
WBS 2.1 CMS Software Architecture
U.S.: e.g. get CARF, Object Dbase, C++ know-how
U.S.: e.g. get CARF, Object Dbase, C++ know-how
WBS 2.2 Interactive User Analysis
U.S.: e.g. schema for
U.S.: e.g. schema for ““AODAOD””, prep for Physics TDR tools, prep for Physics TDR tools
WBS 2.3 Distributed Data Mgmt and Processing
U.S.: e.g. production tools at e.g. Fermilab,Caltech,UCSD,Wisconsin,
U.S.: e.g. production tools at e.g. Fermilab,Caltech,UCSD,Wisconsin,
concepts and tools for distributed analysis
concepts and tools for distributed analysis
WBS 2.4 User Support: 25% of each engineers time
U.S.: e.g. software engineering support; physicist formulates a project for CP,
U.S.: e.g. software engineering support; physicist formulates a project for CP,
engineer can help designing / implementing; need to estimate time & involvement
engineer can help designing / implementing; need to estimate time & involvement 8 CAS software engineers on project at 5 institutions
8 CAS software engineers on project at 5 institutions
Vladimir Litvin Caltech DDMP/ Prod. / Arch. Iosef Legrand Caltech (CERN) DDMP
Tony Wildish Princeton (CERN) Prod. / DDMP/ Arch. Michael Case UC Davis Arch.
Lassi Tuura Northeastern (CERN) Arch./ IGUANA Ianna Gaponenko Northeastern (CERN) IGUANA
Hans Wenzel Fermilab Prod./ Arch. Greg Graham Fermilab DDMP
Software
Software
req
req
. from PRS (Paris,
. from PRS (Paris,
CPT
CPT
week)
week)
Analysis tools
Analysis tools
: Needed soon. The sooner the better; Physics TDR(s):
: Needed soon. The sooner the better; Physics TDR(s):
Clearly, aim is not to determine/influence detector designProbably biggest side-effect/purpose will be the training of people
Probably biggest side-effect/purpose will be the training of people
We’ll use “near-final” software and “near-final” tools
Therefore: analysis tools must be ready and people trained on them at least a year before the Physics TDR. That’s End 2003.
Therefore, year 2003 is the year of deployment/training of people (tight, would be better to extend into 2002…)
Will need first full deployment end 2002. With the DAQ TDR. Timing would be good; finish TDR, presumably more time(?)
Estimated size of
Estimated size of
event samples
event samples
needed for Physics TDR studies
needed for Physics TDR studies
Expect that we can continue at ~ the same rate as up to nowAbove statement accurate to a factor 2 (but: CPU for tracking?)(but: CPU for tracking?)
More would be welcome (and even helpful) Data access issue(s) could be looked at Organization could be strengthened
PRS Groups Getting More
Involvded Into Production
Summary of Engineering Efforts
Summary of Engineering Efforts
Spent effort is assessed onthe basis of regular effort reports from engineers and managers, using a bottom-up approach, where for each WBS item
expenditure of labor is accumulated.
For the User Facilities subproject this is
reconciled with the monthly effort reports from the
departments of Fermilab Computing Division.
As part of the reporting for the Core Application
Software subproject the engineers submit monthly reports for effort tracking.
This system is successful!
This system is successful!
WBS item FY2001Q1 FY2001Q2 Total effort
projected FY2001
Total project-funded effort
FY2001
WBS 1.1 Tier 1 Regional Center inactive
WBS 1.2 User Support 0.26 0.32 1.31
WBS 1.3 Maintenance and Operations 0.03 0.02 0.63
WBS 1.4 Tier 2 Regional Centers 0.25 0.20 2.50
WBS 1.5 Tier 1 Network 0.02 0.02 0.50
WBS 1.6 Software and Computing R&D 0.44 0.34 2.69
WBS 1.7 Detector Construction Phase Computing 0.41 0.49 2.00
WBS 1.8 Support for Fermilab Based Computing 0.03 0.01 0.38
Total FTEs at Fermilab Regional Center 1.19 1.20 7.50 4.05
Total FTEs 1.44 1.40 10.00 6.55
WBS item FY2001Q2 Total effort
projected FY2001 Total project-funded effort FY2001 WBS 2.1 Software Architecture 0.44 WBS 2.2 Interactive Graphics and User Analysis 0.06 WBS 2.3 Distributed Data Management and Proc 0.75 WBS 2.4 User Support 0.75
Total FTEs 0.00 2.00 8.50 7.75
User Facilities
U.S. CMS S&C Summary
U.S. CMS S&C Summary
A large part of the computing and software engineering
A large part of the computing and software engineering
for CMS will need to be provided from outside CERN
for CMS will need to be provided from outside CERN
to fully yield the compelling physics potential of CMS
to fully yield the compelling physics potential of CMS
The U.S. CMS S&C Project is going to
The U.S. CMS S&C Project is going to
deliver the necessary efforts and computing resources to
deliver the necessary efforts and computing resources to
enable U.S. physicists making a major impact on
enable U.S. physicists making a major impact on
physics analysis within CMS