Responding to Canada’s
Research Computing Needs:
Jonathan Dursi, CTO
Researcher input and the national
planning process
Memorial University of Newfoundland
McGill University Concordia University
Dalhousie University Université de Sherbrooke
Saint Mary's University
Université de Montréal Université du Québec à Montréal
Carleton University
Université du Québec à Trois-Rivières
Queen’s University
Université Laval
University of New Brunswick St. Francis Xavier University
University of British Columbia University of Calgary
University of Alberta
Simon Fraser University
University of Saskatchewan University of Regina
University of Victoria
Genome Sciences Centre
University of Prince Edward Island
Laurentian University Lakehead University University of Ottawa University of Manitoba
Compute
Canada
Project
Compute Canada Project
Goals
• National Platform for Advanced Research Computing Expertise, Services, and Infrastructure
• Bring an entire national collection of expertise,
services, and compute/data
resources to bear on individual research problems
Memorial University of Newfoundland
McGill University University of Ontario Institute of Technologies (UOIT) Concordia University Dalhousie University Université de Sherbrooke
Saint Mary's University Université de Montréal
Université du Québec à Montréal
University of Toronto
Toronto Hospitals
Wilfrid Laurier University University of Waterloo
University of Guelph McMaster University
Brock University Carleton University Université du Québec à Trois-Rivières
Queen’s University
Université Laval University of New Brunswick
St. Francis Xavier University University of British Columbia
University of Calgary University of Alberta Simon Fraser University
University of Saskatchewan
Member Univerisity and Personnel Site Member Univerisity, Personnel, and Infrastructure Site
University of Regina
University of Victoria
Genome Sciences Centre
University of Prince Edward Island
Member Univerisity York University Laurentian University Lakehead University University of Ottawa University of Manitoba
Who is Compute Canada?
• The Compute Canada project team:
• National office of the same name
• Keep the funding flowing, keep coordination on track, advocacy, national/international collaborations, etc
• Regional organizations with autonomy, flexibility,
responsibility to implement parts of the platform • WestGrid in Western Canada
• National-level coordination on services, interoperability,
knowledge exchange
What’s it all for? - Now
• Now: Enable Canadian research that
needs HPC by
• Providing access to HPC resources
• Providing a national network of
experts (140 technical staff) to help
Researcher Input
• Cannot be successful without strong
mechanisms for Researcher Input.
• Must provide what researchers need
• while also keeping eye out for
upcoming technologies researchers will need.
Researcher Input
• Many regional organizations are very well
integrated into existing research communities
• Get constant feedback
• Not always disseminated, combined across
the country
• Get different types of input
• Weaker connections to, input from,
research communities not already already big users (= successful with current setup)
National Researcher Input
• Try to get broad, routine, national,
researcher input to bring into
• Operations
• Longer term planning
National Researcher Input
• ACOR (Advisory Council on Research)
• National, broad, panel of researchers; advisory to the board
• Recent Researcher Needs survey
• National ~15 min online survey, 425 completed
• Strategic Plan consultations
• ~25 national meetings, 500 people engaged • Chief Science Officer, Dugan O’Neil, SFU
Today
• Want to tell you today about
• The planning process - Strategic Plan, Management Plan
• The results of the national consultations
(strategic plan town halls, survey) have been • How these inputs are shaping plans
• Want your feedback on those results • Want your feedback on mechanisms for
Strategic Plan
• High-level set of basic purpose, goals of
project
• Build written national agreement about
what business we’re in, what broadly we should be doing nationally to enable
research
• Will not lay out specific projects to meet
those goals - more cloud computing, different training programs, etc.
Strategic Plan
• Needed because, though has existed since
~2006, still not widespread agreement about basic mission, what does and
doesn’t fall within remit.
• Also, CFI condition on MSI grant.
Management Plan
• Or “Operations Plan”, or…
• Putting meat on the bones of the
Management Plan
• If these are our priorities, what do we do,
and how do we measure success?
• What is process for determining
Strategic Plan Town Halls
• 24 in-person town-halls, 3 online • St John’s to Victoria • Hundreds of attendeesStrat Plan: What we heard
• People
• Storage
• Ease of use/interoperability
• Funding
• New use cases
• Industry engagement
• National/Regional organization
What we heard: People
• Importance of technical staff in getting research done
• Very high quality of technical staff
• Consulting/support, training extremely
important and must be supported and grow • Access to resources vital - but for many
groups, without access to that technical expertise, the resources have much less value.
What we heard: Storage
• Not nearly enough, even for “traditional”
compute/simulation intensive workflows
• New data-intensive work extremely poorly
supported
• Different types of storage (archival,
What we heard: Usability
• Ease of use/interoperability
• For many, existing systems throw up
roadblocks.
• Systems highly heterogeneous; use a (say)
WG system, start from scratch to use a (say) CQ system.
• Insufficient access to interactive resources
• Insufficient access to web-based interfaces
What we heard: Funding
!
• Wide agreement about need for predictable,
sustained funding
• For hardware refreshes, staff, operations..
• This concern is shared throughout the entire
research community (Digital Infrastructure summit)
• Advocacy has to be an activity of the national
What we heard: New uses
• Data-intensive: storage
• Cloud-type workflows
• Need more support for researchers in disciplines with
emerging compute requirements
• Humanities
• Social Sciences
• Bioinformatics
• Big Data
• …
What we heard: Industry
!
• Need to encourage, develop programs for,
private-sector engagement.
• Make funders happy
• Incoming stream of interesting applied
research problems
• Opportunities for grad students
• Increase HPC adoption
What we heard: Structure
• National/Regional organization
• How will this work?
• Many examples of similar organizations
• Relationship need not be the same in
every region
What we heard: New Tech
• Cutting edge technologies - hardware,
software, etc
• Need leading edge for
• Researchers who want it now
• Evaluation for researchers unsure
• Need to help shape technologies of
particular importance to Canadian research
Strat Plan: What we heard
• People
• Storage
• Ease of use/interoperability
• Funding
• New use cases
• Industry engagement
• National/Regional organization
Strategic Plan
• Is being reflected in the Strategic Plan.
• Stronger emphasis on people, services
• Growth into broader use cases without
sacrificing existing HPC users
• Emphasis on engaging with funders for
predictable, infrastructure-like funding
• Emphasis on engaging w/ private sector R&D
• Negotiating clarity in role of nat’l office,
Researcher Needs
Survey
•
In the field, Oct 2013•
Short (15 min) survey•
Questions about•
Current research tasks•
Current pain points, priorities•
Future growthResearcher Needs
Survey
•
Well-suited to more immediatemanagement plan drafting
•
Concrete, specific computing needsGoals of Survey
•
Broad community results•
Identify groups for followup - to occur•
One-on-one interviews•
Deep dive into future needs inrepresentative groups
•
Develop a picture of what is needed byComplementary Data
•
International surveys on academic researcher needs(XSEDE, PRACE)
•
International surveys on commercial researcherneeds (IDC, Intersect360)
•
Ontario: ORION/Ontario Needs Assessment(~50-100 interviews)
Responses
Academia Federal Govt Prov Govt Hospital Other Not for Profit Int'l Enterprise Cdn Enterprise Cdn. SME Sector Responses by Sector•
~425 responses•
Overwhelmingly academic•
Results reflect:•
Relative lack of communications power outside academic circles•
Consultation fatigue:CIHR, SSHRC, CFI were all in field simultaneously
Text fields
•
Asked about•
current research projects,•
data,•
analysis,•
current bottlenecks,•
problems with existing services/resources
•
What we heard: Storage
•
Storage•
Researchers wasting time shuffling,compressing, re-generating data
•
More storage of various typesneeded
Aging Infrastructure
•
Aging infrastructure was a repeatedproblem
•
~50% of respondents said they coulddo more, better research if they had more access to compute/storage
•
Unreliability, related in part to agingUsage Barriers
•
Lack of interactive access•
Lack of large-memory nodes•
Long queue times•
Last-mile connectivity•
Uptime/reliabilitySharing
•
Many researchers would greatly like toenable web interfaces to their work
•
Data sharing (secured or public) wasSupport
•
Very strong vote of confidence overall insupport, training
•
Many don’t seem to know the breadthof support available
•
Many people need help withautomation, optimization
Other Natural Sciences Health Sciences Engineering Arts And Letters
General Research Area
Range of
disciplines
• 2/3 still traditional physical
science/HPC areas
PC Multiple PCs Group Server Dept Server Group Cluster Dept Cluster Collab Cluster CC Abroad $$ Cloud Other
Where Computation is Done
• (amongst respondents)
• Very little use of commercial
cloud
• Significant use of independent
resources
CC significant
fraction of research
computing
Too Slow Too Small Missing S/W Mismatched H/W Bad Interface Delays in Access Unreliable Lack of Viz Other Current Problems
Current
pain points
• Existing resources: • Too slow/small • Delays in access• Lack of particular hardware/
software
Current
priorities
• Traditional compute, storage,
training are top three priorities
Training Storage Web/Database Traditional Compute Collaboration Hosting Viz Helpdesk Consultancy
Training
• Almost all groups think that
having additional expertise to bring to bear on their problems would be useful 0 10 20 30 40 % Respondents
Training
• Almost all groups think that
having additional expertise to bring to bear on their problems would be useful
• …for a variety of reasons
comput
data
run
simul
larger
analyz access storag better set resourc time system faster current softwar resolut analysi process competit higher research result improv intern much need product effici abl help model also effect allow expertis can limit make perform use dataset larg problem develop order scale tool visual addit a vail experi get requir routin analys generat increas po w er space longer magnitud project support technic will work canada memori new understand code realist autom complex core group job now paramet program size studi mani par allel queue train algorithm calcul high hpc issu long physic processor public quick simultan across cluster contin u explor file genom includ machin method might one possibl someon statist differ general move network node reliabl right sampl sophist step thing accur amount becom case cpu done exist experiment facil gpus greater implement interest like materi molecular optim potenti remot script site speed test user way westgrid applic area around bigger compar complet enabl even find focus futur hardwar librari main number obtain particular peopl point produc rather reduc share solut structur text theori thus within write appli collabor comparison databas demand detail dimension field forc good great handl inform lead less lot manag output part post procedur realli secur shorter signific store student various year add advanc approach appropri ask assembl atom benefit best bioinformat build challeng collect concern consid domain dynam effort engin essenti exampl factor giv e given gpu hard hous individu input instead intens interact keep lab look maintain member oper packag pipelin provid question reason repres search see sequenc server singl solv sta y take though transfer tw o variabl without 100 abil actual alloc backup barcod behind big bootstrap canadian capac carri cloud common concurr contain creat date difficult digit easili equip function fund graduat human hundr imag industr i instal instanc interf ac just know lack least lif e ma y mean mine minim multipl necessari next observ off er often open overal person platform present properti quicker rapid real recent scienc similar skill sometim sooner specif spend substanti sure tackl technolog term thousand togeth tri turbul turn unfor tun wait wellNot applicable 5 ... Very Satisfied 4 3 2 1 ... Very Dissatisfied
Satisfaction with CC Computational Resources
CC Compute
• Users mostly satisfied or very
satisfied with existing compute resources..
CC Compute
• Users mostly satisfied or very
satisfied with existing compute resources..
• But ~1/3 of users will need more
than 3x as many resources to stay internationally competitive Unsure Decrease <3x Decrease >3x Same Increase <3x Increase >3x
Storage Usage
• Mostly fairly modest
• Mostly satisfied with CC storage
where applicable More than 1 PB 500 TB ... 1 PB 100 TB ... 500 TB 10 TB ... 100 TB 1 TB ... 10 TB Less than 1 TB Not sure 0 10 20 30 % Respondents Storage Currently Used
4 3 2 1 ... Very Dissatisfied
Storage Usage
• Mostly fairly modest
• Mostly satisfied with CC storage
where applicable
• But will again need substantial
increases to stay internationally competitive Unsure Decrease <3x Decrease >3x Same Increase <3x Increase >3x
Not using CC
• Good, in a way
• More disuse because wasn’t
aware (fixable) and don’t need (yet) than unsuitable
• But still need to work on the
unsuitable Wasn't Aware Don't Need Too hard Specific Needs Other
Next Steps: Survey
Academia Federal Govt Prov Govt Hospital Other Not for Profit Int'l Enterprise Cdn Enterprise Cdn. SME Sector Responses by Sector•
Next steps:•
Begin followup surveys•
Continue push intounderrepresented sectors
•
Compare withNext Steps Broadly
•
Share SP draft in a couple of weeks•
Begin management plan as SPconverges
•
Incorporate Research needs surveyFuture Researcher
Consultation
•
What should we do?•
ACOR•
New web page, updated with usefulinformation, planning, etc.
•
Annual town halls (+ town hall atHPCS?)