June 2011
Date: 27/06/2011
Report of the e-Infrastructure
Advisory Group
2
Executive summary
This report sets out the findings and recommendations of the e-Infrastructure Advisory Group
commissioned and chaired by BIS into the activities and recommendations put forward by the 2009 RCUK
International Review of e-Science and the 2010 report “Delivering the UK’s e-Infrastructure for Research
and Innovation”.
The advisory group was composed of representatives from UK Research Funders (RCUK and Wellcome
Trust), Higher Education Funding Councils (DELNI, HEFCE and SFC) and Universities UK. Its membership
and Terms of Reference are at Annex A to this report.
As well as reviewing these reports, a consultation exercise was undertaken to provide a baseline of the
current level of infrastructure provision at the local and national level, an understanding of the processes
by which future provision is currently determined and to highlight infrastructure elements where support
and development is required to enable sustainability. This consultation canvassed UK research councils,
UK HEIs and selected end-user organisations to provide a range of perspectives to take into account
geographical issues (e.g. consortia, regional funding etc), institutional research focuses and degree of
utilisation and experience of e-infrastructure.
Key points
The Advisory Group recognised the following emergent points from the review and consultation exercise:
• A priority need for the continuation and development of a dedicated research network infrastructure
such as that currently provided by JaNET for researchers requiring the highest data bandwidth and
lowest latencies;
• The decline in research grade e-literacy among UK researchers was seen as a concern and this was
magnified by the decreasing flow of computational specialists from undergraduate to postgraduate
level;
• Addressing big, small, and persistent data issues requires much more coordination and clearer
leadership to understand the variation in requirements and importance across the research base in
order to design a clear and tractable response;
• The high level of investment and commitment to the development of e-infrastructure at the HEI level
is encouraging and has led to significant growth in UK capability. To ensure continued growth there is
now a need for coordination to ensure that this is sustainable and joined-up to national and regional
drivers strategic drivers as well as the ambition of individual HEIs;
• There is a clear and ongoing need for provision of a HPC National service to provide the capability to
tackle leading edge computational science and simulation with mid-range resource being provided
on a regional/research cluster basis;
• Scientific software development is a clear area for action to derive the maximum benefit from
current and next generation computer architectures and provision paradigms;
• Future leadership and development of strategy for e-infrastructures needs to be taken forward with
a matrix of stakeholders rather than a single organisation. Development will also need reflect the
varying rates of development and maturity for each element in the e-infrastructure;
• Funding from Central Government should be reserved for infrastructure that would provide a
recognised National capability and/or provide benefit to more than one research institution or
community;
3
Recommendations
To address the key points identified the Advisory Group makes the recommendations that:
• Future development of activities be taken forward via a stranded approach recognising the fact that
each element of the overall e-infrastructure has its own timescales with respect to methodological
and technological development;
• The following strands form the focus for developments: Research Networks, People & Skills, Data,
Compute Infrastructure, Software Development and Security & Authentication;
• The continuing provision and development of a UK Research Intensive Network Infrastructure be
treated as the highest priority. It is further recommended that this strand be taken forward with the
full engagement of UK Research Funders and research leaders;
• Responsibility for leading the People and Skills strand lie with the Higher Education Funding Councils
and that this be taken forward in partnership with the RCUK and learned Societies to ensure
alignment with research priorities and skills shortage areas;
• Development of the Data strand to be taken forward by a consortium of Research Funders (RCUK and
Charities) to cover the breadth of need across the research base and to capture the texture of
differing applications within the strand;
• Research Councils retain their coordinating role for investment in National High Performance
Computing facilities and that the scope for partnership in the provision of compute resource be
widened to include engagement with research intensive universities to identify synergies or
alternative models for HPC provision. To include analysis of application areas that may be efficiently
migrated to appropriate “Cloud” resources;
• Current investments in software development be reviewed to identify successful models of support
with a view to treatment of scientific software as a Research Infrastructure. This to be taken forward
by Researchers and Research Funders working in partnership;
• Development of authentication and security be continued and improved by JISC to ensure that
transparent use and benefits of a Nationally shared e-infrastructure can become a reality;
Next Steps
The Advisory Group has suggested an outline governance structure for progression in the section 1.4 of
this paper and recommends that BIS take responsibility for its implementation and development of the
overall strategy for an integrated e-infrastructure. Within this it is recommended that:
• RCUK lead the development of the Compute Infrastructure, Data and Software Development strands;
• JISC lead on the development of the strands for Research Networks and Authentication & Security;
• UK Higher Education Funding Councils lead on the development of the People and Skills strands;
These assignments are recommended on the basis of the current and future expected remits and
responsibilities of these organisations.
4
1.0
Summary and recommendations
1.1
Introduction
This report sets out the findings and recommendations of the e-Infrastructure Advisory Group
commissioned and chaired by BIS into the activities and recommendations put forward by the 2009 RCUK
International Review of e-Science and the 2010 report “Delivering the UK’s e-Infrastructure for Research
and Innovation”. The ‘Advisory Group’s terms of reference and its membership are at Annexe A to this
paper.
The advisory group was composed of representatives from UK Research Funders (RCUK and Wellcome
Trust), Higher Education Funding Councils (DELNI, HEFCE and SFC) and Universities UK (Annexe A)
In reviewing the recommendations of both the 2009 International Review and 2010’s e-Infrastructure
report the ‘Group determined that its principal focus should be to provide a framework within which to
advance the 2010 e-Infrastructure report’s recommendation
1that:
“The UK’s Research and Innovation e-infrastructure needs to be led and driven to deliver a UK wide vision
for research e-infrastructure, embedded in the international context essential to today’s research
challenges. The leadership must provide a multi-year perspective, identify best practice, coordinate
stakeholder investment and champion relevant and fit for purpose cross-disciplinary standards to
facilitate coordination”
To aid in the development of the framework, a consultation exercise was undertaken to provide a
baseline of the current level of infrastructure provision at the local and national level, an understanding
of the processes by which future provision is currently determined and to highlight infrastructure
elements where support and development is required to enable sustainability.
As well as canvassing UK research councils, the consultation targeted UK HEIs and selected end-user
organisations (Annex B). These organisations were chosen to provide a range of perspectives to take into
account geographical issues (e.g. consortia, regional funding etc), institutional research focuses and
degree of utilisation and experience of e-infrastructure; the invitations to participate in the consultation
are reproduced at Annexe C. A summary of the identified trends across all organisations is given at
Section Y.
1.2
Principal findings
The consultation exercise and review by the Advisory Group highlighted the following key points:
• The need for a dedicated computer network for researchers requiring the highest bandwidth and
lowest latencies is a critical element of the UK e-Infrastructure. This was evidenced by the almost
unanimous support for Super JANET 6 or a network with equivalent functionality in the responses
received from HEIs;
• The Group noted that many of the responses highlighted a strong connection between software,
people & skills. Although the Group recognised and appreciated the linkages and need for
development in these areas, it was felt that they need to be separated in to ensure appropriate
treatment. The Group also noted the challenges and potential sensitivities surrounding the
development of a skills strand given current policy relating to Higher Education and student choice.
1
5
HEFCE would liaise with the other HE funding bodies and, in England, consider the supply of
graduates and skills though the HE system as part of the review of Teaching Funding.
• Dealing with data is an area where further coordination and scoping needs to occur in order to both
understand the variation of requirement and the relative importance of data across the research
base. This will be necessary to generate an appropriate and tractable response at both a local and
national level.
• In recent years, there had been a high level of bottom up activity within HEIs and a lot of hardware
(especially in the area of HPC) had been funded through this mode. Although the Group saw these
significant local investments as encouraging, there appeared to be an emergent need for further
coordination to ensure that appropriate growth could be sustained and that future investments are
“joined-up” to regional and national strategic drivers. Leading on from this there is a clear and
developing need for sharing of hardware resources, especially as the scale of requirement stretches
beyond the ability of individual organisations to finance, procure and provide the necessary support
infrastructures. As well as maintaining the competitiveness of the UK Research Base’ equipment
sharing would deliver concomitant benefits in increased accessibility, robustness with respect to
network security, integrity, system resilience and efficiency savings in excess of current
arrangements.
• The Group also identified from the responses that there was is a clear and ongoing need for provision
of HPC systems at the National level with an emerging trend toward provision of mid-range systems
on a regional/research cluster basis. However the need for dedicated systems should be reviewed
against emerging cloud provision targeted at traditional HPC application areas;
• Software development at all levels of the software stack is a clear area for further action in order to
fully utilise current investments and to put the UK in a position to exploit next generation computing
architectures (e.g. many-core and GPU architectures) and paradigms (e.g. cloud computing and data
intensive computing);
• The Group recommends that future strategy development in the area of e-infrastructure needs to be
taken forward by a matrix of stakeholders rather than any single council, institution or sector taking
on both leadership and development.
• With regard to Research Funder or direct Government funding of infrastructure; The Group felt that
this should be reserved for projects that would deliver a National capability (i.e. at a level that could
not be provided without strategic investment) or would provide benefit to more than one research
institution.
1.3
Recommendations
The e-Infrastructure Advisory Group has made the following recommendations for the strategy
development and progression of activities:
The group recommends that future strategy development and activities need to be taken forward via a
stranded approach. This approach recognises the fact that each element of the overall e-infrastructure
has its own characteristics and timescales with respect to methodological and technological
development. An all-encompassing strategy would not have the resolution to adequately capture the
level of detail or be able to respond on an appropriate timescale to opportunities these developments
may bring. The Advisory Group identified six strands that were to be taken forward, namely: Research
Networks, People & Skills, Data, Compute, Software Development and Security & Authentication.
Within these priorities the IAG were agreed and recommended that the continuing provision of a UK
network infrastructure, capable of addressing the future needs and aspirations of research intensive
6
institutions was a critically important element in the provision of a competitive e-infrastructure for the
UK Research Base and should be treated as a priority. With respect to the other identified strands, the
IAG agreed that a coordinated approach in each of these areas would deliver a suite of enabling tools and
services that would deliver a significant net benefit to the Research Base enhancing both scientific
capability and the potential for international collaboration. The advisory group made the following
comments and recommendations on each strand of activity:
• Networks – The Group felt that this was of the highest priority and that a strategy should be
developed for the delivery of a “Research Intensive Network Infrastructure” aligned to, and driven
by, the needs of the UK’s most research-intensive universities and institutes. An organisation with a
role such as that currently provided by JISC would be best placed to take forward the responsibility of
leading this strand given its background with the current JANET network and experience of
undertaking other initiatives of this kind. Strategy development should be undertaken with a full
appreciation and understanding of user needs. As such:
o
Engagement with UK Research Funders (Higher Education Funding Councils, Research Councils
and well as Charities) and Research Leaders (individuals and institutions) is strongly
recommended and will be crucial in delivery of this strategy.
• People & Skills – This was another area where the group felt that there needed to be coordinated
action due to the decreasing flow of highly skilled software engineers into research. The Group also
felt that consideration should be given to ways in which the level of general e-literacy in software
development could be increased within future postgraduate cohorts to increase the potential for
spillover into computationally intensive research and provide individuals with a firm basis on which
to continue their own careers. This need was highlighted in many of the responses received to the
consultation. However, as stated above the group felt that there were sensitivities with respect to
the evolving situation on student fees and any strategy in this area would have to be carefully shaped
to address this. As such, the Advisory Group recommends that:
o
Responsibility for leading this strand lie with the Higher Education Funding Councils and be
taken forward with input from Research Councils and Learned Societies to ensure alignment
with research priorities and skills shortage areas.
This broad range of engagement recognises the very wide spectrum of needs and abilities within and
across research domains and that there appears to be little or no correlation (nor anti-correlation)
between academic seniority and degree of e-literacy. In order to respond to this challenge the
Advisory Group were agreed that the response will need to institute a broad series of actions to
reflect the variation in skills levels across domains and that training would be required at all levels (of
ability and seniority). It is unlikely that this will be amenable to a one size fits all approach.
• Data – Many of the responses to the consultation focussed on common themes regarding data such
as its production, retention, curation and strategies for dealing with these issues and the Group
agreed these were areas where a coordinated approach would be necessary. In addition:
o
The Group recommended that this list should be expanded to include the rapidly increasing
amounts of descriptive metadata associated with the underlying data. As well as these
considerations, the underlying standards and interoperability issues need to be addressed in
order to ensure future relevance and usability of data;
o
The Group also recommended that existing models, methodologies and infrastructures
developed by research councils and charities should be reviewed for applicability to other
research areas, exploiting the lessons learned, rather than starting from the ground up.
7
• It was also recognised that data was an area where there would be considerable variation in needs
and stakeholders.
o
The advisory group recommended that a consortium of Research Funders (Research Councils
and Charities) lead on coordinating this strand to cover the breadth of need across the research
base.
• The exact composition and chairing of this group will need to be decided and will need to take into
account linkages with HEIs and JISC. Particular issues stemmed from the very rapid increase in rates
of data generation in a variety of fields, how these quantities of data might require (and permit) new
kinds of science, and how the availability of data played to the Open Access agenda while raising
serious legal and ethical issues and (for individuals) issues of identifiability even in nominally
anonymised data.
• Compute – The group identified that future compute provision was an area where there was a clear
and pressing need for equipment sharing given the increasing scale and complexity of procuring,
supporting and provisioning for HPC systems. In addition, it was recognised that the current model of
provision at the mid-range, where each research-intensive institution invested in its own HPC
hardware could falter under the increasing need for efficiencies outlined under the Wakeham review
and its effects on capital investment and indirect cost elements of fEC. It was recommended that
o
Research Councils should retain their coordinating role for National investments in High
Performance Computing.
o
The scope should be widened to include engagement with research-intensive universities in
order to determine synergies or alternative models for HPC provision.
o
Further analysis of the potential use of Cloud resources should be conducted to determine areas
of the Research Base where Cloud would provide a technologically viable alternative to local
cluster systems and that capacity planning be undertaken to determine the potential usage and
economic viability of Cloud as a replacement for the capabilities currently provided locally.
o
Any future road mapping exercise take into account energy efficiency and green provisioning
(e.g. shared data-centres, systems and increased infrastructure efficiency) issues alongside
traditional metrics for investment appraisal.
• Software – As noted earlier, software development at all levels of the software stack is a clear area
for further action in order to fully utilise current investments and to put the UK in a position to
exploit next generation computing architectures (e.g. many-core and GPU architectures) and
paradigms (e.g. cloud computing and data intensive computing) across application domains. In order
to ensure that this can take place, future developments in this strand will need to be informed and
directed by the UK Research Base to ensure a full understanding of, and to develop an appropriate
response to, the challenges across research domains. To facilitate this it is recommended that:
o
Researchers and Research Funders work closely with each other to make the appropriate
choices e.g. in the choice of commercial software, new community generated or re-engineering
of existing application codes to support a research area.
Initially this may take the form of the research councils application-to-architecture matching
activities which has the ultimate aim of creating tool-kits and best practice to provide informed
investment and usage decisions to be made on such aspects as compiler and architecture type. The
group also recommended that:
8
o
Current investments in software development should be reviewed with a view to developing
models of support for software as a sustained infrastructure in the long term, as opposed to
being supported by significant one off investments;
• Authentication and Security – The group recommended that
o
The development of robust authentication and security systems to enable trusted users to utilise
shared infrastructures in an open manner needed to be treated as a clear priority if the benefits
of shared infrastructure and collaboration were to be realised.
As the area of interest would be related to the resources shared over JANET or a future Research
Network it was recommended that responsibility for developing this strand should again lie with JISC and
take into account expert advice from the e-science community in future development to ensure any
framework is fit for purpose.
The Advisory Group feels that the strands identified above form a clear programme of strategic areas for
activity and provide a framework upon which an integrated e-infrastructure strategy can be built. It is
envisaged that the development of the scope, strategy, deliverables and phasing of those deliverables
will be developed within each of the strands. The group recognises that effective coordination and the
integrated phasing of activities will be a highly important factor in the successful deployment of the
framework. Given the complexity involved in the above activities and the stakeholder relationships it is
recommended that the leadership role lie with BIS.
1.4
Next Steps
The advisory group recognised that as well as the need for agility, each identified strand would need to
capture the research texture and user requirements within each strand. This is necessary to ensure that
sufficient breadth and appropriate concentration of provision for each of the elements is taken into
account. In order to progress this it is suggested that:
• Each strand be assigned to and led by a body with the appropriate research oversight (e.g. Research
Councils) or responsibility for technical provision (e.g. JISC) ;
• Policy and strategy development within a strand be informed and driven by a Research Strategy
and/or Technical Advisory Stream(s) which will incorporate relevant knowledge, research leadership
and technical expertise to ensure relevance and benefit of proposed developments/activities to the
research base and ensure technical feasibility. It is anticipated that this membership would be drawn
largely from the UK research Base and include representation from the lead body for the strand to
keep informed from a funder/provider perspective ;
• To ensure tensioning and communication between the infrastructure strands, an e-Infrastructure
Board/Forum would be set up with senior representation from each strand along with
representatives from the lead bodies for each strand. This would be the highest-level body in this
structure and be responsible for the development of an integrated roadmap of of activities for
progression in the short, medium and longer term. The Board would also commission research
funders to formulate the response. It is suggested that this board/forum would initially be chaired by
BIS;
The Advisory Group suggests the above as an outline structure only and is agreed that the final leadership
and governance models for development of the framework be taken forward by BIS in partnership with
Research Funders and other key community stakeholders.
The Advisory Group recommends that RCUK take the lead on Compute, Data and Software
Development strands with JISC taking on responsibility for Networks and Authentication and Security.
It also recommends that the People and Skills agenda be taken forward by the Higher Education
Funding Councils, reflecting the need for action at the Undergraduate as well as Post Graduate level.
9
2.0
Responses received to the e-Infrastructure Advisory Group consultation
exercise – January 2011
2.1
Background to the consultation
2.1.1
As part of its evidence gathering, the Advisory Group agreed that a short consultation be
conducted to gain perspectives from HEIs, Research Councils and end-user organisation
organisations with a perceived dependence on e-infrastructure in their business.
2.1.2
The UK academic research institutions were chosen to give an appropriate mix of coverage to
account for geographical issues (e.g. consortia, regional funding etc), organisational research
remit and degree of utilisation and experience in using e-infrastructure.
2.1.3
The following summarises the responses from each stakeholder group under four key headings
to enable cross comparison: “Compute”, “Data”, “Networks”, and “People, skills and software”
2.1.4
A summary of the emergent findings from the consultation is presented at section 4.
2.2
Research Councils
2.2.1
Networks
Although the drivers for Research Councils with respect to networks are different, all are agreed that
continued evolution in terms of the bandwidth and support provided by JANET is key to research
delivery, given the data issues outlined above and the move to more collaborative, multi-site modes of
working. Of greater importance to researchers in the EPSRC space, rather than the ability to accumulate
large, constantly accessed, highly available data sets is the ability to transport results of completed
simulations to their institutions for post processing and visualisation activities. The links into European
network infrastructures such as GEANT and beyond are also crucial especially for those research areas
that are involved in major international collaborations.
2.2.2
People, Skills and Software
Across all councils the need for well developed, robust and readily usable software is seen as key to
science delivery. In addition, a steady stream of people with the skills necessary to harness current and
future infrastructure is also recognised. However, there are very few explicit, examples of current support
structures or delivery mechanisms for these elements in the responses provided. Notable current
examples of activity are EPSRC reshaping of its balance of investment in e-infrastructure to include
software development as a key strand of its plans. By funding short, medium and long-term development
activities to enable maximum return from current research platforms with a view to future research
needs. Provision has also been made available for short courses at the PhD and Post Doctoral level. ESRC
also provides embedded support to increase supply of skilled individuals through the National Centre for
Research Methods and as part of their strategy for supporting postgraduate training at doctoral training
centres.
2.2.3
Data
Data in terms of its creation, transmission, curation and use is an area that has come rapidly to
prominence for the research councils. Reasons for this are that advances in experimental techniques (e.g.
next generation DNA sequencing), experimental complexity (e.g. increase detector resolution and
fidelity) and experimental sophistication/scale (e.g. geospatial sensor and data sets, model data) have
become increasingly pervasive in day-to-day research as opposed to the preserve of a few key groups.
The so called “data deluge” is recognised by BBSRC, NERC, STFC, MRC and ESRC and feeds directly into
10
consideration for future provision in terms of the types of service models that could support the growing
needs for computing hardware, software and networking capabilities for data driven science within those
councils which is fuelling interest in cloud type system solutions . EPSRC recognises the need for a
longer-term plan for research data from simulation and data. However, consultation with the EPSRC community
revealed this to be of secondary importance compared to availability of internationally competitive
compute capabilities.
2.2.4
Compute – All councils recognise the need for appropriate compute resource across their remits.
The level and types of resources currently provided are in line with the research challenges and
bottlenecks that each faces; more resolution on individual needs is provided in the individual
responses.
• High Performance Computing - In the case of EPSRC, STFC and NERC there is a clear and
continuing need for High Performance Computing (mid to high Tera- through to Petascale) to
tackle close-coupled problems that are not currently amenable to solution via a distributed
or cloud resource. These councils are looking to continued National (mid-term) and
International (long-term) collaborations/partnerships to enable continued competitiveness
in dependent fields such as turbulence simulation, climate change, local or long term
weather prediction, and quantum chromo-dynamics. BBSRC has previously seen a need for
this type of compute provision (HPCx) although it has seen its increased investment in
HECToR, which is essentially unused at the current time. Possible contributory factors for this
are stated in their response. ESRC and MRC do not currently see the lack of HPC in their
portfolio as a constraint in delivering their plans although ESRC does anticipate increasing
computational demand on the 2014 timescale.
• Shared/distributed, cloud and ‘novel’ resources – At a National level the Research Councils
do not collectively fund a distributed compute resource although at an individual research
council level EPSRC has funded the National Grid Service as part of the core e-Science
initiative, although direct funding for this will be discontinued post March 2011. Individual
councils do operate shared infrastructure although this is generally to exploit investments
made in experimental facilities (e.g. STFC investment in the e-Science Centre at RAL
supporting ISIS, DLS and CLF) and increasingly for transfer and access to large data sets in the
case of STFC, ESRC, NERC and rather than for use as a distributed/cloud type compute
resource. With respect to cloud computing, although this is a new paradigm (arguably based
upon previous e-research) it has gained a very high degree of interest from all Councils.
Principally this is in areas where data analysis is a key consideration i.e. where high levels of
storage (Petascale and beyond) and an elastic compute capability are needed in close
proximity for analysis and interpretation. Councils that have stated a clear interest in this
area are MRC, BBSRC, ESRC and NERC. EPSRC is also looking at the opportunities for access
to compute via the Cloud and, working with JISC, has recently funded (start date Feb 2011) a
small number of projects to evaluate their use. In the area of novel resources/architectures
there is very little take up of GPU, FPGA and other resources of these architectures due to
the currently high barrier to usage (through software coding complexity) by the average
user. EPSRC has invested in a small test bed facility associated with the HECToR service to
enable potential user to have access to a well-supported system and a high standard of
training. This will be available from March 2011 to all HECToR users. In addition, the Council
has also undertaken an extensive Architecture Comparison Exercise to develop a suite of
knowledge and tools to guide future procurements to ensure the best match possible
between user code needs and underlying hardware architecture.
Strategy for provision – EPSRC and ESRC both have clear, council initiated and led strategies for compute
provision constructed with community input and reflecting the multi-year research strategies of those
councils; a top down approach would be an appropriate description. Whilst all other councils recognise
the need for compute provision as part of their strategies, fulfilment of need is directed in a bottom up
11
manner from the community, with provision on the following basis: project-by-project (STFC, NERC,
BBSRC, MRC); block community resource (STFC’s HPC provision for some communities with LFCF funding);
involvement in external partnerships (NERC Met Office, EPSRC PRACE). Both mechanisms have pro’s and
con’s. However there is a degree of uncertainty in the sustainability of a multi-stranded, multi-funder
approach as used by STFC for HPC provision to parts of its community, this is recognised in the Council’s
response.
2.3
Higher Education Institutions
2.3.1
Networks
All institutions responding to the consultation consider that their local networking capacity available to
them is adequate for current research application with one transitioning to ten Gbit/s seeming to be the
norm across the responses. The majority also have this as an ongoing strand in their development
strategies. The universities are unanimous in their support for the continuation and further development
of the JANET (increased bandwidth and decreased latency) network infrastructure and the criticality of its
continued presence to them. Southampton and UCL make specific reference to the fact that it is currently
quicker to courier 1TB of data on a portable drive. Additionally, continued investment is key to enabling
any future uptake of Cloud systems and that continued investment should be centrally provided.
Following on from physical infrastructure, the ability for researchers to seamlessly use resources whilst
visiting host institutions via single sign on authentication is a continuing priority, with current eduroam
and Shibboleth systems being widely seen as a success.
2.3.2
People, Skills and Software
In the academic context these are inextricably linked; as stated in the Edinburgh response: “ the major
directions of change for computationally-enabled science and commerce are toward extreme scale: both
in terms of analysing vast and disparate data sets and a further thousand fold increase in computer
speed. Both will involve technology development, but also revolutions in algorithms, software and
research methods.” This statement is echoed throughout many of the responses, as there is a concern
that the flow of well-trained, experienced and specialised software developers into the research area is
decreasing at a time of increasing need. Although there are pockets of expertise such as the Edinburgh
Parallel Computing Centre, the Hartree Centre/CSED at Daresbury and some locally available expertise it
is felt this will not be sufficient to sustain future development of science applications. Although the
Bristol response makes reference to up-skilling or re-skilling of staff such as librarians this may not be
enough. As such there is also a need for increasing the general “e-literacy” of postgraduate researchers
across the Research Council remits in general computer science and software engineering skills to
increase the possibility of spill-over into the creation of highly skilled scientific software engineers of the
future and to ensure postgraduates are armed with the skills necessary to further their careers.
2.3.3
Data
In common with the Research Councils, the universities recognise that research has become very data
intensive and that the demand for storage is growing at an accelerated rate. Both councils and
universities recognise the same issues surrounding data are not solely related to the provision of storage
but the enabling framework that gives data its usability and value also have to be considered from the
ground up i.e. retention, management, accessibility, security and ownership. As examples of this UCL,
Southampton, Bristol are each investing in Petascale research data centres to supply well supported data
facilities in their institutions. However in common with other universities they are increasingly concerned
about the ongoing cost of data curation, especially given the requirement of funders (both charities and
research councils) to store this data for timescales that significantly exceed that of the grant or award
that first generated them. As such, universities are increasingly looking for research councils to set data
management policies for research data produced from grants that are funded by them. A point related to
“strategy for provision” in the previous section is that universities are again looking to regional alliances
12
and partnerships in order to achieve economies of scale in co-locating storage and the support
infrastructures needed for longer- term data management.
2.3.4
Compute
• High Performance Computing - Many of the universities responding to the consultation have a long
and successful history of using HPC systems and this is evident from the responses they have
submitted. At one time this capability was mainly facilitated through departmental, “Commercial of
the Shelf Technology” (COTS) type clusters and access to National facilities such as HPCx. The SRIF 3
funding round (2004) facilitated an explosion in the provision of HPC at the local level with some of
the responding institutions making significant investment in HPC that were individually comparable
to the then operating UK National and European systems. Cambridge, Cardiff, Bristol, Southampton
and UCL are notable examples procuring systems firmly between the outgoing 12TF HPCx system and
incoming HECToR Phase 1system at 60TF. These systems have been widely adopted within these
institutions with hundreds of registered users routinely using them to conduct their research and
thus support the “international excellence” strand of many university’s strategies. These systems are
now seen as being a key element in the research infrastructure of all responding universities and
perceived as an attractor to international academic talent. All universities responding have some
form of centralised HPC. Either this is in addition to departmental hardware (as above) or as a
consolidation of departmental cluster systems to counteract the expense of duplicating multiple
systems and support costs across their estates thus increasing sustainability of provision (Manchester
is a notable example of this move). As such, the universities see a clear financial as well as scientific
case for their sustained investment in HPC. In addition to supporting institutional strategies, the
systems provide a much easier transition from local to the much higher capability investments at the
National (HECToR Phase 2 and above) and international level (PRACE, TeraGrid) thus increasing the
ambition and collaborative reach of these organisations.
• Shared/distributed, cloud and ‘novel’ resources –With respect to shared and distributed resources
all responding universities have made use of deploying CONDOR services within their institutions as a
relatively low cost way of harnessing spare processing capabilities of linked PCs spread across their
estates. Although this provides a cost effective way of creating a large and useful computational
resource that would otherwise go unused, the limitations are recognised. However, the resource
does provide a step toward meeting researchers’ needs before the transition to institutional HPC
systems. As with the Research Councils, Cloud computing has created a high degree of interest in the
respondents’ institutions and all are very much aware of the possibilities that could be opened
through the Cloud model. Although the possibilities and potential of Cloud are well known to
respondents, some are wary of large-scale implementation or replacement of current services.
Barriers to entry are the seen as the ongoing service costs associated with provision by a third party,
the contractual arrangements necessary and, to some degree, the lack of control in what is actually
provided over time. This is broadly in line with the JISC cloud computing for research report. This
being noted, the Cloud model of provision is still seen as highly valid and a watching brief is being
maintained and individual researchers are being directed toward Cloud where workloads are seen as
compatible with that model. With respect to novel architectures, a number of institutions have
introduced small-scale evaluation/production systems based on GPGPU technology attracted by the
high performance vs. cost ratio. However, it is unclear from the responses what the utilisation of
these systems is like, what application areas are using them and whether these institutions intend to
scale up their commitment. From discussions with representatives of HPC-SIG, it seems unlikely there
will be mass uptake of such technology for some time due to the current complexity of porting legacy
applications to such systems because of the current lack of a mature programming environment. As
such, uptake will be restricted, as at the national level, to those with the technical programming staff
needed to overcome this barrier, or where there is a clear and specific priority to be addressed that
requires the investment.
13
Strategy for Provision – The Universities responding to the consultation have comprehensive strategies
for the provision of research computing and infrastructure within their own institutions. All institutions
have a clear appreciation of the direction of travel for e-infrastructure within their organisations and this
is informed by direct interaction with users and analysis of their current and future needs. Sustainability
is a key issue in these strategies and although modest developments can be incorporated through
reinvestment of fEC income; additional funding sources to bolster this investment should be made
available. Aligned with this is the universities own realisation that their individual requirements have
grown to such a scale that significant further development and funding cannot be undertaken in the
context of a single organisation due to physical infrastructure restrictions (power, cooling and space). As
a result these organisations are now starting to pursue partnership agreements (UCL, Oxford,
Southampton) or are becoming more open to regional alliances for all aspects of e-infrastructure (Bristol,
Cardiff, Manchester). However, the responses tend to indicate that organisations would need a steer or
framework from an interested, though independent third party such as BIS to provide a boundary
framework and enabling investment for these activities to take place.
Examples of the responses received to the consultation from the University of Bristol and Cardiff
University are included at Annexe C.
2.4
Wellcome Trust – Sanger Institute
The response received from the Sanger Institute clearly articulates the role that e-Infrastructure has to
play in the continuing success of the ‘Institute and the wider future applications of genomics in delivery
of healthcare. The WTSI is driven by the production, analysis and storage of data from sequencing
technologies. As a result, it has been drawn out here to distinguish its requirements and drivers from
those of HEIs with wider applications drivers.
• Networks – Although no specific networking environment is stated in the report, it is assumed that
as for HEIs there is a growing requirement for the ability to move large data sets in good time. In
addition, the need for robust, scalable and secure authentication systems is stated as being a priority,
especially as data becomes used increasingly used in the clinical realm.
• People, Skills and Software – In common with the research councils and university responses, these
issues have become a high priority for attention. With respect to people, the WTSI states that
recruitment of sufficient numbers of skilled staff is challenging and that this trend is likely to continue
unless further investment is put into training. On the issue of software, as with high performance
computing the underlying hardware is advancing at such a pace that is requiring continual
development of software in order to keep pace.
• Data – The current and growing data requirements for keeping apace with Next Generation
Sequencing (NGS) technologies used by the WTSI is considerable. The WTSI currently has twelve
Petabytes of storage with the associated EBI having similar amounts of raw storage space available to
it. To give some idea of the scale of this investment the HECToR National HPC service only has a
capacity of just over one Petabyte associated with the system.
• Compute - The Institute has found that the centralised rather than distributed facilities have proved
to be most cost effect in taking forward its research and currently has its own large server farm that
has undergone significant expansion in the last 12 months. This is also supplemented by other
virtualised systems and cloud computing resource to supplement these in house capabilities on an as
needs basis. In terms of the overall IT requirement, this is evaluated on weekly basis by a standing
committee.
14
2.5
External Perspectives
•
IBM, Microsoft and Hewlett Packard were asked to provide an input to the exercise from an external
perspective and as organisations centred on providing infrastructural IT solutions (such as cloud
offerings). The full responses from these organisations are given in Annexe E, raise interesting
observations on e-infrastructure, and support many of the trends identified through the RC, HEI and
WTSI responses
3.0
European and International Activity in e-infrastructures
3.1
At the 7
thDecember meeting, the group requested that an overview of European and
international activity be included with this report to give a context to this review:
3.1.1
European Commission Activity
3.1.2
Funding for e-Infrastructure from the European Commission is delivered by the “Information and
Society” directorate. The major infrastructures all appear on the ESFRI
2roadmap for large-scale
research infrastructures coordinated by the Research Directorate.
3.1.3
The e-Infrastructures activity, as a part of the Research Infrastructures programme, focuses on
ICT-based infrastructures and services that cut across a broad range of user disciplines. It aims at
empowering researchers with an easy and controlled online access to facilities, resources and
collaboration tools, bringing to them the power of ICT for computing, connectivity, storage and
instrumentation. This allows for instant access to data and remote instruments, "in silico"
experimentation, as well as the setup of virtual research communities (i.e. research
collaborations formed across geographical, disciplinary and organisational boundaries).
3.1.4
e-Infrastructures foster the emergence of e-Science, i.e. new working methods based on the
shared use of ICT tools and resources across different disciplines and technology domains.
Furthermore, e-Infrastructures enable the circulation of knowledge in Europe online and
therefore constitute an essential building block for the European Research Area (ERA).
3.1.5
The Communication from the European Commission on ICT Infrastructures for e-Science (COM
(2009)) puts in a context the relation between modern science and ICT-based infrastructures and
presents a renewed strategy for achieving leadership in Science, developing world-class
e-Infrastructures and exploiting their innovation potential.
3.1.6
The Digital Agenda for Europe initiative is one of the seven flagships initiatives of the Europe
2020 Strategy for smart, sustainable and inclusive growth. It recommends sufficient financial
support to joint ICT research infrastructures and innovation clusters, further development of
e-Infrastructures to be develop and the establishment of an EU strategy for "cloud computing",
notably for government and science.
3.1.7
Under FP7, the e-Infrastructures activity is part of the Research Infrastructures programme,
funded under the FP7 'Capacities' Specific Programme. It focuses on the further development
and evolution of the high-capacity and high-performance communication network (GÉANT),
distributed computing infrastructures (grids and clouds), supercomputer infrastructures,
simulation software, scientific data infrastructures, e-Science services as well as on the
adoption of e-Infrastructures by user communities.
2
15
3.1.8
UK engagement in the major EU activities includes:
• The JANET network is the UK link in to GEANT
3; it is recognised that the benefits of the UK’s early
involvement in developing JANET were very great, and it is envisioned the benefits of having a
single means of connecting JANET to GEANT will lead to similar benefits in scientific collaboration
across Europe.
• The National Grid service with support from JISC acts as the UK national grid initiative partner in
the European Grid Initiative
4, now ‘Infrastructure (EGI) – a new organisation, formally created in
Feb 2010, is based in the Netherlands and was established with grant support from the EC that
aims to co-ordinate the activities of the national grid infrastructures. The EGI also incorporates
activities formerly supported by the Commission under the EGEE project, which, in the UK, was
important for enabling particle physics to develop the necessary ICT for transporting, storing and
analysing data in readiness for the LHC.
• The Partnership for Advanced Computing (PRACE)
5was formed in 2010 with grant support from
the EC. EPSRC on behalf of research councils is one of the founding members of the Association.
PRACE aims to coordinate the European investment in leading large-scale supercomputing
systems and provide access to these systems available to researchers across Europe. The EC see
this route as providing opportunities for industrial competitiveness as well as academic
excellence within Europe. In the initial phase of PRACE the UK participates as a “General” rather
than “Hosting” Partner. This decision to enter into PRACE at a lower level than comparable
European countries (Germany, France) was mainly due to the financial climate at the time the
decision to enter PRACE was taken, and the significant risk posed to continued access to HECToR
for UK based researchers at the full Hosting Partner level (Circa £100M over the initial 5y phase).
• The EU has funded an 18-month project to develop a European road map for exascale software
development (called EESI)
6such that Europe can play a strong role in a developing international
initiative in this area. The activity is led by EDF in France with EPSRC as a member of the
consortium on behalf of researchers at the University of Edinburgh, STFC Daresbury and
Rutherford laboratories and Numerical Algorithms Group Ltd. Individual projects are also funded
in this area with UK universities as partners.
• The EU has provided preparatory phase funding to the European Life Science Infrastructure for
Biological Information (ELIXIR)
7project looking at developing a biological sciences data
infrastructure. BBSRC are playing a leading role in this infrastructure.
3