Edinburgh Compute & Data Facility - December 2014
ECDF Infrastructure Refresh - Requirements Consultation Document
Introduction
In order to sustain the University’s central research data and computing service, Information Services will be investing in a compute service refresh this financial year. This document has been produced for circulation throughout the University's research community to help inform the requirements consultation which will take place in January 2015. Detail on providing feedback is given at the end of this document. Overview
The refresh will deliver:
A refresh of the (ECDF) Eddie compute cluster service.
A path for migration of existing parallel computational workloads, such as MPI, from the current service to the national HPC service, Archer.
A computational cloud service.
Infrastructure improvements to further support Data Science, such as an implementation of the Hadoop framework.
An extendable service, to allow for direct researcher capital investment. This complements the substantial investment in data storage and management services that has already been made in 2013 and 2014.
Roadmap
Consultation: January 2015.
Procurement commences: February 2015. Equipment delivered and installed: May 2015.
Equipment acceptance testing & validation: June/July 2015. Eddie refresh available: August 2015.
Cloud service early adopters: from October 2015.
The funding allocated is for the University's 2014-2015 financial year, and thus the procurement aspect of the project must be completed before 31st July 2015.
Deliverables
Eddie Refresh
The current incarnation of Eddie, the University's research computing cluster, was purchased in 2010 and 2011. The refresh of the service will deliver:
Performance improvements in the underlying infrastructure, such as a more modern generation of processor architecture; a faster network backbone for improved and more efficient migration and processing of large research data sets.
An expandable infrastructure core, allowing for easier integration of custom hardware and bespoke research solutions.
A number of options relating to the memory-per-core available to better accommodate the diverse memory requirements across the University.
High-performance disk storage tightly coupled with the compute cluster to store and render data on request.
Improved resource management and prioritisation of resource on the compute cluster through the gridengine job scheduler.
Details of the specific technical and configuration improvements the upgrade will provide can be found in Appendix A.
Parallel computational workloads
The intention is to cease provision of low-latency compute node interconnects with this refresh of Eddie. Tasks that require multi-node low-latency communication will be transitioned to the national HPC service, Archer, on which the University has guaranteed institutional capacity. This transition will be confirmed during the service consultation process.
Cloud services
An OpenStack cloud service will be included in the project, supported by the
equipment supplier. This computational cloud will open up the infrastructure to more researchers and to be accessed by different use-cases.
In traditional compute cluster services, a researcher’s specific software environment needs to be integrated into the cluster environment, e.g. porting a new software package onto an existing operating system and compiler environment. With a computational cloud the researcher’s software environment can be encapsulated within a virtual machine in which they have complete control of the environment. This makes it simpler for a researcher to manage their own analysis environment and it allows for pre-packaged environments to be easily run on common infrastructure. For example, Biolinux is a popular Linux distribution which already has a number of common Bioinformatics and Biological data processing tools available. This refresh will deliver a service which can deploy such bespoke environments via a
computational cloud, which will enhance usability and open up the service to more researchers.
We aim for this service to also allow for user-driven self-provisioning of compute environments, with a simple user interface to allow for rapid deployment.
Supporting Data Science
The Eddie service has always had a strong focus on data services, and this will be further developed with the move to a "Data Centric Computing" service model. This will integrate the Eddie storage and the University's DataStore service, allowing for easy access to data held on DataStore from the Eddie service, and vice versa. Eddie's
local file systems will be tuned for performance, rather than capacity. They will effectively act as a cache of data held on the DataStore service, with the system automatically moving data to Eddie on demand, and copying all new data from Eddie back to the DataStore. This will offer many advantages, including the ability to securely share data between research groups and the ability to make data available on multiple different systems.
The service refresh will also look to deploy the appropriate architectures for the emerging needs of the University’s Data Science community. There is existing use of the Hadoop framework for data analysis within the institution, and the most
appropriate way to sustain this will be considered. The consultation process will also investigate the infrastructure requirements for newer technologies such as Apache Spark and for large scale structured data.
Extension capability
Core to the new research computing infrastructure will be the ability for research groups to add their own purchased equipment to the service. To facilitate this, vacant serviced rack space will be provisioned, pre-cabled with power and ethernet
connectivity. The cluster provisioning environment will be given a significant overhaul, moving towards a virtualised provisioning environment to maximise the flexibility in hardware compatibility and deployment modes. This will allow for rapid commissioning of new equipment - making each new node available in the cloud environment on the same day it is removed from its packaging.
A simple service costing model will be developed for this type of provisioning to make this attractive for Principal Investigators who hold capital only grant funding.
A flexible and capable provisioning system will be included in the service, allowing new pieces of equipment to be rapidly deployed and installed to a working state, without overly onerous processes, licenses or configuration being required.
Consultation and feedback
In January 2015 we will engage in a consultation with the University's research community to capture specific requirements for this refresh. We intend for this document to be widely circulated and hope that it will act as input for the consultation process.
We intend to arrange an open meeting within HSS, CMVM and each school in CSE within the first three weeks of January, and complete the consultation process by the end of January.
Please provide feedback and thoughts to either your nominated school/college contact (please see Appendix B), or directly to the ECDF through either Nicholas Moir or Kenton D'Mellow, via the service consultation mailing list:
Appendix A
Technical detail Processors
The current Eddie nodes use Intel Westmere processors, with DDR3 memory. The Intel processors available on the market now include Ivy Bridge (with DDR3 memory) and Haswell (with DDR4 memory). For large memory systems, DDR3 currently offers significantly more capacity per pound than DDR4, and so we can expect that it may make economic sense to use the Ivy Bridge processors in such systems.
Ivy Bridge processors offer approximately 2.5x the performance of Westmere processors, and Haswell offers approximately 3x the performance of Westmere processors, so use of either technology will represent a significant boost in node performance.
Hyperthreading will be enabled on the majority of serial workload nodes, allowing more slots per node to be available giving better utilisation of CPU resources.
Memory
Eddie currently provides 2GB of memory per processor core. Over the lifetime of Eddie, this amount has become insufficient for some use cases which emerged over that time (particularly in life sciences). However, many use cases still require
substantially less than that. Furthermore, very large (1-2TB) memory is required for a small number of use cases. To accommodate the variety of use cases, the new Eddie cluster will be specified with a range of memory options, and the service configured to make best use of the available configurations. Specifically, a balance of
approximately the following memory-per-core will be specified for the compute nodes:
4GB per core: 75% 12GB per core: 25%
2TB total system memory: 2 nodes
The service will be reshaped to allow smaller memory slots to be available, thus freeing up larger slots on systems with the lowest memory per core. For example, a 16-core node with 4GB of memory per core will have 64GB of RAM, and this could be divided into service slots as:
4x 1GB 6x 2GB 6x 8GB
The exact profile of memory allocation will be determined operationally in response to observed and predicted use.
Specialist hardware
The refresh will include appropriate accelerator nodes. The existing two Phi nodes provide a multi integrated core architecture and will be maintained. The use of GPGPUs remains an important part of some niche use cases. The Eddie service has provided a small GPGPU cluster environment to allow for initial proof of concept work and small scale production work, and this will be extended with the inclusion of two GPGPU nodes (each with two latest generation NVidia cards) in the cluster. The existing GPGPU nodes will remain available for use while the hardware remains functional.
Networking
The primary use of Eddie has always been for non-mpi batch processing tasks. The requirement for running multi-node distributed memory jobs has diminished greatly with the increase in node cpu core count, along with many use cases targeting a "commodity" compute environment. Alongside this, the University now has access to the Archer HPC service, which is designed explicitly to service multi-node distributed memory workloads. Therefore, the Eddie service will no longer be provisioned with a network targeted specifically at "MPI" tasks. Instead, the general purpose network will be upgraded from 1GE to 10GE to provide improved large data handling, and also provide some opportunity for multi-node distributed memory jobs (albeit with a potentially lower performance than if using a dedicated infiniband network for this purpose). Pending confirmation from the University's research community, tasks which require multi-node communication will be moved to the Archer service. The core of the Eddie network will be upgraded to 40GE to accommodate the increased edge bandwidth of 10GE, and the GPFS storage servers will be connected directly to this 40GE core, offering improved storage bandwidth. The storage network will also be upgraded to balance this.
Storage
The disk infrastructure will be refreshed with cost effective, performance driven storage, providing sufficient I/O to drive the enhanced compute cluster. The "iops per node" ratio will be increased to account for the increased core density, and new file systems will be created with the latest performance enhancements implemented (in particular - a large filesystem block size will be used).
The current /exports/work filesystem data will be migrated over to the new file system by August 2015.
GPFS will continue to be the file system used on Eddie, having proven itself to be highly capable and functional over the life of the Eddie service, and proving to be market competitive.
By using GPFS to integrate the HPC storage with the DataStore service (see
"Supporting Data Science" previously), researchers will no longer need to store large volumes of unused data on the HPC file systems. HPC storage becomes more cost effective as a result, with adequate space only needed for an appropriately sized, user defined "working data set", rather than for all data for a given group. Data will also be easily available from other systems.
Extension Capability
Two vacant 42U racks will be provisioned, configured with with 1/10GE networking and rack PDU's sufficient for connecting at least 24 hosted systems per rack. Space will be mapped out to allow for further expansion - both of further vacant rack space and of full-rack/multi-rack solutions. If required, per-socket metered PDU's will be installed to allow for exact measurement of power consumption by each hosted device.
Appendix B
Nominated School/College contacts
CSE
Biology- Alastair Kerr [email protected]
Physics- Bob Mann [email protected]
Geosciences- Mike Mineter [email protected] Chemistry- Carole Morrison [email protected]
Mathematics- Steve Law [email protected]
Engineering- Antonis Giannopoulos [email protected] Informatics- Jim Bednar [email protected]
HSS
Fraser Muir [email protected]