Command and Control of a
Massively Parallel GALS
Environment
Cameron Patterson
Supervisor: Steve Furber
SpiNNaker Team, APT Group,
University of Manchester, UK.
Management
•
SpiNNaker
– ASIC for modelling Artificial Neural Networks
– Utilises Asynchronous Interconnects
Command and Control of a Massively Parallel GALS Environment
• On-chip System and Communication NoCs • Off-chip inter-connects
to other SpiNNaker chips
– Up to 20 cores per chip – Each core simulates up
to 1,000 neurons
– System Scales to >1M
Management
•
Resource Contention
– In the brain there are billions of neurons, massive
interconnectivity, but biological operation is slow
Command and Control of a Massively Parallel GALS Environment
• Computing technology is
fast, so we can multiplex
• Contention is mapped by
anticipated statistics
• Machine is homogenous
• However behaviour of
NN is organic: we don’t know where `hot-spots’ will form
Proposed Research
(1of3)•
System Command and Control
– Many thousands of components in a large system:
Command and Control of a Massively Parallel GALS Environment
– Chips, RAMs, Links, Ethernets
– They could go wrong `Fault
Alerting’
– Other facets of system
management:
• Capacity Management • Accounting
• Performance Analysis • Security
Proposed Research
(2of3)•
Real-time Software Monitoring
– Neural Software running in biological time
Command and Control of a Massively Parallel GALS Environment
– The system doesn’t have to
be a like a `batch system’ / `black box’
– Customers may be
neuro-scientists or psychologists
– Which parts of the artificial
brain are`lit-up’
– Save this activity for later
timeline analysis
* Picture from: Powell K: Economy of the Mind.PLoS Biol 1/3/2003: e77. Rewarding the Brain – Ventral midbrain activity
Proposed Research
(3of3)•
In-Flight Reconfiguration Management
– Dealing with the consequences of a h/w or s/w issue
Command and Control of a Massively Parallel GALS Environment
– A core is overloaded – A chip is failing
– A link is congested
– Take automated remedial
action
– Remap neurons to different fascicle processors
– Re-routing of data around the fault in the machine – Turning on QoS
Infrastructure
•
To retrieve the information from the system:
Command and Control of a Massively Parallel GALS Environment
– Two components:
– Protocol to get/set data – Data stored
– Common solution used for
network attached systems:
• SNMP * • MIB *
• Research attempting
implementation of
SNMP & MIB on SpiNNaker
* J Case, M Fedor, M Schoffstall, and C Davin. RFC 1067 A Simple Network Management Protocol (SNMP), 1989
Infrastructure
•
SpiNNaker is a very large scale system
•
It may therefore require many command and
control machines to
monitor the whole system
Command and Control of a Massively Parallel GALS Environment
• Functions can be split into management domains eg:
• Function: Capacity, Faults, Software Visualisation etc. • Type: processor utilisation,
memory use, link capacity • Location: cluster of chips by
Issues to Overcome
•
SpiNNaker resources are limited
– Small instruction memory
– Restricted I/O via Ethernet links
– Want to limit all but essential load to maximise neural computation
•
The management system domain will be large.
•A solution that minimises system resources is
required, and one which is scalable
– Hierarchical System of both NMSs and agents
• AgentX * permits master/slave agent relationship – delegates the collection and data store for the end systems
Command and Control of a Massively Parallel GALS Environment
* M Daniele, B Wijnen,M Ellison, and D Francisco. RFC 2741 Agent Extensibility (AgentX) Protocol, 2000
Proposal
• Combine `system’ and `neural’ management data into
single standardised command/control SNMP framework
• Offload as much processing as possible from the
SpiNNaker machine
Command and Control of a Massively Parallel GALS Environment
• A protocol translator is therefore proposed to provide facilities for both
– To SpiNNaker side, looks like a low cost native host – To NMS looks like a
standard SNMP agent
Visualisation
What might the management systems look like?
Work so Far
• Design of low processing cost IP compatible protocol for the SpiNNaker Ethernet links
– Permits routing - collaboration with partners off site & resiliency
• Test Software – Doughnut Hunter
– Neural Network Application – Implemented on test systems – Validation of IP protocol
Command and Control of a Massively Parallel GALS Environment
• Diagnostic testing and specification
– Existing Test Chip – Proposals for next
iteration of chip design
Future Work
• Implementing the SNMP and Protocol Translator system
for SpiNNaker
– Devising a MIB for hardware and software
– Software Creation, optimisation, comparison with other solutions – Topological Testing – placing the P.T. internally to the system
• Creation of Neural Visualisation
– Explore standard tools vs. bespoke
– Extend P.T. to provide standard neural imaging output format
• In-Flight System reconfiguration
– Use management output to command/control the system
– Look at automation of re-routing around hot spots and failures without a stop being required of the system
Conclusions
•
The Protocol Translator is a promising idea to
offload management functions, but providing a
standard SNMP interface to an NMS
•
It also seems that using the same framework in
order to support both hardware and software
visualisation is a valid one
•
The low cost IP protocol developed has already
been validated as a low-cost way to significantly
improve network functionality
Questions ?
Command and Control of a Massively Parallel GALS Environment
Contact Details