Appendix IV SIMD architecture notes - High-Performance scientific computing on FPGA aboard the

________________________________

10

________________________________

10 Conclusions and future prospects

he RTE inverter has been presented as the first device ever specifically designed to invert the radiative transfer equation aboard a space-borne instrument. The stringent time and power consumption constraints of space instrumentation, as the Polarimetric and Helioseismic Imager for Solar Orbiter, made the development a real challenge, which has been finally successful. Two high-performance scientific computing architectures on FPGA have been proposed, one of them will be implemented in the real instrument.

The RTE inversion is an involved, iterative, non-linear least squares minimization of a merit function. Such a merit function measures a distance (goodness of fit) between the observed and synthetic Stokes profiles of a given spectral line. An optimized RTE inversion code called, C-MILOS, has been presented that is based on a previous version (MILOS) written by scientists in IDL. We have demonstrated that C-MILOS working in single floating point precision is as reliable and robust as MILOS, which works in double precision. We have assessed the computational cost and how performance is affected by the working spectral line, the number of wavelength samples, and the convolutions related with the broadening instrumental profile.

We have designed two new processor architectures for adapting the parallel computer paradigms to the RTE inversion problem using current high-performance computing techniques like multi-core architectures, code optimization, and specific-domain efficient processors. We

have taken well established, state-of-the-art, computing models like MIMD and SIMD, and have applied novel ideas on them for getting enhanced their computing capabilities.

Both multiprocessor architectures are proposed in order to achieve high performance in floating point precision using the Xilinx FPGA Virtex-5 and Virtex-4 respectively and trying to make the best out of all the FPGA resources.

We have proposed an MIMD multiprocessor architecture as a firm candidate to be part of embedded systems in an FPGA, mainly due to its ability for exploiting the functional and data parallel algorithms. This architecture is original because of its pipelined execution based on a novel programming method, called intensive-pipelining software. Using this method, the architecture can increase the system performance. With the proposed design, the synchronization and the communication between processors have been simplified. The implementation of this architecture using simplified processors, pProcessors, has been shown. Such pProcessors work for eliminating latency and for exploiting the computing power that the FPGA provides.

On the other hand, a SIMD multiprocessor architecture has been presented and it is finally in charge of carrying out the scientific analysis aboard the SO/PHI instrument within the DPU instrument. The new proposal was developed because the Virtex-5 FGA was not finally accepted by ESA. Despite the SIMD architecture is slower than the MIMD version, it provides a good scheme to save resources having a unique control unit for all the processors. Besides, one of the main contributions of this work is the ability of saving resources allocating operation cores in a shared operation block. The pipeline-designed processors of this architecture are tailored for reaching a high rate of executed instructions, trying to execute one instruction per clock cycle. An innovative memory address space has been introduced in order to feed the processor with its operands as fast as possible. The memory works as if it was a cache and it is statically scheduled by the compiler.

The proposed architectures are very focused on the RTE inversion problem, but we have pointed out that they can be used in other embarrassingly parallel problems since the number of processors in the architectures can be configured an adapted. Thus, the architectures are presented as scalable and configurable.

We have presented a software tool, TAPAS, which follows given design guidelines and makes it easier the use and programming of the proposed MIMD and SIMD architectures. It also makes the debugging and test tasks easier because it provides simulations of both the architectures and its running code.

This tool uses advanced techniques of software pipelining. Specifically, the compiler is decisive in this work, since it is responsible for re-ordering the instructions and organizing the

Chapter 10. Conclusions and future prospects 149

memories in order to exploit the architectures at maximum. The associate programming language makes it easier to program the architecture using a C-like style and isolating the code from the under system.

The RTE inversion algorithm needs to perform the Singular Value Decomposition of a correlation matrix within its iterative procedure. A specific SVD pipelined architecture which is able of diagonalizing two correlation matrices at the same time has been developed. The final SVD architecture has been integrated within the SIMD architecture; it exceeds the best systems on FPGA in time and precision performance.

The impact of SEU induced errors in the proposed architecture has been discussed and two different strategies for detecting and mitigating errors within the RTE inverter have been proposed. Furthermore, one of the strategies is able to detect and correct error for most elements within the architecture.

A software protocol for communication between the RTE inverter and the DPU, based on register operations, has been detailed.

By using TAPAS and a Virtex-5 FPGA, a fully-configured pProcessor MIMD architecture has been obtained along with the communication network inside it. The proposed system calculates the synthesis and spectral response functions almost 63 million times in a minute, achieving the science goal. Computationally, the system reaches high calculation performance in floating point with single precision: over 10 GFLOPS running at 200 MHz and using less than 50% of all available resources on the FPGA. This means that it is able of exceeding to commercial desktop processor by a factor 40.

Using the SIMD architecture, the challenge of carrying out the RTE inversion in less than 15 minutes has been reached. The architecture has not only demonstrated that is able to do it but it is also improves the computing capabilities of ground systems by more than ten times using a relatively slow (and 10 year-old) Virtex-4 FPGA device.

This dissertation has demonstrated that FPGAs offer enough floating-point capabilities and enable allocation of specific-domain processors to solve high demanding scientific problems even embedded aboard a space-borne instrument.

The RTE inverter prototype has been tested using real images taken by another instrument. It is able of working as accurately as usual computers regarding the scientific precision. In addition, it has satisfied the stringent requirements of power consumption and processing time.

In summary, this thesis has provided the scientific community with high-performance computing architectures, compilers, configuration and simulation tools, and specific mathematic cores like the SVD core, that all of them assembled are able of carrying out the same computing problem than a cluster of PCs but using only FPGA devices.

Future work

Many are the research opportunities that this thesis generates. Among them let us enumerate the following:

1) Development of a fault tolerant architecture.

2) Implementation of the proven solutions into other devices of the Virtex family.

3) Use of another type of bus (e.g. AMBA) that allows an easy integration in other systems. 4) Extensive comparison with GPU solutions.

5) Extend this type of architecture to other algorithms and benchmarks. 6) Convergence of the two SIMD and MIMD architectures in TAPAS. 7) Comparison of power consumption between MIMD and TopGreen500.

In document High-Performance scientific computing on FPGA aboard the SOLAR ORBITER PHI instrument (Page 164-168)